很好的一篇文章,指出了统计学在现代社会中的重要性。
The original link is :
http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=2&hp
MOUNTAIN VIEW, Calif. — At Harvard, Carrie Grimes majored inanthropology and archaeology and ventured to places like Honduras,where she studied Mayan settlement patterns by mapping where artifactswere found. But she was drawn to what she calls “all the computer andmath stuff” that was part of the job.
Skip to next paragraph [url=javascript:pop_me_up2(]Enlarge This Image[/url][url=javascript:pop_me_up2(]
[/url]Thor Swift for The New York TimesCarrie Grimes, senior staff engineer at Google, uses statistical analysis of data to help improve the company's search engine.
Multimedia
[url=javascript:pop_me_up2(]
Graphic[/url][url=javascript:pop_me_up2(]Data Sleuths in an Internet Age[/url]
Daniel Rosenbaum for The New York TimesT-shirts for sale at the Joint Statistical Meetings in Washington this week.
Readers' Comments
[blockquote]Share your thoughts.[/blockquote] [list][li]Post a Comment »[/li][li]Read All Comments (38) »[/li][/list] “People think of field archaeology as Indiana Jones, but much of what you really do is data analysis,” she said.
Now Ms. Grimes does a different kind of digging. She works at Google, where she uses statistical analysis of mounds of data to come up with ways to improve its search engine.
Ms.Grimes is an Internet-age statistician, one of many who are changingthe image of the profession as a place for dronish number nerds. Theyare finding themselves increasingly in demand — and even cool.
“Ikeep saying that the sexy job in the next 10 years will bestatisticians,” said Hal Varian, chief economist at Google. “And I’mnot kidding.”
The rising stature of statisticians, who can earn$125,000 at top companies in their first year after getting adoctorate, is a byproduct of the recent explosion of digital data. Infield after field, computing and the Web are creating new realms ofdata to explore — sensor signals, surveillance tapes, social networkchatter, public records and more. And the digital data surge onlypromises to accelerate, rising fivefold by 2012, according to aprojection by IDC, a research firm.
Yet data is merely the rawmaterial of knowledge. “We’re rapidly entering a world where everythingcan be monitored and measured,” said Erik Brynjolfsson, an economistand director of the Massachusetts Institute of Technology’sCenter for Digital Business. “But the big problem is going to be theability of humans to use, analyze and make sense of the data.”
Thenew breed of statisticians tackle that problem. They use powerfulcomputers and sophisticated mathematical models to hunt for meaningfulpatterns and insights in vast troves of data. The applications are asdiverse as improving Internet search and online advertising, cullinggene sequencing information for cancer research and analyzing sensorand location data to optimize the handling of food shipments.
Even the recently ended Netflixcontest, which offered $1 million to anyone who could significantlyimprove the company’s movie recommendation system, was a battle wagedwith the weapons of modern statistics.
Though at the fore,statisticians are only a small part of an army of experts using modernstatistical techniques for data analysis. Computing and numericalskills, experts say, matter far more than degrees. So the new datasleuths come from backgrounds like economics, computer science andmathematics.
They are certainly welcomed in the White Housethese days. “Robust, unbiased data are the first step toward addressingour long-term economic needs and key policy priorities,” Peter R. Orszag, director of the Office of Management and Budget, declared in a speech in May. Later that day, Mr. Orszag confessed in a blog entry that his talk on the importance of statistics was a subject “near to my (admittedly wonkish) heart.”
I.B.M.,seeing an opportunity in data-hunting services, created a BusinessAnalytics and Optimization Services group in April. The unit will tapthe expertise of the more than 200 mathematicians, statisticians andother data analysts in its research labs — but that number is notenough. I.B.M. plans to retrain or hire 4,000 more analysts across thecompany.
In another sign of the growing interest in the field,an estimated 6,400 people are attending the statistics profession’sannual conference in Washington this week, up from around 5,400 inrecent years, according to the American Statistical Association. Theattendees, men and women, young and graying, looked much like any othercrowd of tourists in the nation’s capital. But their rapt exchangeswere filled with talk of randomization, parameters, regressions anddata clusters. The data surge is elevating a profession thattraditionally tackled less visible and less lucrative work, likefiguring out life expectancy rates for insurance companies.
Ms.Grimes, 32, got her doctorate in statistics from Stanford in 2003 andjoined Google later that year. She is now one of many statisticians ina group of 250 data analysts. She uses statistical modeling to helpimprove the company’s search technology.
For example, Ms.Grimes worked on an algorithm to fine-tune Google’s crawler software,which roams the Web to constantly update its search index. The modelincreased the chances that the crawler would scan frequently updatedWeb pages and make fewer trips to more static ones.
The goal, Ms.Grimes explained, is to make tiny gains in the efficiency of computerand network use. “Even an improvement of a percent or two can be huge,when you do things over the millions and billions of times we do thingsat Google,” she said.
It is the size of the data sets on the Webthat opens new worlds of discovery. Traditionally, social sciencestracked people’s behavior by interviewing or surveying them. “But theWeb provides this amazing resource for observing how millions of peopleinteract,” said Jon Kleinberg, a computer scientist and socialnetworking researcher at Cornell.
For example, in research just published,Mr. Kleinberg and two colleagues followed the flow of ideas acrosscyberspace. They tracked 1.6 million news sites and blogs during the2008 presidential campaign, using algorithms that scanned for phrasesassociated with news topics like “lipstick on a pig.”
TheCornell researchers found that, generally, the traditional media leadsand the blogs follow, typically by 2.5 hours. But a handful of blogswere quickest to quotes that later gained wide attention.
Therich lode of Web data, experts warn, has its perils. Its sheer volumecan easily overwhelm statistical models. Statisticians also cautionthat strong correlations of data do not necessarily prove acause-and-effect link.
For example, in the late 1940s, beforethere was a polio vaccine, public health experts in America noted thatpolio cases increased in step with the consumption of ice cream andsoft drinks, according to David Alan Grier, a historian andstatistician at George Washington University.Eliminating such treats was even recommended as part of an anti-poliodiet. It turned out that polio outbreaks were most common in the hotmonths of summer, when people naturally ate more ice cream, showingonly an association, Mr. Grier said.
If the data explosion magnifies longstanding issues in statistics, it also opens up new frontiers.
“Thekey is to let computers do what they are good at, which is trawlingthese massive data sets for something that is mathematically odd,” saidDaniel Gruhl, an I.B.M. researcher whose recent work includes miningmedical data to improve treatment. “And that makes it easier for humansto do what they are good at — explain those anomalies.”
Andrea Fuller contributed reporting.
The original link is :
http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=2&hp
MOUNTAIN VIEW, Calif. — At Harvard, Carrie Grimes majored inanthropology and archaeology and ventured to places like Honduras,where she studied Mayan settlement patterns by mapping where artifactswere found. But she was drawn to what she calls “all the computer andmath stuff” that was part of the job.
Skip to next paragraph [url=javascript:pop_me_up2(]Enlarge This Image[/url][url=javascript:pop_me_up2(]

Multimedia
[url=javascript:pop_me_up2(]


Readers' Comments
[blockquote]Share your thoughts.[/blockquote] [list][li]Post a Comment »[/li][li]Read All Comments (38) »[/li][/list] “People think of field archaeology as Indiana Jones, but much of what you really do is data analysis,” she said.
Now Ms. Grimes does a different kind of digging. She works at Google, where she uses statistical analysis of mounds of data to come up with ways to improve its search engine.
Ms.Grimes is an Internet-age statistician, one of many who are changingthe image of the profession as a place for dronish number nerds. Theyare finding themselves increasingly in demand — and even cool.
“Ikeep saying that the sexy job in the next 10 years will bestatisticians,” said Hal Varian, chief economist at Google. “And I’mnot kidding.”
The rising stature of statisticians, who can earn$125,000 at top companies in their first year after getting adoctorate, is a byproduct of the recent explosion of digital data. Infield after field, computing and the Web are creating new realms ofdata to explore — sensor signals, surveillance tapes, social networkchatter, public records and more. And the digital data surge onlypromises to accelerate, rising fivefold by 2012, according to aprojection by IDC, a research firm.
Yet data is merely the rawmaterial of knowledge. “We’re rapidly entering a world where everythingcan be monitored and measured,” said Erik Brynjolfsson, an economistand director of the Massachusetts Institute of Technology’sCenter for Digital Business. “But the big problem is going to be theability of humans to use, analyze and make sense of the data.”
Thenew breed of statisticians tackle that problem. They use powerfulcomputers and sophisticated mathematical models to hunt for meaningfulpatterns and insights in vast troves of data. The applications are asdiverse as improving Internet search and online advertising, cullinggene sequencing information for cancer research and analyzing sensorand location data to optimize the handling of food shipments.
Even the recently ended Netflixcontest, which offered $1 million to anyone who could significantlyimprove the company’s movie recommendation system, was a battle wagedwith the weapons of modern statistics.
Though at the fore,statisticians are only a small part of an army of experts using modernstatistical techniques for data analysis. Computing and numericalskills, experts say, matter far more than degrees. So the new datasleuths come from backgrounds like economics, computer science andmathematics.
They are certainly welcomed in the White Housethese days. “Robust, unbiased data are the first step toward addressingour long-term economic needs and key policy priorities,” Peter R. Orszag, director of the Office of Management and Budget, declared in a speech in May. Later that day, Mr. Orszag confessed in a blog entry that his talk on the importance of statistics was a subject “near to my (admittedly wonkish) heart.”
I.B.M.,seeing an opportunity in data-hunting services, created a BusinessAnalytics and Optimization Services group in April. The unit will tapthe expertise of the more than 200 mathematicians, statisticians andother data analysts in its research labs — but that number is notenough. I.B.M. plans to retrain or hire 4,000 more analysts across thecompany.
In another sign of the growing interest in the field,an estimated 6,400 people are attending the statistics profession’sannual conference in Washington this week, up from around 5,400 inrecent years, according to the American Statistical Association. Theattendees, men and women, young and graying, looked much like any othercrowd of tourists in the nation’s capital. But their rapt exchangeswere filled with talk of randomization, parameters, regressions anddata clusters. The data surge is elevating a profession thattraditionally tackled less visible and less lucrative work, likefiguring out life expectancy rates for insurance companies.
Ms.Grimes, 32, got her doctorate in statistics from Stanford in 2003 andjoined Google later that year. She is now one of many statisticians ina group of 250 data analysts. She uses statistical modeling to helpimprove the company’s search technology.
For example, Ms.Grimes worked on an algorithm to fine-tune Google’s crawler software,which roams the Web to constantly update its search index. The modelincreased the chances that the crawler would scan frequently updatedWeb pages and make fewer trips to more static ones.
The goal, Ms.Grimes explained, is to make tiny gains in the efficiency of computerand network use. “Even an improvement of a percent or two can be huge,when you do things over the millions and billions of times we do thingsat Google,” she said.
It is the size of the data sets on the Webthat opens new worlds of discovery. Traditionally, social sciencestracked people’s behavior by interviewing or surveying them. “But theWeb provides this amazing resource for observing how millions of peopleinteract,” said Jon Kleinberg, a computer scientist and socialnetworking researcher at Cornell.
For example, in research just published,Mr. Kleinberg and two colleagues followed the flow of ideas acrosscyberspace. They tracked 1.6 million news sites and blogs during the2008 presidential campaign, using algorithms that scanned for phrasesassociated with news topics like “lipstick on a pig.”
TheCornell researchers found that, generally, the traditional media leadsand the blogs follow, typically by 2.5 hours. But a handful of blogswere quickest to quotes that later gained wide attention.
Therich lode of Web data, experts warn, has its perils. Its sheer volumecan easily overwhelm statistical models. Statisticians also cautionthat strong correlations of data do not necessarily prove acause-and-effect link.
For example, in the late 1940s, beforethere was a polio vaccine, public health experts in America noted thatpolio cases increased in step with the consumption of ice cream andsoft drinks, according to David Alan Grier, a historian andstatistician at George Washington University.Eliminating such treats was even recommended as part of an anti-poliodiet. It turned out that polio outbreaks were most common in the hotmonths of summer, when people naturally ate more ice cream, showingonly an association, Mr. Grier said.
If the data explosion magnifies longstanding issues in statistics, it also opens up new frontiers.
“Thekey is to let computers do what they are good at, which is trawlingthese massive data sets for something that is mathematically odd,” saidDaniel Gruhl, an I.B.M. researcher whose recent work includes miningmedical data to improve treatment. “And that makes it easier for humansto do what they are good at — explain those anomalies.”
Andrea Fuller contributed reporting.