很好的想法,有医生开始使用了吗?效果如何?
very nice!- 拿12万的还是个例吧,不过去大的药厂的话,一般8-9万是有的。
- 于 从前有一只熊猫[s:11]
- 很好的一篇文章,指出了统计学在现代社会中的重要性。
The original link is :
http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=2&hp
MOUNTAIN VIEW, Calif. — At Harvard, Carrie Grimes majored inanthropology and archaeology and ventured to places like Honduras,where she studied Mayan settlement patterns by mapping where artifactswere found. But she was drawn to what she calls “all the computer andmath stuff” that was part of the job.
Skip to next paragraph [url=javascript:pop_me_up2(]Enlarge This Image[/url][url=javascript:pop_me_up2(][/url]Thor Swift for The New York TimesCarrie Grimes, senior staff engineer at Google, uses statistical analysis of data to help improve the company's search engine.
Multimedia
[url=javascript:pop_me_up2(]Graphic[/url][url=javascript:pop_me_up2(]Data Sleuths in an Internet Age[/url]
Daniel Rosenbaum for The New York TimesT-shirts for sale at the Joint Statistical Meetings in Washington this week.
Readers' Comments
[blockquote]Share your thoughts.[/blockquote] [list][li]Post a Comment »[/li][li]Read All Comments (38) »[/li][/list] “People think of field archaeology as Indiana Jones, but much of what you really do is data analysis,” she said.
Now Ms. Grimes does a different kind of digging. She works at Google, where she uses statistical analysis of mounds of data to come up with ways to improve its search engine.
Ms.Grimes is an Internet-age statistician, one of many who are changingthe image of the profession as a place for dronish number nerds. Theyare finding themselves increasingly in demand — and even cool.
“Ikeep saying that the sexy job in the next 10 years will bestatisticians,” said Hal Varian, chief economist at Google. “And I’mnot kidding.”
The rising stature of statisticians, who can earn$125,000 at top companies in their first year after getting adoctorate, is a byproduct of the recent explosion of digital data. Infield after field, computing and the Web are creating new realms ofdata to explore — sensor signals, surveillance tapes, social networkchatter, public records and more. And the digital data surge onlypromises to accelerate, rising fivefold by 2012, according to aprojection by IDC, a research firm.
Yet data is merely the rawmaterial of knowledge. “We’re rapidly entering a world where everythingcan be monitored and measured,” said Erik Brynjolfsson, an economistand director of the Massachusetts Institute of Technology’sCenter for Digital Business. “But the big problem is going to be theability of humans to use, analyze and make sense of the data.”
Thenew breed of statisticians tackle that problem. They use powerfulcomputers and sophisticated mathematical models to hunt for meaningfulpatterns and insights in vast troves of data. The applications are asdiverse as improving Internet search and online advertising, cullinggene sequencing information for cancer research and analyzing sensorand location data to optimize the handling of food shipments.
Even the recently ended Netflixcontest, which offered $1 million to anyone who could significantlyimprove the company’s movie recommendation system, was a battle wagedwith the weapons of modern statistics.
Though at the fore,statisticians are only a small part of an army of experts using modernstatistical techniques for data analysis. Computing and numericalskills, experts say, matter far more than degrees. So the new datasleuths come from backgrounds like economics, computer science andmathematics.
They are certainly welcomed in the White Housethese days. “Robust, unbiased data are the first step toward addressingour long-term economic needs and key policy priorities,” Peter R. Orszag, director of the Office of Management and Budget, declared in a speech in May. Later that day, Mr. Orszag confessed in a blog entry that his talk on the importance of statistics was a subject “near to my (admittedly wonkish) heart.”
I.B.M.,seeing an opportunity in data-hunting services, created a BusinessAnalytics and Optimization Services group in April. The unit will tapthe expertise of the more than 200 mathematicians, statisticians andother data analysts in its research labs — but that number is notenough. I.B.M. plans to retrain or hire 4,000 more analysts across thecompany.
In another sign of the growing interest in the field,an estimated 6,400 people are attending the statistics profession’sannual conference in Washington this week, up from around 5,400 inrecent years, according to the American Statistical Association. Theattendees, men and women, young and graying, looked much like any othercrowd of tourists in the nation’s capital. But their rapt exchangeswere filled with talk of randomization, parameters, regressions anddata clusters. The data surge is elevating a profession thattraditionally tackled less visible and less lucrative work, likefiguring out life expectancy rates for insurance companies.
Ms.Grimes, 32, got her doctorate in statistics from Stanford in 2003 andjoined Google later that year. She is now one of many statisticians ina group of 250 data analysts. She uses statistical modeling to helpimprove the company’s search technology.
For example, Ms.Grimes worked on an algorithm to fine-tune Google’s crawler software,which roams the Web to constantly update its search index. The modelincreased the chances that the crawler would scan frequently updatedWeb pages and make fewer trips to more static ones.
The goal, Ms.Grimes explained, is to make tiny gains in the efficiency of computerand network use. “Even an improvement of a percent or two can be huge,when you do things over the millions and billions of times we do thingsat Google,” she said.
It is the size of the data sets on the Webthat opens new worlds of discovery. Traditionally, social sciencestracked people’s behavior by interviewing or surveying them. “But theWeb provides this amazing resource for observing how millions of peopleinteract,” said Jon Kleinberg, a computer scientist and socialnetworking researcher at Cornell.
For example, in research just published,Mr. Kleinberg and two colleagues followed the flow of ideas acrosscyberspace. They tracked 1.6 million news sites and blogs during the2008 presidential campaign, using algorithms that scanned for phrasesassociated with news topics like “lipstick on a pig.”
TheCornell researchers found that, generally, the traditional media leadsand the blogs follow, typically by 2.5 hours. But a handful of blogswere quickest to quotes that later gained wide attention.
Therich lode of Web data, experts warn, has its perils. Its sheer volumecan easily overwhelm statistical models. Statisticians also cautionthat strong correlations of data do not necessarily prove acause-and-effect link.
For example, in the late 1940s, beforethere was a polio vaccine, public health experts in America noted thatpolio cases increased in step with the consumption of ice cream andsoft drinks, according to David Alan Grier, a historian andstatistician at George Washington University.Eliminating such treats was even recommended as part of an anti-poliodiet. It turned out that polio outbreaks were most common in the hotmonths of summer, when people naturally ate more ice cream, showingonly an association, Mr. Grier said.
If the data explosion magnifies longstanding issues in statistics, it also opens up new frontiers.
“Thekey is to let computers do what they are good at, which is trawlingthese massive data sets for something that is mathematically odd,” saidDaniel Gruhl, an I.B.M. researcher whose recent work includes miningmedical data to improve treatment. “And that makes it easier for humansto do what they are good at — explain those anomalies.”
Andrea Fuller contributed reporting.
- 考一下SAS base, SAS advance certificate会比较有用,准备考试的过程中可以系统学习一下。有个东西叫sas base online tutor,挺好的,记得有人在人大经济论坛上发过,可以去下一下。考这两个证都有机经,临考试前可以参考。如果想找药厂的sas programmer,建议学一点clinical trial 和 survival analysis的东西,最好有机会能做一下相关的projects.
- 想到一种比较复杂的方法
方程一:y=a1+a2x1+b1x2+b2x2^2+b3x1x2+b4x1x2^2
x1 代表老鼠的种类,for gk, x1=0, for wky, x1=1
x2代表时间,
when x1=0, y=a1+b1x2+b2x2^2
when x1=1,y=a1+a2+b1x2+b2x2^2+b3x2+b4x2^2
这样要判断这两条曲线是否相同,可以用所有数据fit方程一,检验whether a2=0, b3=0, b4=0
我比较担心的是,变量太多,数据较少,用此种方法可行吗?可信度如何? - 从大家的回复中得到点启发,可以用ANCOVA来做吗?把老鼠类型当成qualitative factor,把时间当成quantitative factor,这样有道理吗?
- 不太知道minitab里的GLM是怎样运行的,选择了什么link呢?
- 请问纵向数据分析,分层模型的英文名是什么?我可以自己去查查
- 谢谢楼上的答复,但无论是ANOVA还是kruskal- wallis,都是把时间当成一个因素,把对老鼠的处理当成另一个因素,有没有什么方法能检验mRNA的含量随时间的变化呢?
- In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.
以下是几种常用的
* maximum likelihood (ML)
* method of moments, generalized method of moments
* minimum mean squared error (MMSE)
* minimum variance unbiased estimator (MVUE)
* best linear unbiased estimator (BLUE)
在不同的情况下该如何选择呢? - 于 数理统计方向问题硕士大概学点应用性多一点的比较好吧。
生物统计方向,去药厂
经济统计方向,去保险公司和银行
以上两类比较多一点 - 不好意思,那个地方我朋友写错了。如图,应该是对于每个时间点来说,有两种老鼠,每一种老鼠有六只,所以一共有12个measurements.每一次测量需要杀了老鼠,所以每个时间点的老鼠都是不同的,因此不需要ID.请问大家对于这种问题有什么好的方法吗?
- gxgx!
- 以下是数据和图片
strain time (week) mRNA (fmol/g tissue)
gk 4 3.457556317
gk 4 3.770057966
gk 4 4.41222839
gk 4 2.25564172
gk 4 5.697042369
gk 4 5.499910792
wky 4 1.495201124
wky 4 1.5808647
wky 4 3.179360818
wky 4 3.113355782
wky 4 1.672961823
wky 4 2.51877681
gk 8 1.986966807
gk 8 1.817359067
gk 8 2.607335716
gk 8 2.521751939
gk 8 3.54783496
gk 8 1.603732451
wky 8 0.57066157
wky 8 2.039664195
wky 8 1.582738336
wky 8 1.521576999
wky 8 0.982305373
wky 8 1.453794745
gk 12 2.580166887
gk 12 2.680766965
gk 12 2.327498637
gk 12 2.782070612
gk 12 2.478199531
gk 12 1.772934765
wky 12 1.259066597
wky 12 2.143513256
wky 12 1.945891739
wky 12 2.020674981
wky 12 0.718499774
wky 12 1.188810565
gk 16 1.33091868
gk 16 2.279603818
gk 16 2.404676131
gk 16 2.225135797
gk 16 3.0657563
gk 16 2.136957405
wky 16 1.604713246
wky 16 1.469989954
wky 16 0.12755652
wky 16 1.168993362
wky 16 1.808343899
wky 20 3.387474245
gk 20 2.349570002
gk 20 2.405931947
gk 20 1.686711135
gk 20 2.346464483
gk 20 2.112037923
gk 20 3.150802465
wky 20 2.205191972
wky 20 2.69184351
wky 20 1.459197824
wky 20 1.340014551
wky 20 1.226826031 - 一个朋友跟我咨询这个问题,自己也想不出什么好的方法,请各位大侠多指点!
Dear All,
Sorry I cannot type chinese here but I have a question to ask. I would like to test the significance between two group measurements. What I have is the RNA measurement of two strain rats at different time points. For each time point, I have 6 measurement.Two strains and each strain has three measurements from three rats. And the rats are kill after the measurement.
I have attached the data in the back.
If I plot RNA measurement vs time then I would have two curves represent the progression of that RNA in two strains. What I want is to test if the two strains
have different progression on this particular RNA measurement.
I was wondering if there is anyway besides ANOVA to test that. The thing I don't like ANOVA is that it treats the time as discontinuous variable and what I want to test is the progression of the RNA vs time curve.
Any suggestion is appreciated.Thanks
Jing
[img]file:///C:/DOCUME~1/yuanye/LOCALS~1/Temp/moz-screenshot-1.jpg[/img]
- 于 最牛逼的乘客
- [s:14]
- 补充一下,也可以去几个出国学生比较多的学校的bbs的飞跃版(abroad)上看看。