http://www.nature.com/nature/journal/v415/n6871/full/415530a.html
这篇《Nature》文章中,作者分析基因表达芯片数据中哪些基因与患者的5年内复发有关,基因表达数据是连续数值变量,5年内复发资料是二分类变量,作者通过‘Pearson Coefficient’在筛选出与5年复发有关的基因。据我所知,Pearson Coefficient是用来分析类似人的身高和体重之间关系的数据,请问这篇文章是如何做到的?
作者对分析方法描述如下:
Method of supervised classification
We calculated the correlation between the prognostic category (metastasis vs. no-metastasis) and the logarithmic expression ratio across all 78 samples for each individual gene in the 5,000 significant genes. The distribution of the correlation coefficients is shown in red in the histogram in Figure S1 (a). Genes with a greater correlation coefficient (Pearson Coefficient) are likely candidates for reporting prognosis, i.e., short interval to metastasis or no metastasis.
(见该文的附件Methods中http://www.nature.com/nature/journal/v415/n6871/suppinfo/415530a.html)
希望能够解答我的疑惑。万分感激。