R中关于fdr的计算,可以用命令p.adjust.
但是最近我在用这个命令对差异基因的P值做修正时,觉得有些问题。
我用的方法是BH,也就是fdr
命令如下:
p<-c(0.001315146,0.001236789,0.001229388,0.000889006,0.000876515,0.000578648,0.000565415,0.000536447,0.000517434,0.000487215,0.000364518,0.000364518,0.000247193,7.93E-05);
p.adjust(p,method="fdr",n=length(p));
这样得到的修正后的fdr值为:
0.0013151460,0.0013151460,0.0013151460,0.0011314622,0.0011314622,0.0009001191,0.0009001191,
0.0009001191,0.0009001191,0.0009001191,0.0009001191,0.0009001191,0.0009001191,0.0009001191
可以看到,修正前后面几个P值并不相等,但是修正后的fdr后面几个都变一样的值了。
而且根据fdr的定义,用命令length(p)*p/rank(p)计算出来的结果也和用命令p.adjust("fdr")计算出来的结果不一样。
但是我试了一个比较简单的例子,p<-c(0.0003,0.0001,0.02).这个用上述的p.adjust做修正的话是没有问题的。
我看了下p.adjust的源代码,其中BH这种方法计算的过程主要是:
lp<-length(p)
n<-length(p)
i <- lp:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro]
我仔细研究了半天,觉得问题应该是cummin()这个函数这里。p[o]是将p值从大到小排列,i是从大到小排列的序号,这里用cummin(),就是默认n/i * p[o]也是从大到小排列滴?那个简单的例子确实如此。但是我觉得实际上并不是所有的数据都是这样吧。
这只是我的疑问,很困惑呀。怎么这么多人都用p.adjust却没有问题呢?是我哪里用错了么?请求高人指点指点呀~~~