如何提取循环lm拟合后的P value？

Ihavenothing · 2017年7月26日

f = summary(FitNum)$fstatistic
1 - pf(f[1], f[2], f[3])

yihui · 2017年7月26日

代码 FitNum["F-statistic"] 说明你不清楚 FitNum 这个对象的数据结构，于是开始了连蒙带猜。其实不用猜，要了解任意对象的数据结构，用 str() 函数即可。示例：

fit = lm(hp ~ mpg + disp, data = mtcars)
str(summary(fit))

List of 11
 $ call         : language lm(formula = hp ~ mpg + disp, data = mtcars)
 $ terms        :Classes 'terms', 'formula'  language hp ~ mpg + disp
  .. ..- attr(*, "variables")= language list(hp, mpg, disp)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "hp" "mpg" "disp"
  .. .. .. ..$ : chr [1:2] "mpg" "disp"
  .. ..- attr(*, "term.labels")= chr [1:2] "mpg" "disp"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(hp, mpg, disp)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:3] "hp" "mpg" "disp"
 $ residuals    : Named num [1:32] -14.3 -14.3 -10 -38.2 -11.4 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ coefficients : num [1:3, 1:4] 172.22 -4.273 0.261 69.901 2.303 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "(Intercept)" "mpg" "disp"
  .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
 $ aliased      : Named logi [1:3] FALSE FALSE FALSE
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "mpg" "disp"
 $ sigma        : num 41
 $ df           : int [1:3] 3 29 3
 $ r.squared    : num 0.665
 $ adj.r.squared: num 0.642
 $ fstatistic   : Named num [1:3] 28.8 2 29
  ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
 $ cov.unscaled : num [1:3, 1:3] 2.90554 -0.09333 -0.00433 -0.09333 0.00315 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "(Intercept)" "mpg" "disp"
  .. ..$ : chr [1:3] "(Intercept)" "mpg" "disp"
 - attr(*, "class")= chr "summary.lm"

Summary16 · 2017年7月26日

Ihavenothing
简单明了，非常感谢！
单个lm 成功之后我尝试将这个方法应用于apply之后获得整个f 文件，但是最后一行的p value 全部都是第一个探针拟合之后的p value，麻烦你再指导一下行么？谢谢！！！

f <- apply(tLmbeta,2,function(x) lm(x~cos(t)+sin(t))$fstistic)
> f[,1:6]
      cg00000029 cg00000165 cg00000289 cg00000363 cg00000714 cg00000769
value   2.504589  0.9791611  0.0548869  0.1546277  0.9121745  0.6058973
numdf   2.000000  2.0000000  2.0000000  2.0000000  2.0000000  2.0000000
dendf  44.000000 44.0000000 44.0000000 44.0000000 44.0000000 44.0000000

pva <- head(rbind(f, 1 - pf(f[1],f[2],f[3])))
> pva[,1:6]
       cg00000029  cg00000165  cg00000289  cg00000363  cg00000714  cg00000769
value  2.50458868  0.97916108  0.05488690  0.15462769  0.91217452  0.60589726
numdf  2.00000000  2.00000000  2.00000000  2.00000000  2.00000000  2.00000000
dendf 44.00000000 44.00000000 44.00000000 44.00000000 44.00000000 44.00000000
       0.09329445  0.09329445  0.09329445  0.09329445  0.09329445  0.09329445

Summary16 · 2017年7月26日

yihui
对，我是刚入门的生物信息狗，对数据结构不熟悉，R 和统计都不熟，get 到了str 这个新用法，谢谢！

Summary16 · 2017年7月26日

yihui
刚开始只知道str是Pthon 里面转化成字符的用法，谢谢！！！

Ihavenothing · 2017年7月26日

Summary16
首先你在写 apply 函数的时候应该是 summary(lm(...))$fstatistic，其次 apply 函数把跟 F 统计量有关的信息拼到一起了，这时候你可以用向量化的操作 1 - pf(f[1, ], f[2, ], f[3, ])。找一些入门的教程会帮助你理解这些操作。

Summary16 · 2017年7月27日

Ihavenothing
嗯嗯，问题解决了，O(∩_∩)O谢谢；我会先试着看一些开源的基础教程，谢谢建议。

dapengde · 2017年7月28日

Summary16 可以用我给新手写的 beginr 包，对一个数据框任意两列做线性拟合并把所有拟合数据都提取成一个表格。

install.packages('beginr')
df <- data.frame(a = 1:10, b = 1:10 + rnorm(10), c = 1:10 + rnorm(10))
beginr::lmdf(df)
##   x y r.squared adj.r.squared  intercept     slope Std.Error.intercept
## 1 a b 0.9630442     0.9584247 -0.7901086 1.1271749           0.4843895
## 2 a c 0.8568806     0.8389907 -0.3427519 1.0425054           0.9346580
## 3 b a 0.9630442     0.9584247  0.8783157 0.8543876           0.3749251
## 4 b c 0.8157952     0.7927696  0.6004724 0.8856059           0.9426967
## 5 c a 0.8568806     0.8389907  1.0688792 0.8219436           0.7466774
## 6 c b 0.8157952     0.7927696  0.4432909 0.9211717           0.9729768
##   Std.Error.slope t.intercept   t.slope Pr.intercept     Pr.slope
## 1      0.07806644  -1.6311430 14.438661   0.14150586 5.177385e-07
## 2      0.15063377  -0.3667137  6.920795   0.72334232 1.219459e-04
## 3      0.05917360   2.3426432 14.438661   0.04722045 5.177385e-07
## 4      0.14878374   0.6369730  5.952303   0.54193651 3.410966e-04
## 5      0.11876434   1.4315142  6.920795   0.19016727 1.219459e-04
## 6      0.15475888   0.4556027  5.952303   0.66078732 3.410966e-04

qingyi · 2017年7月28日

dapengde looks nice

Ihavenothing · 2017年7月28日

dapengde 大赞?！

Summary16 · 2017年8月1日

dapengde
哇偶~超级赞~新手就喜欢这种一步搞定的~~~~(>_<)~~~~O(∩_∩)O谢谢?

yihui · 2017年8月1日

Summary16 但对你这个问题似乎并不适用啊，你这里不是一元回归，也不需要对一个数据框里的所有列两两做回归。说到底，你还是得了解数据结构；str() 是抓鱼的办法，beginr::lmdf() 是众多鱼中的一条。

dapengde · 2017年8月1日

Summary16 Ihavenothing qingyi yihui 其实 'beginr' 有些函数是我刚刚学 R 的时候写的，简单易懂，虽然可能比较难看和低效，但是源代码挺适合新手拿来修改练手的。在 RStudio 里把光标移到函数名上，按 F2，就能看到源代码。比如 beginr::lmdf()，看一下源代码就知道怎么抓鱼了。

'beginr' 目前的帮助文件很不完整，可供添加的有用有趣的函数还有很多，但我暂时没时间弄。匆匆发布就是为了把'beginr'这个包名先占住再说。我希望将来能把它弄成一本能替代入门教材的包。新手只要跟着 beginr 里的函数和帮助文件走一圈，就能入门了。哪位要是有兴趣有时间，可以把这个事儿接手过去，共同署名就是了。