• R语言已解决
  • 求助:logistic回归,glm函数,结局变量采用“0,1”和“cbind(success, fail)有何区别?

按照glm函数的帮助文档,这两种表示方法应该是一样的:

但是实际计算结果不同。说明这两种格式的结局变量的含义不同,那这个不同之处在哪呢?
代码如下:

df1

race dis sucess fail

1 58.40 17 61 49
2 92.40 7 0 15
3 18.28 15 4 38
4 59.38 0 1790 14
5 55.51 7 0 15
6 58.40 12 16 52
7 83.39 2 31 11
8 94.54 9 87 67
9 88.54 0 107 11
10 77.19 4 90 12
11 38.75 7 6 17

df2 <- df1 %>% 
  mutate(y = if_else(sucess>fail,1,0))
df2$y <- as.factor(df2$y)
mod1 <- glm(cbind(sucess,fail)~dis+race,data = df,family = 'binomial')
mod2 <- glm(y~dis+race,data = df,family = 'binomial')
res1 <- exp(cbind(OR=coef(mod1),confint(mod1)))
res2 <- exp(cbind(OR=coef(mod2),confint(mod2)))

res1

                OR      2.5 %      97.5 %

(Intercept) 68.1711286 38.2284603 122.6115988
dis 0.7469668 0.7276167 0.7659900
race 0.9868600 0.9793819 0.9944779

res2

                OR        2.5 %    97.5 %

(Intercept) 0.06943074 2.541349e-05 31.938780
dis 0.92467344 6.551580e-01 1.226807
race 1.05313081 9.817307e-01 1.170621

你给的这两种数据不等价,以第一行为例,成功61次,失败49次,那么你转化成一列的factor形式的响应变量时,应该本行重复110次,其中61次的factor值是1,49次的factor值是0。