求助：logistic回归，glm函数，结局变量采用“0,1”和“cbind(success, fail)有何区别？

hagiaatcos · 2023年4月21日

按照glm函数的帮助文档，这两种表示方法应该是一样的：

但是实际计算结果不同。说明这两种格式的结局变量的含义不同，那这个不同之处在哪呢？
代码如下：

df1

race dis sucess fail
1 58.40 17 61 49
2 92.40 7 0 15
3 18.28 15 4 38
4 59.38 0 1790 14
5 55.51 7 0 15
6 58.40 12 16 52
7 83.39 2 31 11
8 94.54 9 87 67
9 88.54 0 107 11
10 77.19 4 90 12
11 38.75 7 6 17

df2 <- df1 %>% 
  mutate(y = if_else(sucess>fail,1,0))
df2$y <- as.factor(df2$y)
mod1 <- glm(cbind(sucess,fail)~dis+race,data = df,family = 'binomial')
mod2 <- glm(y~dis+race,data = df,family = 'binomial')
res1 <- exp(cbind(OR=coef(mod1),confint(mod1)))
res2 <- exp(cbind(OR=coef(mod2),confint(mod2)))

res1

                OR      2.5 %      97.5 %
(Intercept) 68.1711286 38.2284603 122.6115988
dis 0.7469668 0.7276167 0.7659900
race 0.9868600 0.9793819 0.9944779

res2

                OR        2.5 %    97.5 %
(Intercept) 0.06943074 2.541349e-05 31.938780
dis 0.92467344 6.551580e-01 1.226807
race 1.05313081 9.817307e-01 1.170621

fenguoerbian · 2023年4月21日

你给的这两种数据不等价，以第一行为例，成功61次，失败49次，那么你转化成一列的factor形式的响应变量时，应该本行重复110次，其中61次的factor值是1，49次的factor值是0。

hagiaatcos · 2023年4月24日

fenguoerbian 感谢解答！