【已解决】plot 函数的 xlab 和 ylab 的参数设置是否不一样

dapengde · 2019年5月21日

Cloud2016 好神奇。

原因可能是在 plot()内部使用了 data =，导致函数的 ylab= 自动去找 y1 的值。但是不知道为啥没去找 x1 的值。换一下写法可以避免这个问题：

plot(anscombe$x1, anscombe$y1, 
     xlab = substitute(x[i], list(i = 1)),
     ylab = substitute(y[i], list(i = 1))
)

或者用 mtext()，可能更稳定一点：

plot(y1 ~ x1, data = anscombe, xlab = '', ylab = '')
mtext(substitute(x[i], list(i = 1)), 1, line = 3)
mtext(substitute(y[i], list(i = 1)), 2, line = 3)

Cloud2016 · 2019年5月21日

dapengde ylab 去找 y1 的值就不懂了（实际上看起来确实是去找y1对应的值了，而不是字符串 y1），因为 ylab 只要一个字符串或表达式类型的数据对象。

mtext 确实可以解决这个问题，但是方法2看起来应该是思路清晰，方向正确的做法，但是不知道为什么会有问题？我很好奇这个会不会是 R 本身的 BUG

dapengde · 2019年5月21日

Cloud2016 不知道为什么。说不定是个大发现。

这个问题我从来没遇到过，一是我从来不用 data=，二是我觉得默认的 xlab ylab 距离坐标轴太远，mtext() 调整得更节省空间。

要说思路清晰的话，我以前每次都被 substitute(expression())弄得很糊涂，后来遇见了 latex2exp 包，从此用上了 LaTeX 公式，整个世界都美好了。

require(latex2exp)

i <- 1
plot(anscombe$x1, anscombe$y1, 
     xlab = TeX(paste0('$x_{', i, '}$')),
     ylab = TeX(paste0('$y_{', i, '}$')))
)

Cloud2016 · 2019年5月22日

dapengde 我发现点什么东西了

正确的做法是将 call 类型强制转化为 expression 类型，虽然它们看起来好像一样

plot(y1 ~ x1,
  data = anscombe, 
  xlab = as.expression(substitute(x[i], list(i = 1))),
  ylab = as.expression(substitute(y[i], list(i = 1)))
)

xsub <- substitute(x[i], list(i = 1))
xsub
x[1]
class(xsub)
[1] "call"
is.expression(xsub)
[1] FALSE

?substitute 这个地方好像切入 R 比较高深的内容了，参考了 <https://adv-r.hadley.nz/environments.html> 目前似懂非懂状态。

我之前一直以为 substitute 返回一个表达式类型，因为 print 的结果看起来一样，其实是错的，应该要 is.expression 判断一下，人眼不好使！！

as.expression attempts to coerce its argument into an expression object. It is generic, and only the default method is described here. (The default method calls as.vector(type = "expression") and so may dispatch methods for as.vector.) NULL, calls, symbols (see as.symbol) and pairlists are returned as the element of a length-one expression vector. Atomic vectors are placed element-by-element into an expression vector (without using any names): lists are changed type to an expression vector (keeping all attributes). Other types are not currently supported.

以方法1为例，最后一个疑问：返回的 call 对象怎么找的 8.04，其调用栈是怎么运行的？

substitute works on a purely lexical basis. There is no guarantee that the resulting expression makes any sense.

不知这句话是否对回答这个疑问有意义

latex2exp 或者调用 tikzDevice 包，我倒是也知道，只是这里不想把问题避开，觉着应该有直接的办法

这里要开始一点题外话，我最初是翻数据集 anscombe 的帮助文档 help(anscombe)，在文档最后说我有一个魔法可以在一个 for 循环内完成四个线性回归（我看好久没看明白魔法的优雅之处，遂动手实现一个，毕竟批量地跑模型似乎有点用）

##-- now some "magic" to do the 4 regressions in a loop:
ff <- y ~ x
mods <- setNames(as.list(1:4), paste0("lm", 1:4))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  ## or   ff[[2]] <- as.name(paste0("y", i))
  ##      ff[[3]] <- as.name(paste0("x", i))
  mods[[i]] <- lmi <- lm(ff, data = anscombe)
  print(anova(lmi))
}

## See how close they are (numerically!)
sapply(mods, coef)
lapply(mods, function(fm) coef(summary(fm)))

## Now, do what you should have done in the first place: PLOTS
op <- par(mfrow = c(2, 2), mar = 0.1+c(4,4,1,1), oma =  c(0, 0, 2, 0))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  plot(ff, data = anscombe, col = "red", pch = 21, bg = "orange", cex = 1.2,
       xlim = c(3, 19), ylim = c(3, 13))
  abline(mods[[i]], col = "blue")
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
par(op)

我的优雅版本是这样的，完全没有魔法

data(anscombe)
form <- paste(paste0("y", seq(4)), paste0("x", seq(4)), sep = "~")
fit <- lapply(form, lm, data = anscombe)
op <- par(mfrow = c(2, 2), mar = 0.1 + c(4, 4, 1, 1), oma = c(0, 0, 2, 0))
for (i in seq(4)) {
  plot(as.formula(form[i]),
    data = anscombe, col = hcl.colors(11),
    pch = 19, cex = 1.2,
    xlim = c(3, 19), ylim = c(3, 13),
    xlab = as.expression(substitute(x[i], list(i = i))),
    ylab = as.expression(substitute(y[i], list(i = i)))
  )
  abline(fit[[i]], col = "red", lwd = 2)
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
par(op)

dapengde · 2019年5月22日

Cloud2016 深究起来好累。而且好像还没有回答为啥 xlab 和 ylab 区别对待的问题。你要不要提交这个 bug 给 R？

Cloud2016 · 2019年5月22日

dapengde 深究起来，我觉得不是 BUG, xlab 和 ylab 也没有区别对待，只是我传递给它一个它也不知道怎么处理的 R 对象 call 所以它就随便处理了，随便的意思就是不知道会有什么后果的处理！试着call @yihui 应该会有答案

dapengde · 2019年5月22日

Cloud2016 如果是随便处理的话，那 xlab 应该有一半几率适合方法1，一半几率方法2。此前我画了好几次，好像挺稳定的。要不画100次试试？我现在手头没电脑……

Cloud2016 · 2019年5月23日

dapengde 应该是我没看懂，妄言了

yihui · 2019年5月23日

Cloud2016 可以看 plot.formula 方法的源代码：

> graphics:::plot.formula
function (formula, data = parent.frame(), ..., subset, ylab = varnames[response], 
  ask = dev.interactive()) 
{
  ...

我没有细看，但 ylab 这个参数确实是作了特殊处理，应该是在某个地方被执行了 eval() 导致 y[i] 被打回原形。

Cloud2016 · 2019年5月23日

yihui dapengde plot.formula 方法里确实对 ylab 的处理不一样，源代码不少，看起来有点晕，大家平时看源代码有什么辅助没有？

dapengde · 2019年5月23日

Cloud2016 只看明白了一点：确实是对 ylab 经过了特殊处理。xlab 和 ylab 确实待遇不一样。其他的我就看不懂了。

PS. ggplot 对方法 1 是支持的。

gg <- ggplot(anscombe) + geom_point(aes(x1, y1))

gg + 
  xlab(substitute(x[i], list(i = 1))) + 
  ylab(substitute(y[i], list(i = 1)))

gg + 
  xlab(as.expression(substitute(x[i], list(i = 1)))) +
  ylab(as.expression(substitute(y[i], list(i = 1))))

Cloud2016 · 2019年5月23日

我之前没好好注意这个问题，就是 plot 是一个泛型函数，对于不同的数据对象，它有不同的方法。我之前说 xlab 和 ylab 没什么不同是对 plot.default 说的，而在 plot.formula 中确实对 ylab 做了特别处理

 methods(plot)
 [1] plot.acf*           plot.data.frame*    plot.decomposed.ts*
 [4] plot.default        plot.dendrogram*    plot.density*      
 [7] plot.ecdf           plot.factor*        plot.formula*      
[10] plot.function       plot.hclust*        plot.histogram*    
[13] plot.HoltWinters*   plot.isoreg*        plot.lm*           
[16] plot.medpolish*     plot.mlm*           plot.ppr*          
[19] plot.prcomp*        plot.princomp*      plot.profile.nls*  
[22] plot.raster*        plot.spec*          plot.stepfun       
[25] plot.stl*           plot.table*         plot.ts            
[28] plot.tskernel*      plot.TukeyHSD*     
see '?methods' for accessing help and source code

yihui · 2019年5月23日

Cloud2016 Base R 的源代码说实话不读也罢，它并不适合学习。在 plot.formula 中，我怀疑是这句 do.call 将 ylab 从 call 运行为实际值了：https://github.com/wch/r-source/blob/540cb48/src/library/graphics/R/plot.R#L207 因为 do.call() 默认不保护表达式。对比：

> do.call(c, list(a = substitute(cars[i, 1], list(i = 1))))  # 默认不保护
a 
4 
> do.call(c, list(a = substitute(cars[i, 1], list(i = 1))), quote = TRUE)  # 保护表达式
$a
cars[1, 1]

quote() / enquote() 这些概念比较难理解，我至今也没完全搞明白，也没太大兴趣去深究。

Cloud2016 · 2019年5月23日

yihui dapengde 太好了，终于挖出根源了，我觉得这应该算一个 BUG 了，英语水平好的大神们有时间的话，不妨提交一个报告

Jiena · 2019年5月23日

yihui
关于 quote() / enquote()，只知道 enquote() 是表达式再额外加个quote, 例子如下：

> quote(AA(c(10.5, 15.2, 12.3))) # AA 未定义， 保护表达式
AA(c(10.5, 15.2, 12.3))
> enquote(call("AA", c(10.5, 15.2, 12.3))) # 进一步保护表达式？基本上没用过这块。
quote(AA(c(10.5, 15.2, 12.3)))

关于 quote() 和 call()的区别，call()不保护 arguments，比如：

> quote(round(BB)) ## BB未定义，保护所有表达式
round(BB)
> call("round", BB) ## 保护 "round", 不保护 BB
Error: object 'BB' not found

yihui · 2020年6月7日

Cloud2016 这个千年老漏洞在最新的 R 4.0.1中终于修正了：https://stat.ethz.ch/pipermail/r-announce/2020/000655.html

Cloud2016 · 2020年6月7日

yihui 哇，07 年就有人提交了 BUG 报告。最近看到 Michael Chirico 把 BUG 报告同步到 Github 了 https://github.com/MichaelChirico/r-bugs 这无疑搜索起来很方便，减少重复提交 BUG 的工作