dapengde Liechi
我建议在 ggplot2 里复现 Base R 图形能不能不仅仅形似,而且做到神似,神似不是要追求真假难辨,而是在深入了解 Base R 绘图的过程后(这其中很可能涉及统计的内容,如直方图),将其等价地转换为 ggplot2 代码。如果把图库这一章按照这要求复现出来,我想应该能在统计和可视化两个层面上更上一层楼,这对于提高这两方面的水平是非常有意义的。练习的意义就会更深一层,毕竟我发现两位大侠都是 ggplot2 的粉丝,基础的操作都会。
为了说清楚,以现代统计图形之图库之直方图为例,
# 这是 Base R 版
par(mfrow = c(2, 2), mar = c(2, 3, 2, .5), mgp = c(2, .5, 0))
data(geyser, package = "MASS")
hist(geyser$waiting, main = "(1) freq = TRUE", xlab = "waiting")
hist(geyser$waiting, freq = FALSE, xlab = "waiting", main = "(2) freq = FALSE")
hist(geyser$waiting, breaks = 5, density = 10, xlab = "waiting", main = "(3) breaks = 5")
hist(geyser$waiting, breaks = 40, col = "red", xlab = "waiting", main = "(4) breaks = 40")
# 这是 ggplot2 版
library(ggplot2)
library(cowplot)
p <- ggplot(aes(waiting), data = geyser)
p1 <- p + geom_histogram(breaks = seq(40, 110, by = 5))
p2 <- p + geom_histogram(breaks = seq(40, 110, by = 5), aes(y = stat(density)))
p3 <- p + geom_histogram(breaks = seq(40, 110, by = 10))
p4 <- p + geom_histogram(breaks = seq(42, 108, by = 2), fill = "red", color = "black")
plot_grid(p1, p2, p3, p4, labels = c(
"(1) freq = TRUE",
"(2) freq = FALSE",
"(3) breaks = 5",
"(4) breaks = 40"
), ncol = 2)
注意 ggplot2 版的 breaks 的设置,它来源于 Base R 版的绘图结果,不是试出来的。以第一个子图为例
h <- hist(geyser$waiting, main = "(1) freq = TRUE", xlab = "waiting")
h
$breaks
[1] 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110
$counts
[1] 2 26 29 25 17 10 34 59 44 35 14 3 0 1
$density
[1] 0.0013377926 0.0173913043 0.0193979933 0.0167224080 0.0113712375
[6] 0.0066889632 0.0227424749 0.0394648829 0.0294314381 0.0234113712
[11] 0.0093645485 0.0020066890 0.0000000000 0.0006688963
$mids
[1] 42.5 47.5 52.5 57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.5
[13] 102.5 107.5
$xname
[1] "geyser$waiting"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
此节的另一个例子是
demo("hist_geyser", package = "MSG")
df <- data.frame(x = seq(40, 110, 5), y = 0,
xend = seq(40, 110, 5), yend = ht)
p2 + geom_density(fill = "lightgray", color = "black") +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend),
data = df, lty = 3
)