问题描述
数据框中包含city
和dim
两个分类变量,以及area
和weight
两个测量值,想要先基于city
进行分类,在此基础上计算各个小组中dim
为特定类别的条目的area
或weight
统计值。我用dplyr
的group_by
和summarise
能完成计算,想知道如何把这个计算过程封装成一个函数呢?具体说明如下。
代码和结果
# 载入需要的包
library(dplyr)
# 构建数据
mydata <- data.frame(
"city" = rep(c("Beijing", "Shanghai", "Kunming"), each = 4),
"dim" = rep(c(1,2,3), 4),
"area" = seq(1:12),
"weight" = rep(2, 12)
)
这是我的计算过程和应得的结果:
# 按city分组,然后分别统计各个分组中dim = 1的area占组内area总值的比例
# 然后再分别统计各组中dim = 1的weight占组内weight总值的比例
mydata %>% group_by(city) %>%
summarise(perc = sum(ifelse(dim == 1, area, 0)/sum(area)))
mydata %>% group_by(city) %>%
summarise(perc = sum(ifelse(dim == 1, weight, 0)/sum(weight)))
# `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
# city perc
# <chr> <dbl>
# 1 Beijing 0.5
# 2 Kunming 0.238
# 3 Shanghai 0.269
# `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
# city perc
# <chr> <dbl>
# 1 Beijing 0.5
# 2 Kunming 0.25
# 3 Shanghai 0.25
然后尝试构建可以实现上一步分组统计的函数:其中x
为输入数据框mydata
,y
为要统计的变量,即area
或者weight
:
myfun <- function(x,y) {
x %>% group_by(city) %>%
summarise(perc = sum(ifelse(dim == 1, y, 0)/sum(y)))
}
myfun(mydata, mydata$area)
# 然而结果错误:
# `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
# city perc
# <chr> <dbl>
# 1 Beijing 0.0641
# 2 Kunming 0.0256
# 3 Shanghai 0.0385
# 于是尝试这么搞:
myfun(mydata, "area")
# 然而提示运行错误:
# Error: Problem with `summarise()` input `perc`.
# x invalid 'type' (character) of argument
# i Input `perc` is `sum(ifelse(dim == 1, y, 0)/sum(y))`.
# i The error occurred in group 1: city = "Beijing".
# Run `rlang::last_error()` to see where the error occurred.