假设有一份有缺失的多年追踪调查的数据,已经全部读取到 1 个 list 里。其中每个 dataframe 中都包含了受访人id、地区area、某变量值var_1、年份year,但是每年的地区不完全相同,怎样比较方便地求出每个地区最晚一年的年份、受访人id、变量值?
示例数据如下:
x <- list('year_2014' = data.frame(id = c(41,42,43,44,45,46),
area = c("A", "A", "B", "B", "C", "C"),
var_1 = c(11, 13, 12, 14, 17, 17)),
'year_2015' = data.frame(id = c(51, 52, 53, 54, 55, 56),
area = c("A", "A", "B", "B", "D", "D"),
var_1 = c(10, 13, 13, 15, 25, 27)),
'year_2016' = data.frame(id = c(61, 62, 63, 64, 65, 66),
area = c("A", "A", "E", "E", "F", "F"),
var_1 = c(12, 12, 25, 23, 7, 8))
)
x <- mapply(cbind, x, 'year' = str_remove_all(names(x), 'year_'), SIMPLIFY = F)
期望结果:
| area | year | id | var |
| ---- | ---- | ---- | ---- |
| A | 2016 | 61 | 12 |
| A | 2016 | 62 | 12 |
| B | 2015 | 53 | 13 |
| B | 2015 | 54 | 15 |
| C | 2014 | 45 | 17 |
| C | 2014 | 46 | 17 |
| D | 2015 | 55 | 25 |
| D | 2015 | 56 | 27 |
| E | 2016 | 63 | 25 |
| E | 2016 | 64 | 23 |
| F | 2016 | 65 | 7 |
| F | 2016 | 66 | 8 |
之前很少接触 list(list 批量读取结构类似的多份数据时好干净利落啊)。比起先把 list 中所有 dataframe 合并到一个 dataframe 里再计算,请问有没有更好的建议?