- 已编辑
1、data.table
包的tstrsplit
函数是个好东西,按;
分割时,长度不够的会用NA
填充;
2、用melt()
来个宽转长,对应baseR的reshape()
library("data.table")
x<-data.frame(year=c(2019,2019,2019,2020,2020),
c1=c('a;b','c;d;e','f;g','h;i;j;k','l;m'))
setDT(x)
x[,tstrsplit(c1,";")]
#> V1 V2 V3 V4
#> 1: a b <NA> <NA>
#> 2: c d e <NA>
#> 3: f g <NA> <NA>
#> 4: h i j k
#> 5: l m <NA> <NA>
y<-x[,tstrsplit(c1,";")][,year:=x$year]
melt(y,id.vars="year",na.rm=TRUE)[order(year,value),.(c1=value,year)]
#> c1 year
#> 1: a 2019
#> 2: b 2019
#> 3: c 2019
#> 4: d 2019
#> 5: e 2019
#> 6: f 2019
#> 7: g 2019
#> 8: h 2020
#> 9: i 2020
#> 10: j 2020
#> 11: k 2020
#> 12: l 2020
#> 13: m 2020
<sup>Created on 2022-05-02 by the reprex package (v2.0.1)</sup>