问题
有一个数据框:
tbl <- structure(list(STN = c(402550L, 402550L, 402550L, 402550L, 402550L,
402550L), WBAN = c(99999L, 99999L, 99999L, 99999L, 99999L, 99999L
), YEARMODA = c(20200105L, 20200107L, 20200108L, 20200110L, 20200112L,
20200113L), TEMP = c(48.2, 50.6, 47.3, 44.7, 49.9, 51.4), DEWP = c(4L,
4L, 4L, 5L, 4L, 4L), SLP = c(9999.9, 9999.9, 9999.9, 9999.9,
9999.9, 9999.9), STP = c(0L, 0L, 0L, 0L, 0L, 0L), VISIB = c(9999.9,
9999.9, 9999.9, 9999.9, 9999.9, 9999.9), WDSP = c(0L, 0L, 0L,
0L, 0L, 0L), MXSPD = c(9999.9, 9999.9, 9999.9, 9999.9, 9999.9,
9999.9), GUST = c(0L, 0L, 0L, 0L, 0L, 0L), MAX = c(999.9, 999.9,
999.9, 999.9, 999.9, 999.9), MIN = c(0L, 0L, 0L, 0L, 0L, 0L),
PRCP = c(3.7, 3.2, 4.8, 1.8, 3.7, 4), SNDP = c(4L, 4L, 4L,
5L, 4L, 4L), FRSHTT = c(5.1, 4.1, 6, 4.1, 5.1, 6), V17 = c(999.9,
999.9, 999.9, 999.9, 999.9, 999.9), V18 = c("50.7", "58.1",
"52.3", "47.8*", "52.5*", "54.3*"), V19 = c("47.1*", "47.1*",
"45.1*", "43.0*", "47.7", "49.6"), V20 = c("0.24E", "0.12E",
"0.43E", "0.00I", "0.00I", "0.04E"), V21 = c(999.9, 999.9,
999.9, 999.9, 999.9, 999.9), V22 = c(0L, 0L, 0L, 0L, 0L,
0L)), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
多出来了六列,是因为源数据(.op文件,使用data.table::fread()导入)中的某些数字间出现空格导致的。原本应该是16列,在上述代码中,第4、5列应该为一列,名为TEMP,6、7列为一列,名为DEWP……依次类推。除了999.9外,所有本列为浮点数的列与为整数的后一列应该为一列。如何快速地见列还原?
自己的解决思路
使用tidyr::unite()函数解决:
tbl %>%
unite('TEMP', TEMP:DEWP, sep = '') %>%
unite('DEWP', SLP:STP, sep = '') %>%
unite('SLP', VISIB:WDSP, sep = '') %>%
...
这样会丢失列。
tbl %>%
unite('TEMP', TEMP:DEWP, sep = '', remove = FALSE) %>%
unite('DEWP', SLP:STP, sep = '', remove = FALSE) %>%
unite('SLP', VISIB:WDSP, sep = '', remove = FALSE) %>%
...
这样虽然不会丢失列,但是会使列的顺序错乱。
提问
1,有无更加简便的方法?
2,能否从问题的根源也就是读取数据的时候解决?