R包data.table的帮助文档里有这样一段话:
rollends
A logical vector length 2 (a single logical is recycled) indicating whether values falling before the first value or after the last value for a group should be rolled as well.
- If rollends[2]=TRUE, it will roll the last value forward. TRUE by default for LOCF and FALSE for NOCB rolls.
- If rollends[1]=TRUE, it will roll the first value backward. TRUE by default for NOCB and FALSE for LOCF rolls.
当rollends = c(F, T)
和rollends = c(T, F)
时,我能理解它的意思分别为”不滚动向后的第一个值但是滚动向前的最后一个值” 和”滚动向后的第一个值但是不滚动向前的最后一个值” 。但是当rollends = c(T, T)
和rollends = c(F, F)
时,我就无法顺着这个逻辑看懂data.table输出的结果了。
示例如下:
library(data.table)
A <- data.table(idA = c("A1", "A2", "A3"),
DateA = as.Date(c("2023-01-01", "2023-02-23", "2023-05-09")),
value = c(0.2, 0.5, 1.5))
B <- data.table(idB = c("B1", "B2", "B3", "B4", "B5", "B6"),
DateB = as.Date(c("2023-01-31", "2023-02-28", "2023-03-30",
"2023-04-30", "2023-05-31", "2023-06-30")))
A[, RollDT := DateA]
B[, RollDT := DateB]
setkey(A, RollDT)
setkey(B, RollDT)
A
#> idA DateA value RollDT
#> 1: A1 2023-01-01 0.2 2023-01-01
#> 2: A2 2023-02-23 0.5 2023-02-23
#> 3: A3 2023-05-09 1.5 2023-05-09
B
#> idB DateB RollDT
#> 1: B1 2023-01-31 2023-01-31
#> 2: B2 2023-02-28 2023-02-28
#> 3: B3 2023-03-30 2023-03-30
#> 4: B4 2023-04-30 2023-04-30
#> 5: B5 2023-05-31 2023-05-31
#> 6: B6 2023-06-30 2023-06-30
A[B, roll = T, rollends = c(T, T)]
#> idA DateA value RollDT idB DateB
#> 1: A1 2023-01-01 0.2 2023-01-31 B1 2023-01-31
#> 2: A2 2023-02-23 0.5 2023-02-28 B2 2023-02-28
#> 3: A2 2023-02-23 0.5 2023-03-30 B3 2023-03-30
#> 4: A2 2023-02-23 0.5 2023-04-30 B4 2023-04-30
#> 5: A3 2023-05-09 1.5 2023-05-31 B5 2023-05-31
#> 6: A3 2023-05-09 1.5 2023-06-30 B6 2023-06-30
A[B, roll = T, rollends = c(F, F)]
#> idA DateA value RollDT idB DateB
#> 1: A1 2023-01-01 0.2 2023-01-31 B1 2023-01-31
#> 2: A2 2023-02-23 0.5 2023-02-28 B2 2023-02-28
#> 3: A2 2023-02-23 0.5 2023-03-30 B3 2023-03-30
#> 4: A2 2023-02-23 0.5 2023-04-30 B4 2023-04-30
#> 5: <NA> <NA> NA 2023-05-31 B5 2023-05-31
#> 6: <NA> <NA> NA 2023-06-30 B6 2023-06-30
<sup>Created on 2023-03-27 by the reprex package (v2.0.1)</sup>
对于A[B, roll = T, rollends = c(T, T)]
,我的理解是”既滚动向后的第一个值,也滚动向前的最后一个值” 。也就是说,对于表B
中的DateB
的第一个取值"2023-01-31",在表A
中的DateA
寻找早于它的最后一个值,即2023-01-01
,然后也寻找晚于它的第一个值,即2023-02-23
,然后把这两个值相应的表A
中的观测都放在一起,共同对应于"2023-01-31",即结果中应当会连着出现下面这两行:
#> idA DateA value RollDT idB DateB
#> 1: A1 2023-01-01 0.2 2023-01-31 B1 2023-01-31
#> 2: A2 2023-02-23 0.5 2023-01-31 B1 2023-01-31
可事实并非如此,在结果中只出现了一行。
这是为什么呢?该如何理解以上的运行结果呢?谢谢大家!
R环境及版本:
xfun::session_info()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Locale:
#> LC_COLLATE=Chinese (Simplified)_China.936
#> LC_CTYPE=Chinese (Simplified)_China.936
#> LC_MONETARY=Chinese (Simplified)_China.936
#> LC_NUMERIC=C
#> LC_TIME=Chinese (Simplified)_China.936
#>
#> Package version:
#> base64enc_0.1.3 bslib_0.3.1 callr_3.7.0 cli_3.2.0
#> clipr_0.8.0 compiler_4.1.3 crayon_1.5.1 data.table_1.14.2
#> digest_0.6.29 ellipsis_0.3.2 evaluate_0.15 fansi_1.0.3
#> fastmap_1.1.0 fs_1.5.2 glue_1.6.2 graphics_4.1.3
#> grDevices_4.1.3 highr_0.9 htmltools_0.5.2 jquerylib_0.1.4
#> jsonlite_1.8.0 knitr_1.38 lifecycle_1.0.1 magrittr_2.0.3
#> methods_4.1.3 pillar_1.7.0 pkgconfig_2.0.3 processx_3.5.2
#> ps_1.6.0 purrr_0.3.4 R.cache_0.15.0 R.methodsS3_1.8.1
#> R.oo_1.24.0 R.utils_2.11.0 R6_2.5.1 rappdirs_0.3.3
#> rematch2_2.1.2 reprex_2.0.1 rlang_1.0.2 rmarkdown_2.13
#> rprojroot_2.0.3 rstudioapi_0.13 sass_0.4.1 stats_4.1.3
#> stringi_1.7.6 stringr_1.4.0 styler_1.7.0 tibble_3.1.6
#> tinytex_0.38 tools_4.1.3 utf8_1.2.2 utils_4.1.3
#> vctrs_0.4.1 withr_2.5.0 xfun_0.30 yaml_2.3.5
<sup>Created on 2023-03-27 by the reprex package (v2.0.1)</sup>