frankzhang21 发现一个新问题噢:结果里的 Store-Name 其实是个去过的历史组合,有没有好点的办法把组合里重复的店铺名去重下呢?
例如,假设 A 君特别爱去店铺 x1,数据集改成:
library(data.table)
dt <- data.table(
Customer_Name=c("A","A","C","D","D","B","A","A","A"),
Store_Name=c("x1","x2","x2","x2","x3","x1","x1","x1","x1"))
dt[, .(Store_Name = paste0(Store_Name, collapse = "_")), by = .(Customer_Name)][, .N, by = .(Store_Name)]
结果会变成:
Store_Name N
1: x1_x2_x1_x1_x1 1
2: x2 1
3: x2_x3 1
4: x1 1
而不是:
Store_Name N
1: x1_x2 1
2: x2 1
3: x2_x3 1
4: x1 1
目前我是直接删了数据框里的重复数据行,感觉十分傻……
谢了先 :)