chaoyuan_101 假如两列都是相同的数据类型,都是字符,如果两列中存在a,b或者b,a就当做重复数据,或者两列的数据相同都是 x,x时,怎么去除重复数据呢?之前的帖子中的例子,我看了,也实际操作了一下,然而并不行。 index1<-duplicated(a1[,1]) index2<-duplicated(a1[,2]) index=index1&index2 a2=a1[!index,] a2 first last 1 q e 2 a b 3 o e 4 b a 5 c h 6 x x 请问下可以怎么解决呢?
tctcab chaoyuan_101 按照你的描述来看,对“重复数据”的定义有点复杂,不适合直接套用duplicated或者unique 。 试试这个: library(data.table) library(dplyr) # read data a= " id first last 1 q e 2 a b 3 o e 4 b a 5 c h 6 x x " tmp <- fread(a) # construct new column idx tmp2 <- tmp %>% # filter x == x filter(first != last) %>% # construct new column idx: ab, ba will have the same idx based on the alphabet order mutate(idx = ifelse(first > last, paste0(first,last),paste0(last,first))) %>% # filter out duplicated rows based on idx filter(!duplicated(idx)) ## output, row 4 and 6 are removed tmp2 ######### id first last idx 1 1 q e qe 2 2 a b ba 3 3 o e oe 4 5 c h hc