[未知用户]
哥,非常感谢您的帮助。可是我在运行unlist(dm)时,提示"cannot allocate vector of size 595.8Mb",我的内存是4G,也运行了memory.limit(3000)命令,问题依旧。有无良策?
哥,非常感谢您的帮助。可是我在运行unlist(dm)时,提示"cannot allocate vector of size 595.8Mb",我的内存是4G,也运行了memory.limit(3000)命令,问题依旧。有无良策?
#统计数据类型数(低内存占用方案)
all<-unique(dm[[1]])
for(i in 2:length(dm)){
all<-unique(c(all,dm[[i]]))
}
#数据转换函数
trand <- function(dmi){
all%in%dmi
}
#数据转换
ndm<-do.call(rbind,lapply(dm,trand))
colnames(ndm)<-all
tndm<-ndm
while(length(tndm)!=0){
#那列最多?
if(is.null(nrow(tndm))){
print(names(tndm[tndm][1]))
break
}else{
dc<-which.max(matrix(1,1,nrow(tndm))%*%tndm)
}
print(colnames(tndm)[dc])
#按该列删除数据列
tndm<-tndm[tndm[,dc]==FALSE,-dc]
}
> tables()
NAME NROW NCOL MB COLS KEY
[1,] dm_dt 2,199,189 2 17 where,what
Total: 17MB
转换为宽表就是个超大的稀疏矩阵, 内存就装不下了...Using 'what' as value column. Use 'value.var' to override
Aggregate function missing, defaulting to 'length'
Error: cannot allocate vector of size 14.9 Gb
所谓分块运算方法是在硬盘与内存中来回倒腾么? 话说长表这么小, 总感觉直接循环覆盖掉也不是不可以...length <- sample(1:20, 1000000, replace=T)
dm <- vector("list", 1000000)
for (i in 1:1000000){
dm[[i]] <- sample(1:20000, length[i], replace=T)
}
where <- rep(1:length(dm), times=length)
what <- unlist(dm)
where1<-where
what1<-what
while(length(what1)!=0){
#找出最大的元素
max<-which.max(table(what1))
print(names(max))
flag<-!where1%in%unique(where1[what1==names(max)[1]])
where1<-where1[flag]
what1<-what1[flag]
}
keep.ele <- vector()
system.time({
while(length(dm)!=0){
dm.cha <- unlist(dm)
most.ele <- names(which.max(table(dm.cha)))[1]
keep.ele <- c(keep.ele, most.ele)
dm <- dm[-grep(most.ele, dm)]
}
})
dm <- dm[-grep(most.ele, dm)]
ele.idx <- lapply(dm, function(i) most.ele %in% i)
dm <- dm[!unlist(ele.idx)]