http://www.p2peye.com/forum.php?mod=viewthread&tid=48727 从该网页中提取那五个excel表格。该网页编码是GBK。
我的代码如下:
library(XML)
web1<-htmlParse('http://www.p2peye.com/thread-48727-1-3.html')
web2<-readHTMLTable(web1)
web3<-lapply(web2,function(g) iconv(g,'GBK',"UTF-8"))
web3[[5]]
可是提取的数据完全不是我要的:
> web3[[5]]
[1] "c(4, 12, 5, 1, 16, 18, 17, 14, 6, 1, 19, 2, 13, 7, 1, 9, 10, 8, 1, 15, 21, 11, 20, 1, 3)"
[2] "c(NA, NA, NA, 10, 3, 8, 9, NA, NA, 11, 2, NA, NA, NA, 11, 7, NA, NA, 1, 6, 4, NA, NA, 11, 5)"
[3] "c(NA, NA, NA, 10, 8, 4, 3, NA, NA, 11, 5, NA, NA, NA, 11, 9, NA, NA, 1, 6, 2, NA, NA, 11, 7)"
[4] "c(NA, NA, NA, 11, 9, 6, 5, NA, NA, 4, 3, NA, NA, NA, 4, 2, NA, NA, 1, 10, 8, NA, NA, 4, 7)"
[5] "c(NA, NA, NA, 5, 3, 2, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6, 4)"
这是怎么回事呢?
另外我的sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.98-1.1
loaded via a namespace (and not attached):
[1] tools_3.0.2
在线等大神帮忙!谢谢了