本书第三章做垃圾邮件识别的代码,出现了问题,请教高人看看除了什么问题。
</p>
<p>> spam.path <- file.path("data", "spam")<br />
> get.msg <- function(path)<br />
+ {<br />
+ con <- file(path, open = "rt", encoding = "latin1")<br />
+ text <- readLines(con)<br />
+ # The message always begins after the first full line break<br />
+ msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]<br />
+ close(con)<br />
+ return(paste(msg, collapse = "\n"))<br />
+ }<br />
> spam.docs <- dir(spam.path)<br />
> spam.docs <- spam.docs[which(spam.docs != "cmds")]<br />
> all.spam <- sapply(spam.docs,<br />
+ function(p) get.msg(file.path(spam.path, p)))<br />
错误于seq.default(which(text == "")[1] + 1, length(text), 1) :<br />
seq(.)里的(to - from)/by 有错<br />
此外: 警告信息:<br />
1: In readLines(con) :<br />
输入链结'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'内的输入不对<br />
2: In readLines(con) :<br />
输入链结'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'内的输入不对<br />
3: In readLines(con) :<br />
输入链结'data/spam/00035.7ce3307b56dd90453027a6630179282e'内的输入不对<br />
4: In readLines(con) :<br />
读'data/spam/00035.7ce3307b56dd90453027a6630179282e'时最后一行未遂<br />
文件的路径应该不是问题。错误说是seq里的东西错了,我也没发现是哪里出了错。请高人指点。
</p>