• R语言
  • 从sina finance下载宏观数据的函数

<br />
mac=function(){<br />
  require(XML)<br />
  xpath="//div[@class='tree_wrap']//dl"<br />
  htmlParse('http://finance.sina.com.cn/mac/',encoding='GBK')->text<br />
  getNodeSet(text,xpath)->tree</p>
<p>  mainpath=paste(xpath,'//dt',sep='')<br />
  main=sapply(getNodeSet(text,mainpath),<br />
              function(z)iconv(xmlValue(z),from='UTF-8',to='UTF-8'))<br />
  cat(paste(1:length(main),main,collapse='\n'))<br />
  as.numeric(readline('which one:'))->get1</p>
<p>  subpath=paste(xpath,'[',get1,']/dd',sep='')<br />
  sub=sapply(getNodeSet(text,subpath),<br />
             function(z)iconv(xmlValue(z),from='UTF-8',to='UTF-8'))<br />
  cat(paste(1:length(sub),sub,collapse='\n'))<br />
  as.numeric(readline('which one:'))->get2</p>
<p>  maincode=getNodeSet(text,paste(xpath,'/dt/@param',sep=''))[[get1]]<br />
  maincode=as.character(maincode)<br />
  subcode=get2-1</p>
<p>  length=as.numeric(readline('length:'))<br />
  url=paste('http://money.finance.sina.com.cn/mac/view/vMacExcle.php?',<br />
            'cate=',maincode,'&event=',subcode,'&from=0&num=',<br />
            length,'&condition=',sep='')<br />
  tp=tempfile(fileext='.xls')<br />
  download.file(url,tp,quiet=T)<br />
  read.csv(tp,head=T,skip=2,check=F)->e<br />
  unlink(tp)<br />
  return(e)<br />
}<br />
</p>

回复 第1楼 的 bpdszxpddks:我现在想学网页抓取数据,不知道网页地址,函数参数设置都是什么意思,如:xpath="//div[@class='tree_wrap']//dl",能帮忙指点一下看什么书吗?或者怎么看?非常感谢!

回复 第3楼 的 jcl:http://www.w3school.com.cn/xpath/