[quote]
引用第8楼hexm26于2007-07-30 11:21发表的“”:
哈哈,这个时候SAS的功力可以显现出来了,如果有可能,楼主用SAS试一试,说不定快的让你想象不到。 [/quote][quote]
引用第10楼alexru于2007-08-01 02:48发表的“”:
这时侯可能考虑换个软件试试[/quote]
还是用事实说话吧——在同一台机器上的测试结果:
数据模拟:
<br />
cols=1e6;rows=12<br />
dat=matrix(rnorm(rows*cols),rows,cols)<br />
write.table(dat,"tmp.dat",sep="\t",ro=F)<br />
R code and results for reading 'tmp.dat':
<br />
> system.time({<br />
+ cols=1e6;rows=12<br />
+ header.read=scan("tmp.dat","",nlines=1)<br />
+ dat.read=scan("tmp.dat",skip=1)<br />
+ dim(dat.read)=c(cols,rows)<br />
+ rownames(dat.read)=header.read<br />
+ dat.read=t.default(dat.read) ## this is actually unnecessary<br />
+ })<br />
Read 1000000 items<br />
Read 12000000 items<br />
user system elapsed <br />
38.010 2.030 40.042 <br />
> <br />
> <br />
> system.time({<br />
+ cols=1e6;rows=12<br />
+ header.read=scan("tmp.dat","",nlines=1)<br />
+ dat.read=scan("tmp.dat",skip=1)<br />
+ dim(dat.read)=c(cols,rows)<br />
+ rownames(dat.read)=header.read<br />
+ })<br />
Read 1000000 items<br />
Read 12000000 items<br />
user system elapsed <br />
36.10 1.72 37.81 <br />
<br />
SAS code: read.performance.sas
<br />
data d;<br />
infile 'tmp.dat' firstobs=2 delimiter=' ' RECFM=V LRECL=20000000;<br />
input V1-V1000000;<br />
run; <br />
<br />
*proc import file='tmp.dat' dbms=tab out=d1; ** 这个慢得让人无法忍受,就不运行了;<br />
*run;<br />
Run SAS code:
<br />
% sas -work . -memsize 2g read.performance.sas<br />
%<br />
SAS log的结果:
<br />
NOTE: 12 records were read from the infile 'tmp.dat'.<br />
The minimum record length was 18180739.<br />
The maximum record length was 18182624.<br />
NOTE: The data set WORK.D has 12 observations and 1000000 variables.<br />
NOTE: DATA statement used (Total process time):<br />
real time 2:17.99<br />
cpu time 52.45 seconds<br />
可见,测试结果提示R在这种情况下仍然要快于SAS!