感谢各位高手鼎力相助。不过今天一不小心发现base里有个函数就是专门干这事的[s:12]。
findInterval:
Find the indices of x in vec, where vec must be sorted (non-decreasingly); i.e., if i <- findInterval(x,v), we have v[i[j]] ≤ x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := + Inf, and N <- length(vec).
而且难能可贵的是,此函数既可以允许x未排序,也可以允许x重复,而且速度还无比快。
<br />
set.seed(65535)<br />
y <- 1:50000<br />
x <- sample(y,1000,replace=T) + 0.5<br />
t1 <- system.time(re <- sapply(x, function(x,y){y <- y[y <= x]; tail(y,1)}, y))<br />
t3 <- system.time(re3 <- y[findInterval(x,y)])</p>
<p>#y <- c(1,2,5,6,7,9)<br />
#x <- c(1,1,2,4,4,6,8)<br />
pt <- proc.time()<br />
rk <- rank(x)<br />
x <- sort(x)<br />
uq <- unique(x)<br />
z <- split(y, cut(y, c(-Inf,uq), include.lowest=T))<br />
re2 <- numeric(length(uq))<br />
for (i in 1:length(uq)) {<br />
re2[i] <- ifelse (length(z[[i]])== 0, re2[i-1], max(z[[i]]))<br />
}<br />
re2 <- rep(re2,rle(x)$len)<br />
re2 <- re2[rk]<br />
t2 <- proc.time() - pt</p>
<p>> t1<br />
user system elapsed<br />
2.11 0.00 2.11<br />
> t2<br />
user system elapsed<br />
0.14 0.01 0.30<br />
> t3<br />
user system elapsed<br />
0 0 0<br />
> all.equal(re,re2)<br />
[1] TRUE<br />
> all.equal(re,re3)<br />
[1] TRUE</p>
<p>
</p>
数据换大点的:
</p>
<p>set.seed(65535)<br />
y <- 1:500000<br />
x <- sample(y,1000,replace=T) + 0.5<br />
> t1<br />
user system elapsed<br />
30.65 1.46 32.78<br />
> t2<br />
user system elapsed<br />
0.64 0.01 0.81<br />
> t3<br />
user system elapsed<br />
0.01 0.00 0.01<br />
> all.equal(re,re2)<br />
[1] TRUE<br />
> all.equal(re,re3)<br />
[1] TRUE<br />
</p>
当然,再次感谢各位的热心讨论。本人受益匪浅!