Barton
有意思的问题,查到了这个答案
不过只是相近,只能提取第一个序列,所以自己改了改, 现在可以多个连续年份的时间段了。
思路跟上面的答案类似,diff()
查出相邻元素的差,差值为1就表示相邻元素连续,然后提取即可。
代码:
mydate = c(1950L, 1950L, 1951L, 1951L, 1951L, 1952L, 1953L, 1954L, 1954L,
1954L, 1954L, 1954L, 1955L, 1955L, 1955L, 1955L, 1955L, 1957L,
1958L, 1959L, 1960L, 1960L, 1960L, 1962L, 1964L, 1967L, 1967L,
1967L, 1968L, 1969L, 1969L, 1970L, 1971L, 1972L, 1972L, 1972L,
1972L, 1972L, 1973L, 1973L, 1974L, 1975L, 1976L, 1977L, 1978L,
1979L, 1980L, 1981L, 1981L, 1982L, 1984L, 1984L, 1985L, 1986L,
1987L, 1987L, 1987L, 1988L, 1988L, 1989L, 1990L, 1990L, 1991L,
1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1994L, 1994L, 1995L,
1995L, 1995L, 1996L, 1996L, 1996L, 1997L, 1997L, 1998L, 1999L,
1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L)
get_start_ed = function(vdate){
vdateuni = unique(sort(vdate)) # combine adjacent years and sort
mydiff = diff(vdateuni) # difference between adjacent elements
myrle = rle(mydiff) # run length encoding
# get sequence start / end
# + 1 to include the next adjacent element
ed = cumsum(myrle$lengths) + 1
start = c(1,ed)[1:length(ed)]
# extract sequences based on start and end.
df = data.frame(s = start, e = ed, v = myrle$values, l = myrle$lengths)
dfseqs = df[which(df$v == 1),]
sequences = lapply(1:nrow(dfseqs), function(i){
return(vdateuni[dfseqs$s[i]: (dfseqs$e[i])])
})
return(sequences)
}
unique(sort(mydate))
#> [1] 1950 1951 1952 1953 1954 1955 1957 1958 1959 1960 1962 1964 1967 1968
#> [15] 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982
#> [29] 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
#> [43] 1998 1999 2000
# results
get_start_ed(mydate)
#> [[1]]
#> [1] 1950 1951 1952 1953 1954 1955
#>
#> [[2]]
#> [1] 1957 1958 1959 1960
#>
#> [[3]]
#> [1] 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
#> [15] 1981 1982
#>
#> [[4]]
#> [1] 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
#> [15] 1998 1999 2000
<sup>Created on 2019-01-03 by the reprex package (v0.2.1)</sup>