低维情况下, 观测数量远大于维度并因此导致训练集非常大时, 我想是不是可以通过多次抽样(每次大约可以抽75%的观测), 分别训练, 最后平均预测结果的方式来逼近一次性训练整个训练集的精度? 因为要一次训练完整的训练集, 内存真心不够, 方法的实现和内存大小上已经没有优化空间了.
除此之外, 各位还有什么好方法推荐呢?
低维情况下, 观测数量远大于维度并因此导致训练集非常大时, 我想是不是可以通过多次抽样(每次大约可以抽75%的观测), 分别训练, 最后平均预测结果的方式来逼近一次性训练整个训练集的精度? 因为要一次训练完整的训练集, 内存真心不够, 方法的实现和内存大小上已经没有优化空间了.
除此之外, 各位还有什么好方法推荐呢?
Another option is to try a series of subsample sizes, then extrapolate to the total sample size.
回复 第2楼 的 Dexim Corp deleveled proC mixeD:Hmm that sounds reasonable ... Eh, is there any specific literature or research I could refer to? [s:18]
It's generally a regression task.
回复 第3楼 的 nan.xiao:I don't know if there are any papers specifically discussing this. It's somewhat similar to the SIMEX in measurement error models. I'd been doing something similar a few year ago, but haven't published anything on that topic... The difficulty lies in choosing the regression function, and the large variability of the resulting extrapolated estimates. However, I still believe that this deserves a try.
回复 第4楼 的 micro@:Thanks for the hint. I will take a shot.