做了个小实验,的确有影响。而且似乎和这个取值是否非零无关,和分类器的原理有关。之前的想法有误。
对于 SVM, 不论单值变量取值是否非 0, 不去掉时,结果相同。去掉后,精度就有下降。看来对于利用了感知机这样想法的分类器,只要有单值的变量,无论取值是多少,都需要注意。
对于树,零或非零,去不去掉,结果都完全一样。
<br />
require(mlbench)<br />
require(caret)</p>
<p>set.seed(1001)<br />
p = mlbench.threenorm(100, 10)<br />
x = p$x<br />
x0 = cbind(x, matrix(0, 100, 5))<br />
x8 = cbind(x, matrix(8, 100, 5))<br />
y = p$classes<br />
levels(y) = c('x1', 'x2')</p>
<p>fitControl = trainControl(method = "repeatedcv",<br />
number = 5,<br />
repeats = 5)</p>
<p>set.seed(1001)<br />
svm_na = train(x, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'svmRadial', scaled = FALSE)</p>
<p># 0.820</p>
<p>set.seed(1001)<br />
svm_0 = train(x0, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'svmRadial', scaled = FALSE)</p>
<p># 0.828</p>
<p>set.seed(1001)<br />
svm_8 = train(x8, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'svmRadial', scaled = FALSE)</p>
<p># 0.828</p>
<p>set.seed(1001)<br />
tree_na = train(x, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'rpart')</p>
<p># 0.602</p>
<p>set.seed(1001)<br />
tree_0 = train(x0, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'rpart')</p>
<p># 0.602</p>
<p>set.seed(1001)<br />
tree_8 = train(x8, y, metric = 'Accuracy', trControl = fitControl,<br />
method = 'rpart')</p>
<p># 0.602<br />
</p>