混迹此论坛的大部分朋友应该是R爱好者吧。谈及R的好处,我想extremely flexible应该是不得不提的一点吧。但是很不幸的是对于一个解释型语言来说,语法越是flexible,就越是容易犯错误,而且有些错误可能会让你欲哭无泪。我想为了防止更多人去犯那些个错误,大家来畅谈一下R里面容易犯错误的地方吧。我就先来个抛砖引玉:
1. 最经典的错误应该是参数传递吧,可爱又可恨的"...".比如说:
a. seq(1,10,step=2) 以为这样可以产生1,3, ... 9。被VB毒害了。
b. cat(data,fileName) 我有一次可是被坑爹了,花了一晚上用R做的一个simulation,想吧运行出来的结果data写入文件fileName里面。一时糊涂犯下这个惨剧。
c. 比较隐蔽的全局变量和局部变量问题:
A really tricky problem about is global and local variables. You can use any global
variable in a user-defined function, which is true at least in R with version 2.13 or
older. For example if you have defined a global variable a, you can use variable a
directly in user-defined functions. However, you cannot use =, <- or -> to mutate
global variables. Whenever you use these three operators, R thinks that you’re either
creating a new variable (if the variable is not defined before), or you’re updating the
value of an existing variable (if the variable already exists). For example, suppose there’s still a global variable a in the R workspace. You can use variable a directly in user-defined functions, but if you use command a=2 (or a<-2 or 2->a) in the user-defined
function, R won’t think that you’re updating the value of the global variable, instead it thinks that you’re creating a new local variable a. For this reason, if a global variable a exists while there’s no local variable a in a user-defined function, error message will be shown if you use command a[1]=1 (or other equivalent ways) and try to run the function, because by doing this you’re telling R that you want to update the value of the first element of local variable a but local variable a doesn’t exist. If you really want to update the value of (or create) a global variable in a user-defined function, you should use operator ->> or <<-. For example, if you want to update the value of global variable a, you can use a<<-2 (or 2->>a). Now you have seen that global variables can screw the execution result of your own defined function easily. The situation can become even trickier and thus harder to debug, if you define a local variable in a branch (e.g. a branch of command if()...else... statement) of the code while forget to define the same local variable in other branches and at the same time a global variable with the same name exists. My suggestion is that you first remove all global variables in the R workspace if there’s any, and then run your program. In this way, you can avoid confusion resulting from global variables.
d. 由于优先级引发的错误,:优先于+,-,×,/等,而[]优先于:。我曾经写一个函数是要产生一个从1到n-1的向量,于是乎用来1:n-1,坑爹~
e. 函数混淆错误,比如说想知道一个字符串的长度,不假思索地就用了函数length.如果是写函数的,八九成酿成惨剧。
f. 一个虚无缥缈的错误:我印象里面使用函数median求中位数的时候好像出现过两个中位数的情况(也就是偶数个值的时候,返回了中间两个数值),但是后来怎么也无法重现这个错误?或许它只在我的幻想里面出现过?不知道大家有没有碰到过?
g. 向量逻辑运算时使用了&&,||
h. apply 作用到data frame 的行上面导致数据类型被转换。
...
等待大伙接龙。