请教一个很有用的问题

huadeng

一般来说察看函数的原代码用edit命令就可，但下面的情况怎办：

>edit(normalTest)

function(x, ...)

UseMethod("normalTest")

> dumpMethod("normalTest")

Method specification written to file "normalTest.default.q"

而"normalTest.default.q"显示为：

setDefaultMethod("normalTest",

function(x, ...)

UseMethod("normalTest")

)

其实还是没看见其定义的方法，我也不知这方法对不，希望各位高手给与帮助。谢谢

yihui

源代码恐怕不是给Windows用户看的，如果你用Linux，应该就会很容易看到

在那些zip包中，并不方便看源代码，你下载相应的*.tar.gz包之后就会看到在每个包的R文件夹下有一些文件名为*.r的文件，那些就是真正的源代码了。

比如，进入http://cran.cnr.berkeley.edu/src/contrib/PACKAGES.html中的某一个包，会看到

Package source: *.tar.gz

fu_neng

.cor()等内置函数代码怎样查看到? 比如查它所用的公式?

yihui

上面不是说了么，下载源码包才能看，Windows binary一般是看不到的。

yihui

cor()函数不是在contrib pkg中，你得下载base source pkg

huadeng

谢老师,我问的是在splus的确情况下如何查看

huadeng

谢老师,我问的是在splus的确情况下如何查看

januslian

我记得有个函数getS3method可以查看成员函数的具体定义。但是不知道在Splus是不是这样的

sociology

[quote]引用第0楼huadeng于2006-11-18 09:55发表的“请教一个很有用的问题”:

一般来说察看函数的原代码用edit命令就可，但下面的情况怎办：

>edit(normalTest)

function(x, ...)

UseMethod("normalTest")

> dumpMethod("normalTest")

.......[/quote]

察看源代码的方法取决于具体的环境，以下是一个例子：

t.test

methods("t.test")

getS3method("t.test","default")

在我blog上面有一些总结http://sociology.yculblog.com/post.882537.html

另外，最详细的文章请见Rnew 2006，vol 4的R Help Desk

areg

你好，在线啊？

Statistics with R

Vincent Zoonekynd

<zoonek@math.jussieu.fr>

28th August 2005

我刚才去看的，把R程序包（UNIX）的解压了，没有找到正态性检验函数，我把查到的相关信息和参考示例给发上来：

第449/1085页

7.2.3 Gaussian distribution

This is the famous “bell-shaped” distribution.

More precisely, the central limit theorem states that if X1, X2, ... X3 are independant

identically distributed random varaibles with expectation m avd variance sˆ2, then

(X1+X2+...+Xn) - nm

---------------------

sqrt(n) s

converges in law to a gaussian distribution when n tends to infinity.

in other words, the empirical means

X1+...Xn

--------

n

is “close to” a gaussian distribution of expectation m and standard deviation s/sqrt(n).

This explains the omnipresence of the gaussian law: when you repeat an experiment a large

number of times, the average result (almost) follows a gaussian distribution.

limite.centrale <- function (r=runif, m=.5, s=1/sqrt(12), n=c(1,3,10,30), for (i in n) { x <- matrix(r(i*N),nc=i)

x <- ( apply(x, 1, sum) - i*m )/(sqrt(i)*s)

hist(x, col=’light blue’, probability=T, main=paste("n =",i),

ylim=c(0,max(.4, density(x)$y)))

lines(density(x), col=’red’, lwd=3)

curve(dnorm(x), col=’blue’, lwd=3, lty=3, add=T)

if( N>100 ) { rug(sample(x,100))

} else { rug(x)

}

}

}

op <- par(mfrow=c(2,2))

limite.centrale()

par(op)

op <- par(mfrow=c(2,2))

limite.centrale(rexp, m=1, s=1)

par(op)

op <- par(mfrow=c(2,2))

limite.centrale(rexp, m=1, s=1)

par(op)

op <- par(mfrow=c(2,2))

limite.centrale(function (n) { rnorm(n, sample(c(-3,3),n,replace=m=0, s=sqrt(10), n=c(1,2,3,10))

par(op)

Exercise: dra a plot, similar to the one above, but with the theoretical probability densities.

For the formula-savvy reader, the gaussian probability density function is

f(x) = exp( -x^2/2 ) / sqrt( 2 pi )

In R:

curve(dnorm(x), xlim=c(-3,3), col=’red’, lwd=3)

title(main=’Gaussian Probability Distribution Function’)

curve(pnorm(x), xlim=c(-3,3), col=’red’, lwd=3)

title(main=’Cumulative gaussian distribution function’)

curve(qnorm(x), xlim=c(0,1), col=’red’, lwd=3)

title(main=’Gaussian quantiles function’)

n <- 1000

x <- rnorm(n)

hist(x, probability=T, col=’light blue’, main=’Gaussian Distribution’)

lines(density(x), col=’red’, lwd=3)

curve(dnorm(x), add=T, col=’red’, lty=2, lwd=3)

legend(par(’usr’)[2], par(’usr’)[4], xjust=1,

c(’sample density’, ’theoretical density’),

lwd=2, lty=c(1,2),

col=’red’)

In the discussion above, we had assumed the mean was mu=0 and the standard deviation

sigma=1 (the “standard gaussian” distribution): to get the general case, we just apply an

affine transformation:

f(x) = exp( -( (x-mu) / sigma )^2 /2 ) / sqrt( 2 pi sigma )

The gaussian distribution is sometimes called the “normal” distribution – I shall try to avoid

this word, because in some situations, the distribution we would like to observe (the one we

would like to call “normal”) is not the gaussian one.

########################################################################

上面是statistics for R中的材料

下面是从source得到的资料（UNIX）安装包解压得的

############# t检验代码###########

t.test <- function(x, ...) UseMethod("t.test")

t.test.default <-

function(x, y = NULL, alternative = c("two.sided", "less", "greater"),

mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,

...)

{

alternative <- match.arg(alternative)

if(!missing(mu) && (length(mu) != 1 || is.na(mu)))

stop("'mu' must be a single number")

if(!missing(conf.level) &&

(length(conf.level) != 1 || !is.finite(conf.level) ||

conf.level < 0 || conf.level > 1))

stop("'conf.level' must be a single number between 0 and 1")

if( !is.null(y) ) {

dname <- paste(deparse(substitute(x)),"and",

deparse(substitute(y)))

if(paired)

xok <- yok <- complete.cases(x,y)

else {

yok <- !is.na(y)

xok <- !is.na(x)

}

y <- y[yok]

}

else {

dname <- deparse(substitute(x))

if( paired ) stop("'y' is missing for paired test")

xok <- !is.na(x)

yok <- NULL

}

x <- x[xok]

if( paired ) {

x <- x-y

y <- NULL

}

nx <- length(x)

if(nx < 2) stop("not enough 'x' observations")

mx <- mean(x)

vx <- var(x)

estimate <- mx

if(is.null(y)) {

df <- nx-1

stderr <- sqrt(vx/nx)

if(stderr < 10 *.Machine$double.eps * abs(mx))

stop("data are essentially constant")

tstat <- (mx-mu)/stderr

method <- ifelse(paired,"Paired t-test","One Sample t-test")

names(estimate) <- ifelse(paired,"mean of the differences","mean of x")

} else {

ny <- length(y)

if(ny < 2) stop("not enough 'y' observations")

my <- mean(y)

vy <- var(y)

method <- paste(if(!var.equal)"Welch", "Two Sample t-test")

estimate <- c(mx,my)

names(estimate) <- c("mean of x","mean of y")

if(var.equal) {

df <- nx+ny-2

v <- ((nx-1)*vx + (ny-1)*vy)/df

stderr <- sqrt(v*(1/nx+1/ny))

} else {

stderrx <- sqrt(vx/nx)

stderry <- sqrt(vy/ny)

stderr <- sqrt(stderrx^2 + stderry^2)

df <- stderr^4/(stderrx^4/(nx-1) + stderry^4/(ny-1))

}

if(stderr < 10 *.Machine$double.eps * max(abs(mx), abs(my)))

stop("data are essentially constant")

tstat <- (mx - my - mu)/stderr

}

if (alternative == "less") {

pval <- pt(tstat, df)

cint <- c(-Inf, tstat + qt(conf.level, df) )

}

else if (alternative == "greater") {

pval <- pt(tstat, df, lower = FALSE)

cint <- c(tstat - qt(conf.level, df), Inf)

}

else {

pval <- 2 * pt(-abs(tstat), df)

alpha <- 1 - conf.level

cint <- qt(1 - alpha/2, df)

cint <- tstat + c(-cint, cint)

}

cint <- mu + cint * stderr

names(tstat) <- "t"

names(df) <- "df"

names(mu) <- if(paired || !is.null(y)) "difference in means" else "mean"

attr(cint,"conf.level") <- conf.level

rval <- list(statistic = tstat, parameter = df, p.value = pval,

conf.int=cint, estimate=estimate, null.value = mu,

alternative=alternative,

method=method, data.name=dname)

class(rval) <- "htest"

return(rval)

}

t.test.formula <-

function(formula, data, subset, na.action, ...)

{

if(missing(formula)

|| (length(formula) != 3)

|| (length(attr(terms(formula[-2]), "term.labels")) != 1))

stop("'formula' missing or incorrect")

m <- match.call(expand.dots = FALSE)

if(is.matrix(eval(m$data, parent.frame())))

m$data <- as.data.frame(data)

m[[1]] <- as.name("model.frame")

m$... <- NULL

mf <- eval(m, parent.frame())

DNAME <- paste(names(mf), collapse = " by ")

names(mf) <- NULL

response <- attr(attr(mf, "terms"), "response")

g <- factor(mf[[-response]])

if(nlevels(g) != 2)

stop("grouping factor must have exactly 2 levels")

DATA <- split(mf[[response]], g)

names(DATA) <- c("x", "y")

y <- do.call("t.test", c(DATA, list(...)))

y$data.name <- DNAME

if(length(y$estimate) == 2)

names(y$estimate) <- paste("mean in group", levels(g))

y

}

areg

[quote]引用第8楼sociology于2006-11-20 22:51发表的“”:

察看源代码的方法取决于具体的环境，以下是一个例子：

t.test

methods("t.test")

.......[/quote]

sociology

你好啊，刚才我也是去查看了半天，觉得t test比较典型，就举它为例，结果一提交，发现你先举了啊

areg

是不是把正态性检验放在密度函数中了？请看下面：来自source(UNIX)解压包

###############################

density <- function(x, ...) UseMethod("density")

density.default <-

function(x, bw = "nrd0", adjust = 1,

kernel = c("gaussian", "epanechnikov", "rectangular",

"triangular", "biweight", "cosine", "optcosine"),

weights = NULL, window = kernel, width,

give.Rkern = FALSE,

n = 512, from, to, cut = 3, na.rm = FALSE, ...)

{

if(length(list(...)) > 0)

warning("non-matched further arguments are disregarded")

if(!missing(window) && missing(kernel))

kernel <- window

kernel <- match.arg(kernel)

if(give.Rkern)

##-- sigma(K) * R(K), the scale invariant canonical bandwidth:

return(switch(kernel,

gaussian = 1/(2*sqrt(pi)),

rectangular = sqrt(3)/6,

triangular = sqrt(6)/9,

epanechnikov= 3/(5*sqrt(5)),

biweight = 5*sqrt(7)/49,

cosine = 3/4*sqrt(1/3 - 2/pi^2),

optcosine = sqrt(1-8/pi^2)*pi^2/16

))

if (!is.numeric(x))

stop("argument 'x' must be numeric")

name <- deparse(substitute(x))

x <- as.vector(x)

x.na <- is.na(x)

if (any(x.na)) {

if (na.rm) x <- x[!x.na]

else stop("'x' contains missing values")

}

N <- nx <- length(x)

x.finite <- is.finite(x)

if(any(!x.finite)) {

x <- x[x.finite]

nx <- length(x) # == sum(x.finite)

}

## Handle 'weights'

if(is.null(weights)) {

weights <- rep.int(1/nx, nx)

totMass <- nx/N

}

else {

if(length(weights) != N)

stop("'x' and 'weights' have unequal length")

if(!all(is.finite(weights)))

stop("'weights' must all be finite")

if(any(weights < 0))

stop("'weights' must not be negative")

wsum <- sum(weights)

if(any(!x.finite)) {

weights <- weights[x.finite]

totMass <- sum(weights) / wsum

} else totMass <- 1

## No error, since user may have wanted "sub-density"

if (!isTRUE(all.equal(1, wsum)))

warning("sum(weights) != 1 -- will not get true density")

}

n.user <- n

n <- max(n, 512)

if (n > 512) n <- 2^ceiling(log2(n)) #- to be fast with FFT

if (missing(bw) && !missing(width)) {

if(is.numeric(width)) {

## S has width equal to the length of the support of the kernel

## except for the gaussian where it is 4 * sd.

## R has bw a multiple of the sd.

fac <- switch(kernel,

gaussian = 4,

rectangular = 2*sqrt(3),

triangular = 2 * sqrt(6),

epanechnikov = 2 * sqrt(5),

biweight = 2 * sqrt(7),

cosine = 2/sqrt(1/3 - 2/pi^2),

optcosine = 2/sqrt(1-8/pi^2)

)

bw <- width / fac

}

if(is.character(width)) bw <- width

}

if (is.character(bw)) {

if(nx < 2)

stop("need at least 2 points to select a bandwidth automatically")

bw <- switch(tolower(bw),

nrd0 = bw.nrd0(x),

nrd = bw.nrd(x),

ucv = bw.ucv(x),

bcv = bw.bcv(x),

sj = , "sj-ste" = bw.SJ(x, method="ste"),

"sj-dpi" = bw.SJ(x, method="dpi"),

stop("unknown bandwidth rule"))

}

if (!is.finite(bw)) stop("non-finite 'bw'")

bw <- adjust * bw

if (bw <= 0) stop("'bw' is not positive.")

if (missing(from))

from <- min(x) - cut * bw

if (missing(to))

to <- max(x) + cut * bw

if (!is.finite(from)) stop("non-finite 'from'")

if (!is.finite(to)) stop("non-finite 'to'")

lo <- from - 4 * bw

up <- to + 4 * bw

y <- .C("massdist",

x = as.double(x),

xmass = as.double(weights),

nx = nx,

xlo = as.double(lo),

xhi = as.double(up),

y = double(2 * n),

ny = as.integer(n),

PACKAGE = "base" )$y * totMass

kords <- seq(0, 2*(up-lo), length = 2 * n)

kords[(n + 2):(2 * n)] <- -kords[n:2]

kords <- switch(kernel,

gaussian = dnorm(kords, sd = bw),

## In the following, a := bw / sigma(K0), where

## K0() is the unscaled kernel below

rectangular = {

a <- bw*sqrt(3)

ifelse(abs(kords) < a, .5/a, 0) },

triangular = {

a <- bw*sqrt(6) ; ax <- abs(kords)

ifelse(ax < a, (1 - ax/a)/a, 0) },

epanechnikov = {

a <- bw*sqrt(5) ; ax <- abs(kords)

ifelse(ax < a, 3/4*(1 - (ax/a)^2)/a, 0) },

biweight = { ## aka quartic

a <- bw*sqrt(7) ; ax <- abs(kords)

ifelse(ax < a, 15/16*(1 - (ax/a)^2)^2/a, 0) },

cosine = {

a <- bw/sqrt(1/3 - 2/pi^2)

ifelse(abs(kords) < a, (1+cos(pi*kords/a))/(2*a),0)},

optcosine = {

a <- bw/sqrt(1-8/pi^2)

ifelse(abs(kords) < a, pi/4*cos(pi*kords/(2*a))/a, 0)}

)

kords <- fft( fft(y)* Conj(fft(kords)), inv=TRUE)

kords <- Re(kords)[1:n]/length(y)

xords <- seq(lo, up, length = n)

# keep <- (xords >= from) & (xords <= to)

x <- seq(from, to, length = n.user)

structure(list(x = x, y = approx(xords, kords, x)$y, bw = bw, n = N,

call=match.call(), data.name=name, has.na = FALSE),

class="density")

}

plot.density <- function(x, main=NULL, xlab=NULL, ylab="Density", type="l",

zero.line = TRUE, ...)

{

if(is.null(xlab))

xlab <- paste("N =", x$n, " Bandwidth =", formatC(x$bw))

if(is.null(main)) main <- deparse(x$call)

plot.default(x, main=main, xlab=xlab, ylab=ylab, type=type, ...)

if(zero.line) abline(h=0, lwd=0.1, col = "gray")

invisible(NULL)

}

print.density <- function(x, digits=NULL, ...)

{

cat("\nCall:\n\t",deparse(x$call),

"\n\nData: ",x$data.name," (",x$n," obs.);",

"\tBandwidth 'bw' = ",formatC(x$bw,digits=digits), "\n\n",sep="")

print(summary(as.data.frame(x[c("x","y")])), digits=digits, ...)

invisible(x)

}

yihui

[quote]引用第6楼huadeng于2006-11-20 21:38发表的“”:

谢老师,我问的是在splus的确情况下如何查看[/quote]

很久没用S-Plus了，也不打算用了……你看Object Explorer能不能看到吧。S-Plus可没说“Open Source”哦！

fu_neng

[quote]引用第12楼谢益辉于2006-11-21 12:40发表的“”:

很久没用S-Plus了，也不打算用了……你看Object Explorer能不能看到吧。S-Plus可没说“Open Source”哦！[/quote]

是啊, R比它强大多了,并且是在s基础上产生的精华, 何必还用s

sociology

[quote]引用第13楼fu_neng于2006-11-21 17:49发表的“”:

是啊, R比它强大多了,并且是在s基础上产生的精华, 何必还用s[/quote]

各有各的优势吧。比如处理大数据，splus比R好。

但是，在作图方面，R应该比Splus好，特别是在grid完善后：）

fu_neng

stats包里的cor()函数, 用下载linux源码的方法看cor.r文件,里面仍是调用

.Internal(cor(x2, y2, na.method, method == "kendall"))

用methods(cor),getS3method("cor","deafault")

等方法说找不到此cor()函数, 看来是禁止我们看它的源代码了.

yihui

肯定不是这样，既然R是Open Source的，肯定不会禁止查看源代码。我也不知道你怎么对这个问题这么感兴趣：）

cor的源文件恐怕在stats的包中不能直接找到，因为它涉及到一些内置函数，都是用C语言编写的，所以你要追根溯源的话，得去看\src\main下面的那些*.c文件，因为我也没有专门去找过cor的源代码，所以不确定是不是在cov.c那个文件中（按理说是）。

areg

S-PLUS用起来很方便，R很强大，很灵活，都是S语言为基础的，总想试试揭开它的机制，说白了，想自己试着破解一个S-PLUS的新版本长期用，毕竟试用期会结束的！

januslian

[quote]引用第15楼fu_neng于2006-11-21 18:15发表的“”:

stats包里的cor()函数, 用下载linux源码的方法看cor.r文件,里面仍是调用

.Internal(cor(x2, y2, na.method, method == "kendall"))

用methods(cor),getS3method("cor","deafault")

等方法说找不到此cor()函数, 看来是禁止我们看它的源代码了.[/quote]

R Internal 文档中有关于.Internal和.Primitive的论述。

不妨用SubVersion，checkout一个分支看看里边的源代码。

BTW：破解S+估计用不到这么多东西吧，破解GAUSS或者SAS的人不见得就懂它的内部机制啊。

fu_neng

用cor(y,y') 求到的与模型返回的R^2的平方根的值不一样，想看cor（）用到的公式究竟和决定系数有什么不同