S-PLUS中糊涂的"Set Seed for Random Number Generators"

areg

本人非数学专业，在S-PLUS7.0中，看了所有相关的“Seed”帮助文件，还是不明白这个种问题，现在把它最集中介绍的贴上来，哪位给讲讲这个基础知识。

Random Number Seed

Specify an integer between 0 and 1000 to set the random number seed to a desired value. Specifying the seed allows a way to obtain identical results from multiple bootstrap runs.

Set Seed with

Enter a random number seed used in the random generation algorithm. When this field has a value, rerunning the dialog in the same state reproduces the same data.

Set Seed for Random Number Generators

DESCRIPTION:

Puts the random number generator in a reproducible state

USAGE:

set.seed(i, congruential, tausworthe)

set.seed()

OPTIONAL ARGUMENTS:

i interval [0,1023]

congruential

an integer. The number of steps to iterate the congruential part of the random number generator from its default starting point. This iterating is done by a call to a power function, so it is quick for any value of the argument.

tausworthe

an integer. The number of steps to iterate the Tausworthe part of the random number generator. Each iteration involves two shifts and x or operations per cycle so can take a while for very large numbers of iterations (there is no analogue to a power function for iterating this).

If i is an integer between 0 and 1023 then set.seed(i) is equivalent to set.seed(cong=i*2^20,taus=0). Each time a random uniform is generated both the congruential and Tausworthe parts of the seed are updated, so if you use identical values of congruential and tausworthe you will be sampling in the same stream of random numbers as you would when both are 0.

areg

Set Seed for Random Number Generators

DETAILS:

If no arguments are given then congruential and tausworthe are set to values based on the number of milli- or microseconds since 1970 or the last reboot, depending on the machine. This can be used to set a fairly random start for the generator when you start a new chapter. Otherwise a new chapter starts with the .Random.seed stored in the data library, which is a constant.

Random number generators in S-PLUS are all based on a single uniform random number generator that generates numbers in a very long, but ultimately periodic sequence. The generator implemented in S-PLUS is adapted from George Marsaglia's original "Super-Duper" package from 1973. It produces a 32-bit integer whose top 31 bits are divided by 2^31 = 2,147,483,648. The result is a real number in the half-open interval [0,1). The 32-bit integer is computed by a bitwise exclusive-or of two additional 32-bit integers: one produced by a congruential generator, and one produced by a Tausworthe generator. For most starting seeds, the congruential and Tausworthe generators combine to give a period of 2^30 * 4,292,868,097, which is approximately 4.6 * 10^18. The S-PLUS generator skips cases in which the result is exactly 0, producing random numbers in the open interval (0,1); this reduces the period by a small amount.

The object .Random.seed stores the current position in the random number generator's sequence. The first time random numbers are computed in an S-PLUS session, .Random.seed is modified and copied to the local working database. In general, .Random.seed is updated with the current congruential and Tausworthe values whenever S-PLUS computes a random sample. This mechanism maintains the long-term properties of the generator, and also allows for reproducibility of results.

The function set.seed defines .Random.seed so that subsequent calls to random number generator functions ( runif, rnorm , etc.) will generate numbers from a new portion of the overall cycle.

areg

终于明白了，这帮助文件说哪一大篇幅，把我搞糊涂。

简单说来，如果不设具体某数为种子，那么你用某分布每次生成的的随机数都不相同，如果我设了种子等于某个值，如等于3，那么运行该分布函数多次，所得结果每次都是相同的。

> study<-menuRdist(n=5, distribution="normal", seed=3, print.object.p=T, mean=0, sd=1)

sample

1 -0.2931156

2 -0.2614306

3 -1.3840270

4 -1.6942764

5 1.0330899

> study<-menuRdist(n=5, distribution="normal", seed=3, print.object.p=T, mean=0, sd=1)

sample

1 -0.2931156

2 -0.2614306

3 -1.3840270

4 -1.6942764

5 1.0330899

> study<-menuRdist(n=5, distribution="normal", print.object.p=T, mean=0, sd=1)

sample

1 -0.2024932

2 0.4263001

3 1.2855138

4 0.1744993

5 -0.4636422

> study<-menuRdist(n=5, distribution="normal", print.object.p=T, mean=0, sd=1)

sample

1 -0.04388846

2 0.30065397

3 0.52000641

4 -0.15322038

5 -0.04014543

> study<-menuRdist(n=5, distribution="normal", print.object.p=T, mean=0, sd=1)

sample

1 1.5845531

2 0.9458098

3 -0.4685173

4 0.8232337

5 0.5010749

>

areg

前久在本论坛某贴上看到，用SAS还是哪个程序（本人没有记清楚，对不起呀）生成的随机数每次都相同，而用R统计软件生成的，每次都不同。好象前一个程序不“随机”啦。

今天终于明白，那个朋友的结果不是不随机，而是设置了某个种子数或者说采用了默认的种子，如果不设种子，那么每次结果都是随机的，如果设了种子，每次以该种子为基础的“随机数”结果相同。

如上贴中，我尝试的结果。

高手们就别笑话，初学的朋友不再对“Seed”发愁啦

areg

“种豆得豆，种瓜得瓜”

在生成随机数中，同一个分布函数，种子“品种”不同（即等于不同值），每次生成的随机数当然不同；如种子等于某个具体值，每次生成的随机数都相同，因为它是基于某个算法规则的（见帮助文件中）。

yihui

上次说的是Stata：）

多谢areg

areg

[quote]引用第5楼谢益辉于2006-11-11 15:47发表的“”:

上次说的是Stata：）

多谢areg [/quote]

刚才睡了一觉，醒来，突然想起来，以前好象是宏软件论坛上有人提醒过，机器生成的“随机数”并非真正的随机数，而是一种“伪随机数”，当时也没有细想，结果也不知道。

实际上，机器生成的随机数之所以为“伪随机数”，是因为它有一定的算法基础。这个算法就在上面提的帮助文件中，不过如何来算了产生，我还没有真明白，只是清楚啦电脑程序随机数生成的原理。

eshanzi

很多台湾教材就直接说的产生“伪随机数”，我上次试stata的时候刚好碰到这个问题。谢谢areg圆满的解答，也佩服你的钻研精神

rtist

我好像以前在哪个论坛上贴过一次。常用的一般都是伪随机数，通过物理设备得到的通常可以认为是真正的随机数，另外一种在少数情况下应用的是准随机数，准随机数通常随机性不如伪随机数，但是可能为了获得其他性质在不同情况下可能采用准随机数代替伪随机数。伪随机数都有一定的周期性，也就是他们是周期性出现的，当然常见的算法周期都很长，经常被忽略。