发个学习贴，新手可以进来看看一些问题的理解

superdesolator

92.PKD之Documentation

文档的重要性我就不介绍了，只介绍R的文档框架，它们是.Rd文件，这种文件use a custom syntax, loosely based on latex, and are rendered to html for viewing

与其我们自己写.Rd文件，roxygen2帮我们生成.Rd文件,它的优点如下：

1. 代码与文档交织在一起，当修改函数的时候，很容易同时去修改文档

2. roxygen有一些功能使得我们避免重复性的劳动

3. 容易学

好了，我们先来展示一下整个工作流程，再一步步深入：

1.在.R文件中加入 roxygen comments

2. Run devtools::document() (or press Cmd + Shift + D in RStudio) 把roxygen转变成.Rd文件

3. 用?查看

4. 调整，重新整个过程

用作者的例子稍微展示一下：

第一步：写R文件并加roxygen注释#'

#' Add together two numbers.<br />
#'<br />
#' @param x A number.<br />
#' @param y A number.<br />
#' @return The sum of \code{x} and \code{y}.<br />
#' @examples<br />
#' add(1, 1)<br />
#' add(10, 1)<br />
add <- function(x, y) {<br />
  x + y<br />
}

第二步 devtools::document() (or press Cmd + Shift + D in RStudio)生成对应的.Rd文档

% Generated by roxygen2 (4.0.0): do not edit by hand<br />
\name{add}<br />
\alias{add}<br />
\title{Add together two numbers}<br />
\usage{<br />
add(x, y)<br />
}<br />
\arguments{<br />
  \item{x}{A number}</p>
<p>  \item{y}{A number}<br />
}<br />
\value{<br />
The sum of \code{x} and \code{y}<br />
}<br />
\description{<br />
Add together two numbers<br />
}<br />
\examples{<br />
add(1, 1)<br />
add(10, 1)<br />
}

第三步，先ctrl+shift+L或B，再使用?add,help("add"),example("add")的时候，R会查看包含\alias{"add"}的.Rd文件，然后显示

具体样子，大家在?sum,?mean等等的时候那个help html早见过了，这里节省篇幅

第四步，重新修改第一步

这里要提醒的是，对于使用ctrl+shift+l,我们再第二步之后，可以？或help是因为devtools重写了这个函数，而example("add")却不能运行，但对于build之后的三个函数都可以

OK，熟悉最基本的整个work flow之后，我们来看看roxygen注释怎么写：

在一个函数之前的都是一个block,而一个block会被打散成很多@tagName details

每个tagName的内容是到下一个开始为止，对于第一个tagName之前的文档（叫做introduction），会被特殊解释，第一个句子是title,会出现在help(packge=newpkg)首页的描述

第二个段落是description

第三个以及之后的段落是details

前两个是每个函数都必须有的，第三个可选，一个简单的例子如下：

#' Sum of vector elements.<br />
#'<br />
#' \code{sum} returns the sum of all the values present in its arguments.<br />
#'<br />
#' This is a generic function: methods can be defined for it directly or via the<br />
#' \code{\link{Summary}} group generic. For this to work properly, the arguments<br />
#' \code{...} should be unnamed, and dispatch is on the first argument.<br />
sum <- function(..., na.rm = TRUE) {}

可以看到这个文档符合上面解释的样子，但我翻看了pipeR的文档，看到了下面的样子，引用一下

#' Pipe an object forward<br />
#'<br />
#' The \code{\%>>\%} operator pipes the object on the left-hand side to the<br />
#' right-hand side as either the first argument and \code{.}, or a symbol<br />
#' defined by lambda expression.<br />
#'<br />
#' @param x object<br />
#' @param expr expression<br />
#' @details<br />
#' \code{\%>>\%} supports the following pipline mechanisms:<br />
#'<br />
#' 1. Pipe to first argument:<br />
#'<br />
#' \code{x \%>>\% f} as \code{f(x)}<br />
#'<br />
#' \code{x \%>>\% f(...)} as \code{f(x,...)}<br />
#'

它是用@details来特殊表明，这样感觉上比起前面介绍的方法写details的位置是可控的

稍微注意的是，\code{} and \link{} are formatting commands会在后面提到，还要介绍个ctrl+shift+/或者code | re-flow来让block在80字符之内

我们可以用任意标签@section tag来把一些很长的介绍来打散成各个醒目的小节，然后来引用作者的原话

[quote]. Section titles should be in sentence case and must be followed a colon, they can only be one line long[/quote]

例如：

#' @section Warning:<br />
#' Do not operate heavy machinery within 8 hours of using this function.

其他常用的有Note,References (例如?lapply）

下面来介绍2个用来在帮助文档之间来导航的标签：

@seealso 让你指向其他资源：web, \url{http://www.r-project.org}, in your package \code{\link{functioname}}, or another package \code{\link[packagename]{functioname}}

@family 用来指向一个家族系列的函数（还没有完全弄清楚[s:12]）

然后再来介绍3个容易让用户找到文档的标签，由于我暂时不能完全理解，还是先引用作者原话

[quote]Three other tags make it easier for the user to find documentation:

@aliases space separated aliases adds additional aliases to the topic. An alias is another name for the topic that can be used with ?.

@concepts adds extra keywords that will be found with help.search().

@keywords keyword1 keyword2 ... adds standardised keywords. Keywords are optional, but if present, must be taken from a predefined list found in file.path(R.home("doc"), "KEYWORDS").

Generally, keywords are not that useful except for @keywords internal. Using the internal keyword removes omits the function from the package index and disables some of their automated tests. It’s common to use @keywords internal for functions that are of interest to other developers extending your package, but not most users.[/quote]

需要注意的是@aliases 在build&reload + Document可以，而不会在load_all()的?显示出来（我猜是devtools的?没有完全重写?函数，导致部分功能不能运行），其它2个，@concepts，@keywords我不知道怎么用，甚至不能ctrl+shift+d 出来这两个标签[s:12]

基于其它要文档化的对象还有其它的标签，下面我们就来介绍函数的文档化标签和R的3种oo系统的标签：

对于函数文档，基本上很简单：@param, @examples and @return. 我们的第一个add例子就展示了，参数注释在一起的时候要注意下， @param x,y Numeric vectors. 的写法，最后最重要的是@ examples 中的3个format commands:

\dontrun{},\dontshow{},\donttest{}

普通没有这些格式命令，既会被run也会被show（我暂时不知道会不会被test，所以不提[s:18]）,run是指example()的时候会运行的，show是指?的时候会展示的，test是指R CMD check的时候会Test的，那么好了，解释完了[s:11]

然后需要注意一个@example path/relative/to/packge/root的用法

#' @example<br />
#' test/test.r<br />
#' @example<br />
#' test/test2.r

这个test是在newpkg下的文件夹，然后把需要测试的代码放入其中就可以了，包括可以放前面说的3种format commands

作者提醒:一般的标签第2行以及以后行最好来个indent，但对于经常要跨多行的标签例如@example应该重新开行且不需要indent

我们再来看看怎么文档化3种oo系统：

1.s3

s3的泛型函数可以如同上述方法描述，对于类方法却没有正式的规定，以前的版本roxygen会要求显示的@method generic class，现在不需要了，roxygen2可以自动找出,除了一些模糊的情况，（例如a.b.c）

2.s4

虽然暂时用不到，还是简单说下

首先我们需要再类前放一个block用@slot来描述类的节点，就好像用@param来描述函数参数一样

#' An S4 class to represent a bank account.<br />
#'<br />
#' @slot balance A length-one numeric vector<br />
Account <- setClass("Account",<br />
  slots = list(balance = "numeric")<br />
)

其次，S4泛型函数如同一般函数一样，对于S4类方法都必须得注释，由于这个情况，我们又不想对每个类方法给一页，我们可以放这些类方法在以下3个地方：

[quote]In the class. Most appropriate if the corresponding generic uses single dispatch and you created the class.

In the generic. Most appropriate if the generic uses multiple dispatch and you have written both the generic and the method.

In its own file. Most appropriate if the method is complex, or the you’ve written the method but not the class or generic.[/quote]

还要注意的是，s4的装载顺序，使用@include

#' @include class-a.R<br />
setClass("B", contains = "A")

这表示说装载当前文件之前要装入class-a.r文件，否则R按字母序装入文件，会导致错误

同样的：

#' @include foo.R bar.R baz.R<br />
NULL</p>
<p>setMethod("foo", c("bar", "baz"), ...)

我们可以把要装载的放入文件头，然后修饰一个NULL对象

Roxygen会把@include放入Collate field in the DESCRIPTION，当然我们可以使用特殊的方法来替代使用@include,例如我们把所有的类和类方法定义在aaa-classes.R and aaa-generics.R文件中，这样可以确保按字典序会被第一个装入，但这样的缺点是不能灵活的把这样定义放入多个文件

以前的版本会显示要求@usage, @alias and @docType现在都可以自动了

3.RC

由于RC系统的方法定义在类里面，它有个特殊的docstring语法,所有的类方法在类里使用docstring,我们只需要描述@field,roxygen会帮我们弄好RC的类方法：

#' A Reference Class to represent a bank account.<br />
#'<br />
#' @field balance A length-one numeric vector.<br />
Account <- setRefClass("Account",<br />
  fields = list(balance = "numeric"),<br />
  methods = list(<br />
    withdraw = function(x) {<br />
      "Withdraw money from account. Allows overdrafts"<br />
      balance <<- balance - x<br />
    }<br />
  )<br />
)

那么它的显示结果如下：

<br />
Account-class {newpkg}	R Documentation<br />
A Reference Class to represent a bank account.</p>
<p>Description</p>
<p>A Reference Class to represent a bank account.</p>
<p>Fields</p>
<p>balance<br />
A length-one numeric vector.</p>
<p>Methods</p>
<p>withdraw(x)<br />
Withdraw money from account. Allows overdrafts

好了，暂时基本了解上述介绍就够了，特别对于S4，RC

最后，我们来看看roxygen给我们2种不需要重复劳动的途径：

第一种，@inheritParams

#' @param a This is the first argument<br />
foo <- function(a) a + 10</p>
<p>#' @param b This is the second argument<br />
#' @inheritParams foo<br />
bar <- function(a, b) {<br />
  foo(a) * 10<br />
}

就相当于

#' @param a This is the first argument<br />
#' @param b This is the second argument<br />
bar <- function(a, b) {<br />
  foo(a) * 10<br />
}

第二种功能是让多个函数文档在同一个文件使用@rdname or @describeIn

首先，使用@describeIn

代码如下

#' Foo bar generic<br />
#'<br />
#' @param x Object to foo.<br />
foobar <- function(x) UseMethod("x")</p>
<p>#' @describeIn foobar Difference between the mean and the median<br />
foobar.numeric <- function(x) abs(mean(x) - median(x))</p>
<p>#' @describeIn foobar First and last values pasted together in a string.<br />
foobar.character <- function(x) paste0(x[1], "-", x[length(x)])

结果如下

<br />
foobar {newpkg}	R Documentation<br />
Foo bar generic</p>
<p>Description</p>
<p>Foo bar generic</p>
<p>Usage</p>
<p>foobar(x)</p>
<p>## S3 method for class 'numeric'<br />
foobar(x)</p>
<p>## S3 method for class 'character'<br />
foobar(x)<br />
Arguments</p>
<p>x<br />
Object to foo.</p>
<p>Methods (by class)</p>
<p>numeric: Difference between the mean and the median</p>
<p>character: First and last values pasted together in a string.

我们会看到，在usage中会出现所有的泛型函数和类方法，最后我们的文档在Methods(by class)那里显示了

事实上：

[quote]

@describeIn is designed for the most common cases:

Documenting methods in a generic.

Documenting methods in a class.

Documenting functions with the same (or similar arguments).

It generates a new section, named either “Methods (by class)”, “Methods (by generic)” or “Functions”. The section contains a bulleted list describing each function, labelled so that you know what function or method it’s talking about.[/quote]

我们的例子展示的就是第一种情况

替代@describeIn的@rdname,它使得我们直接用所有的文档来描述另一个对象

#' Basic arithmetic<br />
#'<br />
#' @param x,y numeric vectors.<br />
add <- function(x, y) x + y</p>
<p>#' @rdname add<br />
times <- function(x, y) x * y

或者先建立一个通用的文档来描述一个NULL对象，然后其他对象的文档引用它

#' Basic arithmetic<br />
#'<br />
#' @param x,y numeric vectors.<br />
#' @name arith<br />
NULL</p>
<p>#' @rdname arith<br />
add <- function(x, y) x + y</p>
<p>#' @rdname arith<br />
times <- function(x, y) x * y

我在这里要说下的是，我看了下pipeR的roxygen,注意到了@name这个标签，然后推测它是用来给生成的rd文件取名的，如下

#' Pipe an object forward to expression (deprecated)<br />
#' @param . object<br />
#' @param expr expression<br />
#' @name deprecated<br />
#' @export<br />
<code>%:>%</code> <- function(.,expr) {<br />
 ...<br />
}</p>
<p>#' @rdname deprecated<br />
#' @export<br />
<code>%|>%</code> <- function(.,expr) {<br />
....}<br />

又如：

#' Basic arithmetic<br />
#'<br />
#' @param x,y numeric vectors.<br />
#' @name arith<br />
NULL</p>
<p>#' @rdname arith<br />
add <- function(x, y) x + y</p>
<p>#' @rdname arith<br />
times <- function(x, y) x * y

我们可以看到生成的.Rd文件的alias会出现 arith,add,times,然后?随便其中一个，根据前面说的，R会找.Rd文件中的alias部分，这3个会展示同样的一个page,大家可以自己看下效果

最后，我们来说下Text formatting reference sheet，来介绍一些重要的命令，完整的命令在< a href=http://cran.r-project.org/doc/manuals/R-exts.html#Marking-text> R extensions

[quote]Character formatting

\emph{italics}.

\strong{bold}.

\code{r_function_call(with = "arguments")}, \code{NULL}, \code{TRUE}.

\pkg{package_name}.

Links

To other documentation:

\code{\link{function}}: function in this package.

\code{\link[MASS]{stats}}: function in another package.

\link[=dest]{name}: link to dest, but show name.

\linkS4class{abc}: link to an S4 class.

To the web:

\url{http://rstudio.com}.

\href{http://rstudio.com}{Rstudio}.

\email{hadley@@rstudio.com} (note the doubled @).

Lists

Ordered (numbered) lists:

#' \enumerate{

#' \item First item

#' \item Second item

#' }

Unordered (bulleted) lists:

#' \itemize{

#' \item First item

#' \item Second item

#' }

Definition (named) lists:

#' \describe{

#' \item{One}{First item}

#' \item{Two}{Second item}

#' }

Mathematics

You can use standard LaTeX math (with no extensions). Choose between either inline or block display:

\eqn{a + b}: inline equation.

\deqn{a + b}: display (block) equation.[/quote]

最后个展示table的功能如下，作者用该函数把R的数据框转成table（这段代码还是值得读的）

tabular <- function(df, ...) {<br />
  stopifnot(is.data.frame(df))</p>
<p>  align <- function(x) if (is.numeric(x)) "r" else "l"<br />
  col_align <- vapply(df, align, character(1))</p>
<p>  cols <- lapply(df, format, ...)<br />
  contents <- do.call("paste",<br />
    c(cols, list(sep = " \\tab ", collapse = "\\cr\n  ")))</p>
<p>  paste("\\tabular{", paste(col_align, collapse = ""), "}{\n  ",<br />
    contents, "\n}\n", sep = "")<br />
}</p>
<p>cat(tabular(mtcars[1:5, 1:5]))<br />
#> \tabular{rrrrr}{<br />
#>   21.0 \tab 6 \tab 160 \tab 110 \tab 3.90\cr<br />
#>   21.0 \tab 6 \tab 160 \tab 110 \tab 3.90\cr<br />
#>   22.8 \tab 4 \tab 108 \tab  93 \tab 3.85\cr<br />
#>   21.4 \tab 6 \tab 258 \tab 110 \tab 3.08\cr<br />
#>   18.7 \tab 8 \tab 360 \tab 175 \tab 3.15<br />
#> }

PS: 最后还是要提醒，ctrl+shift+B 可以完成?查看aliases,example(""),包括查看\code{\link{}}等功能而ctrl+shift+L不能完成相应功能，是因为devtools并没有重写这些功能

由于内容太杂，简单总结下思路：

1. 基本的workflow

2. Introduction,@details,@seealso,@family,@aliases等

3. @section warning：(Note,References)

4. 函数文档的@examples,@example,\dontrun{}等

5. 3种OO系统的用法，特别注意s4的@include,RC的docstring

6. 避免重复劳动，@inheritParams,@rdname or @describeIn

7. 一些格式命令

superdesolator

93.PKD之Package documentation

这个帖子主要介绍怎么写vignette.主要是用来介绍整个包的，我们平时看的关于整个包的使用方法就是这个东西告诉我们的，而上一贴主要是?出来的帮助文档.

我们可以通过browseVignettes()来看所有的可查看的vignettes,或者browseVignettes("packagename")来看一个包里的，当然我们也

可以直接使用vignette("",package="")来直接查看某个包里的某个vignette.还可以edit(vignette("",package=""))来提取其中的代码（暂时没尝试成功，在询问中）

作者推荐使用knitr包中带的rmarkdown vignette engine来制作vignette.

好处有3点：

1. 写用markdown的语法写，拥有markdown的优势

2. 使用knitr能将代码和文字混合

3.使用pandoc把markdown转成html

个人的感觉是这个过程是:knitr将代码转成markdown格式（后面也会看到这点）,再和原来的markdown语法部分由pandoc将markdown转成html（注意是个人理解）

使用Rstudio就很容易使用上述工具，如果不是就得装devtools::install_github("rstudio/rmarkdown")和pandoc

我们来看看vignette的workflow

作者说：

<br />
devtools::use_vignette("my-vignette")

这会自动帮我们做3个事情：[quote]

1.Create a vignettes/ directory.

2.Add the necessary dependencies to DESCRIPTION (i.e. knitr in suggests and vignette builder).

3.Creates a draft vignette, my-vignette.Rmd.[/quote]

接下来就简单了，只要循环修改vignette然后ctrl+shift+K就可以了

那么首先我们来说，怎么写vignette,根据前面介绍的，接下来会分3部分介绍：

1) metadata

---<br />
title: "Vignette Title"<br />
author: "Vignette Author"<br />
date: "<code>r Sys.Date()</code>"<br />
output: rmarkdown::html_vignette<br />
vignette: ><br />
  %\VignetteIndexEntry{Vignette Title}<br />
  %\VignetteEngine{knitr::rmarkdown}<br />
  %\usepackage[utf8]{inputenc}<br />
---<br />

对于这些metadata,要动的就是\VignetteIndexEntry{Vignette Title} 中的Vignette Title要修改成和title保持一致就可以了，事实上这些metadata用了一种和DCF文件很类似的yaml格式，>后面的内容是普通text不会被解释成yaml的特征，这些东西是要被R用到的，很类似之前的latex格式（因为之前的vigenette都是用sweave的，而sweave用的就是latex)

2)markdown

其实1)部分，我们暂时可以不必太关心，markdown的语法是我们需要关心的：

首先，这里有很全的解释

我们来解释一些暂时比较重要的常用的语法：

1.开头，有2种方法

<br />
第一种<br />
This is an H1<br />
=============<br />
This is an H2<br />
-------------<br />
第二种<br />
# Heading 1<br />
## Heading 2<br />
### Heading 3<br />
创建水平线<br />
* * *</p>
<p>***</p>
<p>*****</p>
<p>- - -</p>
<p>---------------------------------------

2.项目

<br />
无序的可以用*,+,-<br />
-   Red<br />
-   Green<br />
-   Blue<br />
有序的，看第一个<br />
1.  Bird<br />
2.  McHale<br />
3.  Parish<br />
和<br />
1.  Bird<br />
1.  McHale<br />
1.  Parish<br />
效果一样<br />
但是<br />
3. Bird<br />
1. McHale<br />
8. Parish<br />
就从3开始往后数,但要注意如果想使用年份<br />
1986\. What a great season.<br />
当然可以嵌套<br />
* Bulleted list<br />
* Item 2<br />
    * Item 2a (notice 4-space indent)<br />
    * Item 2b<br />
但是如果我们要一个项目中有很多段，必须要在第二段及以后来缩进，至少是第二段的第一行，当然第二段之后的所有行缩进会更好看<br />
  * It's possible to put multiple paragraphs of text in a list item. </p>
<p>    But to do that, the second and subsequent paragraphs must be<br />
    indented by four or more spaces. That looks better if the first<br />
    bullet is also indented.<br />
但是如果对于项目中的引用则必须缩进<br />
*   A list item with a blockquote:</p>
<p>    > This is a blockquote<br />
    > inside a list item.<br />
而对于项目中的代码段则要缩进两次<br />
*   A list item with a code block:</p>
<p>        <code goes here><br />
再提个<br />
Definition lists use :</p>
<p>Definition<br />
  : a statement of the exact meaning of a word, especially in a dictionary.<br />
List<br />
  : a number of connected items or names written or printed consecutively,<br />
    typically one below the other.<br />
  : barriers enclosing an area for a jousting tournament.<br />

3.引用

<br />
其中可以嵌套<br />
> ## This is a header.<br />
><br />
> 1.   This is the first list item.<br />
> 2.   This is the second list item.<br />
> > This is nested blockquote<br />
> Here's some example code:<br />
><br />
>     return shell_exec("echo $input | $markdown_script");<br />

4.代码段

<br />
要缩进，注意在项目中要缩进两次，引用在项目中要缩进<br />
This is a normal paragraph:</p>
<p>    This is a code block.

5.链接

<br />
This is [an example](http://example.com/ "Title") inline link.</p>
<p>[This link](http://example.net/) has no title attribute.<br />
[link text](destination)<br />
<http://this-is-a-raw-url.com>

6.格式

_italic_, *italic*<br />
__bold__, **bold**

7.代码

<br />
inline use <code>code</code><br />
For bigger blocks of use, use <code></code>`</p>
<p>为了语法高亮，可以加个语言，目前支持很多很多种语言，cpp,python,java,ruby,r,sql等等<br />
<code></code>`c<br />
int add(int a, int b) {<br />
  return a + b;<br />
}<br />
<code></code>`

但是一般我们不用r而用`{r}，是因为knitr帮我们处理这个方式，控制性更多

3)knitr

由于之前我用过knitr写过R sweave to PDF所以比较熟悉一点，下面介绍下它的主要功能，至于chunk的部分暂时不

涉及

对于`{r, opt1 = val1, opt2 = val2}这种控制方式，我们的常用选择有：

1)eval = FALSE,不运行代码

2)echo = FALSE,不打印代码本身

3)results = "hide",不打印结果

4)warning = FALSE, message = FALSE,不显示警告以及信息

5)error = TRUE,查看错误

6)collapse = TRUE, comment = "#>"

作者说经常把下面代码放在文档的起始

<code></code>`{r, echo = FALSE}<br />
knitr::opts_chunk$set(collapse = TRUE, comment = "#">)<br />
<code></code>`

7)fig.show = "hold" hold所有图像直到代码段的结束

8)fig.width = 5, fig.height = 5

9)results = "asis".这个作用很大，因为它把输出的R结果当做markdown的内容，然后再转成html

例如，

<code></code>`{r, results = "asis"}<br />
pander::pandoc.table(iris[1:3, 1:4])<br />
<code></code>`

使用这个pander包里的pandoc.table输出markdown的制表语法

--------------------------------------------------------<br />
 Sepal.Length   Sepal.Width   Petal.Length   Petal.Width<br />
-------------- ------------- -------------- -------------<br />
     5.1            3.5           1.4            0.2     </p>
<p>     4.9             3            1.4            0.2     </p>
<p>     4.7            3.2           1.3            0.2<br />
--------------------------------------------------------

最后再由上面的markdown表->html上的表

当然，了解全部的东西，<a href=http://yihui.name/knitr/options>可以看这里</a>

然后要提醒的是，devtools::build_vignettes()用来生成所有vigenettes但是很少用，一般用 build()来创建一个bundle的时候来生成所有vigenettes，还要注意的是build&reload(ctrl+shift+B)并不生成vigenettes（节省时间的角度考虑改用devtools::install()），但是所有安装函数例如install_github()都要求vigenettes已经被创建好的。

下面对于常见问题，我引用吧

[quote]The vignette builds interactively, but when checking, fails with an error about a missing package that you know is installed. This means that you’ve forgotten to declare that dependency in the DESCRIPTION (usually it should go in suggests).

Everything works interactively, but the vignette doesn’t show up after you’ve installed the package. First, remember that RStudio’s “build and reload” doesn’t build vignettes, so instead run devtools::install(). Next check:

The directory is called vignettes/ and not vignette/.

Check that you haven’t inadvertently excluded the vignettes with .Rbuildignore

Ensure you have the necessary vignette metadata.

error = TRUE[/quote]

大概是说,vigenette用到的包最好放入DESCRPTION的suggests中，然后再确保使用devtools::install()生成了vigenettes的情况下，保证目录名vigenettes(不是vigenette),没有放在.Rbuildignore中，确保含有必要的metadata

superdesolator

94.PKD之NAMESPACE

首先我来引用下一段英文[quote]The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path.[/quote]

记得我之前仔细研究过search path点这里

上述英文解释正好是一个补充，其实之前的帖子环境那章也提到过：

nrow<br />
#> function (x)<br />
#> dim(x)[1L]<br />
#> <bytecode: 0x2a4ebc8><br />
#> <environment: namespace:base><br />
dim <- function(x) c(1, 1)<br />
dim(mtcars)<br />
#> [1] 1 1<br />
nrow(mtcars)<br />
#> [1] 32

上述代码发生的原因就是，nrow首先在自己的包（base包）的namespace中找，然后在imports中找，再在base namespace中找，最后才上search path找.

显然我那篇帖子说的是search path上的寻值规则，而这里一切发生在上search path之前[s:11]

所以nrow在自己包的namespace中就找到了

最简单的使用namespaces的方法是roxygen2,它就是来转化成NAMESPACE指令的，例如

@import plyr becomes import(plyr) and @importFrom plyr ddply becomes importFrom(plyr, ddply),@export可以自动找到函数名

也就是说import(plyr)这些指令是在NAMESPACE文件中的真正样子，而roxygen2是帮我们在R文件周围写的标签转化成对应的效果.

我们首先来介绍Exports:

默认的，roxygen2不导出任何东西，所以必须得显式@export

作者例举了下面的指导规则来告诉我们什么该导出：

1) 函数，想导出的函数都是必须文档化的

2) 数据集不需要导出，它们本身就是可获得的

3) S3类如果想让其他人创建类的实例，导出它的构造函数

4) s3泛型函数同1)

5) s3方法所有的S3方法都必须要导出（原因暂时看不懂我也翻了下Writing R ext手册也是暂时看不懂[s:18]）

6) s4类暂时没怎么弄懂，原话引用如下，并且开了个贴子询问中

[quote]S4 classes: if you want others to be able to extend your class, @export it. If you want others to create instances of your class, but not extend it, @export the constructor function, but not the class.

# Can extend and create

#' @export

setClass("A")

# Can extend, but constructor not exported

#' @export

B <- setClass("B")

# Can create, but not extend

#' @export C

C <- setClass("C")

# Can create and extend

#' @export D

#' @exportClass D

D <- setClass("D")[/quote]

7)s4泛型同1)

8)s4方法最好全导出

9)RC类和S4类规则一样

另外要注意对于S4，

Depends: methods in DESCRIPTION

import(methods) in NAMESPACE （@import methods 可以放在任何地方，但最好在介绍整个包的文档中）

要这样做的原因是，methods包会被loaded和attached在R交互式的时候，但Rscript的时候却不attached

我们再来介绍Imports:

NAMESPACE同时也可以控制我们的包需要的东西

假如我们的包需要用到一个包中很少的函数，我们可以在DESCRPTION中Imports这个包，然后用::使用那些函数，这样做没有在NAMESPACE中import有效率，但是由于在NAMESPACE中import的那些包会在load我们的包的同时被load,而使用::可以在用到这个包::的时候才会导致这个包被load,这就带来了一个包load delay的好处（虽然我现在还不知道什么是load,attach,但很显然这个方法比起在NAMESPACE中显式import确实可以带来load delay)

当然，替代的方法就是@importFrom pkg fun

如果我们包需要很多个来自其他的包的函数，那么@import pkg

如果我们需要为一个S3泛型增加一个类方法，那么 @importFrom pkg generic.

如果我们使用S4，则

@importClassesFrom package classa classb ... to import selected S4 classes.

@importMethodsFrom package methoda methodb ... to import methods for selected S4 generics.

如果我们需要其他包的编译好的代码，使用@useDynLib (这部分writing r ext上有比较详细的介绍，但暂时无力弄懂[s:18]）

[quote]@useDynLib package imports all compiled functions.

@useDynLib package routinea routineb imports selected compiled functions.

Any @useDynLib specification containing a comma, e.g. @useDynLib mypackage, .registration = TRUE will be inserted as is into the the NAMESPACE, e.g. useDynLib(mypackage, .registration = TRUE)[/quote]

最后，作者给出了一个技巧，就是怎么给一个suggested package 注册S3方法，仍然是原文引用[s:18]

[quote]One complexity arises when you want to register S3 methods for a generic that’s defined in a suggested package. You can’t use S3method() because the generic is not available at package load time. Instead, you can use the following code to set up hooks that load automatically.

From htmltools:

# COPYRIGHT RStudio, GPL >= 2

registerMethods <- function(methods) {

lapply(methods, function(method) {

pkg <- method[[1]]

generic <- method[[2]]

class <- method[[3]]

func <- get(paste(generic, class, sep="."))

if (pkg %in% loadedNamespaces()) {

registerS3method(generic, class, func, envir = asNamespace(pkg))

}

setHook(

packageEvent(pkg, "onLoad"),

function(...) {

registerS3method(generic, class, func, envir = asNamespace(pkg))

}

)

})

}[/quote]

虽然我们不明白methods是什么样子，怎么lapply它，怎么拆分出需要的东西，但是应该可以看出setHook中说，当那个包被load的时候注册S3方法！

PS：注意ctrl+shift+D [s:11]

superdesolator

95.PKD之Testing

这一贴,其实主要介绍作者的testthat包

测试的好处我简短归纳下：

1) 测试越多未来出错可能越少

2) 越好的代码越容易测试，测试的时候很有可能让我们把代码分散成功能块

3）更容易知道工作要做什么，比如修正BUG,没有BUG的时候添加新功能

4) 修改代码更有信心

1. testthat的结构

expectation 用来检测最基础结果，test用来组合expectation来检测一个test_that()函数，context用来组合相关的一系列test_that()

2.expectations

引用原文，夹杂个人翻译[s:11]

[quote]•equals() 用 all.equal()来检测是否相等，但容许一定的数值精度差.

# Passes expect_that(10, equals(10)) # Also passes expect_that(10, equals(10 + 1e-7))

# Fails expect_that(10, equals(10 + 1e-6))

# Definitely fails! expect_that(10, equals(11))

•is_identical_to() 用identical()来检测精确相等.

# Passes expect_that(10, is_identical_to(10)) # Fails expect_that(10, is_identical_to(10 + 1e-10))

•is_equivalent_to() 宽松的equals()版本，因为忽略属性:

# Fails expect_that(c(“one” = 1, “two” = 2), equals(1:2)) # Passes expect_that(c(“one” = 1, “two” = 2), is_equivalent_to(1:2))

•is_a() 检测inherit()s继承自某个特定的类.

model <- lm(mpg ~ wt, data = mtcars) # Passes expect_that(model, is_a(“lm”))

# Fails expect_that(model, is_a(“glm”))

•matches() matches a character vector against a regular expression. The optional all argument controls whether all elements or just one element needs to match. This code is powered by str_detect() from the stringr package.

string <- “Testing is fun!” # Passes expect_that(string, matches(“Testing”)) # Fails, match is case-sensitive expect_that(string, matches(“testing”)) # Passes, match can be a regular expression expect_that(string, matches(“T.+ting”))

•prints_text() 对输出结果来进行正则匹配

a <- list(1:10, letters) # Passes expect_that(str(a), prints_text(“List of 2”)) # Passes expect_that(str(a), prints_text(fixed(“int [1:10]”))

•shows_message() 检测一个表达式生成特定信息:

# Passes expect_that(library(mgcv), shows_message(“This is mgcv”))

•gives_warning() 检测生成警告信息.

# Passes expect_that(log(-1), gives_warning()) expect_that(log(-1), gives_warning(“NaNs produced”)) # Fails expect_that(log(0), gives_warning())

•throws_error() verifies that the expression throws an error. You can also supply a regular expression which is applied to the text of the error.

# Fails expect_that(1 / 2, throws_error()) # Passes expect_that(1 / “a”, throws_error()) # But better to be explicit expect_that(1 / “a”, throws_error(“non-numeric argument”))

•is_true() is a useful catchall if none of the other expectations do what you want - it checks that an expression is true. is_false() is the complement of is_true().

[/quote]

这11种之多的检测可以组合出很多test去完成一个特定错误检测,另外，作者还提供了快键方式：

[quote]

expect_that(x, is_true())

expect_true(x)

expect_that(x, is_false())

expect_false(x)

expect_that(x, is_a(y))

expect_is(x, y)

expect_that(x, equals(y))

expect_equal(x, y)

expect_that(x, is_equivalent_to(y))

expect_equivalent(x, y)

expect_that(x, is_identical_to(y))

expect_identical(x, y)

expect_that(x, matches(y))

expect_match(x, y)

expect_that(x, prints_text(y))

expect_output(x, y)

expect_that(x, shows_message(y))

expect_message(x, y)

expect_that(x, gives_warning(y))

expect_warning(x, y)

expect_that(x, throws_error(y))

expect_error(x, y)

[/quote]

3.tests

一个test是test_that(name,code block)组成的, name是一种一旦出错可以让你知道哪里或者哪个功能出错的有用的信息标签，而code block就是完成对那种功能的检测的组合代码块，一旦出错就可以轻松找到对应的代码块，例如作者举的 test floor_date() function from library(lubridate).

test_that("floor_date works for different units", {<br />
  base <- as.POSIXct("2009-08-03 12:01:59.23", tz = "UTC")</p>
<p>  is_time <- function(x) equals(as.POSIXct(x, tz = "UTC"))<br />
  floor_base <- function(unit) floor_date(base, unit)</p>
<p>  expect_that(floor_base("second"), is_time("2009-08-03 12:01:59"))<br />
  expect_that(floor_base("minute"), is_time("2009-08-03 12:01:00"))<br />
  expect_that(floor_base("hour"),   is_time("2009-08-03 12:00:00"))<br />
  expect_that(floor_base("day"),    is_time("2009-08-03 00:00:00"))<br />
  expect_that(floor_base("week"),   is_time("2009-08-02 00:00:00"))<br />
  expect_that(floor_base("month"),  is_time("2009-08-01 00:00:00"))<br />
  expect_that(floor_base("year"),   is_time("2009-01-01 00:00:00"))<br />
})

那么，假设一旦出错，我们看到 Test failed: "floor_date works for different units", Not expected: ..... 我们很快就可以找到上述test的对应代码

每一个test拥有自己的环境，但也有会影响到自己环境之外的情况：

1)文件系统，创建删除文件

2) search path,loading,detaching包

3) global options, 例如options(),par()

作者说如果有上述动作，you’ll need to clean up after yourself，有些测试包会帮你set-up and teardown methods that are run automatically before and after each test，对于testthat,我们可以在tests外创建对象然后依靠copy-on-modify机制不改变外部对象（这段没怎么深入理解）引用一下原文吧[s:11]

[quote]When you use these actions in tests, you’ll need to clean up after yourself. Many other testing packages have set-up and teardown methods that are run automatically before and after each test. These are not so important with testthat because you can create objects outside of the tests and rely on R’s copy-on-modify semantics to keep them unchanged between test runs. To clean up other actions you can use regular R functions[/quote]

4.contexts

context就是把一些相关功能的tests组织到代码块，一般一个context一个文件,下面就举个测试stringr

包中的str_length

context("String length")</p>
<p>test_that("str_length is number of characters", {<br />
  expect_that(str_length("a"), equals(1))<br />
  expect_that(str_length("ab"), equals(2))<br />
  expect_that(str_length("abc"), equals(3))<br />
})</p>
<p>test_that("str_length of missing is missing", {<br />
  expect_that(str_length(NA), equals(NA_integer_))<br />
  expect_that(str_length(c(NA, 1)), equals(c(NA, 1)))<br />
  expect_that(str_length("NA"), equals(2))<br />
}<br />
test_that("str_length of factor is length of level", {<br />
  expect_that(str_length(factor("a")), equals(1))<br />
  expect_that(str_length(factor("ab")), equals(2))<br />
  expect_that(str_length(factor("abc")), equals(3))<br />
})

我们同时用nchar来替代str_length测试

5.运行测试

我们有两种运行测试的情况：一是开发的时候，交互式的，二是自动测试

1) 我们用test_file(),test_dir()运行所有测试

2) auto_test来自动测试

3) 让R CMD check来运行测试

6.测试文件和目录

从下面的代码以及结果可以看到source(path)和test_file(path)的区别，可以看到test_file会给出所有的结果，而source只会给出第一个错误的test

> source("test-str_length.r")<br />
> test_file("test-str_length.r")<br />
.........</p>
<p>> source("test-nchar.r")<br />
Error: Test failure in 'nchar of missing is missing'<br />
* nchar(NA) not equal to NA_integer_<br />
'is.NA' value mismatch: 0 in current 1 in target<br />
* nchar(c(NA, 1)) not equal to c(NA, 1)<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>> test_file("test-nchar.r")<br />
...12..34</p>
<p>1. Failure: nchar of missing is missing ---------------------------------<br />
nchar(NA) not equal to NA_integer_<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>2. Failure: nchar of missing is missing ---------------------------------<br />
nchar(c(NA, 1)) not equal to c(NA, 1)<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>3. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("ab")) not equal to 2<br />
Mean relative difference: 0.5</p>
<p>4. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("abc")) not equal to 3<br />
Mean relative difference: 0.6666667

test_dir()会自动执行一个目录下以test开头的文件，下面是stringr的结果，12个contexts,每个2到25个基本检测

> test_dir("inst/tests/")<br />
String and pattern checks : ......<br />
Detecting patterns : .........<br />
Duplicating strings : ......<br />
Extract patterns : ..<br />
Joining strings : ......<br />
String length : .........<br />
Locations : ............<br />
Matching groups : ..............<br />
Test padding : ....<br />
Splitting strings : .........................<br />
Extracting substrings : ...................<br />
Trimming strings : ........

然后就是报告的形式，上面看到的test_dir的结果就是test_dir和test_file默认的报告形式summary reporter,而我们可以选择minimal reporter(E代表错误，F代表失败):

> test_dir("inst/tests/", reporter="minimal")<br />
...............................................

而最后一种reporter形式stop是当有一个failure出现就stops()的形式.原文如下

[quote]The stop reporter is the default and stop()s whenever a failure is encountered.[/quote]

我的理解是stop reporter就是

4. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("abc")) not equal to 3<br />
Mean relative difference: 0.6666667

7.Autotest

auto_test()有两个参数，code_path和test_path.前者是放代码的目录，后者是放测试的目录，一旦运行，auto_test()会自动检测，如果某个测试文件修改了，那么会重新用那个测试文件测试，如果某个源文件修改了，它会重新装载那个源文件，然后执行所有测试文件测试。

我的理解就是auto_test()一运行，剩下的就去改吧，改完点保存自动测试。

这改变了传统的modify->save->source->check的模式

8.R CMD check

首先我们得把testthat放入DESCRIPTION文件，以免R CMD check警告unspecified dependencies.

然后就是要把测试代码放入R CMD check可以找到的地方.

最开始，最好的实践是把所有测试文件放入inst/tests然后添加下列代码到tests/test-all.R

library(testthat)<br />
library(yourpackage)<br />
test_package("yourpackage")

现在推荐的方式是：

把测试放入tests/testthat,然后在tests/test-all.R放入

library(testthat)<br />
test_check("yourpackage")

作者说这样的好处是可以在R CMD install的时候选择 -install-tests或者设置install.packages()的参数

INSTALL_opts = c(“–install-tests”)

我又要猜测为什么要这么做了[s:11]：

由于R包的代码和测试代码的位置相对可以找到，所以test_check里面很可能用了test_dir和test_file,从而利用了这两个函数的优势（例如比起source的优势）

9.开发的两种方式

一种叫探索性编程，一种叫验证式编程

验证式就是遇到一个test失败，run一下auto_test()然后不断修改直到通过

探索式就是不断source+modify

然后最后都弄好了就document(),update NEWS.

superdesolator

95.PKD之Data

R包里面有3种包含数据的方法：

1) 如果你想存parsed data，并且可以供用户使用，那么放入 data/

2) 如果你想存parsed data,但不想让用户使用，那么放入R/sysdata.rda

3) 如果你想存raw data,最好放入inst/extdata

OK，下面我们一个个介绍：

1.data/

这个文件夹是放置示例数据集的最好的地方，其中的.rda文件要是save()创建的：

每个文件应该包含一个单个对象，文件名就是这个对象的名字

save(mtcars, file = "data/mtcars.rda")

data文件夹里的对象是自动被exported，所以必须文档化，本贴最后面会提到

还记得DESCRIPTION里的LaztData字段吗，它的值为true的话，就表示数据集不会占用任何内存直到我们加载它们，

pryr::mem_used()<br />
#> 24.4 MB<br />
library(nycflights13)<br />
pryr::mem_used()<br />
#> 24.6 MB<br />
invisible(flights)<br />
pryr::mem_used()<br />
#> 59.9 MB<br />

我们通过上例看到，当我们加载包的时候数据集flights并没有被装入内存，但是当使用flights的时候，它就被装入内存了，所以显然这个方式是值得推荐的

另外，有些数据文件是从别的地方收集到的raw data创建的，所以我们最好能这些可重复的代码放入data-raw文件夹中，并注意把这个文件夹加入.Rbuildignore)

大家可以看看实际的例子： nasaweather

最后作者提醒，如果你的包要用到一个很少变动的大数据集，最好把它放入一个专门的包，这样你的代码包改动的时候，用户就不必重复下载数据集

2.R/sysdata.rda

有的时候，我们的函数需要用到一些事先计算好的表，那么我们最好使用这个方法，因为我们不应该让用户可以使用，这个文件可以放入任意数量对象：

save(x, y, z, file = "R/sysdata.rda")

由于不被用户使用，不被exported,所以不必文档化

3.inst/

当我们需要给出 loading/parsing raw data的例子的时候，把原文件放入inst/,事实上，我们可以放入任何目录，但最好是inst/extdata,由于inst目录的内容在包被装的时候所有东西都会被放入最上层目录，所以inst里面不能有R，DESCRIPTION之类名字的目录

获得路径的方法：

system.file("include", "Rcpp.h", package = "Rcpp")<br />
#> [1] "/usr/lib/R/site-library/Rcpp/include/Rcpp.h"

但要注意，如果没有这个文件夹，system.file不会出现错误，而是返回空字符串

system.file("include", "Rcp.h", package = "Rcpp")<br />
#> [1] ""

4.vignettes/

如果我们需要数据集写vignette,就直接把数据集放入vignette文件夹就好了，然后用local path来引用

5.cran notes

如果我们的包要被提交到cran,我们必须确保数据集被优化压缩，原文：

[quote]Run tools::checkRdaFiles() to determine the best compression for each file. If you’ve lost the code for recreating the files, you can use tools::resaveRdaFiles() to save as best format, but it’s better to modify the original save() code[/quote]

大家可以?看下这两个函数以及?save,其实有很多东西要学

6.文档化数据集

给个作者的例子

#' Prices of 50,000 round cut diamonds.<br />
#'<br />
#' A dataset containing the prices and other attributes of almost 54,000<br />
#' diamonds. The variables are as follows:<br />
#'<br />
#' \itemize{<br />
#'   \item price. price in US dollars (\$326--\$18,823)<br />
#'   \item carat. weight of the diamond (0.2--5.01)<br />
#'   ...<br />
#' }<br />
#'<br />
#' @format A data frame with 53940 rows and 10 variables<br />
#' @source \url{http://www.diamondse.info/}<br />
#' @name diamonds<br />
NULL

我们不直接文档数据集，而是文档NULL，然后用@name告诉roxygen2你其实在文档什么，然后@format就是给出数据结构的样子，如果我们省略，roxygen会自动根据间str()的第一行输出添加，@source给出数据的来源,经常是\url{}

PS: 下面内容是补充第5点的内容，不想看可以不必细看[s:11]

顺便更深入的介绍下save,tools::resaveRdaFiles,tools::checkRdaFiles,其实大家可以自己?去仔细读，我这里捡些重要的内容以及给个框架：

首先是R中的数据压缩

[quote]Large files can be reduced considerably in size by compression. A particular 46MB R object was saved as 35MB without compression in 2 seconds, 22MB with gzip compression in 8 secs, 19MB with bzip2 compression in 13 secs and 9.4MB with xz compression in 40 secs. The load times were 1.3, 2.8, 5.5 and 5.7 seconds respectively. These results are indicative, but the relative performances do depend on the actual file: xz compressed unusually well here.

It is possible to compress later (with gzip, bzip2 or xz) a file saved with compress = FALSE: the effect is the same as saving with compression. Also, a saved file can be uncompressed and re-compressed under a different compression scheme (and see resaveRdaFiles for a way to do so from within R).[/quote]

可以看到gzip,bzip2,xz的不同之处

再次来看save:

save(..., list = character(),<br />
     file = stop("'file' must be specified"),<br />
     ascii = FALSE, version = NULL, envir = parent.frame(),<br />
     compress = !ascii, compression_level,<br />
     eval.promises = TRUE, precheck = TRUE)

...就是要存储的对象的名字，list是包含名字的字符串向量,file是存储的路径,envir是寻找的环境，ascii是存储的格式（备选是2进制，大概是说ascii可以跨平台，而2进制只可以在所有R平台),version是说R的版本，老的用1，新的用2（ R 1.4.0 之后就算新了），compress就是压缩方式，为true默认是gzip,显然这里和ascii对应了，就是说如果不是ascii也就是2进制，默认用gzip,当然也可以自己选"bzip2","xz",compression_level貌似是数字代表压缩方式？（不太理解），eval.promises是否要计算promises对象（不的话就得保存整个相关环境），precheck是否

检测对象的存在性？最后个safe,先用临时文件保存成功了转成正式的，失败了就自动删除吧，貌似是这样

最后要注意，这个函数也看options的设置，我们可以

options(save.defaults = list(ascii = TRUE, safe = FALSE)

更改默认值

最后来看看tools::resaveRdaFiles,tools::checkRdaFiles：

其实看清楚了save之后，这两个函数就没什么好说的了，我只举个例子，首先大家从

这个地方的data文件下载borders.rdata放入wd

然后我们来看看吧：

<br />
tools::checkRdaFiles("borders.rdata")<br />
   #            size ASCII compress version<br />
#borders.rdata 79354 FALSE     gzip       2<br />
tools::resaveRdaFiles("borders.rdata",compress="xz",2)<br />
tools::checkRdaFiles("borders.rdata")<br />
 #             size ASCII compress version<br />
#borders.rdata 50572 FALSE       xz       2

OK,这么清楚的了解了save,tools::resaveRdaFiles,tools::checkRdaFiles，那么在R包里存放个数据有什么难的呢？[s:11]

superdesolator

96.PKD之Compiled code

这个部分很难，作者也就写了3页，我就简单列举了：

1.c++

可能是Rcpp的原因，C++是最容易用在R里面的编译型语言

2.C

如果想包装老的C代码，或者罕见的不用C++的包装

那么：

.c文件放入src/，并且添加

[quote]#include <R.h>

#include <Rdefines.h>[/quote]

像这样包装函数

#' @useDynLib qstack car<br />
car_ <- function(x) .Call(car, x)

[quote]

Slowest: use .Call("car")

Faster: use .Call(car)

Fastest: don’t write wrapper function[/quote]

如果使用了useDynLib,最好使用.onUnload()来收尾

.onUnload <- function (libpath) {<br />
  library.dynam.unload("qstack", libpath)<br />
}

3.Fortran

...[s:18]

4.Java

作者说rJava也需要很多copy所以貌似很慢

5.Dynamically loading dll

使用@dynlib loading

然后unload:

.onUnload <- function(libpath) {<br />
  library.dynam.unload("yourpackagename", libpath)<br />
}

来确保当包被unloaded的时候DLL也被unloaded

6.Licensing

如果使用了别人写的代码，我们必须e.g

http://cran.rstudio.com/web/packages/optparse/COPYRIGHTS

我估计有些地方作者还没写完吧[s:12]

superdesolator

97.PKD之Other components

很不幸，作者这本书正在写，而现在这部分还没写完，Hadley Wickham大神很忙的啊（可是书您能先一口气写完么[s:11]）

作者列了个目录，准备写的部分：

[quote]

.Rbuildignore

inst/

exec/: executable scripts

po/: translation files

the inst/CITATION file describes how to cite your package. If you have published a peer reviewed article which you’d like people to cite when they use your software, this is the place to put it.

the demo/ directory contains larger scale demos, that use many features of the package.[/quote]

可以看到exec/,po/我们还陌生的很啊[s:12]

CITATION貌似写完了,它和citation()函数关联，文件在inst/CITATION,主要目的是cite,对我们来说就是cite r package.我们直接运行citation()会出现如何cite base R.

To cite R in publications use:</p>
<p>  R Core Team (2012). R: A language and environment for statistical<br />
  computing. R Foundation for Statistical Computing, Vienna, Austria.<br />
  ISBN 3-900051-07-0, URL http://www.R-project.org/.</p>
<p>A BibTeX entry for LaTeX users is</p>
<p>  @Manual{,<br />
    title = {R: A Language and Environment for Statistical Computing},<br />
    author = ,<br />
    organization = {R Foundation for Statistical Computing},<br />
    address = {Vienna, Austria},<br />
    year = {2012},<br />
    note = { {ISBN} 3-900051-07-0},<br />
    url = {http://www.R-project.org/},<br />
  }</p>
<p>We have invested a lot of time and effort in creating R, please cite it<br />
when using it for data analysis. See also ‘citation("pkgname")’ for<br />
citing R packages.

产生上面信息的CITATION文件可能长下面的样子：

bibentry("Manual",<br />
   title = "R: A Language and Environment for Statistical Computing",<br />
   author = person("R Core Team"),<br />
   organization = "R Foundation for Statistical Computing",<br />
   address      = "Vienna, Austria",<br />
   year   = version$year,<br />
   note   = "{ISBN} 3-900051-07-0",<br />
   url    = "http://www.R-project.org/",</p>
<p>   mheader = "To cite R in publications use:",</p>
<p>   mfooter =<br />
     paste("We have invested a lot of time and effort in creating R,",<br />
      "please cite it when using it for data analysis.",<br />
      "See also", sQuote("citation(\"pkgname\")"),<br />
      "for citing R packages.", sep = " ")<br />
)

大家可以？bibentry,第一个参数是bibtype,然后对照下mheader,mfooter不过这两个是可选的,其他部分是基本信息

然后demo没有写完，作者只说了demo/目录下的R文件会和demo()关联，比起一个函数例子，它们是多个函数的组合

demos需要个index,最简单的是：

#' @demoTitle my title

然后作者就什么都没写了........ 期待作者赶紧写吧，不过看了下这本书要到2015年的6月出版，有的等了[s:15]

superdesolator

98.PKD 之 Git and github

此时此刻，作者几乎木有写，然后我就找了份教程学习了，感觉非常好

点这里看教程

我花了10多个小时的时间学习完了,绝对值得推荐,看完上面的教程，应该是以后不用看Hadley Wickham写出来的这部分了[s:11]

superdesolator

99.PKD 之 Checking

如果我们要把包发布到CRAN上，这个包必须通过R CMD check,也就是说不能有

warnings,errors以及尽可能少的notes.

要check我们的包，运行devtools::check(),或者在RStudio按ctrl+shift+E

1.Check components

devtools::run_examples()它还有可选的参数让你选择从哪个函数开始，这个好处是前面通过的函数就不要每次都运行了

devtools:：check_doc() 文档化是最常见的check失败的原因之一

2.Common problems

这里作者给出了一些常见问题，由于我们并没有真正走到这一步，但先看看也是不错的：

1)有时check()的失败是安装失败造成的

2)所有被exported的对象都必须文档化，所有的函数参数都必须文档化

3)S3方法要和它的泛型保持参数一致，特别例如print的...参数：

print<br />
#> function (x, ...)<br />
#> UseMethod("print")<br />
#> <bytecode: 0x30475b8><br />
#> <environment: namespace:base></p>
<p># BAD<br />
print.my_class <- function(x) cat("Hi")</p>
<p># GOOD<br />
print.my_class <- function(x, ...) cat("Hi")

4)用TRUE,FLASE而不是T,F

5）Data应该要被适当压缩，见Data那贴

6) examples or vignettes带来的BUGS，要注意的是它们只能用exported函数，而不能用internal functions

同样也要注意DESCRIPTION的依赖关系

7) building pdf manual

[quote]Occassionally you’ll get an error when building the PDF manual. This is usually because the pdf is built by latex and you’ve forgotten to escape something. Debugging this is painful - your best bet is to look up the latex logs and combined tex file and work back from their to .Rd files and

back to a roxygen comment. You may need to enlist the help of a local latex expert.[/quote]

3.Checking on Windows

[quote]You can check packages on Windows using build_win(). This builds your package and submits it to the CRAN win-builder. 30-60 minutes after submission, you’ll recieve and email which will point you to the check results.

This checks both 32-bit and 64-bit builds. So if you’re using C code, you’ll need to make sure it works for both.

By default build_win() checks with both the release and development version of R. You can override this with the version parameter.[/quote]

貌似是说，build_win()帮你把包提交到CRAN win-builder上check,然后过个半个小时到1个小时我们就会收到相应结果的邮件

4.Checking dependencies

感觉作者没有写完[s:12]，因为原文就这样一个标题

5.Continuous integration with Travis

这部分暂时大家可以不学，因为连github都没有用熟悉，怎么就能学怎么自动化，什么commit的时候运行check,什么pull requests的时候自动运行....先把github用的比较熟了，再来学习这个travis工具吧！貌似是个很给力的工具

superdesolator

100.PKD 之 Releasing

1.Releasing a package

两个地方:

CRAN,github

重要的文件：

NEWS 说明版本的变动

README.md 总体介绍包

2.README.md

作者说用3-4段来说明以下几个问题：

•Why should someone use your package?

•How does it compare to other existing solutions?

•What are the main functions?

同时推荐在更新包的版本时也使用，这在github包的主页上会显示

3.NEWS

以下举例stringr的NEWS

stringr 0.5<br />
===========</p>
<p>* new <code>str_wrap</code> function which gives <code>strwrap</code> output in a more<br />
  convenient format</p>
<p>* new <code>word</code> function extract words from a string given user defined<br />
  separator (thanks to suggestion by David Cooper)</p>
<p>* <code>str_locate</code> now returns consistent type when matching empty string<br />
  (thanks to Stavros Macrakis)</p>
<p>* new <code>str_count</code> counts number of matches in a string.</p>
<p>* <code>str_pad</code> and <code>str_trim</code> receive performance tweaks - for large vectors<br />
  this should give at least a two order of magnitude speed up</p>
<p>* str_length returns NA for invalid multibyte strings</p>
<p>* fix small bug in internal <code>recyclable</code> function

可以看到：

1) 包名+版本号+ 一行=

2) 每个改动都要和一个bullet一起列出，如果一个bullet持续多行，那就其他行得至少2个spaces

3) 如果改动多，可以分成小节，小节的名字都必须是大写左对齐

4) markdown的使用会帮助其他地方publish NEWS

我们可以使用devtools::show_news()，默认显示最近，但latest=FALSE改变这个默认

另一个选择是NEWS.md

•Can’t include in package (needs to be in .Rbuildignore), and so not on CRAN.

•Easier to read on github

•Easier to copy-and-paste into github releases

4.Version numbers

R使用版本号来决定包的依赖关系是否满足，所以很重要：

一个版本号的结构通常是这样的：

1.9和1.9.0效果一样，但是比1.9.2低

1是主版本号major,一些不兼容的API变动的时候可以改变这个版本号

9是次版本号minor,一些向后兼容的API变动的时候

0是补丁版本号patch,一些向后兼容的修正

另外，还有次补丁sub-patch,这预示着released版本在改动中1.9.0.9000,1.9.0.9001

但注意上面的规则也不是绝对的，例如我们修改了一个很不常用的向后兼容的API那可能就没必要改动1，或者当我们修复了一个非常重要的BUG的时候可能可以改动1

5.Publishing on Github

作者暂时没写[s:12]

6.Publishing on CRAN

当我们通过了check的时候就可以上传给CRAN了：

1.Build a package bundle with build().

2.Upload to http://cran.r-project.org/submit.html

3.Confirm submission email

4.Wait for response.

作者推荐devtools::release(),这个我们以后遇到再说吧，没实战也不知道效果

值得说的是，如果失败了，那么就make the suggested changes, update cran-comments.md describing what you did and why, and re-run release().

7.CRAN policies

最重要的是：

•Must pass R CMD check without errors, warnings and ideally notes. If there are notes that you do not believe are important, it is almost always easier to hack around them than to persuade CRAN that they’re ok.

•Must have clearly identified IP holders, have open source license (so CRAN can distribute it) and not include any binary executables.

•Maintainer email address must be correct. CRAN will make no effort to contact apart from through this address. Make sure it’s something that’s likely to be around for a while

•Must not be more than 5 megs.

•Packages shouldn’t make external changes without explicit user permission. Don’t write to the file system, change options, install packages, quit R, send information over the internet, open external software, etc.

•You must not use :::

我相信如果大家仔细学过来的话，应该对上面所提的比较熟悉，作者在其他地方都提及过

8.Post-release and Publicising

这部分基本上就是git tag以及宣传我们的包的地方，主要是twitter,blog,maillist以及自己的包的主页(github pages)等

superdesolator

101.PKD之Style guide

其实这个帖子介绍风格的,作者的adr书中也说过，毕竟两本书分开的嘛，所以作者可能要写两次，那我在简单做个总结吧，形成一个好的R风格要注意的：

1) File names :

最好意义，且以.R结尾

# Good<br />
fit-models.R<br />
utility-functions.R<br />
# Bad<br />
foo.r<br />
stuff.r

如果需要按顺序执行则得这样：

0-download.R<br />
1-parse.R<br />
2-explore.R

2)Object names:

最好是下划线，不要以.防止S3方法派送出问题

# Good<br />
day_one<br />
day_1</p>
<p># Bad<br />
first_day_of_the_month<br />
DayOne<br />
dayone<br />
djm1

同时不要以存在的内建的名字命名

3)Syntax:

对内建的操作符(=,+,-,<-等）要左右加空格，对于函数参数的=也是一样,注意逗号后加空格前面不加

# Good<br />
average <- mean(feet / 12 + inches, na.rm = TRUE)</p>
<p># Bad<br />
average<-mean(feet/12+inches,na.rm=TRUE)

但对于:, ::, :::例外

除了函数调用的括号，最好在括号前后加空格，括号里面不要加

if (debug) do(x)<br />
diamonds[5, ]</p>
<p># Bad<br />
if ( debug ) do(x)  # No spaces around debug<br />
x[1,]   # Needs a space after the comma<br />
x[1 ,]  # Space goes after comma not before

4)Curly braces

# Good</p>
<p>if (y < 0 && debug) {<br />
  message("Y is negative")<br />
}</p>
<p>if (y == 0) {<br />
  log(x)<br />
} else {<br />
  y ^ x<br />
}</p>
<p># Bad</p>
<p>if (y < 0 && debug)<br />
message("Y is negative")</p>
<p>if (y == 0) {<br />
  log(x)<br />
}<br />
else {<br />
  y ^ x<br />
}<br />
#Good<br />
if (y < 0 && debug) message("Y is negative")

只有一句的话if后可跟，否则第二行indent，同时注意else的位置

5)Indentation

long_function_name <- function(a = "a long argument",<br />
                               b = "another argument",<br />
                               c = "another long argument") {<br />
  # As usual code is indented by two spaces.<br />
}

6)Commenting guidelines

# Load data ---------------------------</p>
<p># Plot data ---------------------------

PS:ok,pkd也暂时结束[s:11]

superdesolator

102. 关于时间日期处理的包lubridate

简单介绍下，这个包的用处就是方便我们处理日期有关的事情，举个从别处拿来的例子，假设我们要统计一个给定的日期的当月的星期五有几天，如下代码可以实现这个功能：

重点是 months,days,wday,day等函数的功能

个人感觉比较实用直观，推荐一下给还不知道的筒子们[s:11]

fridays <- function(the_day){<br />
    require(lubridate)<br />
    day(the_day) = 1<br />
    the_month = seq(the_day, the_day+months(1)-days(1),1)<br />
    sum(wday(the_month) == 6)<br />
}</p>
<p>That returns the number of Fridays in the month of a Date object given as arg:</p>
<p>> fridays(as.Date("2012-01-01"))<br />
[1] 4<br />
> fridays(as.Date("2012-02-01"))<br />
[1] 4<br />
> fridays(as.Date("2012-03-01"))<br />
[1] 5<br />
> fridays(as.Date("2012-04-01"))<br />
[1] 4<br />

superdesolator

103. search path的研究

最近又用到这个的知识,不知道这个知识前面的帖子收录了没有,在链接一下

点我

« 上一页