利用R搭建GO注释流程

wuxian2r · 2012年6月3日

各位好：

能否利用R软件来搭建一个GO注释的流程？主要想实现以下功能：

一、输入：

1. 支持常见数据库编号输入，如蛋白质GI号，蛋白质名称，uniprot ID号，基因GI号，基因名，基因号等；

2. 类似于blast2go支持基因或者蛋白质序列输入；

二、输出：

1. 基因产物的GO对应关系；

2. GO的分类饼图等，类似于wego那样的分类图；

3. GO的富集分析。

请各位大侠支招哈。

谢谢~

foshuochanyu · 2012年6月4日

"topGO" package in Bioconductor is suitable for these jobs, especially the GO analysis and category-plotting ("Output" things in your query). For the "Input" format, gene names conversion could be easily done by some web-bases tools like "gene converter" http://idconverter.bioinfo.cnio.es/. R packages such as "biomaRt" is also in the recommended list if you are familiar with R language. For the gene/protein sequence query, I have not found a R package with this kind of function. My suggestion is converting gene sequence to its name at first, and then using the tools mentioned above.

Since GO analysis is quite a hot topic, if someone has any better ideas, please share with me. Thanks.[s:13]