graceli
Several add-on packages implement ideas and methods developed at the borderline between computer science and statistics - this field of research is usually referred to as machine learning. The packages can be roughly structured into the following topics:
*Neural Networks : Single-hidden-layer neural network are implemented in package nnet as part of the VR bundle (shipped with base R).
*Recursive Partitioning : Tree-structured models for regression, classification and survival analysis, following the ideas in the CART book, are implemented in rpart (shipped with base R) and tree. Package rpart is recommended for computing CART-like trees. A rich toolbox of partitioning algorithms is available in Weka , package RWeka provides an interface to this implementation, including the J4.8-variant of C4.5 and M5.
Two recursive partitioning algorithms with unbiased variable selection and statistical stopping criterion are implemented in package party. Function ctree() is based on non-parametrical conditional inference procedures for testing independence between response and each input variable whereas mob() can be used to partition parametric models. Extensible tools for visualizing binary trees and node distributions of the response are available in package party as well.
An adaptation of rpart for multivariate responses is available in package mvpart. The validity of trees can be investigated via permutation approaches with package rpart.permutation and a tree algorithm fitting nearest neighbors in each node is implemented in package knnTree. For problems with binary input variables the package LogicReg implements logic regression. Graphical tools for the visualization of trees are available in packages maptree and pinktoe.
*Random Forests : The reference implementation of the random forest algorithm for regression and classification is available in package randomForest. Package ipred has bagging for regression, classification and survival analysis as well as bundling, a combination of multiple models via ensemble learning. In addition, a random forest variant based on conditional inference trees is implemented in package party. The varSelRF package focuses on variable selection by means for random forest algorithms.
*Regularized and Shrinkage Methods : Regression models with some constraint on the parameter estimates can be fitted with the lasso2 and lars packages. The solutions for all values of the shrinkage parameter can be simultaneously computed using the functionality in package elasticnet. The L1 regularization path for generalized linear models and Cox models can be obtained from functions available in package glmpath. The shrunken centroids classifier and utilities for gene expression analyses are implemented in package pamr.
*Boosting : Various forms of gradient boosting are implemented in packages gbm (tree-based functional gradient descent boosting) and boost (including LogitBoost and L2Boost). Package GAMBoost can be used to fit generalized additive models by a boosting algorithm. An extensible boosting framework for generalized linear, additive and nonparametric models is available in package mboost.
*Support Vector Machines : The function svm() from e1071 offers an interface to the LIBSVM library and package kernlab implements a flexible framework for kernel learning (including SVMs, RVMs and other kernel learning algorithms). An interface to the SVMlight implementation (only for one-against-all classification) is provided in package klaR.
*Bayesian Methods : Bayesian Additive Regression Trees (BART), where the final model is defined in terms of the sum over many weak learners (not unlike ensemble methods), are implemented in package BayesTree.
*Optimization using Genetic Algorithms Packages gafit and rgenoud offer optimization routines based on genetic algorithms.
*Association Rules : Package arules provides both data structures for efficient handling of sparse binary data as well as interfaces to implementations of Apriori and Eclat for mining frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
*Model selection and validation : Package e1071 has function tune() for hyper parameter tuning and function errorest() (ipred) can be used for error rate estimation. The cost parameter C for support vector machines can be chosen utilizing the functionality of package svmpath. Functions for ROC analysis and other visualisation techniques for comparing candidate classifiers are available from package ROCR.
*Elements of Statistical Learning : Data sets, functions and examples from the book The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman have been packaged and are available as ElemStatLearn.
neige
I am looking for R codes for the book
Trevor Hastie, Robert Tibshirani, Jerome Friedman. The elements of statistical learning. Springer, 2001
anyone has it?