arbor
Recursive Partitioning and Regression Trees

Description

Produces an object of class "arbor", which is a fit of a tree model to the data.

Usage

arbor(formula, data = environment(formula), weights, subset, 
   na.action = na.arbor, method, CppFunctions = list(), 
   model = FALSE, x = FALSE, y = TRUE, parms, control, cost, 
   nRandomSplitVars = 0, ...)

Arguments

formula a formula expression as for other regression models, of the form response ~ predictors, but interaction terms are not allowed. For details, see the documentation for lm and formula.
data an optional data frame in which to interpret the variables named in the formula.
weights optional weights.
subset an optional expression specifying that only a subset of the rows of the data should be used in the fit.
na.action the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.
method one of
  • "anova"
  • "poisson"
  • "class"
If method is missing, then the routine tries to make an intelligent guess. If y is a factor, then method = "class" is assumed, for multi-column input method = "poisson" is assumed; otherwise, method = "anova" is assumed.

The "exp" method and "longitudinal" method are not implemented in this version of arbor.

cppFunctions specifying split, eval and error is not implemented in this version of arbor, the method decides these functions.

model keep a copy of the model frame in the result. If the input value for model is a model frame (likely from an earlier call to the arbor function), then this frame is used rather than constructing new data.
x keep a copy of the x matrix in the result.
y keep a copy of the dependent variable in the result.
parms optional list of parameters for the splitting function.

Anova method has no parameters.

For Poisson splitting, the list components can include the coefficient of variation of the prior distribution on the rates (component shrink), and an error method (component method). method can be either "deviance" or "sqrt". method defaults to "deviance". shrink can be any positive numeric value. The default for shrink is 1 when method="deviance" and 0 when method="sqrt".

For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be "gini" or "information". The default priors are proportional to the data counts, the losses default to 1, and the split defaults to "gini".

control options that control details of the arbor algorithm.
cost optional vector of variable costs, one value per predictor. In choosing the primary split variable, each variable's improvement is divided by the cost; this modified improvement is what is used to rank the variables, and what will be listed in the output. Values must be greater than 0, the default value is 1. Costs are not used in defining surrogate splits.
nRandomSplitVars Number of variables to sample at each tree node as candidates for splitting. Default value of 0 means all variables are candidate split variables at a node. A value > 0 gives a randomized tree.
... arguments to arbor.control may also be specified in the call to arbor. These arguments are overridden by settings given for the control argument.
Value
returns an object of class arbor.
References
Atkinson, E.J. and Therneau, T. M. 2000. An Introduction to Recursive Partitioning Using the RPART Routines. Palo Alto, CA: Stanford Univ.
Breiman, L., et al. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks/Cole.
Breiman, L. 2001. Random Forests. Berkeley, CA: University of California Statistics Dept. Tech. Report.
Breiman, L. 2001. Statistical modeling: The two cultures. Statistical Science. Volume 16, No. 3, 199-231.
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning New York, NY: Springer.
See Also
arbor.control, arbor.object, summary.arbor, print.arbor, arbor.control,
Examples
library("arbor")
data(kyphosis, package = "Sdatasets")
fit <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis)
fit2 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis,
    parms=list(prior=c(.65, .35), split='information'))
fit3 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis,
    control=arbor.control(cp=.05))





print(fit)
print(fit2)

# return the model frame and use it in a new fit data(lung, package = "Sdatasets") fit4 <- arbor(cbind(time, status) ~ inst + age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, method = "poisson", data = lung, model = TRUE) fit5 <- arbor(model = fit4$model, method = fit4$method, cp = 0.001, xval = 0)

Package arbor version 6.1.1-7
Package Index