arbor(formula, data = environment(formula), weights, subset, na.action = na.arbor, method, CppFunctions = list(), model = FALSE, x = FALSE, y = TRUE, parms, control, cost, nRandomSplitVars = 0, ...)
formula | a formula expression as for other regression models, of the form response ~ predictors, but interaction terms are not allowed. For details, see the documentation for lm and formula. |
data | an optional data frame in which to interpret the variables named in the formula. |
weights | optional weights. |
subset | an optional expression specifying that only a subset of the rows of the data should be used in the fit. |
na.action | the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing. |
method |
one of
The "exp" method and "longitudinal" method are not implemented in this version of arbor. |
cppFunctions |
specifying split, eval and
error is not implemented in this version of arbor,
the method decides these functions.
|
model | keep a copy of the model frame in the result. If the input value for model is a model frame (likely from an earlier call to the arbor function), then this frame is used rather than constructing new data. |
x | keep a copy of the x matrix in the result. |
y | keep a copy of the dependent variable in the result. |
parms |
optional list of parameters for the splitting function.
Anova method has no parameters. For Poisson splitting, the list components can include the coefficient of variation of the prior distribution on the rates (component shrink), and an error method (component method). method can be either "deviance" or "sqrt". method defaults to "deviance". shrink can be any positive numeric value. The default for shrink is 1 when method="deviance" and 0 when method="sqrt".
For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be "gini" or "information". The default priors are proportional to the data counts, the losses default to 1, and the split defaults to "gini". |
control | options that control details of the arbor algorithm. |
cost | optional vector of variable costs, one value per predictor. In choosing the primary split variable, each variable's improvement is divided by the cost; this modified improvement is what is used to rank the variables, and what will be listed in the output. Values must be greater than 0, the default value is 1. Costs are not used in defining surrogate splits. |
nRandomSplitVars | Number of variables to sample at each tree node as candidates for splitting. Default value of 0 means all variables are candidate split variables at a node. A value > 0 gives a randomized tree. |
... | arguments to arbor.control may also be specified in the call to arbor. These arguments are overridden by settings given for the control argument. |
library("arbor") data(kyphosis, package = "Sdatasets") fit <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis) fit2 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis, parms=list(prior=c(.65, .35), split='information')) fit3 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis, control=arbor.control(cp=.05)) print(fit) print(fit2)# return the model frame and use it in a new fit data(lung, package = "Sdatasets") fit4 <- arbor(cbind(time, status) ~ inst + age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, method = "poisson", data = lung, model = TRUE) fit5 <- arbor(model = fit4$model, method = fit4$method, cp = 0.001, xval = 0)