frame |
data frame with one row for each node in the tree.
The row.names of frame contain the (unique) node numbers that
follow a binary ordering indexed by node depth.
Elements of frame include var,
the variable used in the split at each node
(leaf nodes are denoted by the string <leaf>),
n, the size of each node,
wt, the sum of the weights given to the observations in each node,
dev, the deviance of each node,
yval, the fitted value of the response at each node,
and splits, a two column matrix of left and right split labels
for each node.
All of these are the same as for a tree object.
Extra response information is in yval2. The first column of yval2 is the same as yval. For the poisson and exponential methods, the second column of yval2 contains the number of events at the node. For classification the rest of yval2 consists of matrices of class counts and probabilities. Each matrix includes a column for each class. The anova method does not have a yval2. Also included in the frame are complexity, the complexity parameter at which this split will collapse, ncompete, the number of competitor splits retained, and nsurrogate, the number of surrogate splits retained. |
where | vector, the same length as the number of observations in the root node, containing the row number of frame corresponding to the leaf node that each observation falls into. |
splits | a matrix describing the splits. The row label is the name of the split variable, and columns are count, the number of observations sent left or right by the split (for competitor splits this is the number that would have been sent left or right had this split been used, for surrogate splits it is the number missing the primary split variable which were decided using this surrogate), ncat, the number of categories or levels for the variable (+/-1 for a continuous variable), improve, which is the improvement in deviance given by this split, or, for surrogates, the concordance of the surrogate with the primary, index, the numeric split point, and adj, a measure of how much of the gain over and above naive did I do. For a factor, this column contains the row number of the csplit matrix. For a continuous variable, the sign of ncat determines whether the subset x<cutpoint or x>cutpoint is sent to the left. |
csplit | this will be present only if one of the split variables is a factor. There is one row for each such split, and column i = 3 if this level of the factor goes to the left, 1 if it goes to the right, and 2 if that level is not present at this node of the tree. For an ordered categorical variable all levels are marked as R/L, including levels that are not present. |
variable.importance | a named numeric vector with a measure of importance of each predictor in the model. This is only present if there are any splits. When printed by summary.arbor, the values are scaled as a percentage. |
method | the method used to grow the tree. |
cppFunctions | the names of the C or C++ functions used in partitioning. |
cptable | the table of optimal prunings based on a complexity parameter. |
terms | an object of mode expression and class term summarizing the formula. Used by various methods, but typically not of direct relevance to users. |
control | a named list of control parameters. control is described in help files for arbor and arbor.control. |
parms | a vector of parameters used by the splitting function. The meaning of parms varies by method. See the manual for details of useage and output description. |
ordered | a vector of logicals describing whether the predictors are ordered factors. |
functions | a list of functions such as summary and print used by the different methods. See the manual for more details. |
call | an image of the call that produced the object, but with the arguments all named and with the actual formula included as the formula argument. To re-evaluate the call, say update(tree). |
seed | if cross-validation groups are created internally in arbor, then the random seed setting used for this will be returned. |
Optional components include the matrix of predictors (x), the response vector (y) and the model frame (model). If none of these are requested explicitly, by setting their argument=T, the response variable (y) will be returned by default. The model frame from one run can be used as input into a future run of the arbor function, see examples in the arbor help file. |