treeRegFit
Fit a Regression Tree Model from Spotfire

Description

Fits a regression tree model from Spotfire using the formula, data. Summary statistics and data for visualizations of the fit are returned. This function is not intended to be called by the user.

Usage

treeRegFit(formula, data, modelName = NULL, minsplit,
    maxdepth, cp, xval)

Arguments

formula a formula object. The response variable, specified as a single numeric variable must be on the left of a tilde (~) operator and the terms, separated by plus sign (+) operators, must be on the right.
data data frame to contain the variables named in the formula.
modelName a character string containing the name of the model in Spotfire.
minsplit the minimum number of observations that must exist in a node for a split to be attempted.
maxdepth set the maximum depth of any node of the final tree, with the root node counted as depth 0 (if set past 30, arbor returns nonsense results).
cp the complexity parameter. Any split that does not decrease the overall lack of fit by a factor of cp is not attempted. For instance, with anova splitting, this means that the overall Rsquare must increase by cp at each step. The main role of this parameter is to save computing time by pruning off splits that are obviously not worthwhile. Essentially, the user informs the program that any split that does not improve the fit by cp is likely pruned off by cross-validation, and that hence the program need not pursue it. If cp is given a positive value, mindev is set to -1.0.
xval an integer number representing the size of the cross-validation groups.

Details

This function gets invoked when a regression tree model is fit in Spotfire from the Tools menu.
Value
a list with components:
modelObj the object created by calling arbor using the formula and data. The object is converted to a "raw" binary object with the function SObjectToBlob from the SpotfireUtils package. This is to allow storing the object in Spotfire. To convert to the original arbor object, use the BlobToSObject function, also in the SpotfireUtils package.
fitSummaryTable a single column data frame containing a summary of the tree model. The first row of the data frame contains the name of the response column in data. The second row contains the names of the predictor columns from data in a single string, comma separated.
fitPlotData a data frame containing the residuals and fitted values.
fitPlotDesc a character matrix containing a description of the visualizations that can be created in Spotfire using the data in fitPlotData. The columns of the matrix are:
MenuName
the text to appear in the Spotfire menu.
PlotType
the type of visualization to create
Xdatatable
the name of the data table for the x-axis variable. If the data table is generated by this function (i.e. fitPlotData) the name will have the prefix modelName_.
Xcolumn
the name of the x-axis column in Xdatatable.
Ydatatable
the name of the data table for the y-axis variable. If the data table is generated by this function (i.e. fitPlotData) the name will have the prefix modelName_.
Ycolumn
the name of the y-axis column in Ydatatable.
Title
the title for the visualization.
varImpTable a two column data frame with columns Variable and VarImportance. The value of VarImportance is the variable.importance value from the arbor object.
See Also
treeRegEvaluate, treeRegPredict, arbor
Examples
ztree <- treeRegFit(ozone ~ wind + temperature, data=Sdatasets::air,
    modelName="treeExample")
names(ztree)
ztree[[2]]
Package SpotfireStats version 6.1.1-7
Package Index