How to: |
When creating a Data Flow, you can easily run predictive analytics on your data sets using Machine Learning functions, without prior knowledge of advanced statistics.
Train and run multiple iterations of predictive models in parallel, evaluate and compare models actively, and select which model you want to save. Then you can re-run your model against new data sets.
Note: For more information, see the TIBCO WebFOCUS Installation and Configuration manual for your platform.
After you create a Data Flow, you can select from different model algorithms to run against your data set.
The Data Flow page opens, as shown in the following image.
Note: Double-click a data source to display sample data.
The Train Models panel opens, as shown in the following image.
The following models display within the Train Models panel:
Now you can select a model to train and run against your data.
These models predict binary values based on four different algorithms: Random Forest, K-Nearest-Neighbors, Logistic Regression, and Extreme Gradient Boosting.
Note: When running the Binary Classification model algorithm, smaller-sized data files may not generate a model. Larger-sized files are recommended for best results.
The Configure dialog box displays, as shown in the following image.
You can click the Target dropdown menu to select a different target. All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.
Your selected model type appears on the dataflow canvas, as shown in the following image.
The Compare Model Evaluation Results dialog opens, as shown in the following image.
The model algorithms run in parallel, allowing you to easily compare results and determine which model is best. You can filter which model comparisons you want to see by selecting or deselecting the model check boxes.
Note: To re-open the Compare Model Evaluation Results dialog box, click the Compare icon on the canvas toolbar.
Your model data displays in the following tabs. You can select different model algorithm options from the model drop-down menu. The best model is selected by default.
Note: Feature Importances is available for the Random Forest model only.
These models predict numeric values based on four different regression algorithms: Random Forest, K-Nearest-Neighbors, Polynomial Regression, and Extreme Gradient Boosting.
The Configure dialog box displays, as shown in the following image.
You can click the Target dropdown menu to select a different target. All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.
Your selected model type appears on the dataflow canvas, as shown in the following image.
The Compare Model Evaluation Results dialog opens, as shown in the following image.
The model algorithms run in parallel, allowing you to easily compare results and determine which model is best. The best model has the lowest Root Mean Square Error value, and a scatter plot with dots closest to the red line You can filter which model comparisons you want to see by selecting or deselecting the model check boxes.
Note: To re-open the Compare Model Evaluation Results dialog box, click the Compare icon on the canvas toolbar.
Your model data displays in the following tabs. You can select different model algorithm options from the model drop-down menu. The best model is selected by default.
Note: Feature Importances is available for the Random Forest model only.
These models produce cluster assignments, based on two different clustering algorithms: K-Means and BIRCH. K-Means uses different geometric properties of data points to assign them to clusters based on similarities. BIRCH is a hierarchical method that allows data points to be in the same cluster if they are separated by a distance smaller than a set threshold distance. Both clustering model types run at the same time with default hyperparameters.
The Configure dialog box displays, as shown in the following image.
All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.
Your selected model type appears on the dataflow canvas, as shown in the following image.
Your model data displays according to your model algorithm. You can select the K-MEANS Clustering or BIRCH Clustering algorithm from the drop-down menu. The best algorithm is selected by default.
Your model data displays in the following tabs, using the K-Means algorithm.
Locations of individual observations are shown in the image below.
Your model data displays in the following tabs, using the BIRCH algorithm.
Locations of individual observations are shown in the image below.
These models detect anomalies, based on one clustering algorithm: Isolation Forest.
The Configure dialog box displays, as shown in the following image.
All numeric Field measures are selected by default as Predictors. You can add or remove Predictors by selecting or unselecting the check boxes.
Your selected model type appears on the dataflow canvas, as shown in the following image.
Your model data displays in the following tabs, using the Isolation Forest model algorithm.
These models produce time-series forecasts based on the forecasting algorithm: Auto-SARIMA.
The Configure dialog box displays, as shown in the following image.
You can click the Forecast dropdown menu to select a different Forecast variable. You can choose a Date/Datetime variable by selecting its radio button.
Your selected model type appears on the dataflow canvas, as shown in the following image.
Your model data displays in the following tabs.
Before or after your model is trained, you can edit your model target, predictors, or datetime variables, depending on your model type. You can also edit the default parameters unique to each model.
To edit your model target and predictors, right-click the canvas model node, point to Edit Settings, and then click Target and Predictors, or for Time-Series forecasting models, click Forecast and Date/Datetime variables.
To edit your model parameters, right-click the canvas model node, point to Edit Settings, point to Parameters and Hyperparameters, and then click your model algorithm type. For Time-Series forecasting models, the Sampling frequency parameter Week may not work with your dataset, and is an Advanced option.
Note: For Time-Series forecasting models, if the chosen parameter for Sampling frequency is too long, it may result in too few data points to produce reliable statistics. In this case, the algorithm will modify the sampling to a shorter frequency, for example, from Year to Quarter, and redo the analysis. Sampling frequency modifications are reported in the training log.
You can also click the Model Editor icon to change targets, predictors, and parameters.
When training a model, you can save it from the Compare Model Evaluation Results dialog box. After running a model, you can save it from the tabbed panel beneath the canvas. You can then re-run your saved model against new data sets.
The Save dialog opens, as shown in the following image.
You can change the model algorithm, name, or location, and add a description.
Your model is saved to your selected folder location.
Your saved model can be run later against new data that is similar to the data it was trained on.