Data Modeling and Model Validation

Team Studio provides a robust set of modeling operators and operators that provide validation for models.

For example, you can create classification models, logistic and linear regression models, or K-Means models. You can perform time series analysis or apply association rules.

Cluster Analysis Using K-Means
One of the most well-known and commonly used partitioning methods for cluster analysis.
Patterns in Data Sets
A pattern in a data set is referred to as an "Association." It can be a set of items, subsequences, substructures, and so on, that occurs frequently in a dataset. It is also known as a frequent pattern.
Alpine Forest Operators
Team Studio provides modeling, evaluation, validation, and prediction operators for regression (continuous) or classification (categorical) machine learning applications.
Model Export Formats
You can use the Export operator to export a model to a variety of formats.
Fitting a Trend Line for Linearly Dependent Data Values
Linear regression is the statistical fitting of a trend line to an observed dataset, in which one of the data values - the dependent variable - is found to be linearly dependent on the value of the other causal data values or variables - the independent variables.
Probability Calculation Using Logistic Regression
Logistic Regression is the statistical fitting of an s-curve logistic or logit function to a dataset in order to calculate the probability of the occurrence of a specific categorical event based on the values of a set of independent variables.
Classification Modeling with Decision Tree
The Decision Tree operator applies a classification modeling algorithm to a set of input data. This operator is most suitable for predicting or describing data sets with binary or limited number of discrete output categories.
Classification Modeling with Naive Bayes
Naive Bayes is a classification modeling method, like Logistic Regression and Decision Tree models.
Computed Metrics and Use Case for the Regression Evaluator
For model validation, the Regression Evaluator operator uses the MLlib regression evaluator. You can use it with
Collaborative Filtering
Collaborative filtering is commonly used for recommender systems. This approach collects information about a user's preferences and uses that to make predictions on what they may like based on their similarity to other users who have rated similar products.
Prediction Threshold
You can use the Classification Threshold Metrics operator to determine the prediction threshold that determines what the predicted class is, based on the probabilities that the model outputs.
Principal Component Analysis
PCA, or Principal Component Analysis, is a multivariate technique for examining relationships among several quantitative variables. It uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables (principal components).
Support Vector Machine Classification
Support Vector Machine (SVM) is an advanced supervised modeling technique for classifying both linear and nonlinear data. SVM Classification clusters data into the most distant and distinct groups as possible.
T-Tests
A t-test is any statistical test of significance that is based on the Student's t- distribution. T-tests are used to determine where two sets of numeric data are drawn from two different samples.
Testing Models for Performance Decay
After models are trained and deployed, differences between the training data and the data to be scored might lead to a degree of deterioration in their predictive performance. This page addresses how to test models to verify that their performance lies within predefined thresholds of acceptability.

Related concepts

Workflow Operator Reference

Contents

Index

Search Results

Data Modeling and Model Validation