Contents

What's New
System Requirements
Supported Legacy Database Platforms
Supported Legacy Hadoop Platforms
Supported TIBCO Data Virtualization Platforms
Supported Spark Cluster Platforms
System Administration
TIBCO Data Science - Team Studio Licensing
Installation
Prerequisites
Hadoop Connection Prerequisites
Spark Cluster Connection Prerequisites
Prerequisites Server Configuration
Installing TIBCO Data Science - Team Studio
Installer Command Line Interface
Upgrading TIBCO Data Science - Team Studio
Integrating TIBCO Data Virtualization with TIBCO Data Science - Team Studio
Importing the .car file to TIBCO Data Virtualization
Running the Initial Setup
Configuring TIBCO Data Virtualization Data Service
Migrating TIBCO Data Virtualization Assets
Accessing Data in TIBCO Data Virtualization from TIBCO Data Science - Team Studio
Shared Volume Data Access Configuration
Integrating TIBCO Data Science - Team Studio with TIBCO ModelOps
TIBCO Data Science - Team Studio Default Ports
Python Packages in Notebooks Container of TIBCO Data Science - Team Studio
TIBCO Data Science - Team Studio Configuration
TIBCO Data Science - Team Studio Deploy Properties
The Properties File
Configuring Indexing Frequency for Database Instances
TIBCO Data Science - Team Studio Configuration Properties
Server Ports
Configuring the HDFS Directory and Permissions for Results File Storage
TIBCO Data Science - Team Studio Related HDFS Configuration
Deleting Temporary Files
Security
Enabling LDAP Authentication
Configuring LDAP
LDAP Configuration Properties
Adding LDAP Users
Removing LDAP Users
Troubleshooting LDAP Configuration
LDAP Use Case Scenarios
Scenario 1: LDAP Authentication with Group Membership
Scenario 2: LDAP Authentication without Group Membership
Scenario 3: Import Users from an LDAP Group to TIBCO Data Science - Team Studio
Command-line Utilities for Managing the Services
Backing up in the previous version of TIBCO Data Science - Team Studio
Backing up in the current version of TIBCO Data Science - Team Studio
Restoring the TIBCO Data Science - Team Studio
TIBCO Data Science - Team Studio Log Files
Download Logs
Monitoring Logs
Administering TIBCO Data Science - Team Studio
Connecting TIBCO Data Science - Team Studio to Data Sources
Database Data Sources
Connect to a JDBC Data Source
Connect to a Hive JDBC Data Source
Connect to an Oracle Database
Enable Oracle Databases
Connect to a Greenplum Database
Connect to a Pivotal HAWQ Database
Connect to an Amazon RedShift Data Source
Connect to a BigQuery Data Source
BigQuery Data Source Connection Tests and Troubleshooting
Hadoop Data Sources
Adding a Hadoop Data Source from the User Interface
Hadoop Data Source Connection Tests and Troubleshooting
TIBCO Data Virtualization Data Sources
Connecting TIBCO Data Science - Team Studio to Spark Cluster
Adding a Spark Cluster from the User Interface
Configuring Connection Parameters
Workflow Editor Preferences
Algorithm Preferences
System Preferences
Data Source Preferences
UI Preferences
Work Flow Preferences
Datetime Formats Preferences
Administrator Options in TIBCO Data Science - Team Studio
Email Configuration
Usage Statistics
Data Visibility
Browsing Datasets In Your Workspace
Browsing Datasets In the Entire Application
Controlling Data Source Visibility
Controlling Data Source Permissions
Adding Data to a Workspace
Data Source Associations
Associating a Data Source
Data Source Credentials
Data Administrators
Data Source States
Manage TIBCO Data Science - Team Studio Users
Managing User Profiles
Add a New Person
User Roles
Establish Your Identity
The TIBCO Data Science - Team Studio Environment
Workspaces
Creating a Workspace
Other Workspace Activities
Workspace Tabs
Overview Tab
Workspace Stages
Workspace Roles
Data Sources Tab
Associating a Data Source with a Workspace from Your Workspace
Associating a Data Source with a Workspace from the Data Section
Data Tab
Exploring a Data Source
Previewing Data in a Workspace Table
Visualizing Data in a Workspace Table
Associating Datasets with a Workspace
Importing Data into a Workspace’s Sandbox
Importing Associated Data
Importing Oracle Datasets
Importing Hadoop Datasets
Scheduling Recurring Imports
Work Files Tab
Creating a Work File
Editing a Work File
Searching for a Work File
Copying a Work File to a Workspace
Importing a Work File
Deleting a Work File
Creating a SQL Work File
SQL Editor
Running a SQL Work File
Version Control in SQL Work Files
Jobs Tab
Viewing the Jobs List
Creating a Job
Adding Tasks to a Job
Running a Job
Viewing Job Results
Milestones Tab
Viewing a Milestone
Creating a Milestone
Engines Tab
Viewing the Engines List
Creating an Engine
Deploying and Governing an Engine
Testing an Engine
Workflows
Supported Workflows
Workflow Editor
Workflow Actions
Your Workflow Results
Refresh Metadata
Flow History
Export Flow
Workflow Variables
Clear Temporary Data
Convert to Spark/Revert to Non-Spark
Output All Results: Table or View
Preferences
Manage Custom Operators
System Logs
Workflow Menu
Data Explorer
Browse Hadoop Sources
Browse Database Sources
Data Sources
Operator Explorer
Operator Help
Creating a New Workflow
Importing TIBCO Data Virtualization Data into a Workflow
Creating a New Legacy Workflow
Importing Database Data into a Workflow
Importing Hadoop Data into a Workflow
Explore Visual Results
Navigating the Results Panel
Plotly Charts and Graphs
Spark Optimization for Data Scientists
Spark Autotuning
Settings for Spark-Enabled Operators
Advanced Settings dialog
alpine.conf Spark Settings
Spark Values
Team Studio-Specific Spark Values
YARN Configuration Values
Deploying Models and Workflows
Moving a Workflow from Development to Production
Preparing Data and Deploying Models
Optimizing Models
Batch Model Scoring
Real-Time Model Scoring
Code Generation
Model Management
Workflow Scheduling
Preview and Visualize Data
Preview
Show Table Metadata or Inspect Hadoop File Properties
Scatter Plot Chart
Bar Chart
Univariate Plot Chart
Box and Whisker Chart
Histogram
Summary Statistics (right-click)
Time Series Chart
Correlation Analysis
Frequency Analysis
Running a Workflow
Stepping Through a Workflow
Stopping a Workflow
Clearing a Workflow
Saving a Workflow
Reverting a Workflow
Running a Flow in Local Mode
Running Workflow Branches in Parallel
Handling Bad Data in Hadoop
Viewing Workflow Results
Saving Flow Output
Results Management
Downloading Results
Viewing Database SQL
Workflow Variables
Defining New Workflow Variables
Overriding Hadoop Data Source Parameters Using Workflow Variables
TIBCO Data Science - Team Studio Operator Job Names
Touchpoints
Creating a Touchpoint
Run Settings Tab
Adding Parameters to a Touchpoint
Testing a Touchpoint
Downloading Touchpoint Test Results
Running a Touchpoint
Publishing a Touchpoint to the Catalog
Touchpoint Parameters
Text Touchpoint Parameter
Multiline Text Touchpoint Parameter
Number Touchpoint Parameter
Single-Select Option Touchpoint Parameter
Multiple-Select Option Touchpoint Parameter
Date/Time Touchpoint Parameter
Search
Search Page Options
Tags
Viewing Tags
Navigating with Tags
Adding and Editing Tags
Deleting a Tag
Renaming a Tag
Jupyter Notebooks
Creating a Jupyter Notebook
Installing Python Packages
Python Packages Required for Jupyter Notebooks in TIBCO Data Science - Team Studio
Initializing PySpark for Spark Cluster
Initializing PySpark for Hadoop Distribution
Creating a Custom Environment for Running Jupyter Notebooks
Uploading and Running the Conda Environment Example
Adding Your Data to a Notebook
Reading Your Data Using Direct Connection in a Notebook
Incorporating Notebooks in a Workflow
Workflow Operators
Operator Actions
Connecting Operators
Selecting Multiple Operators
Editing Operator Properties
Deleting Operators
Moving Connections
Deleting Connections
Data Management
Selecting Groups of HDFS files
Data Exploration
Visualizing data with charts and graphs
Explore Visual Results
Navigating the Results Panel
Plotly Charts and Graphs
Correlation and Covariance
Information Value and Weight of Evidence Analysis
Data Transformation
Aggregation Methods for Batch Aggregation
Outliers in Numerical Data
Creating a Join condition for a database join
Key-Value Pairs Parsing Example using the Variable Operator
datetime Format Conversion Examples
Spark SQL Syntax and Expressions
Data Modeling and Model Validation
Cluster Analysis Using K-Means
K-Means Use Case
Patterns in Data Sets
Alpine Forest Operators
Ensemble Decision Tree Modeling with Alpine Forest
Model Export Formats
Fitting a Trend Line for Linearly Dependent Data Values
Linear Regression Use Case (1)
Linear Regression Use Case (2)
Probability Calculation Using Logistic Regression
Logistic Regression Use Case (1)
Logistic Regression Use Case (2)
Classification Modeling with Decision Tree
Decision Tree and CART Operator General Principles
Decision Tree Concept of Purity
Information Gain
Pruning or Pre-Pruning
Differences in Decision Tree Algorithms
Decision Tree Output Troubleshooting
Decision Tree Use Case
Classification Modeling with Naive Bayes
Naive Bayes Use Case
Computed Metrics and Use Case for the Regression Evaluator
Collaborative Filtering
Prediction Threshold
Principal Component Analysis
Support Vector Machine Classification
SVM Use Case
T-Tests
Independent Samples T-Test Use Case
Paired Samples T-Test Use Case
Single Sample T-Test Use Case
Testing Models for Performance Decay
Prediction
Prediction and Modeling Operator Pairings
Pearson's Chi Square Operations
Spark Node Fusion
Viewing Results for Individual Operators
Specialized Tools
Natural Language Processing Tools
Using the Results of Text Featurizer
Unsupervised Text Mining
LDA Training and Model Evaluation Tips
NLP Use Case
Test Corpus Parsing
Using Pig User-Defined Functions
DateTime Input Values
Setting Up Notebooks for Python Execute
R Execute
R Execute Error Messages
Syntax errors in the R script
Logical errors in the user's R script
Input data size limitations
Output data size limitations
Network issues
Missing output reference in the R script
Column name or type mismatches
Type coercion error
Workflow Operator Reference
Legacy Operators
Data Operators
Copy Between Databases
Copy To Database
Copy To Hadoop
Dataset
Hadoop File
Hive Table
Import Excel (DB)
Import Excel (HD)
Load To Hive
Exploration Operators
Bar Chart
Box Plot
Correlation (DB)
Correlation (HD)
Frequency
Histogram
Information Value
Line Chart
Scatter Plot Matrix
Summary Statistics (DB)
Summary Statistics (HD)
Variable Selection (DB)
Variable Selection (HD)
Transformation Operators
Aggregation (DB)
Aggregation (HD)
Batch Aggregation
Collapse
Column Filter (DB)
Column Filter (HD)
Correlation Filter (DB)
Correlation Filter (HD)
Distinct (DB)
Distinct (HD)
Fuzzy Join
Join (DB)
Join (HD)
Normalization (DB)
Normalization (HD)
Null Value Replacement (DB)
Null Value Replacement (HD)
Numeric to Text (DB)
Numeric to Text (HD)
One-Hot Encoding
Pivot (DB)
Pivot (HD)
Reorder Columns (DB)
Reorder Columns (HD)
Replace Outliers (DB)
Replace Outliers (HD)
Row Filter (DB)
Row Filter (HD)
Sessionization
Set Operations (DB)
Set Operations (HD)
Sort By Multiple Columns
Time Series SAX Encoder
Transpose
Unpivot (DB)
Unpivot (HD)
Unstack
Variable (DB)
Variable (HD)
Wide Data Variable Selector - Chi Square / Anova
Wide Data Variable Selector - Correlations
Window Functions - Aggregate
Window Functions - Lag/Lead
Window Functions - Rank
Sampling Operators
Random Sampling (DB)
Random Sampling (HD)
Resampling
Sample Selector
Stratified Sampling
Modeling Operators
Alpine Forest - MADlib
Alpine Forest Classification
Alpine Forest Predictor - MADlib
Alpine Forest Regression
ARIMA Time Series (DB)
ARIMA Time Series (HD)
Association Rules
Collaborative Filter Trainer
Decision Tree
Decision Tree - MADlib
Decision Tree Classification - CART
Decision Tree Regression - CART
Elastic Net Linear - MADlib
Elastic Net Logistic - MADlib
Generalized Linear Regression Models
Gradient Boosting Classification
Gradient Boosting Regression
K-Means (DB)
K-Means (HD)
K-Means Clustering - MADlib
Linear Regression (HD)
Linear Regression (DB)
Linear Regression - MADlib
Logistic Regression (DB)
Logistic Regression (HD)
Logistic Regression - MADlib
Naive Bayes (DB)
Naive Bayes (HD)
Neural Network
PCA (DB)
PCA (HD)
SVM Classification
NLP Operators
N-gram Dictionary Builder
N-gram Dictionary Loader
Text Extractor
Text Featurizer
Stop Words
LDA Predictor
LDA Trainer
Prediction Operators
Chi Square, Goodness of Fit
Chi Square, Independence Test
Classifier (DB)
Classifier (HD)
Collaborative Filter Predictor
Collaborative Filter Recommender
K-Means Predictor - MADlib
PCA Apply
Predictor (DB)
Predictor (HD)
Model Validation Operators
Alpine Forest Evaluator
Classification Threshold Metrics
Confusion Matrix
Goodness of Fit
Lift (DB)
Lift (HD)
Regression Evaluator (DB)
Regression Evaluator (HD)
ROC
T-Test - Independent Samples
T-Test - Paired Samples
T-Test - Single Sample
Tool Operators
Convert
Export
Export to Excel (DB)
Export to Excel (HD)
Export to FTP
Export to SBDF (DB)
Export to SBDF (HD)
Flow Control
HQL Execute
Load Model
Note
Pig Execute
Python Execute (DB)
Python Execute (HD)
R Execute (DB)
R Execute (HD)
SQL Execute
Sub-Flow
Apache Spark Specific Operators
Data Operators
Dataset
Import Excel
Exploration Operators
Correlation
Summary Statistics
Modeling Operators
Elastic-Net Linear Regression
Elastic Net Logistic Regression
Gradient-Boosted Tree Classification
Gradient-Boosted Tree Regression
Isolation Forest
K-Means Clustering
Naive Bayes
PCA
Random Forest Classification
Random Forest Regression
SOM Clustering
Model Validation Operators
Confusion Matrix
Goodness of Fit
Regression Evaluator
Prediction Operators
Predictor
Sampling Operators
Random Sampling
Resampling
Sample Selector
Tool Operators
Export Model to ModelOps
Export Model to Workspace
Export to SBDF
Load Model
SQL Execute
Transformation Operators
Batch Aggregation
Column Cleanser
Column Filter
Distinct
Dynamic Column Filter
Fourier Bessel
Join
Normalization
Null Value Replacement
Pivot
Reorder Columns
Row Cleanser
Row Filter
Set Operations
Unpivot
Variable
Wide Data Variable Selector - Chi Square/Anova
Wide Data Variable Selector - Correlations
Window Functions - Rank
Operator dialogs
Advanced Parameter Configuration dialog
Advanced Settings dialog
Bin Configuration dialog
Choose Collapse Columns dialog
Configure Columns dialog
Configure Columns: Text Files
Configure Columns: XML and JSON Files
Configure Columns: Log Files
Define Column Aggregations dialog
Define Filter dialog
Define Filter dialog for Row Filter Operator
Define Join Conditions dialog (Hadoop)
Define Pig Script dialog
Define Quantile Variables dialog
Define R Script dialog
Define Sample Size dialog
Define Sets dialog
Define SQL Statement dialog
Define Variables dialog
Edit Table Columns dialog
Input Table Mapping dialog
Interaction Parameters dialog
Join Properties - Database dialog
Key Columns dialog
Null Value Replacement Configuration dialog (DB)
Null Value Replacement Configuration dialog (HD)
Null Value Replacement Configuration dialog
Ordered Columns dialog
Results File Structure dialog
Select Columns dialog
Storage Parameters dialog
Store Intermediate Results
Sub-Flow Variable dialog
Window Column Configuration dialog
Operator Compatibility
AWS EMR Data Source and Operator Compatibility
Deprecated, Removed, or Replaced Operators
Processing Tools for Hadoop-Enabled Operators
Processing Tools for Data Load Operators (HDFS)
Processing Tools for Exploration Operators (HDFS)
Processing Tools for Transformation Operators (HDFS)
Processing Tools for Sample Operators (HDFS)
Processing Tools for Modeling Operators (HDFS)
Processing Tools for NLP Operators (HDFS)
Processing Tools for Prediction Operators (HDFS)
Processing Tools for Model Validation Operators (HDFS)
Processing Tools for Tool Operators (HDFS)
Glossary
For Developers
Custom Operators
Required Tools
Configuring Java
Installing and Configuring Maven
Installing the Custom Sample Operator for your Version
Compiling the Samples
Compiling the Samples on Windows
Troubleshooting Compiling the Samples
Uploading an Operator to TIBCO Data Science - Team Studio
Viewing and Running Samples From IntelliJ IDEA
Troubleshooting Viewing and Running Samples from IntelliJ IDE
Building Your First Custom Operator in Scala
Setting Up Your Environment
Creating the Signature Class
Creating the Constants Object
Creating the GUI Node Class
Creating the onPlacement Method
Building the Operator dialog
Defining the Output Schema
Creating a Runtime Class
Preparing Your Operator to Build
Building a Source Operator
Setting Up Your Environment
Creating the Signature Class
Creating the Utils Object
Creating the GUI Node Class
Creating the OnPlacement Method
Creating the onInputOrParameterChange Method
Creating a Runtime Class
Creating a Spark Job
Setting Up the Spark Job
Creating the Dataset
Exporting the Dataset
Creating a Custom Visualization with Scala and Java script
TIBCO Data Science - Team Studio API
ROOT_URL
Running a Workflow
Running a Workflow With Variable Substitution
Querying a Running Workflow
Stopping a Running Workflow
Response failures
TIBCO Data Science - Team Studio API Demo
Version and Licensing
Open Source Attributions
Collaboration Framework
Common Infrastructure
Python Components for the Jupyter Notebook Packages
R Server Connector Package for TIBCO Data Science - Team Studio
TIBCO Data Science - Team Studio Core Attributions
API Documentation
Environment Overview
Product Connectivity
Ports and Protocols
Public-Facing Client Connection Ports
Outbound Connections
Authentication and Authorization
Authentication
Authentication with HTTP Request
Authorization
User Roles
Data Access Control
Data Visibility and Control
Data Access Permissions
For Monitoring the Logs
Session Management
Session IDs
Session Rotation
Session Timeouts and Configuration
Cryptography
Data at Rest
Data in Motion
TIBCO Documentation and Support Services
Legal and Third-Party Notices