TIBCO Data Science - Team Studio API Demo
You can run workflows remotely through the application's API. Beyond the ability to run the flow from outside the UI, you can define workflow variables at runtime. In this demo, we show how this can empower you to loop through a list in which the same modeling must be done for various values.
In this example, we look at the election92 dataset, a 3,141 row dataset with information about each county in the United States at the time of the 1992 election. Included in this data are the county name and state as well as statistics about the county demographics and voting patterns.
50 States, 50 Workflows?
While there are many useful applications of this entire dataset, it may be more interesting to analyze each state's data individually. This work could be replicated with the Group By configuration in the Summary Statistic operator, but that would require creating 50 row filters and manipulating them all within the UI.
Instead, we create a single workflow, use workflow variables, and contact the API to automate the looping over all states.
Data Set Exploration
A first glance at the dataset indicates that we should be able to do a simple row filter on the state column to isolate a state's data.
Row Filter with Workflow Variable
By using workflow variables, we can change the state we are filtering each time we run the workflow using the API.
@state set to AL. Note that all workflow variables begin with @.
Now we define our filter configuration by adding a row filter to our data, selecting Define Filter from the configuration dialog, and setting the Column Name (state[chararray]) to be equal to '@state'. Be sure to include the single quotes around the workflow variable.


To point out the ability to use workflow variables on data sets, we save our results as 50 individual tables within our Hadoop data source. To do this, we set the
Results Name field to
@state. This means that when the results are saved, the value of
@state, which is currently AL, is used as the file name for the results. By using the API, we can loop through a list of the state names and create one file per state.
Use the API to Run the Workflow
The TIBCO Data Science - Team Studio API enables us to run a workflow without having to go to TIBCO Data Science - Team Studio in our browser. Instead, we can use the command line or write a script to run workflows. This extends the power of TIBCO Data Science - Team Studio, allowing us to run the workflow as a piece of our script. In this demo, we write a Bash script that can be run from the command line. This script loops over a list of the state abbreviations, allowing us to define the abbreviation when we run the flow. By doing this, we can change the workflow variables we mentioned above 50 times in a fraction of the time it would take us to do manually.
Creating the Request
Each of the following steps are put into a Bash script. We begin by creating a file called
50states.sh.
First, we define our root URL as follows:
ROOT_URL="http://$host:$port/alpinedatalabs/api/v1/json"
where
$hostis the TIBCO Data Science - Team Studio application host.$portis the TIBCO Data Science - Team Studio port.alpinedatalabsis our application server name.
The complete URL for the request should take the following form:
url="$ROOT_URL/workflows/$workflowID/run?token=$token"
where
$workflowIDis the ID of the workflow being run.$tokenis the API authentication token (same as the TIBCO Data Science - Team Studio session ID, found at http://$host:$port/servername/api/sessions).
Next, we must define our workflow variables. To do this, we include a JSON payload in the request. Below is an example of this payload:
{
"meta" : { "version" : 1},
"variables" : [
{ "name" : "@state" , "value" : "AL"}
]
}
Finally, we combine this all in a Bash script that loops through all 50 2-letter state abbreviations and calls the workflow using the abbreviation as the workflow variable. Running the API creates a new directory in our Hadoop data source that contains the filtered data. As you can see, we've added the line 'sleep 15' to ensure that the workflow finishes running before moving to the next state. Alternatively, you query the process which is running to see if it has finished.
Our final script:
workflowID=??? token=mytokenhere ROOT_URL="http://$host:$port/servername/api/v1/json" url="$ROOT_URL/workflows/$workflowID/run?token=$token" states=("AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VA" "WA" "WV" "WI" "WY") for i in "${states[@]}" do workflow_variables='{"meta":{"version":1}, "variables":[{"name":"@state", "value":"'"$i"'"}]}' response=$(curl -i -H "Content-Type: application/json" -d "$workflow_variables" -X POST "$url") echo $response sleep 15 done
Adding Summary Statistics
After running the above script successfully, we create a workflow like the one below to get the summary statistics for each state. This is a new workflow file, so that it has a different ID in the API.
We navigate to the folder on our cluster where we stored the result file, and drag it to the canvas. We call it "State Data". Double-click the icon to bring up the configuration properties.
Change the value AL in the
Hadoop File Name parameter to be our workflow variable, @state. This allows us to change the state we are analyzing when we call the API.
Configure Summary Statistics
Our Summary Statistics configuration is shown below, showing statistics for the pop, income, and turnout columns.
Once again, we set the
Results Name configuration to
@state. By setting
Overwrite to
Yes, we create a way to rerun this information.
Run Workflow
Running the Summary Statistics workflow is nearly identical to running the Row Filter workflow from before. A different workflow ID is called in the script, using the same looping pattern as before to go through all 50 states. In the end, we are left with 50 result tables from the Summary Statistics calculations, stored on our HDFS cluster.