Execute External Workspace Node

The Statistica Workspace is a graphical data preparation and analysis environment, which allows you to create, view and edit a symbolic representation of a flow from the input data to the final results and models. Such flows sometimes become very complex and hard to read when deployed to production or business processes with complex data structures. It is also difficult to collaborate with others when the whole workflow is presented on a single workspace.

Execute External Workspace nodes allow you to create references to other workspaces, pass the data, and to execute and collect the results of the workspace execution. Thus they simplify the viewing of the workspace as well as enables collaboration with other team members who may be working on separate building blocks of a large workflow. "Execute External Workspace" accepts multiple inputs but produces a single output and, optionally, collections of reporting documents. The Execute External Workspace node can reference other workspaces deployed to the Statistica Enterprise repository. The user of the main workspace is assumed to have at least read permissions for the external workspace deployed to Statsitica Enterprise.

Example

In this example of a credit scoring application, the data comes from three different sources:

Applicant Info: ID, Balance of Current Account, Payment of Previous Credits, Value of Savings, Employed by Current Employer for, Installment in % of Available Income, Marital Status, Gender, Living in Current Household for, Most Valuable Assets, Age, Further running credits, Type of Apartment, Number of previous credits at this bank, Occupation

Credit Info: ID, Duration of Credit, Purpose of Credit, Amount of Credit

Credit Rating: ID, Credit Rating

In order to build a credit scoring model all three sources should be merged. The complete workflow may look like the example below:

In practical use cases, the data preparation alone can involve dozens of nodes, which might represent:

Mappings to dictionaries

Various data quality and cleaning procedures

Business rules and transformations

In this example, you can move data preparation operations (represented by two merge variables nodes in this simple example) to another external workspace.

Create a new workspace (Execute External Workspace - Credit rating (Merge Data)), and put templates of the input nodes into it, as illustrated below:


  

Even though this step is not required, we recommend that you include a small subset or a single row of data in the input nodes for testing and troubleshooting purposes. During the execution of the calling workspace, the data in the inputs of the external workspace will be substituted with the respective data in the calling workspace (for nodes mentioned in the input assignment).

Once, the workspace is created, mark the name of the node that produces a spreadsheet to be returned to the calling workspace. In this example, it is Merge Variables2.

Finally, deploy the workspace  to Statistica Enterprise, following a set of standard steps.

At this point you can create the main modeling workflow as follows. Now the data preparation will be substituted by a single Execute External Workspace node.

 Parameter Details

Workspace Path

Select the complete path to the external workspace deployed to Statistica Enterprise. User can modify this parameter either by providing a string or by using a Select button to navigate in the Statistica Enterprise repository.

Version Information

Select  Version informationfrom this dropdown list. If SDMS integration is enabled in Statistica Enterprise, the latest or latest approved version of the external workspace can be used in the analysis.

Use Defined Input/Output Nodes

Check this option to ignore the Inputs and Output Assignment parameters, and use the nodes marked as Input/Output in the external workspace.

Inputs Assignment

In this textbox, map your inputs.   This is the key parameter of this node, as it maps the  inputs of the Execute External Workspace node on the main workspace to the inputs of the referenced workspace.

Put Names of the respective nodes in quotes.

Use the sign ->  for mapping (node name on the main workspace -> node name on the external workspace).

 Use a semicolon as a delimiter between multiple mappings.

Example:

"Applicant Info"->"Applicant Info";"Credit Info"->"Credit Info";"Credit Rating"->"Credit Rating"

Output Assignment

This parameter defines the name of the node on the external workspace from which the data will be extracted and put on a downstream spreadsheet node of the Execute External Workspace node. The name should not have quotes and should match exactly the name of the node on the external workspace.

Check Inputs For Consistency

When checked, the system will compare the datasets defined in inputs assignment, and verify that they have the same number of columns, and the same column names and types. An error will be raised if differences are found.

Output Report Documents

In some cases the external workspace can produce some output to Reporting Documents during the execution (descriptive statistics and others). When checked, the system will merge reporting documents on the external workspace with reporting documents on the main workspace.