TIBCO Data Science - Team Studio Related HDFS Configuration

TIBCO Data Science - Team Studio uses several temp directories in HDFS. These directories and files are created with HDFS, YARN, and other users when it is connected to EMR 5.35.x.

The temp directories must be made accessible to user Chorus and other relevant users at the base level. Only individual directories for the corresponding user can be viewed by the specified user. Those directories are:

  • Standard output for operators: @default_tmpdir/dsts_out/<user_name>/<workflow_name>/
  • TIBCO Data Science - Team Studio temporary output: @default_tmpdir/dsts_runtime/<user_name>/<workflow_name>/
  • TIBCO Data Science - Team Studio model location: @default_tmpdir/dsts_model/<user_name>/<workflow_name>/

Set or change the permissions and ownership as follows:

  • The /tmp directory should be readable and writable.
  • The /tmp/hadoop-yarn directory should be readable and writable for Spark jobs.

The upgrade options are as follows (choose one):

  • Change /tmp/dsts_* directories with full permissions, so everyone can read/write/execute.
  • Delete the /tmp/dsts_* directories and let the upgraded TIBCO Data Science - Team Studio application recreate them. If you are using LDAP, the recreated directories have the default structure /tmp/dsts_*/<LDAP_username>/workflowname/operator/, and permissions at this directory level can be limited to the LDAP_username as desired.

TIBCO Data Science - Team Studio overwrites @default_tmpdir/dsts* files as users re-run workflows. TIBCO Data Science - Team Studio users can clear selected @default_tmpdir/dsts_out files using Clear Temporary Data. Hadoop administrators can safely clear @default_tmpdir/dsts_runtime from HDFS, because this directory is used to store information for which TIBCO Data Science - Team Studio users have chosen the option Store Results = False.

Note: Handle @default_tmpdir/dsts_model with caution, because TIBCO Data Science - Team Studio users might need to export models from this directory.