Team Studio Related HDFS Configuration

Team Studio uses several temp directories in HDFS. These directories and files are created with HDFS, Yarn, MapRed, and other users.

The temp directories must be made accessible to user chorus and other relevant users at the base level. Only individual directories for the corresponding user can be viewed by the specified user. Those directories are:

  • Standard output for operators: @default_tmpdir/dsts_out/<user_name>/<workflow_name>/
  • Team Studio temporary output: @default_tmpdir/dsts_runtime/<user_name>/<workflow_name>/
  • Team Studio model location: @default_tmpdir/dsts_model/<user_name>/<workflow_name>/

Set or change the permissions and ownership as follows:

  • The /tmp directory should be readable and writable.
  • The /tmp/hadoop-yarn directory should be readable and writable for Spark jobs.

The upgrade options are as follows (choose one):

  • Change /tmp/dsts_* directories with full permissions, so everyone can read/write/execute.
  • Delete the /tmp/dsts_* directories and let the upgraded Team Studio application recreate them. If you are using LDAP, the recreated directories will have the default structure /tmp/dsts_*/<LDAP_username>/workflowname/operator/, and permissions at this directory level can be limited to the LDAP_username as desired.

By default, @default_tmpdir is set to /tmp. This can be modified for individual workflows using Workflow Variables, or for all newly created workflows using Workflow Editor Preferences.

Team Studio overwrites @default_tmpdir/dsts* files as users re-run workflows. Team Studio users can clear selected @default_tmpdir/dsts_out files using Clear Temporary Data. Hadoop administrators can safely clear @default_tmpdir/dsts_runtime from HDFS, because this directory is used to store information for which Team Studio users have chosen the option Store Results = False.
Note: Handle @default_tmpdir/dsts_model with caution, because Team Studio users might need to export models from this directory.