Setting HDFS Permissions

Setting HDFS permissions includes creating a serviceuser directory and then setting permissions for Active Directory and various temporary directories.

Perform this task on the computer where Team Studio is installed.

Prerequisites

You must have write access to the configuration files.

Procedure

  1. Create the HDFS directory /user/serviceuser/ with the owner:group as serviceuser:supergroup.
    This directory is used to cache the uploaded JAR files such as spark-assembly.jar.
    Note: This staging directory is typically set as /user. If not, create the directory using /<staging directory>/serviceuser.
  2. Give the /user/serviceuser directory read, write, and execute permissions for the serviceuser.
  3. Set the Active Directory (AD) permissions.
    To run Pig jobs, the Team Studio application attempts to create a folder /user/<username> as the AD user. By default, the permissions are set to hdfs:supergroup:drwxr-xr-x,, which prevents Team Studio from creating that folder.
    1. Change permissions to grant write access to that folder to the AD users running the Team Studio application (drwxrwxr-x or drwxrwxrwx).
  4. Set permissions for the temporary directory HDFS /tmp.
    To run YARN, Pig, and similar jobs, each individual user might need to write temp files to the temporary directories. There are many Hadoop temp directories such as hadoop.temp.dir, pig.tmp.dir, and so on. By default, all of them are based off of the /tmp directory.
    1. Make the /tmp directory writable by everyone so that everyone can run different jobs.
    2. Make the /tmp directory executable by everyone so that everyone can recurse the directory tree. Set the /tmp permissions using the following command:
      hadoop fs -chmod +wx /tmp

      Setting this option allows all users to recurse the directory tree.

  5. Set the permissions for the temporary directories HDFS /tmp/tsds_*.
    The Team Studio application generates these directories in HDFS:
    • /tmp/tsds_out/<username>
    • /tmp/tsds_model/<username>
    • /tmp/tsds_runtime/<username>
    1. Set or change the permissions and ownership appropriately, as follows.
      • The /tmp directory should be readable, writable, and executable.
      • The /tmp/tsds_*/<username> directories for the corresponding user can be viewed by that user.
      • The /tmp/hadoop-yarn directory should be readable and writable for Spark jobs.