Configuring the HDFS Directory and Permissions for Results File Storage

This procedure describes how to create an HDFS directory for a user, set Active Directory user permissions, and set permissions to temp directories.

Prerequisites

Procedure

  1. Create an HDFS Directory for user "chorus." This directory is used to cache the uploaded .jar files such as spark-assembly.jar.
  2. Provide the user with read, write, and execute permissions for the /user/chorus directory.
  3. The staging directory is typically set as /user. If it is not, create a directory using the modified /<staging directory>/chorus.
  4. To run Pig jobs, the Team Studio application attempts to create a folder /user/<username> as the Active Directory user. By default, the permissions are set to hdfs:supergroup:drwxr-xr-x, which prevents Team Studio from creating that folder. To grant write access to that folder to the Active Directory users who are running the Team Studio application, change permissions to drwxrwxr-x or drwxrwxrwx.
  5. Set permissions to temporary directories.
    To run Yarn, Pig, and similar jobs, each individual user might need to write temp files to the temporary directories. There are many Hadoop temp directories such as hadoop.tmp.dir and pig.tmp.dir, all of which are based on the /tmp directory by default. Therefore, the /tmp directory must be writable by everyone to enable them to run different jobs. Additionally, it must be executable by everyone to enable them to recurse the directory tree. Set the /tmp permissions using the following command:
    hadoop fs -chmod +wx /tmp