Configuring the HDFS Directory and Permissions for Results File Storage
This procedure describes how to create an HDFS directory for a user, set user permissions, and set permissions to temp directories. Perform the following steps when it is connected to Spark cluster with YARN.
Note: For the
New Workflow
engine, perform the step 1 and step 2. For the Legacy Workflow
engine, perform the step 1 to step 5.- Procedure
- Create an HDFS Directory for user "tds". This directory is used to cache the uploaded .jar files such as spark-assembly.jar.
- Provide the user with read, write, and execute permissions for the /user/tds directory.
- The staging directory is typically set as /user. If it is not, create a directory using the modified /<staging directory>/tds.
- To run Pig jobs, the TIBCO Data Science - Team Studio application attempts to create a folder /user/<username> as the Active Directory user. By default, the permissions are set to hdfs:supergroup:drwxr-xr-x, which prevents TIBCO Data Science - Team Studio from creating that folder. To grant write access to that folder to the Active Directory users who are running the TIBCO Data Science - Team Studio application, change permissions to drwxrwxr-x or drwxrwxrwx.
- Set permissions to temporary directories.
To run YARN, Pig, and similar jobs, each individual user might need to write temp files to the temporary directories. There are many Hadoop temp directories such as hadoop.tmp.dir and pig.tmp.dir, all of which are based on the /tmp directory by default. Therefore, the /tmp directory must be writable by everyone to enable them to run different jobs. Additionally, it must be executable by everyone to enable them to re curse the directory tree. Set the /tmp permissions using the following command:
hadoop fs -chmod +wx /tmp