Hadoop Cluster Configuration
This section demonstrates how to configure several settings for Kerberos on the Hadoop cluster. It includes examples of the files you must change, customizing the values of each key as necessary.
According to Cloudera documentation (Configure Secure YARN), you must set the YARN configuration that is detailed in Configuring HDFS and YARN. Similarly, according to the book Hadoop Security, by O'Reilly, Spivey, Joey Echeverria, the following advice is given.
"In addition to configuring the NodeManager to use Kerberos for authentication, we need to configure the NodeManager to use the LinuxContainerExecutor. The Linux ContainerExecutor uses a setuid binary to launch YARN containers. This allows each NodeManager to run the containers using the UID of the user that submitted the job. This is required in a secure configuration to ensure that Alice can't access files created by a container Bob launched. Without the LinuxContainerExecutor, all of the containers would run as the yarn user and containers could access each other's local files. First set the following parameters in the yarn-site.xml file" (p. 57).
In Team Studio testing, this setting is not required to enable Kerberos authentication. We added the above configuration due to the Cloudera and Hadoop Security recommendations. The System Administrator must determine whether to set this configuration.
- Configuring HDFS and YARN
To configure HDFS and YARN for your Kerberos integration of Team Studio, follow these steps. - Setting HDFS Permissions
Setting HDFS permissions includes creating a serviceuser directory and then setting permissions for Active Directory and various temporary directories.