Hadoop Connection Prerequisites
This checklist is provided to help ensure that all Team Studio components of a typical Hadoop-based installation are accounted for and completed.
The Hadoop connection configuration requires the HDFS host, HDFS port, Jobtracker host, and Jobtracker port. All Hadoop node hostnames must resolve to the proper computers from the Team Studio server. Team Studio needs access to a Hadoop administrator or anyone with access to the Hadoop configuration files (*-site.xml) if the inputs provided in the form below are not valid. Team Studio also might have to make changes to the host file of the Team Studio server if the Hadoop hostnames do not resolve.
The connection takes approximately two hours to configure and test if the Hadoop cluster is not configured for Kerberos. If it is, the user running Team Studio on the Team Studio server must have a keytab to authenticate in Kerberos. Team Studio requires that keytabs for the NameNode and Jobtracker are located on the Team Studio server. If any of these three elements is missing or invalid, Team Studio requires that a Hadoop administrator is available to contact during installation. Configuring the initial connection to a cluster configured for Kerberos takes approximately four hours.
Question | Response | For Reference |
---|---|---|
What are the HDFS host and port? | Can be found in core-site.xml as fs.default.name: hdfs://HDFSHOST:HDFSPORT |
Question | Response | For Reference |
---|---|---|
What is the name of the name service? | Can be found in hdfs-site.xml as dfs.nameservices: hdfs://nameservice1 | |
What is the value for dfs.ha.namenodes.<namerservice>? | Can be found in hdfs-site.xml using the name of the name service. | |
What are the values for dfs.namenode.rpc- address.<nameservice>.<namenode>? | Can be found in hdfs-site.xml using the name of the name service, and each NameNode specified in the previous row. | |
What is the value for dfs.client.failover.proxy.provider.<namerservice>? | Can be found in hdfs-site.xml using the name of the name service. |
Question | Response | For Reference |
---|---|---|
What are the Job host and port? | Can be found in mapred-site.xml as mapred.job.tracker: hdfs://JOBHOST:JOBPORT |