Setting up Hadoop Distributed File System
Apache Spark is compatible with Hadoop data. You can run it in Hadoop clusters through YARN or Apache Spark's standalone mode, and it can process data in Hadoop Distributed File System (HDFS). HDFS is highly fault tolerant and efficient in parallel data processing. HDFS takes in data, breaks the information into separate blocks, and distributes them to different nodes in a cluster.
Prerequisites
For example, home/username/hadoop/hdfs/namenode and home/username/hadoop/hdfs/datanode.
Procedure
What to do next
- To verify if HDFS is running correctly , run the
JPS command from any path. The following processes with ID are displayed:
[username@TIBCO MDM master machine name sbin]# jps 5607 Jps 4634 DataNode 4842 SecondaryNameNode 5132 NodeManager 4527 NameNode 5023 ResourceManager
- Configure TIBCO MDM with Apache Spark. For information, see the "Configuration Properties for Apache Spark" and "Required JAR Files for Apache Spark" sections in TIBCO MDM User's Guide.
Copyright © Cloud Software Group, Inc. All rights reserved.