HDFSOperation

You can use the HDFSOperation activity to do basic operations on files in HDFS, including copying files between HDFS and a local system, renaming files in HDFS, and deleting files from HDFS.

General

In the General tab, you can specify the activity name in the process, establish a connection to HDFS, and select the specific HDFS operation that you want to perform.

The following table lists the configurations in the General tab of the HDFSOperation activity:

Field Module Property? Description
Name No The name to be displayed as the label for the activity in the process.
HDFSConnection Yes The HDFS Connection shared resource that is used to create a connection between the plug-in and HDFS. Click to select an HDFS Connection shared resource.

If no matching HDFS Connection shared resources are found, click Create Shared Resource to create one. For more details, see Creating an HDFS Connection.

HDFSOperation No The HDFS operation that you want to perform. Select a HDFS operation from the list:
  • PUT_LOCAL_TO_HDFS: copy local files or folders to HDFS.
  • GET_HDFS_TO_LOCAL: copy files or folders from HDFS to a local file system.
  • RENAME_HDFS: rename files in HDFS.
  • DELETE_FROM_HDFS: delete files from HDFS.

Description

In the Description tab, you can enter a short description for the HDFSOperation activity.

Input

In the Input tab, you can configure the HDFS operation that you select in the General tab. The input elements of the HDFSOperation activity vary depending on the HDFS operation that you select in the General tab.

The following table lists all the possible input elements in the Input tab of the HDFSOperation activity:
Note: For the PUT_LOCAL_TO_HDFS and GET_HDFS_TO_LOCAL operations, you can optionally copy all the files in a source folder to a destination folder. To do so, provide the full folder path in both sourceFilePath and destinationFilePath.
Input Item Data Type Description
HDFS Complex The HDFS operation configuration.

This element contains the elements from sourceFilePath to recursive that are listed in this table.

sourceFilePath String The path of the source file. Alternatively, to copy multiple files to a destination folder, you can provide the path to the folder that contains the source file(s). The plug-in will automatically copy all the files in the source folder to the destination folder that you specify in destinationFilePath.
destination

FilePath

String The path of the destination file or folder into which you want the source files copied.
overwrite Boolean If a file that has the same name already exists in the specified destination path, yo can specify whether you want to overwrite the existing file , 1 (true) or 0 (false).
blockSize Long The block size of the file. The value in this field must be greater than 0.
replication Short The number of replications of the file. The value in this field must be greater than 0.
permission Integer The permission of the file. The value in this field must be in the range 0 - 777.
offset Long The starting byte position. The value in this field must be 0 or greater.
length Long The number of bytes to be processed.
bufferSize Integer The size of the buffer that is used in transferring data. The value in this field must be greater than 0.
recursive Boolean You can specify whether you want to operate on the content in the subdirectories, 1 (true) or 0 (false).
timeout Long The amount of time, in milliseconds, to wait for this activity to complete.

By default, this activity does not time out if you do not specify a value.

Output

In the Output tab, you can view whether the execution is successfully.

The following table lists the output elements in the Output tab of the HDFSOperation activity:

Output Item Data Type Description
HDFS Complex The execution of HDFS operation.

This element contains the status and the msg elements.

status Integer A standard HTTP status code that indicates whether the execution is successful.
msg String The execution message.

Fault

In the Fault tab, you can view the error code and error message of the HDFSOperation activity. See Error Codes for more detailed explanation of errors.

The following table lists the error schema elements in the Fault tab of the HDFSOperation activity:

Error Schema Element Data Type Description
msg String The error message description that is returned by the plug-in.
msgCode String The error code that is returned by the plug-in.
exception String The exception occurs when the plug-in has internal errors.
message String The error message that is returned by the server.
javaClassName String The name of the Java class where an error occurs.