HDFS File Writer Adapter

Introduction

The TIBCO StreamBase® File Writer Adapter for Apache HDFS writes a file to a configured Hadoop Distributed File System resource.

The adapter writes in response to the receipt of a tuple on its control input port to open and close the file and then the data port for the contents.

The adapter has a sample, described in HDFS Delete Adapter Sample.

HDFS File Writer Properties

This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Adapter: A read-only field that shows the formal name of the adapter.

Class: A field that shows the fully qualified class name that implements the functionality of this adapter. Use this class name when loading the adapter in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this adapter starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the adapter does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Adapter Properties Tab

Property Description
Default File Name The default file name to create on start-up. If not file is specified the control port must be enabled to send control tuples to open a file for writing.
Default User The default user if none is provided on the control input port
Data Field The field from the incoming data port to write to the file. If left blank the entire tuple will be written as a string.
Default File Name The default file name to create on start-up. If not file is specified the control port must be enabled to send control tuples to open a file for writing.
Data Field The field from the incoming data port to write to the file. If left blank the entire tuple will be written as a string.
Enable Control Port If checked this will enable the control port which allows the user to send control commands to open, flush, and close files.
Enable Status Port If checked this will enable the status port which gives information about what actions are being taken against files.
Write Line Separator After Tuple If checked this will write the system new line character after each tuple is received and written to the file.
File Create Mode This determines how a file will be created:
  1. Append - If the file exists its contents will remain the new data will be appended at the end.

  2. Overwrite - If the file exists its contents will be overwritten.

  3. Fail - If the file exists a Reject status tuple will be emitted and the file will not be opened for writing.

File Compression This determines how a file will be compressed when written:
  1. None - The file will be written with no compression.

  2. GZip - The file will be written with the GZip compression format.

  3. BZip2 - The file will be written with the BZip2 compression format.

  4. Zip - The file will be written with the standard Zip format and contain a single entry with the same name as the file name you specify minus the extension.

Flush Interval The value in milliseconds to perform a flush on the file to write out any buffered data. A value of 0 means a flush will be performed after each tuple is received.
Log Level Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL.

HDFS Tab

Property Data Type Description
Buffer Size (Bytes) int The size of the buffer to be used, if empty the default will be used
Replication short The required block replication for the file, if empty the server default will be used, only used during file creation
Block Size (Bytes) long The default data block size, if empty the server default will be used, only used during file creation

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Description of This Adapter's Ports

The File Writer adapter's ports are used as follows:

  • Data (input): Tuples are received on this port and written to the current file. The schema for this port can be anything.

  • Control (input): Tuples enqueued on this port cause the adapter to perform actions on files. The schema for this port has the following fields:

    • Command, string, the command to perform:

      1. Open - Open a file for written with the options given.

      2. Close - Close the currently open file.

      3. Flush - Force a flush operation on the current file to write out any buffered data.

    • FileName, string, Required by the Open command, the name of the file to open.

    • CreateMode, string, determines how a file will be created, if null this remains unchanged and the last value loaded is used:

      1. Append - If the file exists its contents will remain the new data will be appended at the end.

      2. Overwrite - If the file exists its contents will be overwritten.

      3. Fail - If the file exists a Reject status tuple will be emitted and the file will not be opened for writing.

    • Compression, string, determines how a file will be compressed when written, if null this remains unchanged and the last value loaded is used:

      1. None - The file will be written with no compression.

      2. GZip - The file will be written with the GZip compression format.

      3. BZip2 - The file will be written with the BZip2 compression format.

      4. Zip - The file will be written with the standard Zip format and contain a single entry with the same name as the file name you specify minus the extension.

    • FlushInterval, int, The value in milliseconds to perform a flush on the file to write out any buffered data, if null this remains unchanged and the last value loaded is used.

    • WriteLineSeparator, boolean, determines if the system new line character will be written after each tuple is received, if null this remains unchanged and the last value loaded is used.

  • Status (output): The adapter emits tuples from this port when significant events occur, such as when an attempt to open, flush, or close a file occurs. The schema for this port has the following fields:

    • Type, string: returns one of the following values to convey the type of event:

      • User Input

      • System

    • Action, string: returns an action associated with the event Type:

      • Rejected

      • Open

      • Close

      • Flush

      • Error

    • Object, string: returns an event type-specific value, such as the name of the file which a open failed or the control input tuple that was rejected.

    • Message, string: Returns a human-readable description of the event.

    • Time, timestamp: Returns the time this event occurred.

    • InputTuple, tuple: Returns a combined data and control tuple which was used during this event.

Typechecking and Error Handling

The File Writer adapter uses typecheck messages to help you configure the adapter within your StreamBase application. In particular, the adapter generates typecheck messages for the following reasons:

  • The Control Input Port does not have the required schema.

  • The field specified in the data field does not exists in the incoming data schema.

  • The flush interval is not an integer value greater than or equal to 0.

  • The Default File Name property is blank and the control port is disabled.

  • The compression mode is Zip and the file create mode is Append.

Suspend and Resume Behavior

When suspended, the adapter stops processing requests to write files.

When resumed, the adapter once again starts processing requests to write files.

Related Topics