CSV File Writer Output Adapter

< Previous		Next >

Introduction

The TIBCO StreamBase® CSV File Writer is suitable for saving tuples to comma-separated value (CSV) format files.

There are a number of options that help you control how and when files are created, how big files can be, and how strings are saved in the file.

Tip

In the StreamBase application that contains the CSV File Writer adapter, if the output CSV file will be used by an application that requires a specific order of fields, and the fields in the stream's tuples do not match that order, you can use a Map operator to arrange the fields as needed.

Tip

If you write a CSV file containing timestamps and import it into Microsoft Excel, by default Excel does not display fractions of seconds. To display times with millisecond precision, in Excel assign timestamp columns a custom format that you define as hh:mm:ss.000.

This adapter writes files to the directory named in a configuration file, or to a default location in the node directory, as described in Locating the Output Files.

Properties

This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.

In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.

If you are maintaining a legacy StreamSQL program, use the StreamSQL names of the adapter's properties when using this adapter with the APPLY JAVA statement, or when specifying properties for a stream-to-CSV container connection (which uses this adapter's technology).

General Tab

Name: Use this required field to specify or change the component's name, which must be unique in the current module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Adapter: A read-only field that shows the formal name of the adapter.

Class name: Shows the fully qualified class name that implements the functionality of this adapter. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this adapter starts as part of the JVM engine that runs this EventFlow fragment. If this field is set to No or to a module parameter that evaluates to false, the adapter instance is loaded with the engine, but does not start until you send an epadmin container resume command (or its sbadmin equivalent), or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

File Creation Tab

Property	Data Type	Default	Description	StreamSQL Property
File Name	string	`sample.csv`	Name of the file to write to. Specify the directory to contain the output files with a `dataAreaPath` property in a configuration file, or accept the default location in the node directory, as described in Locating the Output Files below. If this adapter is configured with a maximum file size on the Rolling and Flushing tab, then this filename is appended with the date and time when the maximum file size is reached. When using the Compress Data option described below, TIBCO recommends specifying the `.csv.gz` file name extension. The `.gz` extension is not added automatically.	FileName
Use Default Charset	check box	Selected	If selected, specifies whether the Java platform's default character set is to be used. If cleared, a valid character set name must be specified for the Character Set property.	UseDefaultCharset
Character Set	string	None	The name of the character set encoding that the adapter is to use to read input or write output.	Charset
Include Header In File	check box	true (selected)	Specifies the Inclusion of an optional row at the top of each file with the name of each column.	IncludeHeaderInFile
If File Doesn't Exist	radio buttons	Create new file	Specifies the action to take if the specified CSV file does not exist when the adapter is started: Create new file or Fail.	IfFileDoesntExist
If File Exists	drop-down list	Append to existing file	Specifies the action to take if the specified CSV file already exists when the adapter is started: Append to existing file, Truncate existing file, or Fail.	IfFileExists
Open File During Initialization	check box	false (cleared)	When selected, the output file is created, or opened and truncated, even if the adapter is not configured to start with the application, or the container in which the adapter is running has not started.	OpenOutputFileDuringInit
Compress data	check box	false (cleared)	If selected, the adapter compresses its output in gzip format.	CompressData
Start control port	check box	false (cleared)	Select this check box to give this adapter instance a control port you can use to specify a new output file name. The schema for the control port must begin with a field of type string used to convey the name of the new file to open. When a tuple is enqueued to this port, the existing file, if any, is closed, and the new file is opened.	StartControlPort
Start event port	check box	false (cleared)	Select this check box to create an output port that emits an informational tuple each time a CSV file is opened or closed. The informational tuple schema has five fields: Type, string Object, string Action, string Status, int Info, string For a file open event, the event port tuple's Type field is set to `Open`, while the Object field is set to the file name of the CSV file being opened. For a file close event, the event port tuple's Type field is set to `Close`, while the Object field is set to the file name of the CSV file being closed. For both open and close operations, the Status field is set to 0 to indicate success or –1 to indicate failure. The Info field always contains a text message describing the event. For data events, the event port tuple's Type field is set to `Write` or `Flush`, depending on the operation that caused the status event. The Data field contains the input data tuple. These status events only output when Pass Through Data To Event Port is enabled.	StartEventPort
Pass Through Data To Event Port	check box	false (cleared)	If enabled, when data tuples are passed in and a status event occurs, the data tuple is passed to the status event.	PassThroughDataToEventPort

Record Formatting Tab

Property	Data Type	Default	Description	StreamSQL Property
Field Delimiter	string	, (comma)	Specifies the character used to mark the end of one field and the beginning of another. Control characters can be entered as &#ddd; where `ddd` is the character's ASCII value.	FieldDelimiter
String Quote Character	string	" (double quote)	Specifies the character to use to quote strings when they contain the field delimiter.	StringQuoteCharacter
String Quote Option	drop-down list	Quote if necessary	Specifies when string fields are quoted in the CSV file: Quote if necessary, Always quote, or Never quote.	StringQuoteOption
Null Value Representation	string	`null`	Specifies the string to write when a field is null.	NullValueRepresentation
Timestamp Format	string	yyyy-MM-dd HH:mm:ss.SSSZ	Determines the format of all timestamp objects written to the CSV output.	TimestampFormat
Add Timestamp	drop-down list	None	Optionally prepend or append a timestamp to each CSV row of output. The column name in the header row will be `Timestamp`	AddTimestamp
Capture Transform Strategy	radio button	FLATTEN	The strategy to use when transforming capture fields for this operator: FLATTEN or NEST.	CaptureTransformStrategy

Rolling and Flushing Tab

Property	Data Type	Default	Description	StreamSQL Property
Max File Size	int	0 (no rollover)	Maximum size, in bytes, of the file on disk. If the file reaches this limit, it is renamed with the current timestamp and new data is written to the current name specified in the File Name property. This field must contain either 0 to disable file size rolling, or an integer greater than 65535.	MaxFileSize
Max Roll Seconds	int	0 (no rollover)	The maximum number of seconds before file names are rolled over as described in the previous row. The Roll Period and Max Roll Seconds properties are mutually exclusive.	MaxRollSecs
Roll Period	drop-down list	None	Select among None, Weekly, Daily, and Hourly to specify the time period for automatic file rollover. Weekly file rollover is performed at 12:00 AM every Monday. If the program starts after Monday, the rollover is performed the following Monday at 12:00 AM, and at subsequent Mondays thereafter. When rollover is performed, the current file is renamed by appending the previous week's Monday starting date, in the format `yyyyMMdd`. Daily file rollover is performed at 12:00 AM every day by default. If the program starts during the day, the rollover is performed at the next 12:00 AM by default, and subsequent midnights going forward. When rollover is performed, the current file is renamed by appending the current date in the format `yyyyMMdd`. The roll over time can also be modified using the roll hour/minute/second options Hourly rollover is performed every hour on the hour (1:00 AM, 2:00 AM, and so on). If the program starts after the hour (such as 1:20 PM), rollover starts at the next hour mark (such as 2:00 PM), and at subsequent hour marks. When rollover is performed, the current file is renamed by appending the timestamp in the format `yyyyMMddHHmm`.	RollPeriod
Roll Hour	int	0	Allows for selecting which hour (0-23) to perform the file roll when Daily is selected as the roll period. Defaults to 0, meaning 12:00:00 AM.	RollHour
Roll Minute	int	0	Allows for selecting which minute (0-59) to perform the file roll when Daily is selected as the roll period. Defaults to 0, meaning 12:00:00 AM.	RollMinute
Roll Second	int	0	Allows for selecting which second (0-59) to perform the file roll when Daily is selected as the roll period. Defaults to 0, meaning 12:00:00 AM.	RollSecond
Check for Roll at Startup	check box	false (cleared)	If selected, causes the adapter to roll the file at startup if, based on the file's last modification time, the configured roll period, and the current time, the file would have been rolled before the adapter was started.	CheckForRollAtStartup
Flush Interval	int	1	Specifies how often, in seconds, to force tuples to disk. Set this value to zero to flush immediately.	FlushInterval
Sync on flush	check box	false (cleared)	If selected, StreamBase syncs operating system buffers to the file system on flush, to make sure that all changes are written. Using this option incurs a significant performance penalty.	SyncOnFlush

Miscellaneous Tab

Property	Data Type	Default	Description	StreamSQL Property
Throttle Error Messages	check box	false (cleared)	Specifies showing any particular error message only once.	ThrottleErrorMessages
Log Level	drop-down list	INFO	Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level will be used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.	LogLevel

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Locating the Output Files

You provide the basename of the files written by this adapter in the File Name control of the File Creation tab of the Properties view. If you also specify file rollover features in the Rolling and Flushing tab, this basename is appended with a timestamp when the file rolls over.

You must specify the directory in which these files are written using a configuration file, as described in the next section. If you do not provide a configuration file, the adapter writes its output files in a default subdirectory of the node directory, as described in Default Output File Location.

Specifying Output File Location

Specify the location of files written by this adapter using the dataAreaPath property in a HOCON configuration file of type com.tibco.ep.streambase.configuration.sbengine, which is described in the Configuration Guide. Specify a full, absolute path to a directory. For example:

name = "sbengine"
version = "1.0.0"
type = "com.tibco.ep.streambase.configuration.sbengine"

configuration = {
  StreamBaseEngine = {
    streamBase = {
      dataAreaPath = "C:/Users/sbuser/Documents/datadir"
    }
  }
}

Make sure you specify a directory for which you have write rights. If you are developing your EventFlow module on Windows or macOS but expect to deploy on Linux, specify a network directory accessible to all systems, or use a substitution variable to specify one location for development and another for deployment.

If your dataAreaPath property specifies an invalid location, the EventFlow module fails at runtime.

Default Output File Location

If you do not provide a HOCON configuration file of type sbengine with a valid dataAreaPath property, the adapter writes its output file to a default location in the module's node directory. When running your EventFlow module in StreamBase Studio, navigate down the following folder sequence to find the default location:

Studio workspace folder
.nodes (This is a hidden folder on macOS.)
nodename (see the Clusters view to identify the node name for this sample's most recent launch)
fragments
enginename (see the Clusters view)
engine-data-area

You can also identify the default write location using the Engine Data Area line of the engine properties for the current node in the Clusters view. This property shows the full path to the parent folder of the engine-data-area folder.

When running your EventFlow module in a node installed and started with the epadmin command, locate the default engine data area directory with the following folder sequence:

nodedirectory (as specified on the epadmin install node command line with the --nodedirectory option, or the epadmin command's current directory, if --nodedirectory is not specified).
nodename (as specified with the --nodename option).
application
engines
enginename (usually a long name that begins with default-engine).
engine-data-area

If a node containing your EventFlow module is installed and running, whether started from Studio or with epadmin, you can extract the default engine data area location with a command like the following:

epadmin servicename=nodename display engine | grep "Engine Data Area"

Remember that the node directory is normally removed on successful stopping and removal of a node. The node directories for all nodes that Studio starts are removed when Studio exits, if the nodes completed successfully. The node directories for nodes that ended with an error condition are preserved.

Typechecking and Error Handling

Typechecking fails in the following circumstances:

The File Name is null or a zero length string.
The Flush Interval is less than zero.
The Max File Size is less than zero.
More than one string quote character is specified.
More than one field delimiter is specified.
An illegal string quote option is specified.
The Max Roll Seconds value greater than zero and the Roll Period option selected is other than None.

Suspend and Resume Behavior

On suspend, this adapter stops processing tuples, flushes all tuples to disk, and closes the current CSV file.

On resumption, it reopens the current CSV file and begins processing tuples again.