Data Format

Copyright © Cloud Software Group, Inc. All Rights Reserved

Data Format

Shared Configuration

The Data Format resource contains the specification for parsing or rendering a text string using the Parse Data and Render Data activities. This shared configuration resource specifies the type of formatting for the text (delimited columns or fixed-width columns), the column separator for delimited columns, the line separator, and the fill character and field offsets for fixed-width columns. You must also specify the data schema to use for parsing or rendering the text.

When parsing text, each column of an input line is transformed into the corresponding item in the specified data schema. The first column of the text line is turned into the first item in the data schema, the second column is transformed into the second item, and so on. Each line is treated as a record, and multiple lines result in a repeating data schema containing the lines of the input text string.

Figure 19 illustrates how an input text string is parsed into a specified data schema.

Figure 19 Parsing a text string into a data schema

When rendering text, each record in the input data schema is transformed into a line of output text. The first item of the data schema is transformed into the first column of the text line, the second item is transformed into the second column, and so on. Each record in a repeating data schema is transformed into a separate line in the output text string. Rendering a data schema into a text string is exactly the opposite process of parsing a text string into a data schema. Rendering is the reverse of the process illustrated in Figure 19.

Configuration

The Configuration tab has the following fields.

Field

Description

Name

The name to appear as the label for the resource.

Description

Short description of the shared resource.

Format Type

The type of formatting for the text. The text can be either "Delimiter separated" or "Fixed format".

In delimiter-separated text, each column is separated by a delimiter character, specified in the Col Separator field. Each line is separated by the character specified in the Line Separator field. See Delimiter Separated Fields for more information.

In Fixed format text, each column occupies a fixed position on the line. For fixed format text, you must specify the Fill Character, the line length, and the column offsets. See Field Offsets for more information.

Col Separator

This field specifies one or more separator characters between columns when "Delimiter separated" is specified in the Format Type field.

When rendering text, each element in the input data schema is separated by the column separator in the output text string. If more than one character is specified in this field, the Render Data activity places the entire string specified in this field between each column. For example, if ":;" is specified in this field, then ":;" appears between each column in the rendered string.

When parsing text, each column becomes an element in the output data schema. If more than one character is specified in this field, the Parse Data activity uses the rule specified in the Col Separator Parse Rule field to determine how to parse the data.

Col Separator Parse Rule

Specifies the rule to use for multiple column separator characters when parsing data. The choices are the following:

•

Treat all characters as entered as a single column separator string

The characters entered into the Col Separator field are treated as a single string that acts as a separator. For example, if the specified Col Separator is ":;", then Apple:;Orange:;Pear is treated as three columns.

•

Treat each character entered as a potential column separator

Any of the characters will act as a column separator. For example, if the specified Col Separator is ":;", then Apple;Orange:Pear is treated as three columns.

Line Separator

Specifies the character(s) that determine the end of each line. This field allows you to define a custom line separator. By default, new line, carriage return and line feed are defined and can be selected from the drop down list.

When parsing text, each line is treated as a new record in the output data schema. When rendering text, each data record is separated by the line separator character in the output text string.

The last line in your input file must be terminated by the specified line separator.

Note that BusinessWorks does not support specifying the line separator as part of the data.

Fill Character

When processing fixed format columns, this is the type of character that is used to fill the empty space in a column and between columns. This is only available when "Fixed format" is specified in the Format Type field. This fill character is only used by the Render Data activity.

You can select one of the following for this field:

•

Space — fills with a space

•

Dash — fills will a dash

•

Others — allows you to specify your own custom fill character in the Fill With field.

For example, you have a column that holds an integer and the specified width is 10. One row has the value "588" for that column. Because the width of 588 is three and the column width is 10, the remaining 7 characters are filled with the specified fill character.

Fill With

This field is only available when Others is selected in the Fill Character field. This field specifies the fill character to use to pad unused characters in fixed-width columns.

Only one character can be specified. The first character is the fill character and any other characters specified in this field are ignored.

Data Format

The Data Format tab allows you to define a custom schema for the text.

You can define your own datatype on this tab, and you can reference XML schema or ActiveEnterprise classes stored in the project. Once defined, the data specified on the Data Format tab is used to parse a text string into the specified schema or render the specified schema as a text string.

Data Format does not support nested schema.You can only specify basic plain schema with no nesting.

See Appendix A, Specifying Data Schema for a description of how to define a schema.

Delimiter Separated Fields

When processing delimiter-separated text, each field in the input line is separated by the delimiter specified by the Column Separator field. Leading and trailing spaces are stripped from each field and the specified Line Separator determines when a new record starts. Figure 19 illustrates an series of input lines containing comma-separated fields, each record on one line.

In some situations, you may not be able to choose a column separator character that does not appear in any column data. For example, if you choose a comma as the column separator, there may be commas in some of the column values. To process data that contains column separator characters in a column, you can surround the column with double quotes (" "). Double quotes also allow you to include leading and trailing spaces as well as line breaks in a field. If you want to have a double quote appear in a field, escape the double quote by using two consecutive double quotes. That is, use "" to represent a double quote in a field.

The following data illustrates input lines with each field separated by commas. Some fields, however, contain commas, leading or trailing spaces, double quotes, and line breaks.

57643, Smith, "Chris", Accounting , "Statement: Be prepared!"

57644, Jones, "Pat  ",   Marketing   , "Statement: To paraphrase JFK, ""Ask not what your company can do for you,

ask what you can do for your company."""

57645, Walker, "Terry", Develpment  , "Statement: My goal is to be CEO someday."

Notice that Pat Jones’ statement spans two lines and contains double quotes as well as a comma. The entire field is surrounded by double quotes, so it is still treated as part of the same record.

Field Offsets

When processing fixed format text, you must specify the line length and the column offsets. This allows a Parse Data or Render Data activity to determine where columns and lines begin and end. The Field Offsets tab allows you to specify the format of fixed-width text.

The line length is the total length of each input line, including the line separator character(s). Include the appropriate number of characters for the selected line separator on the Configuration tab to the total length of each line.

The column offset is the starting and ending character position on each line for the column. Each line starts at 0 (zero). For each column of the line, you must specify the name of the data item associated with this column (this is the same name you specified for the corresponding element in the data schema), the starting offset for the column, and the ending offset for the column.

It is a good idea to have each column offset begin where the last column offset ended. Many fixed format data files are used by databases (for example, ISAM files) or are generated by automated processes. These types of files have rigid file record formats and may not have additional padding space between columns.

When you define each column offset to begin where the last column offset ends, the data can be read more quickly by TIBCO ActiveMatrix BusinessWorks because the bytes of the input records can be read in sequentially.

Consider the following text file. The first two lines of the file indicate offset numbers (each 0 indicates another 10 characters), and the fill character between columns is spaces:

0           12                30             45

0123456789012345678901234567890123456789012345678901234567

57643       Smith             Chris          Account

57644       Jones             Pat            Marketing

57645       Walker            Terry          Develpment

Figure 20 illustrates the Field Offset tab for the file above. Notice that the line length is specified as 60, even thought the offsets end at character number 58. The line separator is specified as "Carriage Return/Line Feed (windows)", so this adds two additional characters for a total line length of 60.

Figure 20 Fixed-width text strings and field offsets