Configure Columns: Text Files

The Configure Columns dialog changes options depending on the file type specified in the Hadoop File properties dialog. The options described in this topic are available for text files.

Column configuration	Description
Vertical/Horizontal File View	If the text file contains a large number of columns, then you can click the switch icon (), located in the top right corner, to change the display of the columns between vertical and horizontal. For files that have more than 300 columns, only the vertical view is available.
Escape and Quote Characters	Specify the escape and quote characters used in the file.
Delimiter	Select the delimiter from the list. Comma Tab Semicolon Space Control-A Other (Choosing Other specifies that a custom character is used as the delimiter.)
Headers	When TIBCO Data Science – Team Studio opens the Configure Columns dialog, TIBCO Data Science – Team Studio uses heuristics to determine if the first row of data is a header row, and selects or clears the control First row contains header based on this determination. You can select or clear this property manually. If TIBCO Data Science – Team Studio determines that the first row contains header information, then the contents of the row are used as the default column names, and the setting First row contains header is selected. If the source data does not have a header row, then clear First row contains header. If the file does not include headers, but the header information is available in a separate file, then you can set the header file. Click Load header from file and then browse to and select a file from the Hadoop file selector.
Data Columns	TIBCO Data Science – Team Studio attempts to infer the correct column names and data types by using a sample of the first few rows. When the dialog is displayed, each column is preceded by the inferred data type. You can change these settings by providing new column names and data types. The drop-down list box provides a list of standard data types. `chararray` `int` `long` `float` `double` `bytearray` `sparse` `datetime` `datetime`yyyy-MM-dd'T'HH:mm:ss `datetime`yyyyMMdd HH:mm `datetime`yyyy-MM-dd `datetime`HH:mm:ss `datetime`yyyy-MM-dd'T'HH:mm:ss.SSSZ `datetime`MM-dd-yyyy `datetime`MM/dd/yyyy `datetime`dd-MM-yyyy `datetime`yyyy-MM-dd HH:mm:ss `datetime`yyyy-MM-dd'T'HH:mm:ss.SSSZZ You can change the data type for multiple columns. Set the view to horizontal format, select the checkboxes for the desired columns, and then click Configure Selected. The list of columns can also be filtered with the filter field. Note: For `datetime` data types, if the source data uses the ISO `datetime` format, you should select the basic `datetime` data type option to preserve the flexibility of the ISO formatting. ISO provides an international data exchange format framework for `datetime` data types that converts all `datetime` values into the number of milliseconds since 1970. For more details, see ISO DateTime Format. If the source data is not in ISO `datetime` format, you must select from the list of predefined formats the specific `datetime` format of the imported data file. You can modify the list of specific `datetime` data type formats for the application using Datetime Format Preferences. Default `datetime` formats in TIBCO Data Science – Team Studio are listed in the drop-down list box. Although a list of `datetime` formats are pre-defined, you can override the defaults at run-time and specify a different `datetime` format (for a one-time Hadoop file import) using Joda-Time API formatting.

Did you find this helpful?

Yes No