Import Excel

This operator imports an Excel workbook sheet (or a specified portion of the sheet) from the Chorus workspace as a database table. The operator uses TIBCO® Data Virtualization to infer the schema of the imported database table.

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Load Data
Data source type TIBCO® Data Virtualization
Send output to other operators Yes
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

The Excel file from the workspace is read into memory. The data specified by the Sheet Number, Top Left Corner Cell, Right Cut-Off Column Letter, and Bottom Cut-Off Row Number parameters are extracted. If the Has Header Row parameter is set to No, then the operator prepends a row of column names with the Col prefix to the extracted data set. Then, the data is uploaded to TIBCO Data Virtualization in batches of 1,000 records. The TIBCO Data Virtualization infers the schema of a result database table based on the data values of each column. The resulting schema is then passed on to the downstream operators.

Note: You must Step Run or Run this operator before running a downstream operator.

Input

The Import Excel operator is a source operator. Hence, there is no input data port.

Bad or Missing Values
The blank cells or empty cells are converted to null values.

Restrictions

The Excel files are read on the TIBCO Data Science - Team Studio server. Depending on the memory available on your instance, loading very large Excel files on this server might require a large amount of memory and can cause out-of-memory issues. For more information, see the Apache POI limitations.

Configuration

The following table provides the configuration details for the Import Excel operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Excel Workbook Select the Excel workbook stored in the current workspace. Only workbooks with the .xls, .xlsx, and .xlsm extensions can be selected. For more information on uploading an Excel file to the workspace, see Creating a Work File.
Sheet Number Specify the sheet number from the Excel workbook to extract (the first sheet is 1).

Default: 1

Top Left Corner Cell Specify the cell address that defines the top-left cell of the data portion to extract in the selected sheet (for example, B10).
Right Cut-Off Column Letter Specify the column letter where the portion of data to extract should be cut at the right. If not specified, the data extraction is cut at the last defined cell of the first row selected (the row number in the Top Left Corner Cell). This parameter is optional.
Bottom Cut-Off Row Number (0=not specified)

Specify the row number where the portion of data to extract should be cut at the bottom. If this is set to 0, the data extraction is cut at the last defined row in the sheet. This parameter is optional.

Note: It is the row number indicated in the Excel worksheet and not the actual number of rows to be extracted.

For example, if the Top Left Corner Cell is A1 and Has Header Row is Yes, then the number of rows to be extracted is the Bottom Cut-Off Row Number - 1.

Has Header Row Specify whether the first row is treated as a header and each cell value of the row is used as the column name of the result table. If this is set to No, column names are set to Col1, Col2, Col3, and so on.
Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output
Displays the data extracted from the specified Excel worksheet.
Output to successive operators
A single tabular data set extracted from the Excel sheet that can be used by a downstream operator. The column schema is generated after running this operator.

Example

The following example demonstrates the Import Excel operator.

Import Excel operator workflow
Data
None.
Parameter Setting
The parameter setting for the Import Excel operator is as follows:
  • Excel Workbook: raw_materials.xlsx

    Note: The raw_materials.xlsx file is available in the current workspace.
  • Sheet Number: 1

  • Top Left Corner Cell: A1

  • Bottom Cut-Off Row Number: 0

  • Has Header Row: Yes

  • Store Results: Yes

Results

The following figure displays the output for the parameter settings for the Import Excel operator.

Output
Import Excel operator - Result tab