Import Excel (DB)

Imports an Excel workbook sheet (or a portion of the sheet) as a database table.

Information at a Glance

Category Load Data
Data source type DB
Sends output to other operators Yes1
Data processing tool n/a
Note: The Import Excel (DB) operator is for database data only. For Hadoop data, use the Import Excel (HD) operator.

The Excel workbook can be stored in the current workspace.

Formula cells, styles, dates, currencies, percentages, and so on are supported, and are parsed as numeric values. Non-tabular data such as images and pivot tables are skipped. Hidden columns and protected sheets are parsed as normal.

Input

Import Excel (DB) is a source operator. No inputs are required.

Bad or Missing Values
Blank cells or empty cells are converted to null values.

Restrictions

Excel files are read on the Team Studio server. Depending on the memory available on your instance, loading very large Excel files on this server might require a large amount of memory and cause out-of-memory issues. For more information, see Apache POI limitations at https://poi.apache.org/spreadsheet/limitations.html.

Team Studio uses the configuration parameter custom_operators, set in the alpine.conf file, to avoid loading files that are too large. If the Excel file is bigger than this limit, it does not load and an error message is displayed. The default value is 30.0 (MB). The administrator of your Team Studio instance can modify the default value.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Data Source (DB) The data source where the output should be stored (and where the Column Metadata Table is stored).
Schema The schema where the output should be stored (and where the Column Metadata Table is stored).
Chorus Workfile The Excel workbook stored in the current workspace. Only workbooks with the .xls, .xlsx, and .xlsm extensions are displayed.
Sheet Number The number of the sheet to extract (the first sheet is 1).
Top Left Corner Cell The cell address that defines the top-left cell of the data portion to extract in the selected sheet (for example, B10).
Note: If the header is included in the sheet, skip it in the selection. For example, if the header starts in A1, the top-left corner cell entered should be B1. This is because the header is read separately in the Column Metadata File (CSV) parameter, from a CSV file that includes both names and column types.
Right Cut-Off Column Letter The optional column letter that defines where the portion of data to extract should be cut at the right. If not specified, the data extraction is cut at the last defined cell of the first row selected (the row number in Top Left Corner Cell).
Note: If this parameter is not specified, and if the first row selected is empty or not defined, an error is displayed.
Bottom Cut-Off Row Number The optional row number that defines where the portion of data to extract should be cut at the bottom. If this is not specified, the data extraction is cut at the last defined row in the sheet.
Column Metadata Table The database table that defines the header for the output. It should be an empty table with only the column names (the types are defined at table creation, either when uploading via the user interface, or manually).

This table is read at runtime, and the output schema is available to the subsequent operator only after the operator is run.

Drop If Exists Specifies what to do if a table of the same name already exists.
  • If Yes (the default), drop the existing table of the same name and create a new one.
  • If No, stop the flow and alert the user that an error has occurred.

Output

Summary tab
Summary of the selected parameters.

Import Excel Summary

Output tab
Preview of the data extracted from the Excel workbook sheet.

Import Excel Output

Data Output
A single tabular data set that is extracted from the sheet, and can be transmitted to subsequent operators only after the operator is run.
1 The full output schema is not available until you step-run the operator. After the operator is run, the output schema automatically updates, and subsequent operators are either validated or turn red depending on the structure of the output data.