Import Excel (HD)
Imports an Excel workbook sheet (or a portion of the sheet) as an HDFS input.
Information at a Glance
Category | Load Data |
Data source type | HD |
Sends output to other operators | Yes1 |
Data processing tool |
The Excel workbook can be stored in HDFS or in the current workspace.
Formula cells, styles, dates, currencies, percentages, and so on are supported, and are parsed as numeric values. Non-tabular data such as images and pivot tables are skipped. Hidden columns and protected sheets are parsed as normal.
Restrictions
Excel files are read on the Team Studio server. Depending on the memory available on your instance, loading very large Excel files on this server might require a large amount of memory and cause out-of-memory issues. For more information, see Apache POI limitations at https://poi.apache.org/spreadsheet/limitations.html.
Team Studio uses the configuration parameter custom_operators, set in the alpine.conf file, to avoid loading files that are too large. If the Excel file is bigger than this limit, it does not load and an error message is displayed. The default value is 30.0 (MB). The administrator of your Team Studio instance can modify the default value.
Configuration
Output
- Summary tab
- Summary of the parameters selected, as shown in the following image.
- Output tab
- Data preview of the data extracted from the Excel workbook sheet, as shown in the following image.
- Data Output
- A single tabular data set that is extracted from the sheet, and can be transmitted to subsequent operators only after the operator is run.