Snowflake Bulk Load Activity

You can use this activity to load a large amount of data into the database. This activity uses the staging concept for loading data into the database table. The stage is an internal location or an external location, like the Amazon S3 that is used to store compressed or uncompressed data files. The plug-in can later load the data stored in the stage into the database table.

General

The General tab contains the following fields.

Field Module Property? Visual Diff? Description
Name No Yes Specify the name to be displayed as the label for the activity in the process.
Snowflake Connection Resource Yes Yes Click the Create New Process Property icon to select a Snowflake JDBC Connection shared resource. If no matching Snowflake JDBC Connection shared resource is found, click Create Shared Resource to create one. For more details, see Creating and Configuring the Snowflake JDBC Connection Resource.
Create Table from XSD No Yes

Use this option to create a table in Snowflake from the XSD provided by the user. When this checkbox is selected, the Entity field on the General tab is disabled. The Table Name field and the Load Schema button from the SQL Builder tab are enabled. Click the Load Schema button to provide the XSD schema for the table.

Note: When Stage Type is set to Amazon S3, Create Table from XSD works only when File Format is set as Delimited Files.
Entity No Yes

Click the Browse resource icon to select an entity. The entities downloaded in the Snowflake JDBC Connection shared resource Schema tab are available for selection.

This field is disabled when the Create Table from XSD checkbox is selected.

Time Out Yes   Defines activity timeout in seconds. The default value is 0 seconds.
Stage Type No Yes There are four stage types available:
  • User Stage
  • Table Stage
  • Named Stage
  • Amazon S3

Select the stage type. The Snowflake plug-in supports both the internal stage and the external stage. User, Table, and Named stages are internal stages whereas Amazon S3 is an external stage.

Named Stage Yes Yes

Specify the name of the Named Stage defined in the Snowflake. The field is only enabled if Stage Type is selected as Named Stage or Amazon S3.

File Format No No

This combo box is enabled only when Stage Type is selected as Amazon S3. This field provides information on the format of the file that is present in the Amazon S3 bucket. The following file formats are supported:

  • Delimited Files

  • JSON

  • AVRO

  • ORC

  • PARQUET

  • XML

Note: XML data can only be loaded into a variant type column.

For Stage Type=Amazon S3, you need to configure a storage integration object to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more information, see the Snowflake documentation.

Description

On the Description tab, provide a short description for the activity.

SQL Builder

The SQL Builder tab displays the metadata of the entity selected in the General tab.

Field Editable Description
Table Name Yes

Displays the name of the table to be created.

When the General tab > Create Table from XSD checkbox is selected, this field is enabled. If you provide XSD using the Load Schema button, the value of this field is set to the name of the root complex element of the XSD. However, you can edit this value and set to the desired value.

Note: Table Name can be added as a module property.
Field Name No Displays the name of the column.
Data Type No Displays the data type of the column.
Primary Key

Yes, if the 'Metadata' table is loaded using the XSD provided with the 'Load Schema' button from the SQL Builder tab.

No, if the 'Metadata' table is loaded using the Entity field from the General tab.

Displays if the column is a primary key.
Not Null No

Displays if the column accepts a null value.

The checkbox is selected if the constraints to the column in snowflake DB is marked as 'Not Null'.

The checkbox is cleared if the constraints to the column in snowflake DB are not marked as 'Not Null'.

This checkbox is non-editable.

Dimension No Displays the dimension of the column.
Values Yes Displays the fields that must be a part of the Values clause for an INSERT SQL statement. By default, the Values column is selected for all the fields.
Load Schema No When the General tab > Create Table from XSD checkbox is selected, this field is enabled. If you click this button, a dialog is open where you can create or browse XSD present in the workspace. Once you select the XSD in the dialog, its details are displayed into the 'Metadata' table and the 'Table Name' field is set to the name of the root complex element of the XSD.

Advanced

This tab has the following fields:

Field Module Property? Visual Diff? Description
Override Database Name Yes Yes Overrides Database name specified in snowflakejdbcResource.
Override Schema Name Yes Yes Overrides Schema name specified in snowflakejdbcResource.
Interpret Empty String as NULL Yes Yes If this checkbox is selected, the empty string is interpreted as a NULL value.
Validation Mode Yes Yes

When this checkbox is selected, the plug-in returns errors in the data if present. The data is loaded into the table only when there are no errors present in the data.

Note: Validation mode is not supported for semi-structured datatypes
Purge Stage Files Yes Yes Select this checkbox to delete the stage files automatically once the data is loaded to the database table.
Compress Data Yes Yes Select this checkbox to compress data in .gz format and load it to Snowflake Internal Stage. This is not applicable when loading data from External Stage like Amazon S3.
On Error No Yes This field decides how the system behaves after encountering an error. The following actions are available:
  • CONTINUE
  • SKIP_FILE
  • SKIP_FILE_<num>
  • SKIP_FILE_<num>%
  • ABORT_STATEMENT
Skip a File if Error Count Yes Yes This text box is enabled if the SKIP_FILE_<num> option is selected from the On Error dropdown list. It skips the processing of a file if the number of errors present in the data file is equal to the error count specified.
Skip a File if Error Percentage Yes Yes This text box is enabled if the SKIP_FILE_<num>% option is selected from the On Error dropdown list. It skips the processing of a file if the percentage of errors in the data file is equal to the error percentage specified.
Merge No No When you select the Merge checkbox, if the record exists, it is updated by running the MERGE query. If the record does not exist, it is inserted.
Merge On Columns Yes No

Merge On Columns is available only when the Merge checkbox is selected. Enter the names of the columns on which the merge operation must be performed. For more than one column, enter the column names separated by comma. The column names are not case-sensitive.

Note: Provide all the column names in uppercase in Parquet, Avro, XML, or JSON format.

Input

For User, Table, and Named stage types, all the fields in the SQL Builder tab that have the Values column selected are a part of the Input tab. The information in the schema depends on the fields selected on the SQL Builder tab.

For Stage Type=Amazon S3, the schema on the Input tab is populated based on the file format selected on the General tab.

The Input tab displays the input schema of the activity as a tree structure.

The loadOptions field is not available when the Stage Type is Amazon S3 on the General tab and when the Merge checkbox is selected on the Advanced tab.

Output

The Output tab shows the output schema of the activity as a tree structure. The output is read-only. The Output tab displays the following fields:

Column Name Description
FILE Name of source file and relative path to the file
STATUS

Contains the following options: loaded, load failed, load_skipped, or partially_loaded.

ROWS_PARSED Number of rows parsed from the source file
ROWS_LOADED Number of rows loaded from the source file
ERROR_LIMIT If the number of errors reaches this limit, then cancel.
ERRORS_SEEN Number of error rows in the source file
FIRST_ERROR First error of the source file
FIRST_ERROR_LINE Line number of the first error
FIRST_ERROR_CHARACTER Position of the first error character
FIRST_ERROR_COLUMN_NAME Column name of the first error
Note: If new columns are added in the source table, the target table gets altered during runtime and the new columns are added in the target table. This is applicable only when the table does not contain any columns with semi-structured data types. The alter table operation is applicable only for delimited files.

Fault

The Fault tab lists exceptions that are generated by this activity.