Field | Description |
Concurrent Request Limit | This configuration can take a value between 0 to 65536. It specifies the concurrency limits to be imposed on the underlying data source. |
Default String Length | The default VARCHAR length. |
Detect Partition During Introspection | Include this option to automatically detect partitions that the file might have. Note that if they are not properly detected, both usability and performance will be adversely impacted. |
CSV Options | |
Include CSV Files | Check this option to include the delimited files from the storage area. |
Character Set | The character set used by the datasource. |
Delimiter | Indicates the file delimiter character. |
Text Qualifier | Indicates the type of qualifier that is used in the file to enclose a string field. |
Has Header Row | Indicates whether or not the file has a header row. |
Infer Schema | Choosing this option enables the parser to infer the schema and datatypes of each column based on the data in the file. Note: If this option is selected, it is recommended to provide a “sampling ratio” while introspecting the data source, where sampling of the data might be used when inferring the schema. Providing the sampling ratio helps reduce the overhead of not having to read all the rows while inferring the schema. Parquet files do not require schema inference as their schema is encoded in their metadata. |
CSV Escape Character | Indicates the character that should be ignored by the parser in the file. |
CSV Parser Lib | The libraries used to parse the delimited files. The libraries supported currently are commons (default) and uniVocity. For more information, refer: |
CSV Parsing Mode | The various parsing modes used by the data source. Allowed values are “PERMISSIVE (include a malformed row), DROPMALFORMED (Drop bad rows), FAILFAST (Fail the introspection when a bad row is encountered). |
CSV Comment Character | Indicates the character that is used as comment in the file. |
CSV Null Value | Indicates what is considered a Null value in a row. |
CSV File Name Filters | Indicates the file name extensions that are valid. |
Parquet Options | |
Include Parquet Files | Check this option to include the parquet files from the storage area. |
Binary as String | Check this option to read binary value as string. |
INT96 as Timestamp | Check this option to read INT96 value as Timestamp. |
Compression Codec | Parquet files are typically compressed. This setting controls the compression algorithm used to process them. For more information about the different options, refer https://spark.apache.org/docs/2.4.3/sql-data-sources-parquet.html |
Filter Push-Down | Controls whether a predicate specified in a WHERE clause in a SQL query will be pushed down to the Cloud File System data source. |
Merge Schema | In case of partitioned files, choosing this option merges the data and creates a single schema that includes columns from all partitions. |
Parquet File Name Filters | Indicates the file name extensions that are valid. |