TIBCO Databricks SQL Direct Activity
Use this activity to run a simple or a complex SQL Query. The TIBCO Databricks SQL Direct activity returns information in the form of rows.
Settings
The Settings tab has the following field.
| Field | Description |
|---|---|
| Connection | Name of the Flogo Connector for Databricks connection from where information is retrieved. You can select the connection from the Connection dropdown list. |
Input
The Input tab has the following fields:
| Field | Required | Description |
|---|---|---|
| query | Yes | The SQL query string supports DDL, DML, and DQL commands. |
| queryTimeoutInSeconds | No |
Specifies how long (in seconds) the application waits for the next batch of query result rows before timing out. Default: 60 |
| batchSize | No |
The number of records to fetch per batch from the query results. Set to a positive number (example, 100) to enable batch processing. This is useful for processing large datasets in smaller chunks to minimize memory usage. |
| startRecord | No |
The record position to begin fetching from when using batch processing. Default: 0 (starts from the first record) Used in combination with batchSize to implement pagination. This is useful for reading the query results in a batch. |
Output
The Output tab has the following fields:
| Field | Required | Description |
|---|---|---|
| rows | Yes | The rows array is displayed for informational purposes only and cannot be modified or altered. The information in this schema varies depending on the schema provided for coercing. To configure the schema, hover over this field and click the … option. |
| rowCount | integer | The number of rows returned in the current result set. |
| done | boolean | true if no more records are available. false if there are more records available. This output item is useful to check whether there are more records when reading the query results in parts to preserve memory. |
Processing Large Datasets Using Batch Processing
Processing a large number of records from a Databricks query can consume significant system resources. To manage memory efficiently, you can process the data in smaller batches, handling a limited set of records at a time before moving to the next group.
To implement batching with the Databricks SQL Direct activity and process large data sets, perform the following steps:
SELECT statements are supported.-
Configure the Databricks SQL Direct activity with loop settings:
-
In the Loop tab, select Type:
Repeat while true -
Set Condition:
$activity[DatabricksSQLDirect].done == boolean.false()
-
-
Create an expression in the startRecord field to dynamically set the starting record for each iteration of the loop. For example,
$iteration[index] * 100. -
Define the number of records in the batchSize field. For example,
100records per batch. -
The loop uses the done flag available in the Output tab of the Databricks SQL Direct activity. The condition
$activity[DatabricksSQLDirect].done == boolean.false()ensures that the process continues as long as there are more records to read. Once all records have been processed, the done flag returns true, causing the loop to stop.
Loop
If you want this activity to iterate multiple times within the flow, enter an expression that evaluates the iteration details. Select a type of iteration from the Type menu. The default type is None, which means the activity does not iterate. For more information, see Using the Loop Feature in an Activity.
Retry on Error
For more information about the Retry on Error tab, see Using the Retry on Error Feature in an Activity.