TIBCO Databricks SQL Direct Activity

Use this activity to run a simple or a complex SQL Query. The TIBCO Databricks SQL Direct activity returns information in the form of rows.

Settings

The Settings tab has the following field.

Field	Description
Connection	Name of the Flogo Connector for Databricks connection from where information is retrieved. You can select the connection from the Connection dropdown list.

Input

The Input tab has the following fields:

Field	Required	Description
query	Yes	The SQL query string supports DDL, DML, and DQL commands.
queryTimeoutInSeconds	No	Specifies how long (in seconds) the application waits for the next batch of query result rows before timing out. Default: 60
batchSize	No	The number of records to fetch per batch from the query results. Set to a positive number (example, 100) to enable batch processing. This is useful for processing large datasets in smaller chunks to minimize memory usage.
startRecord	No	The record position to begin fetching from when using batch processing. Default: 0 (starts from the first record) Used in combination with batchSize to implement pagination. This is useful for reading the query results in a batch.

Output

The Output tab has the following fields:

Field	Required	Description
rows	Yes	The rows array is displayed for informational purposes only and cannot be modified or altered. The information in this schema varies depending on the schema provided for coercing. To configure the schema, hover over this field and click the … option.
rowCount	integer	The number of rows returned in the current result set.
done	boolean	true if no more records are available. false if there are more records available. This output item is useful to check whether there are more records when reading the query results in parts to preserve memory.

Processing Large Datasets Using Batch Processing

Processing a large number of records from a Databricks query can consume significant system resources. To manage memory efficiently, you can process the data in smaller batches, handling a limited set of records at a time before moving to the next group.

To implement batching with the Databricks SQL Direct activity and process large data sets, perform the following steps:

Note: In batching for Databricks, only SELECT statements are supported.

Configure the Databricks SQL Direct activity with loop settings:
1. In the Loop tab, select Type: Repeat while true
2. Set Condition: $activity[DatabricksSQLDirect].done == boolean.false()
Create an expression in the startRecord field to dynamically set the starting record for each iteration of the loop. For example, $iteration[index] * 100.
Define the number of records in the batchSize field. For example, 100 records per batch.
The loop uses the done flag available in the Output tab of the Databricks SQL Direct activity. The condition $activity[DatabricksSQLDirect].done == boolean.false() ensures that the process continues as long as there are more records to read. Once all records have been processed, the done flag returns true, causing the loop to stop.

Loop

If you want this activity to iterate multiple times within the flow, enter an expression that evaluates the iteration details. Select a type of iteration from the Type menu. The default type is None, which means the activity does not iterate. For more information, see Using the Loop Feature in an Activity.

Retry on Error

For more information about the Retry on Error tab, see Using the Retry on Error Feature in an Activity.

Did you find this helpful?

Yes No