Shared Volume Data Access Configuration
You can configure the Shared Volume between the TIBCO Data Virtualization and the Apache Spark cluster for faster execution of results. It can be configured using the tds.datavirt.sharedDataVolumes
parameter while configuring the Spark Cluster data source and the URL of the File Adapter in TIBCO Data Virtualization.
The following scenarios are applicable:
-
If the
tds.datavirt.sharedDataVolumes
parameter is not configured while configuring the Spark Cluster data source, it is known as TIBCO Data Virtualization managed data access. -
If the
tds.datavirt.sharedDataVolumes
parameter is configured with the same value as the File Adapter in TIBCO Data Virtualization or if the file path is under the Shared Volume, it is known as Shared Volume Optimized data access. -
If the
tds.datavirt.sharedDataVolumes
parameter is configured and the File path is not under the Shared Volume in TIBCO Data Virtualization, it is known as Shared Volume Optimized data access for Remote sources.
Recommendations
-
To benefit from optimized file access, do not use an S3 data source as a shared data volume. You can use any other data sources.
-
The input and output schemas can be different.
-
Assign the shared data volume to the
@default_schema
workflow variable. Make sure that all the operators use this schema as the output schema unless you need explicit writing to the remote sources.