Shared Volume Data Access Configuration

You can configure the Shared Volume between the TIBCO Data Virtualization and the Apache Spark cluster for faster execution of results. It can be configured using the tds.datavirt.sharedDataVolumes parameter while configuring the Spark Cluster data source and the URL of the File Adapter in TIBCO Data Virtualization.

The following scenarios are applicable:

  1. If the tds.datavirt.sharedDataVolumes parameter is not configured while configuring the Spark Cluster data source, it is known as TIBCO Data Virtualization managed data access.

  2. If the tds.datavirt.sharedDataVolumes parameter is configured with the same value as the File Adapter in TIBCO Data Virtualization or if the file path is under the Shared Volume, it is known as Shared Volume Optimized data access.

  3. If the tds.datavirt.sharedDataVolumes parameter is configured and the File path is not under the Shared Volume in TIBCO Data Virtualization, it is known as Shared Volume Optimized data access for Remote sources.

Recommendations

  • To benefit from optimized file access, do not use an S3 data source as a shared data volume. You can use any other data sources.

  • The input and output schemas can be different.

  • Assign the shared data volume to the @default_schema workflow variable. Make sure that all the operators use this schema as the output schema unless you need explicit writing to the remote sources.