Accessing Data in TIBCO Data Virtualization from TIBCO Data Science - Team Studio

There are three methods of accessing data in TIBCO Data Virtualization from TIBCO Data Science - Team Studio.

  1. TIBCO Data Virtualization managed Data Access

    This is the default data access method and supports a wide range of different types of data sources. The data from different sources, such as the Database, Remote File Systems, and Shared File Systems passes through TIBCO Data Virtualization before they are read from the Apache Spark cluster. Due to the lazy evaluation nature of Apache Spark, only the data that is necessary for processing is retrieved and sent to the Apache Spark cluster. This is made possible by using the push-down features in TIBCO Data Virtualization. After computation, the results are returned to the TIBCO Data Virtualization before writing back to the respective data sources.

    Examples of supported Database data sources are JDBC-compliant databases such as PostgreSQL, Oracle, and Redshift. Examples of the supported Remote File Systems and Shared File System data sources are HDFS, NFS drive, and Amazon S3. This is the slowest method of accessing the data. To use this method, you must remove the tds.datavirt.sharedDataVolumes parameter while configuring the Spark Cluster data source.

    TIBCO Data Virtualization managed Data Acces
  2. Shared Volume Optimized Data Access

    This method optimizes access for files in the shared volume. The data from the Shared File System are read directly from the Apache Spark cluster and after computation, they are written back to the same Shared File System. The TIBCO Data Virtualization stores the metadata while writing the data to the Shared File System. Examples of the supported data sources are HDFS, NFS drive, and Amazon S3. This is the fastest method of accessing data because the file read and write is directly from the Apache Spark cluster. In other words, the data is not moved across the cluster. For information on configuring the Shared Volume for data accessing, see Shared Volume Data Access Configuration.

    Shared Volume Optimized Data Access
  3. Shared Volume Optimized Data Access for Remote Sources

    This method optimizes the writing of data to remote sources. In this method, the reading of data from remote sources is through the TIBCO Data Virtualization managed Data Access. After computation, the metadata is stored in TIBCO Data Virtualization and the results are temporarily written to the Shared Auxiliary Volume before writing to the Target Volume (Database or Remote File System). The Shared File System acts as the shared auxiliary volume. The results are moved to the remote sources or target volume using TIBCO Data Virtualization. For information on configuring the Shared Volume for data accessing, see Shared Volume Data Access Configuration.

    Note: This is the recommended method for accessing data.
    Shared Volume Optimized Data Access for Remote Sources