Finishing Data Ship Configuration for Data Sources

All data sources that are to participate in data ship should be configured as data ship targets. If only one node is configured to be a data ship target, the relative sizes of the nodes are not considered; the SQL operation can only be performed on the data source that can be the data ship target.

At least one data source in a federated join must be configured to accept data shipments of external tables as the target; otherwise, data ship optimization is not possible.

Data can only be shipped in the direction of the data ship target.

When all participating data sources are configured to act as a data ship target, the estimated costs to obtain results from each of the nodes is analyzed. The node that is likely to cost the most to retrieve is designated as the target and nodes with smaller cardinality are fetched to ship to the target node. The SQL operation is performed locally on the temporary tables in the data source to gain the efficiencies available on the target.

About Lower and Upper Bounds

When data-source-specific analytic or aggregate functions are used in queries, set the lower bound to zero so that nodes with very low cardinality results get pushed to the data source for shipment. For joins where the nodes have low cardinality and do not use data-source-specific functions, set the lower bound to block data shipment of results with only a small number of rows. Performance is ordinarily better in this case because the query initialization and temp table creation can take a few seconds longer than direct processing of a small number of rows that do not require SQL analytical processing.

Where more than two nodes are involved in a SQL operation, it might not make sense to limit shipment of a small result set, because external table creation and shipment can happen in parallel. The SQL operation is not hindered by gathering relatively small results tables from multiple sources while a larger data set is loading.

The upper bound value sets the size (as estimated and measured by rows) limiting what result tables can be considered for shipping as external tables.

A value of zero disables data shipment for that data source, but does not disqualify it from participating in a data ship as the target for other results tables. If a query node estimate is greater than the upper bound for that data source, it can only participate as the data ship target. The upper bound is data-source-specific; it sets the bounds to restrict the movement of very large result tables.

To finish data ship configuration

1. Open Studio.
2. Right-click the data source name in the Studio resource tree and select Open.
3. Select the Configuration tab, and provide values for the following on the Basic tab:
Pass-through Login—For non-Netezza data sources, set to Disabled so that data ship functionality can work properly.
Transaction Isolation level—Set to Serializable for data ship functionality. This prevents dirty reads, non-repeatable reads, and phantom reads.
4. Use the Advanced tab to set values for the following fields. The fields that you see vary depending on the type of data source with which you are working. For example, a Sybase IQ data source might have the Sybase iAnywhere JDBC driver check box displayed, but a Teradata data source would not

Field

Type of Data Source

Description

Is dataship source

DB2
Oracle 11g
Sybase IQ 15
Teradata
MS SQL Server
Vertica

This must be checked if the physical data source might be used as a source of shipped tables to another data ship enabled data source. Check Is dataship source for all data sources so that the TDV Server can analyze the query and determine the side that would be best to ship to based on expected or estimated query node cardinality.

Is dataship target

All

This must be checked if the physical data source might be used to receive shipped tables from another data ship enabled data source. Check Is dataship target for all data sources so that the TDV Server can analyze the query and determine the side that would be best to ship to based on expected or estimated query node cardinality.

Lower bound for data ship

Upper bound for data ship

LowerBoundForDataShip

UpperBoundForDataShip

All

TDV uses Explain Plan to arrive at a numeric estimate of the cost of shipping data from a node to the Data Virtualizer. When the cost of shipping a federated query node falls between the limits of the Lowerbound and Upperbound, it is considered eligible for shipment so that it can be processed locally.

Use global temp space for temp tables

DB2

 

When this data source is the data ship target, this option creates global temporary tables and drops the data together with the table structure after the data ship query is completed.

Schema path for Temp Tables

All

A relative path to set the location of the temp tables on the data source. It is the name of a schema in the data source.

Required for DB2, make sure that this name matches a schema name known to TDV. Case should match exactly.

Temp Table Prefix

All

A character string addition to temporary table names so that they are recognized if they are needed.

Optional for DB2.

Enable Bulk Import/Export

MS SQL Server

Check box that enables the bulk movement of data using the bcp utility. You must have finished the steps in Configuring Data Ship for Netezza .

Database Link List

Oracle with database links

Add your database links separated with semicolons, using the following syntax:

[DBLINK NAME]@[DBLINK OWNER DS PATH]

For example:

oral2_dblink@/users/composite/test/sources/dship/DEV-DSJ-ORA11G-2

Enable Bulk Export/Load

PostgreSQL

Vertica

Takes advantage of PostgreSQL COPY command.

Enable PostgreSql dblink

PostgreSQL

Check to enable database links to improve performance if you plan to use this data source for data caching or data ship optimization. If you check this box, add one or more database links by specifying the database link name and path of the data source for each link

Sybase iAnywhere JDBC Driver

Sybase IQ

Select this check box to enable better performance. This option enables the Sybase IQ specific ODBC LOAD TABLE SQL tool to import data into TDV.

You must have finished the steps in Configuring Data Ship for Sybase IQ .

Sql Anywhere Data Source

Sybase IQ

Enter your ODBC DSN name.

Enable Sybase IQ SQL Location

Sybase IQ with Location

Select this check box to enable this option for the data source.

SQL Location Name

Sybase IQ with Location

The SQL Location name should take the form:

<server_name>.<database_name>

Path of data source

Sybase IQ with Location

The TDV full pathname to the other Sybase data source. For example:

/shared/sybase_iq_1

Add Sql Location

Sybase IQ with Location

If there are more than two Sybase IQ instances you can add multiple locations by using the green plus button.

Enable FastLoad/FastExport for large tables

Teradata

Setting this option indicates that you want to use Teradata’s FastLoad or FastExport utility to speed up your query times. For a given query, cardinality information is used to decide whether to use Fastpath or JDBC default loading.

FastExport Session Count

Teradata

The number of FastExport sessions to use for Teradata.

FastLoad Session Count

Teradata

The number of FastLoad sessions to use for Teradata.

Enable Bulk Load

Vertica

Setting this option indicates that you want to use Vertica’s Bulk Load utility to speed up your query times. For a given query, cardinality information is used to decide whether to use Bulk Load or JDBC default loading.

Enable Export To Another Vertica Database

Vertica

When set, allows data to be exported to another Vertica database. You need to name each database that is available to accept the exported data. TDV uses the CONNECT and EXPORT commands to establish the connections between the data ship source and the data ship target.

Exported Databases

Vertica

Only available if you have selected the Enable Export To Another Vertica Database option.

Exported database name

Vertica

Name of the Vertica database to which you want to export data.

Path of data source

Vertica

The TDV full pathname to the other Vertica data source. For example:

/shared/vertica
5. Save your settings.