User Guide > Data Ship Performance Optimization > Finishing Data Ship Configuration for Data Sources
 
Finishing Data Ship Configuration for Data Sources
All data sources that are to participate in data ship should be configured as data ship targets. If only one node is configured to be a data ship target, the relative sizes of the nodes are not considered; the SQL operation can only be performed on the data source that can be the data ship target.
At least one data source in a federated join must be configured to accept data shipments of external tables as the target; otherwise, data ship optimization is not possible.
Data can only be shipped in the direction of the data ship target.
When all participating data sources are configured to act as a data ship target, the estimated costs to obtain results from each of the nodes is analyzed. The node that is likely to cost the most to retrieve is designated as the target and nodes with smaller cardinality are fetched to ship to the target node. The SQL operation is performed locally on the temporary tables in the data source to gain the efficiencies available on the target.
About Lower and Upper Bounds
When data-source-specific analytic or aggregate functions are used in queries, set the lower bound to zero so that nodes with very low cardinality results get pushed to the data source for shipment. For joins where the nodes have low cardinality and do not use data-source-specific functions, set the lower bound to block data shipment of results with only a small number of rows. Performance is ordinarily better in this case because the query initialization and temp table creation can take a few seconds longer than direct processing of a small number of rows that do not require SQL analytical processing.
Where more than two nodes are involved in a SQL operation, it might not make sense to limit shipment of a small result set, because external table creation and shipment can happen in parallel. The SQL operation is not hindered by gathering relatively small results tables from multiple sources while a larger data set is loading.
The upper bound value sets the size (as estimated and measured by rows) limiting what result tables can be considered for shipping as external tables.
A value of zero disables data shipment for that data source, but does not disqualify it from participating in a data ship as the target for other results tables. If a query node estimate is greater than the upper bound for that data source, it can only participate as the data ship target. The upper bound is data-source-specific; it sets the bounds to restrict the movement of very large result tables.
To finish data ship configuration
1. Open Studio.
2. Right-click the data source name in the Studio resource tree and select Open.
3. Select the Configuration tab, and provide values for the following on the Basic tab:
Pass-through Login—For non-Netezza data sources, set to Disabled so that data ship functionality can work properly.
Transaction Isolation level—Set to Serializable for data ship functionality. This prevents dirty reads, non-repeatable reads, and phantom reads.
4. Use the Advanced tab to set values for the following fields. The fields that you see vary depending on the type of data source with which you are working. For example, a Sybase IQ data source might have the Sybase iAnywhere JDBC driver check box displayed, but a Teradata data source would not
Field
Type of Data Source
Description
Is dataship source
DB2
Oracle 11g
Sybase IQ 15
Teradata
MS SQL Server
Vertica
This must be checked if the physical data source might be used as a source of shipped tables to another data ship enabled data source. Check Is dataship source for all data sources so that the TDV Server can analyze the query and determine the side that would be best to ship to based on expected or estimated query node cardinality.
Is dataship target
All
This must be checked if the physical data source might be used to receive shipped tables from another data ship enabled data source. Check Is dataship target for all data sources so that the TDV Server can analyze the query and determine the side that would be best to ship to based on expected or estimated query node cardinality.
Lower bound for data ship
Upper bound for data ship
LowerBoundForDataShip
UpperBoundForDataShip
All
TDV uses Explain Plan to arrive at a numeric estimate of the cost of shipping data from a node to the Data Virtualizer. When the cost of shipping a federated query node falls between the limits of the Lowerbound and Upperbound, it is considered eligible for shipment so that it can be processed locally.
Use global temp space for temp tables
DB2
 
When this data source is the data ship target, this option creates global temporary tables and drops the data together with the table structure after the data ship query is completed.
Schema path for Temp Tables
All
A relative path to set the location of the temp tables on the data source. It is the name of a schema in the data source.
Required for DB2, make sure that this name matches a schema name known to TDV. Case should match exactly.
Temp Table Prefix
All
A character string addition to temporary table names so that they are recognized if they are needed.
Optional for DB2.
Enable Bulk Import/Export
MS SQL Server
Check box that enables the bulk movement of data using the bcp utility. You must have finished the steps in Configuring Data Ship for Netezza.
Database Link List
Oracle with database links
Add your database links separated with semicolons, using the following syntax:
[DBLINK NAME]@[DBLINK OWNER DS PATH]
 
For example:
oral2_dblink@/users/composite/test/sources/dship/DEV-DSJ-ORA11G-2
Enable Bulk Export/Load
PostgreSQL
Vertica
Takes advantage of PostgreSQL COPY command.
Enable PostgreSql dblink
PostgreSQL
Check to enable database links to improve performance if you plan to use this data source for data caching or data ship optimization. If you check this box, add one or more database links by specifying the database link name and path of the data source for each link
Sybase iAnywhere JDBC Driver
Sybase IQ
Select this check box to enable better performance. This option enables the Sybase IQ specific ODBC LOAD TABLE SQL tool to import data into TDV.
You must have finished the steps in Configuring Data Ship for Sybase IQ.
Sql Anywhere Data Source
Sybase IQ
Enter your ODBC DSN name.
Enable Sybase IQ SQL Location
Sybase IQ with Location
Select this check box to enable this option for the data source.
SQL Location Name
Sybase IQ with Location
The SQL Location name should take the form:
<server_name>.<database_name>
Path of data source
Sybase IQ with Location
The TDV full pathname to the other Sybase data source. For example:
/shared/sybase_iq_1
Add Sql Location
Sybase IQ with Location
If there are more than two Sybase IQ instances you can add multiple locations by using the green plus button.
Enable FastLoad/FastExport for large tables
Teradata
Setting this option indicates that you want to use Teradata’s FastLoad or FastExport utility to speed up your query times. For a given query, cardinality information is used to decide whether to use Fastpath or JDBC default loading.
FastExport Session Count
Teradata
The number of FastExport sessions to use for Teradata.
FastLoad Session Count
Teradata
The number of FastLoad sessions to use for Teradata.
Enable Bulk Load
Vertica
Setting this option indicates that you want to use Vertica’s Bulk Load utility to speed up your query times. For a given query, cardinality information is used to decide whether to use Bulk Load or JDBC default loading.
Enable Export To Another Vertica Database
Vertica
When set, allows data to be exported to another Vertica database. You need to name each database that is available to accept the exported data. TDV uses the CONNECT and EXPORT commands to establish the connections between the data ship source and the data ship target.
Exported Databases
Vertica
Only available if you have selected the Enable Export To Another Vertica Database option.
Exported database name
Vertica
Name of the Vertica database to which you want to export data.
Path of data source
Vertica
The TDV full pathname to the other Vertica data source. For example:
/shared/vertica
5. Save your settings.