JDBC Data Source Overview

This topic explains how to use the JDBC Table data construct and one or more Query operators in an EventFlow module to access an external JDBC data source. To use a JDBC data source with a StreamSQL application, see the APPLY Statement in the StreamSQL Guide.

In an EventFlow module, a JDBC Table data construct points to the JDBC data source that you want to use. When connected to the JDBC Table, a Query operator can manipulate the JDBC data together with tuples from the application's input streams. JDBC Tables can be associated with multiple Query operators, but each Query operator can be associated only with one JDBC Table.

Tip

JDBC Query operators that specify a select statement generate a typecheck error when working in an environment that does not have connectivity to the database server. As a workaround, you can specify an explicit schema for the SQL query in the operator's Result Settings tab. If you anticipate having database connectivity when composing the operator, but none later, use the Execute query and populate fields link that derives the schema from executing the SQL query.

Before You Begin

When you develop an application that uses a JDBC data source, StreamBase Studio actually connects to the database server to perform typechecking. Before performing the steps in this topic, you will need the following information from the database administrator:

  • The JDBC URI to use to connect to the data source server.

  • If required, a username and password to use when connecting to the data source server.

  • The class name of the driver.

  • The name and location of the JDBC driver JAR file and any library files called by the JAR file.

You must obtain the JDBC driver JAR file from your database vendor's web site, or packaged as part of your database distribution. Remember that connecting to a JDBC data source requires configuring both a <data-source> element that describes the location of the database server, and a <jar> child of the <java-vm> element, with a path pointing to the JDBC driver JAR file.

Note

Under some circumstances, the JVM can access the wrong data source if multiple drivers attempt to access the same data source. To avoid this potential problem, it is best for all applications that access a given data source to be in the same project, within a StreamBase workspace.

Configuring a JDBC Data Source

To use a JDBC data source in an EventFlow module:

  1. To give StreamBase Studio runtime access to the JAR file that implements the JDBC driver for your database, add the JAR file to the Java Build Path for your Studio project:

    • Select the project of interest in the Package Explorer view.

    • Right-click and select Build PathAdd External Archives.

    • Navigate to the folder that contains your database vendor's JDBC JAR file and select it.

    • Click Open.

  2. If your JDBC driver requires access to any supporting library files (such as .so or DLL files), add those to the resource search path of your Studio project. You can:

    • Import the JAR and library files into a folder of your project and then add that folder to the resource search path.

    • Establish a project reference to another Studio project that already includes these files.

  3. In the Package Explorer view, navigate to your application's project folder and open the project's server configuration file, traditionally named sbd.sbconf. If the project does not have a configuration file, add a new, empty one.

  4. To make the JDBC driver visible when running your module from the command line, add a <jar> child element of the <java-vm> element of your configuration file, with a path pointing to the JDBC driver's JAR file obtained from your database vendor. If your JDBC driver requires access to library files, add the folder that contains those to the <library> child element. See <java-vm>.

  5. In the server configuration file, declare the JDBC data source in the <data-sources> element. The following shows an example for connecting to an Oracle database. See <data-sources> Examples for examples for other supported databases.

    <data-sources>
      <data-source name="Oracle_10g" type="jdbc">
        <driver value="oracle.jdbc.driver.OracleDriver"/>
        <uri value="jdbc:oracle:thin:@orasrv.example.com:1521:sb_test"/>
        <param name="user" value="orauser"/>
        <param name="password" enciphered="true"
          value="M5DSWylszg5aA9AK29MOiaDLq7SMqmTor+nW3qURTrT9E9eqJfTPyyUudCK34nhXHE53PXK6pregp4MW8qrueg=="/>
        <param name="jdbc-fetch-size" value="10000"/>
        <param name="jdbc-batch-size" value="20"/>
        <param name="jdbc-max-column-size" value="32768"/>
        </data-source>
    </data-sources>
    

    The name you assign to your data source must follow the StreamBase identifier naming rules, as described in Identifier Naming Rules.

    This example shows the use of an enciphered password. This feature is described on Enciphering Passwords and Parameter Values.

  6. Pay attention to the JDBC Data Source Options for your data source. In particular, notice that the StreamBase default for the jdbc-reconnect-attempts option is zero, which means failed database connections are not automatically reconnected by default. Be sure to specify a non-zero value for this option if you want to use automatic reconnection.

  7. Drag a JDBC Table icon from the Palette view to the EventFlow canvas.

  8. Open the Properties view of the new JDBC Table data construct. Name the component in the General tab, and in the Data Source tab, identify the data source this table will use. If no data sources appear in the drop-down list in this tab, this means you have not completely configured the server configuration file's <data-source> element described in step 5.

  9. Create one or more Query operators that will be associated with the JDBC Table. That is, for each query, drag a Query operator icon from the Palette view to the canvas.

  10. Connect the Query operators to the JDBC Table. If automatic typechecking is enabled, StreamBase Studio performs typechecking at this point.

    If a typecheck error reports an unknown data source:

    • Make sure the database server is running and is accepting connections.

    • Double-check your entries for the <jar> and <data-source> elements of the server configuration file.

  11. For each Query operator, edit its properties as described in Using the Query Operator with JDBC Tables.

Configuring for Deployment With a JDBC Data Source

When you complete your application and are ready to deploy it outside of StreamBase Studio, you can export all the files you have developed in StreamBase Studio to your deployment environment, using the Studio bundling feature.

The sbd.sbconf file that you use in your production environment must be configured to access the JDBC data sources in the production environment:

  • If the URL of the production database server is different than in development, edit the <uri> child of the <data-source> element to point to the production server.

  • Make sure the paths to the JDBC driver's JAR file and any required library files are still correctly described in the production configuration file. If you generated a StreamBase bundle file, paths relative to your Studio project directory are preserved.

JDBC Data Source Options

StreamBase provides several options to manage the behavior of your JDBC connections. If your application needs to access more than one JDBC data source, you can provide a different set of options for each JDBC data source. Specify options using the <param> child element of the <data-source> element in the server configuration file.

Each option is described in detail in the StreamBase Server Configuration File XML Reference, in the section for <data-sources/data-source/param>. In addition, certain <param> options are described in later sections of this page.

The following table summarizes the available JDBC <param> options in two groups. The first group gathers all parameters that manage reconnecting to the data source, while the second group lists the remaining parameters in alphabetical order.

Option (click for primary documentation) Summary
jdbc-reconnect-attempts Specifies an integer number of times that StreamBase Server is to attempt retrying a lost connection to the database server. The default value is zero, which specifies no reconnection attempts. Specify –1 to continuously try to reconnect.
jdbc-reconnect-sleep Specifies in milliseconds the period that StreamBase Server waits before retrying a connection to the database server.
jdbc-reconnect-regexp Specifies a database error message number or SQLSTATE code for which you DO want to retry communicating with the database server.
jdbc-dont-reconnect-regexp Specifies the database error message numbers or SQLSTATE codes for which you do NOT want to retry communicating with the database server.

When a reconnection becomes necessary, a tuple is emitted on the Error Output Port of the Query operator with the message Attempt reconnect to data source. When the reconnection succeeds, another tuple on this port reports Success reconnecting to data source.

The following JDBC data source parameters are in alphabetical order.

Option (click for primary documentation) Summary Further Discussion
jdbc-batch-size Sets the number of statements to be executed in a single batch.
jdbc-batch-timeout-ms Specifies the longest time in milliseconds that batched tuples remain buffered and unsent.
jdbc-fetch-size Controls the size of the buffer used by the JDBC driver when fetching rows. Avoiding Deadlocks With JDBC Queries and Limiting Buffer Size for Fetched JDBC Table Rows below on this page
jdbc-max-column-size Specifies the maximum column size expected in a JDBC table. Controlling Wide Column Data Types below on this page
jdbc-query-timeout Specifies an integer number of milliseconds that StreamBase waits for each JDBC Query operation to execute. Modifying the Timeout for JDBC Communications below on this page
jdbc-quote-strings Specifies sending quoted strings ("'like this'") to the JDBC PreparedStatement object if the database vendor, such as Vertica, requires it.
jdbc-retry-sqlstate Specifies a database error message number or SQLSTATE code for which you always want the previous select, insert, or update operation to be retried. Retrying Queries below on this page
jdbc-share-connection For applications with more than one JDBC Table data construct, set to true to have all JDBC Query operations to the same database server share a single connection. Set to false to have each Query set up and manage its own connection.
jdbc-timeout Specifies the timeout (in milliseconds) of background JDBC operations in StreamBase Studio. This parameter affects only typecheck times in Studio, and has no effect during runtime.

Controlling Wide Column Data Types

Different JDBC drivers allocate different sizes for column types. Very large column types (such as the text column for the Microsoft JDBC driver) can cause typecheck errors. For this reason, the StreamBase Server sbd.sbconf configuration file sets a maximum column size for JDBC tables. By default, this parameter is set to 2048.

If your JDBC database uses wide column types, there are several options to avoid typecheck problems in StreamBase:

  • Limit column sizes in the SQL statements of individual APPLY JDBC operations (in StreamSQL applications) or Query operators (in EventFlow applications). As long as you specify a column size of 2048 or less, output tuple sizes will be acceptable. The SQL syntax depends on your database; here are three different examples of setting a column size to 1000:

    select cast(large_col as varchar(1000)) from table
    select substr(large_col,1,1000) from table
    select convert(large_col,varchar(1000)) from table
    

    Note

    If the actual data in your columns exceeds the limit you set, the data will be truncated.

  • Set StreamBase Server's maximum column size to a higher value. Open the server configuration file, and add or uncomment the jdbc-max-column-size parameter, and set the value. For example:

    <data-sources>
      ...
      <param name="jdbc-max-column-size" value="4096"/>
    </data-sources>
    

    This setting applies to all JDBC tables connected to StreamBase Server. Note: The jdbc-max-column-size can affect the size of tuples sent to the output stream. Because the entire tuple's schema must be able to fit into the page size, consider also increasing the page-pool and page-size parameters, as described in StreamBase Server Configuration File XML Reference.

If neither of these options is acceptable, consider changing your JDBC table schema to avoid wide column types.

Modifying the Timeout for JDBC Communications

Recall that StreamBase Studio communicates with the data source server in order to typecheck an application that has a JDBC data source connection. If the data source server does not respond during a preset jdbc-timeout interval, the application fails to typecheck.

The default timeout value of 15 seconds is adequate for normal local area network connections; consider increasing the jdbc-timeout interval if you experience typecheck failures due to a slow network connection:

  1. Open your project's sbd.sbconf file in a text editor.

  2. Uncomment the <param name="jdbc-timeout"> element under <data-sources>.

  3. Change the default value. For example:

    <data-sources>
      ...
      <param name="jdbc-timeout" value="25000"/>
    </data-sources>
    

You may need to repeat this process, trying different jdbc-timeout values, until your application typechecks normally.

Avoiding Deadlocks With JDBC Queries

It is possible for deadlocks to occur if you perform multiple query operations against the same JDBC data source, and when the output of one query operation feeds the input of another. By default, when multiple EventFlow Query operators or StreamSQL APPLY statements connect to the same JDBC Table, all the query operations run in the same thread, and each query operation holds a connection to the external data source. Therefore, a deadlock can occur if the following conditions are all true:

  • The application contains a path running on a single thread with a Query operator connected to a JDBC database.

  • The query returns more than one row from one or more tables.

  • On the same path and in the same thread, a downstream Query operator attempts to update one or more of the selected JDBC tables.

Note

When we refer to a JDBC Query operator, we mean a Query operator that is connected to a JDBC Table data construct, which in turn is configured to connect to an external database. In a text-based StreamSQL application, we mean cases where you are using APPLY JDBC statements that operate on a selected result set, following by an attempted update of the selected tables.

For example, consider the following EventFlow module, where the first Query operator is used to read data from a large result set in a JDBC table. The second Query operator consumes those output tuples and then attempts to update the same JDBC table rows:

In this scenario, if the first query runs to completion first (all rows in the result set are fetched), the second query can run without a problem.

Now, consider what happens if the Select operator passes only a partial result set in its first output tuple to the Update operator. Because the select operation is not finished, the JDBC server forces the Update operator to wait. The Update operator cannot write to the data source while the JDBC server is waiting. Meanwhile, the Select cannot finish fetching rows because the thread shared by the two operators is waiting for the update operation.

To avoid this issue, you can take any one of these steps:

  • Redesign your application to eliminate multiple query operations on the same data source. This might involve writing more complex SQL update operations that select as well as update data. For example, combine the operations into a single Query operator, and within the updating expression use UPDATE ... WHERE.

  • Write a Query that will return a result set (per SELECT) that is less than the configured jdbc-fetch-size. Most JDBC drivers have a default fetch size that is larger than 1, but the actual value varies in different JDBC drivers. Consider explicitly setting the jdbc-fetch-size parameter in the data-sources section of the StreamBase Server configuration file, sbd.sbconf. This parameter attempts to specify the size of the buffer used by the JDBC driver when fetching rows during query execution. Decide on the minimum number of tuples that you can reasonably fetch from the database per query, and then make sure that jdbc-fetch-size is set to a larger value.

  • Change the fetch size used by your JDBC server to match or exceed the maximum result set expected in select and update operations, as described in Limiting Buffer Size for Fetched JDBC Table Rows. For further information about jdbc-fetch-size, refer to StreamBase Server Configuration File XML Reference in the Reference Guide.

  • Process one or more of the Query operations in parallel mode instead of the default serial mode, if feasible for your application, as described in Concurrency Options.

  • Another workaround is to use the (separately installed) external StreamBase Adapter for JDBC, instead of the JDBC Table data construct in a module. If you are interested in this option, contact your TIBCO representative to obtain the external StreamBase Adapter for JDBC.

Multiple operators in multiple threads can conflict, but they will usually succeed eventually, because StreamBase retries the DML statement that fails. In general, one way to work around deadlocks is to put operations in separate threads.

Limiting Buffer Size for Fetched JDBC Table Rows

Queries to some external JDBC data sources can produce very large result sets. If a SELECT statement returns too much data, memory can be exhausted. If the problem cannot be addressed in the JDBC data source itself, consider setting the jdbc-fetch-size parameter in the <data-sources> section of the StreamBase Server configuration file.

This parameter attempts to limit the size of the buffer used by the JDBC driver when fetching rows during query execution. The use and effects of this parameter vary in different JDBC drivers. See jdbc-fetch-size for details.

Retrying Queries

If you know that the database may return an exception due to transaction deadlocks in the database, and the expected behavior of your JDBC server is to roll back the operation or transaction and retry, consider enabling your StreamBase application to detect such exceptions for a select, insert, or update query.

To use the retry feature, edit your application's sbd.sbconf configuration file and add a <param name = "jdbc-retry-sqlstate"> entry in the <data-source> section of your JDBC data source (as described in the StreamBase Server Configuration File XML Reference reference topic). For example, the following code checks for deadlocks in a database:

<data-sources>
  ...
  <param name="jdbc-retry-sqlstate" value="40001"/>
</data-sources>

If the database driver returns a SQLSTATE code of 40001, the StreamBase application retries the query.