Statistica Query - Streaming of Data on Remote Servers

The query facilities (described in the Introductory Overview), when offered as part of the enterprise version of Statistica, are additionally enhanced by options to process data from remote servers "in-place," that is, without having to import them and create a local data file. This technology is useful for processing extremely large data files where it can produce significant performance gains and allow you to process data files that exceed the storage capacity of the local device.

The streaming database connector facilities enable Statistica to access data directly on the server, unlike the traditional method that requires importing the data first into a Statistica data file on the local computer before they can be processed. This "direct" mode of accessing data via the optional streaming DB connector component offers significant performance gains over the traditional data access method - especially when the data set is very large - because in many circumstances using the streaming DB connector allows the data to be read only once. The traditional method, on the other hand, requires one pass through the source data set in order to import it to the local computer, and then at least one more pass (through the already imported data set on the local device) in order to perform the actual analysis.

Connecting to external databases for streaming DB connector

The Get External Data option executes a user-defined query, and copies the resulting data to your local computer. In other words, all data retrieved by the query is copied into a Statistica data file, and stored locally. Alternatively, you can connect to external databases and retrieve particular sets of data via query without the need to create a local copy of the data. Specifically, in the Create New Document dialog box, select the Streaming DB Connector tab to create a connection to a database that will look to Statistica as if it were a regular data file. In other words, when you use the streaming DB connector options, the program does not first copy and store all data returned by the query on your local computer and then begin the analyses, but rather, the computer processes the database in place, only fetching the information from the database when it is needed for computations. Hence, streaming DB connectors are particularly useful and efficient when processing extremely large data sets (retrieved by a query from an external database), and they are commonly used in data mining applications. See also Streaming Database Connector Technology for additional details.

Writing information back to an external database

With Statistica, you can also write certain information computed by the program back to the original input data file or database and, thus, integrate computed statistics into an existing database or data warehouse. Specifically, with the Rapid Deployment of Models module, you can write computed statistics (predictions, predicted classifications, classification probabilities, residuals) back into the current input data file; this capability to, for example, merge classification probabilities computed by various models into an existing database or data warehouse is extremely useful in the context of data mining applications to deploy models for extremely large data sets. See the Query Options dialog box topic for additional details regarding the specific settings of options to enable Statistica to write results to fields in an external database.

When using the streaming DB connector will not produce speed gains

As mentioned above, the streaming DB connector technology may offer significant speed gains over the traditional method. However, this method is not recommended when multiple analyses are to be performed on the same data set or when performing the specific analytic method requires multiple passes through the raw data set. The main reason for this is that accessing data locally is always faster, and therefore the speed advantages of streaming DB connector does not apply to the instances when the data have already been imported and are available on the local device.

However, there can still be circumstances when the streaming DB connector access is recommended even though multiple passes through the data are necessary. Specifically, when the input data set is very large and its size exceeds the capacity of the local storage device, then using streaming DB connector is the only option. Streaming DB connector is also recommended when Statistica is running as part of a system performing automated queries to various segments of a large database or data warehouse, even if the resulting subsets are of a moderate or small size. In those circumstances, it may be more efficient to access the source data directly and avoid multiple import (and saving) operations of files that are not likely to be ever used again and will only have to be periodically removed from the local storage.

See also, the Cursor definition, Streaming Database Connector Technology (Technical Overview), and Streaming Database Connector FAQs.

Contents

Index

Search Results

Statistica Query - Streaming of Data on Remote Servers