Data Alignment: Time Indexed vs ID Based
Statistica ETL provides two menu options for aligning data:
ID based
-
Aggregates data from multiple data sources based on an identifier variable and an optional time variable
- Optionally aligns data by N equal intervals or N user-specified intervals
Here are three scenarios illustrating these menu options:
Time indexed alignment with a Date/Time variable
In power plants, a database historian (for example., PI System) often stores data describing the continuous process of power generation. Meanwhile, a separate database (for example, Oracle, Microsoft SQL Server) stores data describing soot-blowing and/or other operations. While much data is available, it is not easily accessible to the staff and engineers supervising a particular power plant. As a result, most engineers responsible for power plant operations do not have the tools to easily summarize data, generate reports, or perform basic calculations (such as, root cause analysis). Statistica ETL can help engineers collect the information they need in an organized format. Moreover, this data becomes available for analysis, data mining, and reporting.
For more information, see also
ID based alignment with an ID variable
In manufacturing industries (such as, pharmaceutical), a standard merge/alignment of multiple tables by one or more ID variables is often required. Statistica ETL can perform different types of merges:
- One-to-One
Many-to-Many (such as Cartesian Join). By default, it aggregates means. However, if the aggregation statistics type is set to none, a Cartesian join results.
- One-to-Many
- Many-to-One
With this functionality, you can:
- Perform multiple merges (of multiple data sources) in a single operation
- Compute simple and robust aggregation statistics with data cleaning (e.g., filtering the data into a given range)
- Perform range checking (for valid ranges) in a single pass
For more information, see also
ID based alignment with an ID variable and a Time variable
This special ID-based scenario uses pre-processing of batch-time data to align with other batch-time or static data. For example, suppose you have batch data organized by batch ID, and each batch has different numbers of observations (e.g., some batches have 30 measurements, others have 29 or 31). Statistica ETL has enhanced features to pre-process batch-time data to: re-compute time variable into elapsed time values, and aggregate all batches into a fixed number of elapsed time intervals (normalize batch lengths). You can optionally unstack data to analyze time-dependent changes against static quality outcomes as with Multivariate Statistical Process Control (MSPC).
For more information, see also