Multivariate Statistical Process Control (MSPC) and Nonlinear Iterative Partial Least Squares (NIPALS) Overview

Statistica Multivariate Statistical Process Control (MSPC) and Nonlinear Iterative Partial Least Squares (NIPALS) are implementations of a number of techniques used in statistical multivariate data analysis known as Principal Component Analysis (PCA) and Partial Least Squares (PLS). MSPC also includes the application of these methods to industrial batch processing for quality control and process monitoring. In Statistica, PCA and PLS are implemented using the state-of-the-art algorithm known as NIPALS.

Statistica PCA is a mathematical procedure that aims to represent a set of (possibly correlated) multivariate variables with the aid of a smaller number of uncorrelated variables known as principal components. In other words, it is a multivariate projection method designed to extract systematic variation and relationships among the variables of a data set. This transformation (projection) often simplifies the analysis at hand while also alleviating the worse symptoms of high dimensionality, which is present when the number of variables is large. This makes PCA an ideal technique for solving problems when we are typically faced with a large number of predictor variables.

Although Statistica PCA is primarily a dimensionality reduction tool, its use is by no means restricted to just data preprocessing tasks. Equally important applications of PCA include data diagnostics, both on observation and variable levels. The observation level helps us to detect outliers, while the variable level provides us with insight of how the variables contribute to the observations and relate (correlate) to one another. These diagnostic features of Statistica PCA are particularly useful for process monitoring and quality control as they provide us with effective and convenient analytic and graphic tools for detecting abnormalities that may rise during the development phase of a product. PCA data diagnostics also play an important role in batch processing where the quality of the end product can only be ensured through constant monitoring during its production phase.

Statistica PLS is a popular method for modeling industrial applications. It was developed by Wold in the 1960s as an economic technique, but soon its usefulness was recognized by many areas of science and engineering including multivariate statistical process control in general and chemical engineering in particular. Although the PLS technique is primarily designed for handling regression problems, Statistica PLS also enables you to handle classification tasks. You will find this dual capability useful in many applications of regression or classification, especially when the number of predictor variables is large.

Built upon the capabilities of PCA and PLS techniques, Statistica MSPC is a selection of methods particularly designed for process monitoring and quality control in industrial batch processing. Batch processes are of considerable importance in making products with the desired specifications and standards in many sectors of the industry, such as polymers, paint, fertilizers, pharmaceuticals, cement, petroleum products, biochemicals, perfumes, and semiconductors. The objectives of batch processing are related to profitability achieved by reducing product variability as well as increasing quality. From a quality point of view, batch processes can be divided into normal and abnormal batches. Generally speaking, a normal batch leads to a product with the desired specifications and standards. This is in contrast to abnormal batch runs where the end product is expected to have a poor quality. Another reason for batch monitoring is related to regulatory and safety purposes. Often industrial productions are required to keep full track (i.e., history) of the batch process for presentation of evidence on good quality control practice. Statistica MSPC can help you construct an effective engineering system that you can use to monitor the progression of a batch and predict the quality of the end product.

Data preprocessing
Statistica MSPC and NIPALS provide you with a variety of data preprocessing techniques such scaling, mean centering, and batch unfolding. Scaling and mean centering enhance the quality of your model though improving their predictive ability. Scaling allows for the treatment of all variables on an equal basis while mean centering makes model interpretation easier. Batch unfolding allows you to effectively apply PCA and PLS methods to batch data. Statistica MSPC provides you with two categories for unfolding batch data: time-wise and batch-wise unfolding.
Estimation
PCA and PLS models in Statistica are built using the properties of the NIPALS algorithm. NIPALS is a well established iterative technique widely used in building  PCA and PLS models. With its guaranteed convergence rate, typical accuracy, and scalability (i.e., its ability to handle large data sets), the NIPALS algorithm can construct PCA and PLS models for you with reliable efficiency.
Analysis settings
Statistica MSPC and NIPALS provide you with numerous options for configuring your analysis settings and generating results. You may, for example, specify that the NIPALS algorithm build your model with a pre-set number of principal components. Better yet, you can use the automated cross-validation technique to determine the complexity of your model, i.e., to determine the optimal number of components. You can also add or remove components from your existing model in order to compare the performance of various models with different degrees of complexity using the same data set, all in one analysis.
Results
Statistica MSPC and NIPALS provide you with a wide variety of outputs in the form of spreadsheets and graphs (including brushing capabilities) for reviewing analysis results. These include a variety of control charts and graphs for data diagnostics (both on the observation and variable levels), such as Hotelling T2, Q2, square prediction error SPE, variable significance, and more.
Deployment
In predictive modeling, the goal is often to build Statistical models that can be used reliably for predicting and examining future data, i.e., data that was not part of the model building process. This is more formally known as deployment. Statistica MSPC and NIPALS allows you to deploy existing PCA and PLS models for execution (implementation) against data sets that were not used in the original analysis. Several choices for generating deployed models are provided. They include C\C++, Statistica Visual Basic (SVB), and Predictive Markup Model Language (PMML). Models saved in the latter format can be used in the deployment options of Statistica MSPC and NIPALS for making predictions.