Assumptions, Limitations, Practical Considerations - Multicollinearity and Matrix Ill-conditioning
This is a common problem in many correlation analyses. Imagine that you have two predictors (X variables) of a person's height: 1) weight in pounds and 2) weight in ounces. Obviously, our two predictors are completely redundant; weight is one and the same variable, regardless of whether it is measured in pounds or ounces. Trying to decide which one of the two measures is a better predictor of height would be rather silly; however, this is exactly what you would try to do if you were to perform a multiple regression analysis with height as the dependent (Y) variable and the two measures of weight as the independent (X) variables. STATISTICA would issue a "matrix ill-conditioned" message to let the user know that he or she is trying to do the impossible. When there are very many variables involved, it is often not immediately apparent that this problem exists, and it may only manifest itself after several variables have already been entered into the regression equation. Nevertheless, when this problem occurs it means that at least one of the predictor variables is (practically) completely redundant with other predictors. Multiple Regression contains many statistical indicators of this type of redundancy (tolerances, semi-partial R, etc., see the Model Definition dialog) as well as some remedies (e.g., Ridge regression).
- Fitting centered polynomial models
- The fitting of higher-order polynomials of an independent variable with a mean not equal to zero can create difficult multicollinearity problems. Specifically, the polynomials will be highly correlated due to the mean of the primary independent variable. With large numbers (e.g., Julian dates), this problem is very serious, and if proper protections are not put in place, can cause wrong results. The solution is to "center" the independent variable (sometimes, this procedures is referred to as "centered polynomials"), i.e., to subtract the mean, and then to compute the polynomials. See, for example, the classic text by Neter, Wasserman, & Kutner (1985, Chapter 9), for a detailed discussion of this issue (and analyses with polynomial models in general). Note that STATISTICA automatically checks for very large numbers (created in the process of computing the polynomials).