Stepwise Model Builder - Cox Regression Introductory Overview

The purpose of the Statistica Stepwise Model Builder - Cox Regression module is to facilitate the identification of Cox regression models based on predictors chosen by the user at each step. The final Cox regression model can then be saved in XMLform, PMML form or directly deployed to Statistica Enterprise.

The module will compute continuous and categorical predictors with multiple degrees of freedom, and automatically move the latter into or out of the regression equation in single steps.

At each step, the program will compute various predictor statistics for predictors in the current model, and predictors (predictor candidates) not in the current equation. Statistics reflecting on the overall model quality are also computed.

Thus, you can build models by manually selecting the most important predictors into the regression equation one step at a time, using criteria of statistical significance for the prediction as well as policy and other criteria. By moving selected variables or groups of variables into the prediction and equation, and removing others from that equation, what-if (scenario) analyses are possible to assess the impact of certain model assumptions, policy, or regulatory constraints (for example, on predictors that are not permitted). Thus, analysts can build models that are parsimonious, consistent with policies, guidelines, and regulatory constraints, but are also as accurate as possible.

Cox Regression

Cox’s proportional hazards model is a distribution-free model in which predictors are related to lifetime multiplicatively.   

The form of the Cox proportional hazards model is as follows:

h(t|x) = h0(t) exp(xb)

where h0(t) is the baseline hazard and b = (b1, ..., bp)'  is the vector of regression coefficients. This model does not impose any distributional assumption on the baseline hazard. It is referred to as proportional because the ratio of hazard rates of two individuals is constant and not dependent on time.

This model has become popular in various domains whenever the dependent variable of interest represents the time to a terminal event and the duration of study is limited in time.