Intrinsically Nonlinear Regression Models - Models for Binary Responses: Probit & Logit
It is not uncommon that a dependent or response variable is binary in nature, that is, that it can have only two possible values. For example, patients either do or do not recover from an injury; job applicants either succeed or fail at an employment test, subscribers to a journal either do or do not renew a subscription, coupons may or may not be returned, etc. In all of these cases, you may be interested in estimating a model that describes the relationship between one or more continuous independent variable(s) to the binary dependent variable.
y = exp(b0 + b1*x1 + ... + bn*xn)/{1 + exp(b0 + b1*x1 + ... + bn*xn)}
You can easily recognize that, regardless of the regression coefficients or the magnitude of the x values, this model will always produce predicted values (predicted y's) in the range of 0 to 1.
The name logit stems from the fact that you can easily linearize this model via the logit transformation. Suppose we think of the binary dependent variable y in terms of an underlying continuous probability p, ranging from 0 to 1. We can then transform that probability p as:
p' = loge{p/(1-p)}
This transformation is referred to as the logit or logistic transformation. Note that p can theoretically assume any value between minus and plus infinity. Since the logit transform solves the issue of the 0/1 boundaries for the original dependent variable (probability), we could use those (logit transformed) values in an ordinary linear regression equation. In fact, if we perform the logit transform on both sides of the logit regression equation stated earlier, we obtain the standard linear regression model:
p' = b0 + b1*x1 + b2*x2 + ... + bn*xn
feeling... = b0 + b1*x1 + ...
which is, of course, the standard regression model. It is reasonable to assume that these feelings are normally distributed, and that the probability p of renewing the subscription is about equal to the relative space under the normal curve. Therefore, if we transform each side of the equation so as to reflect normal probabilities, we obtain:
NP(feeling...) = NP(b0 + b1*x1 + ...)
where NP stands for normal probability (space under the normal curve), as tabulated in practically all statistics texts. The equation shown above is also referred to as the probit regression model. (The term probit was first used by Bliss, 1934).
Note: Generalized Linear/Nonlinear Model (GLZ). You can also use the Generalized Linear/Nonlinear Model (GLZ) module to analyze binary response variables. GLZ is an implementation of the generalized linear model and allows you to compute a standard, stepwise, or best subset multiple regression analysis with continuous as well as categorical predictors, and for binomial or multinomial dependent variables (probit regression, binomial and multinomial logit regression; see also Link Functions). In general, the estimation algorithms implemented in the Generalized Linear Models (GLZ) module are more efficient, and Statistica only includes these models here for compatibility purposes.