Survival Analysis - Notes and Technical Information

Life table
The life table technique is one of the oldest methods for analyzing survival (failure time) data. In order to arrive at reliable estimates of the three major functions (survival, probability density, and hazard), the minimum recommended sample size is 30. The life table estimates are computed according to standard formulas.
Specifying tabulated survival data as input
The life table routine in Survival Analysis accepts already tabulated data as input. For example, suppose we had 750 observations in a study, and have already tabulated the survival times in the following manner:
Var_1: Var_2: Var_3:
Interval Start No. of Obs. Censored No. of Obs. Failed
0 10 185
1 10 88
2 10 55
3 10 43
4 14 32
5 52 31
6 38 20
7 24 7
8 25 6
9 24 6

On the Life Table & Distribution of Survival Times - Table of survival times tab,  specify the three variables in exactly this order (1-3) in the variable selection window (click the Variables button to select the variables). Then, enter the number of observations at the beginning of the study (that entered the first interval) by the Number entering first interval check box.

Fitting Theoretical Survival Distribution. The regression procedure for fitting the four theoretical distributions to the life table is based on algorithms proposed by Kennedy and Gehan (1971), and discussed in detail in Lee (1980). Basically, the hazard functions (specifically, the logarithmic transforms of the hazard functions) of all four theoretical distributions are linear functions of the survival times (or log-survival times). Thus, the hazard functions [h(t)] may be expressed in terms of linear regression functions as:

  1. h(t) = L Exponential
  2. h(t) = L'g*tg-1 Weibull (where L' = Lg)
  3. h(t) = exp(L+g*t) Gompertz
  4. h(t) = L+g*t Linear exponential

If we set y=h(t) or y=log h(t), then all four models can be stated in the general form:

y = a + b*x

Weighted least squares
Gehan and Siddiqui (1973) suggest weighted least squares methods to fit the parameters of the respective models to the data. Specifically, Statistica minimizes the quantity:

WSS = S(wi(yi-a-b*xi)2)

Three different weights are used in the estimation:

wi = 1 (unweighted least squares)

wi = 1/vi

wi = ni*hi

where vi is the variance of the hazard estimate, and hi and ni are the interval width and number of observations exposed to risk in the i'th interval, respectively.

Parameterization of the Weibull distribution
As indicated in the expressions for the hazard function, for the Weibull distribution the program estimates and reports the parameter L' instead of the standard scale parameter L. When comparing the results computed by the Survival Analysis module with those computed by, for example, the Process Analysis module, the estimates for the scale parameter cannot be directly compatible. See the note describing the different parameterizations of the Weibull distribution for additional details.
Correcting intervals without terminations (deaths)
The Correct intervals containing no terminations/deaths check box in the Life Tables and Distribution of Survival Times dialog box determines whether intervals in which there are no deaths/terminations are adjusted so that survival distributions can be fitted. Parameters cannot be computed for intervals where no deaths/terminations occur; by default (the check box is selected) the proportion surviving in those intervals is therefore adjusted by the program (the proportion surviving is set to .5/number exposed). Clear the Correct intervals containing no terminations/deaths check box when generating a life table for descriptive purposes only.
Two-sample tests
There are five two-sample tests that can be computed with this module, namely, Gehan's generalized Wilcoxon test (Gehan, 1965a,b, computed using Mantel's procedure; Mantel, 1967), Cox's F-test (Cox, 1964), Cox-Mantel's test (Cox 1959, 1972; Mantel, 1966), the log-rank statistic (Peto & Peto, 1972), and Peto & Peto's generalized Wilcoxon test (Peto & Peto, 1972). All of these procedures test the statistical significance of survival in two groups.
Analyzing large samples
Because of the nature of the algorithms used in the two-sample comparison routines, this program imposes stricter limitations on the maximum group size than other routines available in Survival Analysis. Use the multiple sample comparison option to analyze large data sets; if only two groups are specified for the multiple sample procedure, the computations and resulting statistics are identical to those of Gehan's generalized Wilcoxon test.
Cox's proportional hazard model
The proportional hazard model (Cox, 1972) is the most general of the regression models because it does not make any assumptions about the nature or shape of the underlying survival distribution. The model assumes that the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates); thus, in a sense, Cox's regression model may be considered to be a nonparametric method. The model can be expressed as:

h(t,z) = h0(t)*exp(b'z)

In the above expression h(t,z) is the hazard rate, contingent on a particular covariate vector z; h0(t) is referred to as the baseline hazard, that is, it is the hazard rate when the values for all independent variables (that is, in z) are equal to zero; b is the vector of regression coefficients.

The program uses the Newton-Raphson method to maximize the simplified partial likelihood (proposed by Breslow, 1974):

L = Π exp(b'si)/[Σexp(b'zj)]d(i)

In this formula, d(i) stands for the number of cases observed to fail at time ti, si stands for the sum (over the d(i) cases observed to fail at time ti) of the covariates, zj stands for the covariate vector for a case j in the risk set at time ti. (The geometric sum Π is over all k distinct failure times, the regular sum Σ is over all cases jÎRi in the respective risk set Ri, that is, cases that are observed to fail at or after the respective failure time ti.)

To estimate the survival function S contingent on a particular covariate vector z, the algorithm uses the relationship:

S(t,z) = S0(t)exp(b'z)

In this formula, S0 is the baseline survival function which is independent of the covariates. Breslow (1974) proposed the following estimator of the baseline survival function:

S0(ti) = Π[1-(d/åexp(b'z))]

The Chi-square value is computed as a function of the log-likelihood for the model with all covariates (l1), and the log-likelihood of the model in which all covariates are forced to 0 (L0); specifically:

Chi-square = -2*(L0- L1)

Estimating the Parameters for Time-Dependent Covariates. To accommodate time-dependent covariates, the following modification to the simplified partial likelihood (Breslow, 1974) is introduced (see Lawless, 1982, page 393):

L = Π exp(b'si(ti))/Σ[exp(b'zj(ti))]d(i)

In this formula, si(ti) stands for the sum of the transformed (time-dependent) covariates observed to fail at a particular time ti, and zi(ti) stands for the transformed (time-dependent) covariate vector for a case j in the risk set at time ti, and d(i) stands for the number of cases observed to fail at time ti. (The geometric sum P is over all cases, the regular S is over all cases in the respective risk set, that is, cases that are observed to fail at or after the respective failure time ti.)

This partial likelihood function is minimized using the Newton-Raphson method. Note that in order to compute the likelihood for a given set of parameters, for each case i, all cases with survival times greater than or equal to that of case i have to be processed. Thus, fitting models with time-dependent covariates may require extensive computations, particularly when there are many cases in the datafile.

Wald statistic. The results spreadsheet with the parameter estimates for the Cox proportional hazard regression model includes the so-called Wald statistic, and the p-value for that statistic. This statistic is based on the asymptotic normality property of maximum likelihood estimates, and is computed as:

W = b * 1/Var(b) * b

In this formula , b stands for the parameter estimates, and Var(b) stands for the asymptotic variance of the parameter estimates. The Wald statistic is tested against the Chi-square distribution.

Exponential regression
The regression model for censored exponential survival data has been described by Feigl and Zelen (1965), Glasser (1967), Prentice (1973), and Zippin and Armitage (1966). The algorithm for obtaining the maximum Likelihood estimates uses the Newton-Raphson method and is based on Lagakos and Kuhns (1978). The model assumes that the survival time distribution, contingent on the values of a covariate vector z, is exponential, and that the rate parameter can be expressed as:

L(z) = exp(a + b*z)

The Chi-square value is computed as usual, that is, as a function of the log-likelihood for the model with all covariates (L1), and the log-likelihood of the model in which all covariates are forced to 0 (zero; L0).

Computation of the standard exponential order statistic
One way to check the exponentially assumption of this model is to plot the residual survival times against the standard exponential order a (see Lawless, 1982). Specifically, a is computed as:

ai= S(N-j+1) - 1

If the exponentiality assumption is met, all points in this plot should be arranged roughly in a straight line.

Normal and lognormal regression
In this model, it is assumed that the survival time (or log survival times) come from a normal distribution, and that the model can be stated as:

ti = a + zi * b

where zi is the covariate vector for subject i, and a is a constant. If lognormal regression is requested, ti is replaced by its natural logarithm.

Estimation
STATISTICA uses a very efficient method, the so-called expectation maximization (EM) algorithm, for obtaining the maximum likelihood estimates for this model. This algorithm was first proposed by Dempster, Laird, and Rubin (1977) and is discussed in Cox and Oakes (1984). Specifically, Survival Analysis uses routines that follow the logic developed by Wolynetz ((1979 a, b).
Goodness-of-fit
The Chi-square value is computed as usual, that is, as a function of the log-likelihood for the model with all covariates (L1), and the log-likelihood of the model in which all covariates are forced to 0 (zero; L0). Note that the null model contains two parameters, namely maximum likelihood estimates of the variance and the mean (also estimated using the EM algorithm), and the full model contains parameters for all covariates, a constant (the intercept), and the variance.

The normal regression model is particularly useful because many data sets can be transformed to yield approximations of the normal distribution. Thus, in a sense this is the most general fully parametric model (as opposed to Cox's proportional hazard model which is non-parametric), and estimates can be obtained for a variety of different underlying survival distributions (see Schneider, 1986, for a discussion of the normal distribution and its transforms).