Other Obs. Stats

General Cook’s Distance

General Cook’s Distance (D) measures the difference in the residuals of all cases due to removing the ith observation. Cook’s D for the ith observation is given by:

Differential

The differential residual measures the difference in the Pearson statistic due to removing the ith observation.

Differential Deviance

The differential deviance residuals measure the difference in the deviance statistic due to removing the ith observation.

Differential Likelihood

The differential likelihood residuals measure the difference in the likelihood due to removing the ith observation.

ROC Curve

The ROC curve is often used as a measure of goodness-of-fit to evaluate the fit of a logistic regression model with a binary classifier. The plot is constructed using the true positive rate (rate of events that are correctly predicted as events and also called Sensitivity) on the y- axis against the false positive rate (rate of non-events predicted to be events also called 1- Specificity) on the x-axis for the different possible cutoff points based on the model .

The range of values for the predicted probabilities , Sensitivity and 1-Specificity, is provided in the corresponding spreadsheet. For each ith predicted probability, the algorithm iterates through each case to classify it as events or non-events using the predicted probability as the threshold value. The Sensitivity and 1-Specificity are calculated by:

The area under the ROC curve, sometimes also referred as Area Under Curve (AUC), gives a summary measure of the predictive power of the model, that is, the larger the AUC value (closer to 1), the better is the overall performance of the model to correctly classify the cases.

The AUC is computed using a nonparametric approach known as the trapezoidal rule. The area bounded by the curve and the baseline are divided into a series of trapezoidal regions (intervals) based on (1-Specificity, Sensitivity) points. The area is calculated for each region, and by summing the areas of the region, the total area under the ROC curve is computed.

Lift Chart

The lift chart provides a visual summary of the usefulness of the information provided by the logistic model for predicting a binomial dependent variable. The chart summarizes the utility (gain) that you can expect by using the respective predictive model shown by the Model curve as compared to using baseline information only.

Analogous lift values (Y-coordinate) that are calculated as the ratio between the results obtained with and without the predicted model can be computed for each percentile of the population sorted in descending order by the predicted value, that is, cases classified into the respective category with the highest classification probability. Each lift value indicates the effectiveness of a predictive model, that is, how many times it is better to use the model than not using one.

Let n be the number of true positives that appear in the top k% of the sorted list.

Let r be the number of true positives that appear if we sort randomly, which is equal to the product of the total number of true positives and k.

Then,

The values for different percentiles can be connected by a line that typically descend slowly and merge with the baseline.