Cutoff Points
The Cutoff Points Analysis complements the
analysis of the ROC Analysis section. The
Cutoff Points Chart shows clearly the intersection of both the sensitivity (TPR) and specificity (TNR) lines and also
the intersection of the FPR line with the FNR line. Seeing how these four lines change with
the model output is a great aid to choosing the
Ideal Cutoff Point for your test values.
The Ideal Cutoff Point varies from problem to problem, as one might be interested in minimizing or maximizing different things. Sometimes the goal is to minimize the number of false positives; other times the number
of false negatives; still other times one
might need to maximize the number of true positives or true negatives. With the help of the Cutoff Points Chart of GeneXproTools you can see clearly the best way to move your model threshold to achieve your goals.
Notwithstanding, there is a generic Optimal Cutoff Point. This Optimal Cutoff Point is given by the
Youden index and you can see where it exactly lies in the Cutoff Points Chart. When you
check the Show ROC CP checkbox, GeneXproTools draws the ROC
Cutoff Point in dark brown. GeneXproTools also shows the
Logistic Cutoff Point in the
Cutoff Points Chart
so that you can easily compare both cutoff points. To draw the
Logistic Cutoff Point just check the checkbox Show LCP.
The Youden index J returns the maximum value of the expression
(for inverted models it returns the minimum):
J = max[SE(t) + SP(t) - 1]
where SE(t) and SP(t) are, respectively, the
sensitivity and specificity over all possible
threshold values t of the model. Thus, the ROC Cutoff Point corresponds to the model output at the Optimal Cutoff Point.
In the Cutoff Points Table, GeneXproTools also shows all “SE + SP -1” values and highlights in light green the row with the Optimal Cutoff Point and corresponding
ROC Cutoff Point. These parameters are also shown in the
companion Cutoff Points Statistics Report.
The ROC Cutoff Point can be obviously used to evaluate a Confusion Matrix (in GeneXproTools it is called
ROC Confusion Matrix) and, in the Cutoff Points Table, you have access to the Predicted Class, the Match, and Type
values used to evaluate the ROC Confusion Matrix (you can see the graphical representation of the
ROC Confusion Matrix in the Confusion Matrix
section).
The visualization of the ROC Confusion Matrix is a valuable tool and can in fact be used to determine the right number of
bins to achieve a good fit with the
Logistic Regression Model. But GeneXproTools allows you to do more with the ROC Confusion Matrix and associated
ROC Cutoff Point. By allowing the
conversion of Logistic Regression runs to the Classification Framework,
you can use this model with its ROC Cutoff Point straightaway to make discrete classifications using the Classification Scoring Engine of GeneXproTools.
Note, however, that you'll have to change the Rounding Threshold to
ROC Threshold in the Settings Panel (when a Logistic Regression run
is converted to Classification, the Rounding Threshold is set to
Logistic Threshold by default) and then recalculate all model
thresholds by selecting Update All Thresholds in the History menu.
The Youden index is also used to evaluate a wide range of useful statistics at the Optimal Cutoff Point (OCP statistics for short). They include:
- TP (True Positives)
- TN (True Negatives)
- FP (False Positives)
- FN (False Negatives)
- TPR (True Positives Rate or Sensitivity)
- TNR (True Negatives Rate or Specificity)
- FPR (False Positives Rate, also known as 1-Specificity)
- FNR (False Negatives Rate)
- PPV (Positive Predictive Value)
- NPV (Negative Predictive Value)
- Classification Accuracy (Correct Classifications)
- Classification Error (Wrong Classifications)
How they are calculated is shown in the table below ("TC" represents the number of Total Cases):
TPR (Sensitivity) |
TP / (TP + FN) |
TNR (Specificity) |
TN / (TN + FP) |
FPR (1-Specificity) |
FP / (FP + TN) |
FNR |
FN / (FN + TP) |
PPV |
TP / (TP + FP), and TP + FP
≠ 0 |
NPV |
TN / (TN + FN), and TN + FN
≠ 0 |
Classification Accuracy |
(TP + TN) / TC |
Classification Error |
(FP + FN) / TC |
It is worth pointing out that OCP statistics are quantile-independent and
are therefore a good indicator of what could be achieved with a model in terms of logistic fit and accuracy.
See Also:
Related Tutorials:
Related Videos:
|