threshOptim()
function or the RedYellowGreen()
function from the RemixAutoML package in R for situations like this. Okay, so you spent time building out an awesome classification model. You are seeing a great AUC compared to previous versions. Now what? Your product manager asks you what threshold to use for classifying your predicted probabilities. How do you answer that?
You should know how to answer this question. There are several methods you can use. H2O, for example, offers several which may be useful for you to know. Those are:
Okay, those sound technical, but which one do you use to optimize asymmetrical costs and profits for correct predictions and Type 1 and Type 2 errors? Let’s say that the payoff matrix looks like the one below. H2O defaults to max f1 which will typically be sufficient for most cases but they also offer F2 for penalizing a large number of false negatives and f0point5 for penalizing a large number of false positives. Those measure get you closer to where we want to be but why not be precise with optimizing the threshold?
If your confusion matrix looks something like the below table, such that it’s not comprised of 1’s for correct predictions nor is it comprised of 0’s for incorrect predictions (default values), then you should use be using thethreshOptim()
and RedYellowGreen()
functions in the RemixAutoML package for R.
Actual \ Predicted | Positive Prediction | Negative Prediction |
Positive Outcome | 0.0 | -15 |
Negative Outcome | -4.0 | 0.0 |
The threshOptim()
function utilizes the costs in the confusion matrix to determine a single optimal threshold based on the threshold that maximizes utility. For cases when uncertain probability predictions should warrant further analysis by a medical professional, you should use the RedYellowGreen()
function. The function is designed to allow you to plug in not only the costs of a false positive / false negative but also the cost of further analysis, thus providing two thresholds. Any predicted probability that falls in between the two thresholds should be sent for review while the predicted probabilities that fall below the lower threshold should be an obvious negative outcome and those above the upper threshold should be obvious cases of a positive outcome.
Below is a sample plot output in R from RemixAutoML::RedYellowGreen()
that is automatically generating from running it. The lower threshold is 0.32 and the upper threshold is 0.34. If you generate a predicted probability of 0.33, you would send that instance for further review.
Wouldn't it be a good idea to create a course?