Deep Learning - Test results Confidence values

fstarsinicfstarsinic MemberPosts:20Contributor II
edited April 2020 inHelp
Predictions for test data come back with a prediction (0 or 1 in my case) and a confidence value (float) for both:
confidence(0), confidence(1).

To get the overall confidence, in the past, I would create a new attribute and give it abs(conf0 - conf1) as the overall confidence.
When I do this, I'm noticing very small numbers for the "1 prediction". the values are always below .27.

These seem very low given the predictions are coming back as expected.
The confidence for the 0 label can be very high.

The only thing i can think of is that the dataset is highly imbalanced and has far more 0 labels than 1.
Is this the reason the confidence values are coming back so low for the lesser class? More data would provide better confidence?

My ultimate goal was to "act" on all predictions above a certain confidence but this is perhaps showing me that i cannot use a single value for both predictions (0 and 1) and that i might need to use 2 different confidence values as "trusted".

I trust 0 above .8
I trust 1 above .25 <-- just seems very low to me even tho the results look good.

(or i artifically bump up the confidence of the lesser class so they seem more normal)

As it is, best case, i'd be trusting near a 60%/40% confidence combination which isn't that much better than flipping a coin, i.e., 50%/50%.

So I'm wondering how the confidence values are generated and how I should be interpreting them in terms of what minimum values can be "trusted" and would be considered "actionable".

Thanks.

Best Answer

Answers

  • jacobcybulskijacobcybulski Member, University ProfessorPosts:391Unicorn
    When the model favours one class over another it is the sign of bias. There can be lots of reasons for the biased model, e.g. (1) your data may be heavily unbalanced so during training the model sees one class much more than the other, (2) your system is too simple for the data so that your deep model does not have enough redundancy to accommodate it, (3) a similar issue is that your model underfits the data, so it needs more training, (4) finally it is possible that your training sample and validation sample are very different. The confidence values are not necessarily 50-50 they reflect the nature of your data. "Bumping" your confidence up may be valid as long as the model is unbiased. It is best to get the best model performance first - watch the training performance vs validation performance, the training performance to see that the model is still learning, and validation performance to see at what point the model overfits your training data. If your data or the model are massive it may be too much to ask for cross-validation but at least you could check if the distribution of the two partitions indicates that they come from the same population. After the model is all good, you could adjust the classification threshold to improve some performance indicators.
    Jacob
    lionelderkrikor sgenzer
  • fstarsinicfstarsinic MemberPosts:20Contributor II
    I've decided to not adjust anything in the model, but to have different thresholds for what is flagged an "actionable" prediction for each label separately. That way I can use 2 different target values for the 2 different labels.
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    我理解你said that you don't necessarily want to make any adjustments, but for this type of problem in the future, you might want to check out the Rescale Confidences operator as well as the Drop Uncertain Predictions operator.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer jacobcybulski
  • fstarsinicfstarsinic MemberPosts:20Contributor II
    Interesting. thank you. that looks very promising. I'll try those now.
  • fstarsinicfstarsinic MemberPosts:20Contributor II
    I'm looking at the Rescale Confidence operator. How/Where does this operator fit into a process? Before the model is created? After? And does it need to be used in Training only? Testing only? Both?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    ah thank you@lionelderkrikor.
    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    You're welcome, Scott !

    Regards,

    Lionel
    sgenzer
Sign InorRegisterto comment.