Set Cutoff for Classification using Logistic Regression

btibertbtibert Member, University ProfessorPosts:146Guru
edited September 2019 inHelp
Is it possible to manually set the threshold for the cutoff predict the label using a logistic regression? I read that the cutoff is .5, which I get, but my dataset is heavily imbalanced and I would like to set this by hand. There appears to be an automated way to do this, but for the sake of teaching the concept of the cutoff, I would prefer to show this manually.

Thanks!
Tripartio

Best Answer

  • arjun_gopalarjun_gopal MemberPosts:7Contributor II
    Solution Accepted
    Hi,
    "Create Threshold" and "Apply Threshold" should do the trick for you.



    Tghadially btibert Tripartio

Answers

  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    Hello@btibert

    The operators "create threshold" and "apply threshold" does this. Please inform if this is what you are looking for.

    Hope this helps
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    i think what@btibertrelates to is Platt Scaling. The operator Rescale Confidences (Logistic) is i think what he looks for. You can combine this with Thresholds afterwards.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    varunm1
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    @mschmitzoops my bad, I totally missed the Logistic Regression.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • btibertbtibert Member, University ProfessorPosts:146Guru
    I had seen those operators@gopalabut had a hard time wrapping my head around the construction, thanks for the screenshot as that is now intuitive. Thanks!
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    You can also use Drop Uncertain Predictions operator if you want to treat ambivalent cases as excluded rather than forcing them into one category or another simply by lowering (or raising) the threshold. This is often another helpful way of dealing with the issue because it allows you to recalculate the performance metrics without the excluded cases.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • btibertbtibert Member, University ProfessorPosts:146Guru
    Thanks@Telcontar120, I will keep that in mind as well, but because this will likely be the first time my students have really sunk their teeth into logistic regression, the cutoff discussion, and modifying it manually is perfect for them to understand the construction before using tools that optimize it for them.
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    @btibert当然,这是有道理的,但只是for clarity, Drop Uncertain Predictions doesn't automatically optimize anything. It simply excludes predictions below a certain confidence level that is set manually. It is conceptually the same as Create Threshold, only Create Threshold says "use all data but don't change my prediction until the confidence is above 70%" and Drop Uncertain Predictions says "only keep predictions that are above 70% confidence."
    If you take a look at the tutorial process it should make the outcome a bit clearer.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    Tghadially
  • btibertbtibert Member, University ProfessorPosts:146Guru
    edited September 2019
    will do@Telcontar120, many thanks for the follow-up note

    Tghadially
Sign InorRegisterto comment.