预测罕见with Auto Model

niall_dempsterniall_dempster MemberPosts:1Newbie
edited April 2020 inHelp
Hi,

I am trying to develop 2 models that predict relatively rare events (the F3/4 column and F4 column in attached file). I am a physician and not too familiar with machine learning so trying to get up to speed. I used Turbo Prep to impute missing data in the attached training/validation database and I have a separate independent Testing database that I would like to use once the models have been generated.

Initially using Auto Model, accuracy seemed to be prioritised (every case was predicted to be index 1, which was almost always correct since index 2 is infrequent). However, for this problem it is important to have a sensitive model so I am picking up cases of the rare event (index 2). Is it possible to optimise the AUC/Youden Index rather than accuracy?

So far I have tried adding in custom settings for costs and benefits, so that predicting range 1 where true range is 2 is penalised, and correctly predicting true range 2 is rewarded. Are there recommended numbers to add in for these costs/benefits?

Many thanks for your help

BW,

Niall
Tagged:

Answers

  • hbajpaihbajpai MemberPosts:102Unicorn
    Hey@niall_dempster,

    The cost/benefits are typically based on domain knowledge. Think it like this, what profits you will have for every correct predictions and how much money you will lose if you predict incorrectly and then you can use the exact values in the matrix.

    Auto model main criterion is set to classification error. However, you can open process of your best performing model and change the main criterion. It is in (4) - SCORING, VALIDATION, EXPLANATIONS, WEIGHTS & SIMULATOR section. You can open Validate Model sub-process and evaluate different options with performance operator.
    Best,
    Harshit
    lionelderkrikor
Sign InorRegisterto comment.