Logistic regression: Select or change reference group

chris1chris1 MemberPosts:5Contributor I
edited December 2018 inHelp

I am new to RapidMiner but have been working with logistic regression in SAS for years. When working with categorical attributes in logistic regression, how does RapidMiner choose which cateogry to be the reference category? Is it possible to change this to assign a different reference category?

例如,假设我竞赛模式有五个possible values of white, black, asian, other, and unknown and RapidMiner is assigning a weight of 0 to black (with all other weights being relative to black) but I want to change it so asian or white is the reference group with a weight of 0. Is there a way to do this?

Thanks.

Tagged:

Best Answer

  • earmijoearmijo MemberPosts:270Unicorn
    Solution Accepted

    The solution to your problem is that you could create the dummies yourself.

    In this first example, I let RM choose the reference category (they turn out to beFemalefor gender andFirstfor Passenger Class.





    <宏/ >

    < =“tru运营商激活e" class="process" compatibility="7.5.003" expanded="true" name="Process">

    < =“tru运营商激活e" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="238">


    < =“tru运营商激活e" class="set_role" compatibility="7.5.003" expanded="true" height="82" name="Set Role" width="90" x="179" y="238">




    < =“tru运营商激活e" class="select_attributes" compatibility="7.5.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="238">



    < =“tru运营商激活e" class="h2o:logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression" width="90" x="514" y="238"/>










    Then you get:

    Screen Shot 2017-08-11 at 11.32.38 AM.png

    Say you want the reference categories to be Male and Third Class. You have to create dummies and use comparison groups. This gives you more control but you have to work more.





    <宏/ >

    < =“tru运营商激活e" class="process" compatibility="7.5.003" expanded="true" name="Process">

    < =“tru运营商激活e" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="238">


    < =“tru运营商激活e" class="set_role" compatibility="7.5.003" expanded="true" height="82" name="Set Role" width="90" x="179" y="238">




    < =“tru运营商激活e" class="select_attributes" compatibility="7.5.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="238">



    < =“tru运营商激活e" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="447" y="238">






    < =“tru运营商激活e" class="h2o:logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression" width="90" x="648" y="238"/>











    Then you get:

    Screen Shot 2017-08-11 at 11.32.20 AM.png

    Obviously you can get the original result using:





    <宏/ >

    < =“tru运营商激活e" class="process" compatibility="7.5.003" expanded="true" name="Process">

    < =“tru运营商激活e" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="238">


    < =“tru运营商激活e" class="set_role" compatibility="7.5.003" expanded="true" height="82" name="Set Role" width="90" x="179" y="238">




    < =“tru运营商激活e" class="select_attributes" compatibility="7.5.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="238">



    < =“tru运营商激活e" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="447" y="187">






    < =“tru运营商激活e" class="h2o:logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression" width="90" x="648" y="238"/>











    And you get:

    Screen Shot 2017-08-11 at 11.31.51 AM.png

    sgenzer Thomas_Ott chris1

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Hi Chris,

    They way to control which target or reference variable you want to learn to is using the Set Role operator. Just select the variable name and set the parameter role to 'label.'

    sgenzer
  • chris1chris1 MemberPosts:5Contributor I

    Thanks for the reply but I think maybe I didn't clearly state my question. I have the label set correctly, that's not an issue. What I'm trying to do is determine which level of category within my categorical independent variable in the model is set as the reference group that has a weight of zero within that categorical variable. The weights/coefficients that the model generates are relative to the reference group in the category.

    In my particular model, race is one of the independent variables. When I run the model, RapidMiner is setting the reference group for the categorical race variable as the "black" group. All the coefficients associated with race in the model are then the relative coefficients for each race category relative to the "black" race group. Instead I want to set the "white" group as the reference group and show the coefficients for each race cateogry relative to the "white" group. Some races have positive coefficeint values right now relative to black but may have negative coefficient values when compared to the white group. Race isn't the only categorical predictor that I have in the model, it's just the one I'm using in my example since it's easily understood.

    Does that help clear up what I'm trying to do?

    Thanks.

  • chris1chris1 MemberPosts:5Contributor I

    That's perfect, exactly what I was trying to do. Thanks for your help!

    sgenzer
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    You can also use the "Nominal to Numerical" operator and use the "effect coding" option, which allows you to specify your own comparison groups.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer Thomas_Ott
Sign InorRegisterto comment.