How do I split up scored data into 20 equally sized segments?

simon_philiposesimon_philipose MemberPosts:3Learner I
edited February 2020 inHelp

Hi there-- still only a few days into using RapidMiner and wasn't sure if/how I could go about doing the following:

I created a logistic regression model for direct mail marketing. I've scored my model onto new data but what I want to be able to do is split the scored data up into 20 different groups based on their descending confidence(responder) value resulting in the A's having 1/20th of the most likely responders, the Bs having 1/20th of the next most likely and so on.

Your help is much appreciated.

-Simon




Tagged:

Best Answer

Answers

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:123RM Data Scientist
    Hi@simon_philipose,

    You can first use Sort operator to Sort confidence values with the descending order, followed by Split data operator.
    In split data operator Parameter window; add partition ratio = 1/20

    Hope this helps.

    Cheers,
    Pavithra
    sgenzer MartinLiebig rfuentealba [Deleted User]
  • simon_philiposesimon_philipose MemberPosts:3Learner I

    Hi Pavithra,

    Thank you for your response. So I ran into a few problems with using the Split Data operator.

    1. It splits the dataset into multiple datasets. What I need is one data set but with a field called Model_Group with a value of A, B, C, D, etc. depending on the confidence values.

    2. It appears the maximum number of data sets I can split is 8 by putting .125 in the partions ratio field 8 times. I can't do 10, much less 20 different splits.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    i would do the following:

    Sort - by confidence
    Generate ID - to get a index
    Use Generate attributes with id%10 to get your Model_Group

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    rfuentealba sgenzer
  • simon_philiposesimon_philipose MemberPosts:3Learner I
    Thank you so much@rfuentealba-- your solution worked perfectly! Very much appreciated!!
    sgenzer
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Wow, so many ways to do this in RapidMiner!
    If you copy your score attribute first, Discretize by Frequency should be able to do this directly for your score attribute by selecting that attribute and setting the number of bins to 20. This will create exactly the bins you are looking for, although if there are a large number of ties this can sometimes cause problems for the Discretize operators. (The reason you copy the score first is Discretize will replace your selected attribute with a new attribute, so if you still want to have the raw score, you will need two copies of it, one which is binned and one which is not).
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    MartinLiebig rfuentealba
Sign InorRegisterto comment.