One-Hot Encoding Top 10 Items (Fractional) Rest Other

ZarrokZarrok MemberPosts:3Newbie
Hello together,

i am searching for a smart solution for One-Hot Encoding to the Top 10 (Fractional) Items.
Currently I solve the problem by creating a new attribute for the top 10 values. For example:
For each Attribute I need to generate a new Column:
if((contains([Attri],"Example Data")) ,1,0)

Does anybody have a smart solution for this kind of issue ?

Kind regards,
ZaRRoK
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    likely just use Remove Rare Values first and then One Hot Encoding?
    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ZarrokZarrok MemberPosts:3Newbie
    edited July 2022
    我明白你的意思,问题m is rather that I have a large dataset with about 4000 groups, of which I would like to look at the top 100, the others should be defined as "Other". I would have 101 columns.
    The top 100 groups account for about 70% of the total.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Yeh, thats why I would propse to use the Remove Rare Values operator to replace all strings which are not in the top100 with "Other"?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    Zarrok
  • ZarrokZarrok MemberPosts:3Newbie
    I have found a solution, but it does not make me happy... I have created a aggregation(fractional) which I then join back to the table. Then I create a new attribute, which after the appropriate share either takes over the attribute or defines it as " Other ".

Sign InorRegisterto comment.