Attributes with too many possible values
           I am a beginner and I am not quite familiar with all the operators.
I have a dataset where there is an attributex(the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attributexwherex's absolute count is greater than say 50. Is that possible?(or get records with only the top best y absolute count)
          I have a dataset where there is an attributex(the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attributexwherex's absolute count is greater than say 50. Is that possible?(or get records with only the top best y absolute count)
           Tagged:
          
          
          
           1
           
          
         Best Answer
- 
         MartinLiebig
           Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404 MartinLiebig
           Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404 RM Data Scientist
          Hi@Sarah01,the operator toolbox extension as an operator Replace Rare values which does exactly this. RM Data Scientist
          Hi@Sarah01,the operator toolbox extension as an operator Replace Rare values which does exactly this.
 Best,Martin
 - Sr. Director Data Solutions, Altair RapidMiner -
 Dortmund, Germany3

 
          
 Newbie
Newbie
 RM Data Scientist
RM Data Scientist
