如何平衡数据基于attribute value?

User222697User222697 MemberPosts:17Contributor II
Hi,

i have a dataset with an attribute called "demo".

There are 3 possibles values in "demo" atribute:

"Alpha" which counts 4000 rows
"Beta" which counts 3000 rows
"Omega" which counts 2000 rows

How can i generate a new dataset balanced with the same number of rows?

"Alpha" which counts 2000 rows
"Beta" which counts 2000 rows
"Omega" which counts 2000 rows

Thanks

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Hi,

    look at the tutorial process for the Sample operator (linked in the help text).

    It shows that the "balance data" is being used for this. You need to set the role of your demo attribute to "label", activate "balance data" in Sample, set the sampling method toabsolute, and then enter the desired number of examples in theEdit Listdialog.

    Regards,
    Balázs
Sign InorRegisterto comment.