"to ask about data sampling"
Hi all
I have an unbalanced dataset . No of data in a class is 500 time more than No. of a data in other groups.
and I want to re sample such that the number of sample in all group is same.
How can I do that?
I tried to use sampling techniques but all of them just re sample and save ratio of number of sample in groups
Thank you for your consideration and time in advance
问候
REZA
I have an unbalanced dataset . No of data in a class is 500 time more than No. of a data in other groups.
and I want to re sample such that the number of sample in all group is same.
How can I do that?
I tried to use sampling techniques but all of them just re sample and save ratio of number of sample in groups
Thank you for your consideration and time in advance
问候
REZA
Tagged:
0
Answers
which RapidMiner version do you use?
Greetings,
Sebastian
ver 4.6
to clarification, I want to do this balanced sampling several times and make an average of them performance result to know overall performance in this method
thanks
问候
REZA
I think there are several possibilities you could use:
If you are going to use a learner supporting example weights, you could use the EqualLabelWeighting. This will not sample the number of attributes, but equalizes the total weight assigned to each label. That might be even better, because no examples will be lost at all.
Another possibility would be to split the example set several times depending on the label and sample each subset to the same size. After this, all subsets would have to be merged and viola: You have a balanced example set.
If this becomes unhandy, because you have to many label values, you might use the ValueIterator and an IOStorer and IORetriever...
Ok, seems to be rather complex. Here's how it would work: Hope this will help you, understand what I'm suggesting.
Greetings,
Sebastian