Filtering examples based on number of occurences in attribute
Hi,
For example I have examples that containts information about visits. Every visit is defined to visitor_id. I want to filter the examples(rows) where the visitor_id occure more than 5 times. So there will be no more then 4 rows for every visitor_id. I tried filter, but that was not helpfull.
Any idea how to do this in rapid miner ?
Thanks.
Tagged:
0
Answers
Hi,
While I am pretty sure that the answer to this question will involve the operators "Aggregate", "Pivot", and "Filter Examples", I am unfortunately not sure if I fully got the problem. Can you give us a small data sample (original data) as well as how the desired output for this sample should look like?
Merci,
Ingo
hi...no the Filter Examples operator is not going to help you here (as you saw). The way I see it, you need to first create an attribute that lists # of occurrences, and then you can filter for n > 5 or whatever. Personally I would use the Aggregate operator where you group by visitor_id and aggregate by visitor_id. Then join this with your original data set on the visitor_id attribute.
Scott