How to automatically aggregate the numerial value of every 10/20/30... coloums
cindyliu_au
MemberPosts:6Newbie
inHelp
the original data: 300 attributes: from day1 to day300
I need create 3 datasets with generating features (each row is still each student (id))
dataset1: feature generation: aggregate every 10 days, resulting in 30 attributes (day1-10, day11-20...)
dataset2: feature generation: aggregate every 20 days, resulting in 15 attributes (day1-20, day21-40...)
dataset3: feature generation: aggregate every 30 days, resulting in 10 attributes (day1-30, day31-60...)
I know I can use generate attribute operator then manually select day1 to day10, then day11 to day20...
but I want to know how to automatically generate these aggregated features?
Thank you!
I need create 3 datasets with generating features (each row is still each student (id))
dataset1: feature generation: aggregate every 10 days, resulting in 30 attributes (day1-10, day11-20...)
dataset2: feature generation: aggregate every 20 days, resulting in 15 attributes (day1-20, day21-40...)
dataset3: feature generation: aggregate every 30 days, resulting in 10 attributes (day1-30, day31-60...)
I know I can use generate attribute operator then manually select day1 to day10, then day11 to day20...
but I want to know how to automatically generate these aggregated features?
Thank you!
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949UnicornHi!
Here's an automatic solution. It transposes a copy of the data, so you have day1-dayN in rows. Then it processes these in batches using Loop Batches. You just enter the number of elements in a batch in thebatch sizeparameter. I tested with different values, it works with every setting >= 2.
Inside the batch, the process generates a macro for selecting the dayX attributes, generates a name like day1-dayN and executes Generate Aggregation with this regular expression based attribute filter.
Regards,
Balázs2
Answers
I know it is not an optimal method, but you can useGenerate Aggregationoperator and selectsubset.
In attached file, you can find a process...
Hope this helps,
Regards,
Lionel
the way you provided is a manual method, which I have already achieved.
I am wondering the automatic way becasue I could have 4800 attributes later on, and I would try every 10/20/30/40 days, as well as every 7/14/21/28/35 days. That would be a great workload if I do it manually...
but, still thank you for your help anyway!
I'm waiting for someone could give me some clues of the automatic ways.
This is awesome solution!!! it works very well!!!
Thank you so much!!!!
Btw, in your solution, you use the operator "recall" and the operator "remember changes", looks very interesting! I'll have to learn what are they and how they work
Thanks againBalazsBarany
Regards,
Cindy