Rolling features (Rolling mean, max, min, sum, ...) would be nice

fryasdffryasdf MemberPosts:7Contributor II
edited December 2018 inProduct Feedback - Resolved

每当我做数据科学并试图predi乐鱼平台进入ct a target variable I almost certainly include the past of the target variable. For example: When predicting how long some process will take then I would almost always include 'how long does it usually take' as feature or even as a baseline model. One can compute this 'how long does it usually take' in different ways. For example: For every different process one could just take the average over the whole training set. However, this could be a bad idea due to the fact that the length of the process may depend on seasonalities or other mechanisms in the training data. That is why I prefer rolling window functions to do so, i.e.

rollingMean((15,17,12,11,19,25,27,30,28), 3) would be something like (14.66667, 13.33333, 14.00000, 18.33333, 23.66667, 27.33333, 28.33333, 29.00000)

This is not yet at all included in RM although it is a rather common thing to do in the DS business.

BalazsBarany lion winkmar dkloke
4
4 votes

Fixed and Released·Last Updated

Moving Window operators now in Time Series

Comments

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn

    Hi,

    this feature does already exist in RapidMiner. If you install the free Series extension, there's a moving average operator that does exactly what you want. It aggregates over a fixed window length and moving this window over the dataset. You can select the usual aggregation functions, so you can also compute the standard deviation of a window, which can also be helpful.

    Greetings,

    Sebastian

  • m_okem_oke MemberPosts:11Contributor I

    @land, If I may ask a question as an extension to@fryasdf's question on rolling features.

    Thank you for the moving average operator you pointed out. It does not solve all my problems though. I would still like to do the following:

    1) I want to be able to limit the recaluclation of the moving average to an index, say id. Take for instance (for a window of 2), the operator currently does

    id... x... MovingAverage...

    1... 3... ?

    1... 2... 2.5

    1... 4... 3

    2... 1... 2.5

    2... 3... 2

    but what I want is:

    id... x... MovingAverage...

    1... 3... ?

    1... 2... 2.5

    1... 4... 3

    2... 1... ?

    2... 3... 2

    2) I want to be able to tell the operator to use the corresponding x value when the operator has a blank cell. In my above example, I would want the result to finally look like:

    id... x... MovingAverage...

    1... 3... 3...

    1... 2... 2.5...

    1... 4... 3...

    2... 1... 1...

    2... 3... 2...

    Can you help?

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn

    Hi,

    sure, this is our everyday work...

    1) Put the current process into a Loop Groups Operator of the Jackhammer Extension. Select the id as attribute in the Loop Groups operator, so that it processes all rows with the same values at one time in its subprocess. Append the result again

    2) Simply Use a Generate Attributes Operator afterwards where you test with if(missing([average(x)]),x,[average(x)])

    Hope that helps! If you have such problems more often, you might want to consider Old World Computing's support services ;-)

    Greetings,

    Sebastian

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    Time Series Extension

  • m_okem_oke MemberPosts:11Contributor I

    Great news!

Sign InorRegisterto comment.