Moving Average operator behavior / settings

kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn
edited December 2018 inProduct Feedback - Resolved

Hi, I have come over unexpected behavior of Moving Average operator from series extension.

By default, it creates a new attribute which is the result of moving average (or another chosen function) calculation. With settings like this, for example:

Screenshot 2018-08-23 10.51.57.png

it creates new attribute with name 'average(sum7)'.

Fact is, that this default name prevents me from chaining this operator multiple times, for example if I needed to calculate both 7-days and 30-days moving average, this process won't work:

Screenshot 2018-08-23 10.51.00.png

because second Moving Average 30 tries to create a new attribute with the exactly same name as the first one (Moving Average 7) already has created.

I have either to multiply initial attribute which I am aggregating in order to get its copy under another name, or rename a new one after first Moving Average. Not critical, but still one excessive step in the process.

Is there anything that prevents having a setting that would allow us to choose the default name of an attribute created, and not having it named in a default way that cannot be changed?

Thanks.

Tagged:
0
0 votes

Fixed and Released·Last Updated

Resolved with 9.3. Please post new comments if needed.

Comments

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research

    Hi@kypexin,

    Concerning the operator from the series extension I unfortunately don't know if we can add your supposed functionality, I suggest to open this for voting. But if you only want to calculate the moving average (so no other aggregation function), you may want to have a look at the Moving Average Filter operator from the new time series extension (which is bundled with the core since 9.0). Not only that you can apply the moving average (select simple filter type) on multi attributes at once, it also gives you a parameter with a default prefix for the new attributes.

    Best regards and hopes this helps

    Fabian

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn

    HI@tftemme

    Thanks for answering! I didn't know this has become a part of RM core, though seems that this filter has slightly less settings and possibilities compared to an operator from extension. I will try to evaluate, however.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research

    H@kypexin,

    Yes the new time series operators are still in development and are not yet replacing the old series extension. I would also love to hear what functionality you are missing. In fact I am developing the new time series operators and enjoy getting feedback ;-)

    Best regards
    Fabian

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn

    Hi@tftemme

    I would say, 'result position' setting from old version of operator is a useful thing, because it gives flexibility for different business tasks.

    'Aggregation function' is interesting, but I haven't come across any use case where I would need anything other than 'average' function.

    Lastly, I am concerned with difference produced by both versions of an operator. I have compared 30-days moving averages on the same dataset an dthe results are different.

    For some reason this part of forum does not allow to post photos (??). So I will send you a DM instead.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    @kypexin,

    there are quite some scenarios where i prefer mode over average, because it's good against outliers.

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research

    Hi@kypexin,

    I will answer in this thread again, so that also other can read it. Strange that you cannot post photos.@sgenzerany idea?

    Yeah the result position and aggregation function we could consider.

    Concerning the differences between the two operators. I think the parameters of the new operator are maybe a bit misleading. The filter size is not the same as the window with parameter, but rather the window size of the new operator is 2*filter size+ 1, to ensure that it is symmetric. With a symmetric filter you have a clear defined middle position, which I use to put the result at this position. I think I should loosen this condition and change it. So the reason that the "new" moving average for yourself is more flatten is just that it is a larger filter. When you compare for example a 7 window width Moving Average (series) and a 3 filter size Moving Average Filter (time_series) you see that the values calculated are the same. Also I realized that the position of the calculated moving average (series) is not at the center, even if center is selected. Seems to be a bug there too. So the results are shifted by one Example.

    The reason that the Moving Average Filter (time_series) does not reach the end of a series is, that the values are calculated for the center position and they are not defined at the beginning and end of a series (cause then the window is reaching out of the range of the series).

    Here is the process I used to compare both operators.







    <运营商激活= " true " class = "process" compatibility="9.0.001" expanded="true" name="Process">

    <运营商激活= " true " class = "retrieve" compatibility="9.0.001" expanded="true" height="68" name="Retrieve Lake Huron" width="90" x="112" y="85">


    <运营商激活= " true " class = "time_series:moving_average_filter" compatibility="9.0.002-SNAPSHOT" expanded="true" height="68" name="Moving Average Filter" width="90" x="246" y="85">
    <参数键= value =“attribute_filter_type唱歌le"/>




    <运营商激活= " true " class = "series:moving_average" compatibility="7.4.000" expanded="true" height="82" name="Moving Average" width="90" x="380" y="85">













Sign InorRegisterto comment.