How future predictions can be made with a Time Series model in RapidMiner?

luc_bartkowskiluc_bartkowski MemberPosts:46Maven
edited December 2018 inHelp

I guess this topic is the most asked question regarding RapidMiner Time Series Prediction. Some examples:

We all ask the same question.

We want to be able to do predictions for tomorrow, next week(s), next month(s), whatever the horizon and the dimension of time is.

Some have even asked the same question multiple times in their topic/post as if the question is not clear.

Therefore the following picture, it illustrates the question.

rmcomq.jpeg

How to:

  • Calculate the prediction on Oct 5 (black markup);
  • Using the "-0 attributes" from the Windowing operator (blue markup);
  • In order to predict (orange arrow) the unknown future Last value on Oct 5 (red markup);
  • In the same way the "-0 attributes" (brown markup) are used to calculate the predictions (yellow markup) in the train/validation/test example set;
  • But without being able to use the unknown future Last value (red markup) as a label (green markup)?

The only answer with a possible solution is from@Thomas_Ott:http://community.www.turtlecreekpls.com/t5/Getting-Started-Forum/Time-Series-Forecasting-for-Data/m-p/37315. His answer links to a XML RM-process inhttp://community.www.turtlecreekpls.com/t5/RapidMiner-Studio-Forum/Recall-Error/m-p/37302#U37302. That XML implements a complex process including manipulation of macros, multiple windowing operators in series, remember/recall and loop operators and even a "Materialize Data" operator to free-up memory in RapidMiner. The process is also based on the Yahoo Historical Data operator that unfortunately doesn't work anymore. I'm therefore not even sure if this process answers the question of this topic. Is there a more simple process/solution available to answer the question of this topic?

Thanks,

Luc

Best Answer

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven
    Solution Accepted

    Happy to do so Martin.

    To be honest: don't know much yet about ARIMA. Will watch some YouTube regarding ARIMA this weekend.

    But luckily RapidMiner offers an Optimization Parameters operator. ?

    So@tftemme这is the result:

    Oil Prediction.jpg

    And the model:

    OilPredictionModelARIMA.jpeg

    And the XML













    <参数键=“编码”值= "系统" / >






















    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="112" y="85">

    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">






















































    < connect from_op="Read Database (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
    < connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Date (8)" to_port="example set input"/>
    < connect from_op="Nominal to Date (8)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
    < connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>





    < connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>















































    <运营商激活= " true "类=“乘”兼容ibility="7.6.001" expanded="true" height="124" name="Multiply (3)" width="90" x="112" y="544"/>
    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="103" name="ARIMA Predict Last" width="90" x="246" y="442">


































    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="input 1" to_op="ARIMA Trainer" to_port="example set"/>
    < connect from_op="ARIMA Trainer" from_port="forecast model" to_op="Apply Forecast" to_port="forecast model"/>
    < connect from_op="ARIMA Trainer" from_port="performance" to_port="performance"/>
    < connect from_op="Apply Forecast" from_port="example set" to_port="result 1"/>
    < connect from_op="Apply Forecast" from_port="original" to_port="result 2"/>





















    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="in 1" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
    < connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="out 1"/>
    < connect from_op="Optimize Parameters (Evolutionary)" from_port="result 1" to_port="out 2"/>
    < connect from_op="ARIMA Trainer (6)" from_port="forecast model" to_op="Apply Forecast (6)" to_port="forecast model"/>











    <参数键= value =“oilLast和预测再保险gular"/>


    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="103" name="ARIMA Predict High" width="90" x="246" y="595">


































    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="input 1" to_op="ARIMA Trainer (4)" to_port="example set"/>
    < connect from_op="ARIMA Trainer (4)" from_port="forecast model" to_op="Apply Forecast (4)" to_port="forecast model"/>
    < connect from_op="ARIMA Trainer (4)" from_port="performance" to_port="performance"/>
    < connect from_op="Apply Forecast (4)" from_port="example set" to_port="result 1"/>
    < connect from_op="Apply Forecast (4)" from_port="original" to_port="result 2"/>





















    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="in 1" to_op="Optimize Parameters (2)" to_port="input 1"/>
    < connect from_op="Optimize Parameters (2)" from_port="performance" to_port="out 1"/>
    < connect from_op="Optimize Parameters (2)" from_port="result 1" to_port="out 2"/>
    < connect from_op="ARIMA Trainer (7)" from_port="forecast model" to_op="Apply Forecast (7)" to_port="forecast model"/>














    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="103" name="ARIMA Predict Low" width="90" x="246" y="748">


































    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="input 1" to_op="ARIMA Trainer (5)" to_port="example set"/>
    < connect from_op="ARIMA Trainer (5)" from_port="forecast model" to_op="Apply Forecast (5)" to_port="forecast model"/>
    < connect from_op="ARIMA Trainer (5)" from_port="performance" to_port="performance"/>
    < connect from_op="Apply Forecast (5)" from_port="example set" to_port="result 1"/>




















    Applying the ARIMA process to forecast the next 10 values of the time series

    < connect from_port="in 1" to_op="Optimize Parameters (3)" to_port="input 1"/>
    < connect from_op="Optimize Parameters (3)" from_port="performance" to_port="out 1"/>
    < connect from_op="Optimize Parameters (3)" from_port="result 1" to_port="out 2"/>
    < connect from_op="ARIMA Trainer (2)" from_port="forecast model" to_op="Apply Forecast (2)" to_port="forecast model"/>

























































































































































    < connect from_op="Get/Join Data" from_port="out 1" to_op="Sort" to_port="example set input"/>
    <连接from_op = "排序" from_port = "例子出发了put" to_op="Set Role" to_port="example set input"/>
    < connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    < connect from_op="Select Attributes" from_port="example set output" to_op="Filter Start of Trend" to_port="example set input"/>
    < connect from_op="Filter Start of Trend" from_port="example set output" to_op="Train until Hold-off" to_port="example set input"/>
    < connect from_op="Train until Hold-off" from_port="example set output" to_op="Multiply (3)" to_port="input"/>
    < connect from_op="Multiply (3)" from_port="output 1" to_op="ARIMA Predict Last" to_port="in 1"/>
    < connect from_op="Multiply (3)" from_port="output 2" to_op="ARIMA Predict High" to_port="in 1"/>
    < connect from_op="Multiply (3)" from_port="output 3" to_op="ARIMA Predict Low" to_port="in 1"/>
    < connect from_op="ARIMA Predict Last" from_port="out 1" to_port="result 3"/>
    < connect from_op="ARIMA Predict Last" from_port="out 2" to_op="Set Role (3)" to_port="example set input"/>
    < connect from_op="Set Role (3)" from_port="example set output" to_op="Filter Graph Last" to_port="example set input"/>
    < connect from_op="ARIMA Predict High" from_port="out 1" to_port="result 1"/>
    < connect from_op="ARIMA Predict High" from_port="out 2" to_op="Set Role (4)" to_port="example set input"/>
    < connect from_op="Set Role (4)" from_port="example set output" to_op="Filter Graph High" to_port="example set input"/>
    < connect from_op="ARIMA Predict Low" from_port="out 1" to_port="result 2"/>
    < connect from_op="ARIMA Predict Low" from_port="out 2" to_op="Set Role (5)" to_port="example set input"/>
    < connect from_op="Set Role (5)" from_port="example set output" to_op="Filter Graph Low" to_port="example set input"/>
    < connect from_op="Filter Graph Last" from_port="example set output" to_op="Join" to_port="left"/>
    < connect from_op="Filter Graph High" from_port="example set output" to_op="Join" to_port="right"/>
    < connect from_op="Filter Graph Low" from_port="example set output" to_op="Join (2)" to_port="right"/>
    < connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
    < connect from_op="Join (2)" from_port="join" to_op="Oil Forecast" to_port="example set input"/>
    < connect from_op="Oil Forecast" from_port="example set output" to_port="result 4"/>
    < connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>






    Process Configuration (training example set, horizon, cycles ARIMA optimization, prediction date)
    <描述一致= "中心"颜色=“绿色”彩色="true" height="166" resized="true" width="558" x="83" y="212">Select Time Series Scope
    Get source data
    Generate Future Predictions
    Reporting




    Really love RapidMiner.

    Have a nice weekend.

    Greetings,

    Luc

    sgenzer Thomas_Ott sunnyal

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    Hello@luc_bartkowski- thanks for this. I agree that this is a very frequent use case and also agree that it could be easier. A quick spoiler is that the Time Series Extension is undergoing a complete rebuild (see blog post from 2 weeks agoby@tftemme). That said, I think we can help here consolidate these threads and maybe turn this into a sample for the new extension?:)If so could you please post (repost?) that data set and we will work on this together.

    As for the Yahoo Historical Data issue, yes we have talked about this a lot in this forum. Numerous people have posted alternative solutions (see my KB article about Alpha Venture or posts about using Quandl). Meanwhile we are working on pushing out a more permanent, better solution.


    Scott

    Telcontar120
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Personally@sgenzerI am very much looking forward to the rebuilding of the time series extension and the addition of new operators to make things easier, or to fill in gaps in the current offering (R package "forecast", anyone?).

    But in the meantime@luc_bartkowskiyou may find that there is another sample process, which is heavily annotated, that might help you along your way. If you install the series extension, then when you open the "File>New Process" window of RapidMiner, you will be prompted with a series forecasting template, shown here (just scroll down until you see it). I think you will find it helpful.time series sample.PNG

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer
  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    Dear@Telcontar120,

    Thank you for your answer but the "Time Series Forecasting" template doesn't predict beyond the dates of the example set either.

    Greetings,

    Luc

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    Hello@sgenzer/ Scott,

    I've managed to reverse engineer the "loop" solution of@Thomas_Ottand build it into my own Times Series Prediction process.

    I am "close", but stil "no cigar". ? See the following pictures and the attached XML. The first picture shows my "standard" Time Series Forecasting train/validate/test process. The second picture zooms in on the Loop subprocess.

    These processes are based on theQuandl CME_CL1 Crude Oil Futures Continuous Contract 1 CL1 Front Monthdataset.

    Please note that I added "oil" in front of every attribute name. So attribute Open of this dataset has been renamed oilOpen.

    The same for all other attributes: oilDate, oilHigh, oilLow, oilLast, etc.

    The Loop subprocess generates an amount of future dates following the last date of the Test example set. The amount is equivalent to the horizon. But for some reason the Loop subprocess doesn't generate a new prediction(label) for every new (future) date. It copies the prediction(label) from the Remember/Recall operators (the last row of the Test example set) and adds this (as a constant) value to every new future date.

    It is my understanding that Thomas' Loop subprocess implementation generates a new prediction(label) using the model and puts its value in the attribute "Close". To my opinion the attribute Close doesn't exists in Thomas' Loop subprocess, it should be Close-0 to my humble opinion. So I don't know if this example process that I reused in my process is functioning properly either.

    Any help to get rid of this last flaw in my process model is appriciated.

    Thanks for the support.

    Luc

    oilTimeSeriesPrediction.jpeg

    oilTimeSeriesPrediction2.jpeg













    <参数键=“编码”值= "系统" / >













    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="112" y="85">

    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">






















































    < connect from_op="Read Database (2)" from_port="output" to_op="Store (11)" to_port="input"/>
    < connect from_op="Retrieve (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
    < connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Date (8)" to_port="example set input"/>
    < connect from_op="Nominal to Date (8)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
    < connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>





    < connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>









    <运营商激活= " true "类=“乘”兼容ibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="45" y="340"/>


















































    <运营商激活= " true " class = "系列:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="849" y="238">



































    < connect from_port="training" to_op="SVM" to_port="training set"/>
    < connect from_op="SVM" from_port="model" to_port="model"/>

    < portSpacing端口= " sink_model”间隔= " 0 " / >














    < connect from_port="model" to_op="Apply Model" to_port="model"/>
    < connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    < connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    < connect from_op="Performance" from_port="performance" to_port="averagable 1"/>

















































    <运营商激活= " true "类=“乘”兼容ibility="7.6.001" expanded="true" height="103" name="Multiply (2)" width="90" x="715" y="442"/>










    Calculate<br>amount of<br>rows of the<br>Windowed Test example set





    Set macro filter_range<br>to amount of rows in Test example set minus 1<br>(to obtain last row of the Test example set)





    Obtain the last row<br>in the Test example set






    Remember the<br>last row of Test example set incl. the last date to start<br>the loop for<br>predictions on future dates













    Recall the last row<br>of the Test example set to define structure of the example set that will be generated by the loop operator.<br/>It defines also the last Test date in order to generate new dates.










    Generate n future dates (one by one each loop) adjecent to<br>the last date of the Test example set. n = %{ PredictionHorizon}















    Set the role of the prediction(label) to<br/>regular



    <参数键= "属性" value = "预测(标签)"/>










    Select the prediction(label)
















    Replace oilLast-0 value,<br>using backreference to the previous operator &quot;$1-&quot;, by the prediction(label) value




    Clean-up<br/>memory to get a clean example set

    < connect from_port="input 1" to_op="Apply Model (3)" to_port="model"/>
    < connect from_op="Recall" from_port="result" to_op="Apply Model (3)" to_port="unlabelled data"/>
    < connect from_op="Apply Model (3)" from_port="labelled data" to_op="Generate Attributes" to_port="example set input"/>
    < connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
    < connect from_op="Set Role (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
    < connect from_op="Select Attributes (3)" from_port="example set output" to_op="Replace" to_port="example set input"/>
    < connect from_op="Replace" from_port="example set output" to_op="Materialize Data (2)" to_port="example set input"/>
    < connect from_op="Materialize Data (2)" from_port="example set output" to_port="output 1"/>





    Generate in each loop a new future date and apply model on that date





    Append each result from loop to the future prediction example set

    < connect from_op="Get/Join Data" from_port="out 1" to_op="Sort" to_port="example set input"/>
    <连接from_op = "排序" from_port = "例子出发了put" to_op="Multiply" to_port="input"/>
    < connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
    < connect from_op="Multiply" from_port="output 2" to_op="Set Role (2)" to_port="example set input"/>
    < connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    < connect from_op="Select Attributes" from_port="example set output" to_op="Filter Start of Trend" to_port="example set input"/>
    < connect from_op="Filter Start of Trend" from_port="example set output" to_op="Train until Hold-off" to_port="example set input"/>
    < connect from_op="Train until Hold-off" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    < connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
    < connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    < connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
    <连接from_op = "设置角色(2)“from_port = "的例子set output" to_op="Select Attributes (4)" to_port="example set input"/>
    < connect from_op="Select Attributes (4)" from_port="example set output" to_op="Filter Hold-off to Test" to_port="example set input"/>
    < connect from_op="Filter Hold-off to Test" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    < connect from_op="Windowing (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
    < connect from_op="Multiply (2)" from_port="output 1" to_op="Apply Model (2)" to_port="unlabelled data"/>
    < connect from_op="Multiply (2)" from_port="output 2" to_op="Extract Macro" to_port="example set"/>
    < connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
    < connect from_op="Apply Model (2)" from_port="model" to_op="Loop" to_port="input 1"/>
    < connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
    < connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
    < connect from_op="Filter Example Range" from_port="example set output" to_op="Remember" to_port="store"/>
    < connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
    < connect from_op="Append" from_port="merged set" to_port="result 3"/>





    Process Configuration (training example set, horizon, window, holdoff example set)
    <描述一致= "中心"颜色=“绿色”彩色="true" height="185" resized="true" width="947" x="83" y="199">Train / Validate the Time Series Model
    Test the Time Series Model
    Get source data
    Generate Future Predictions



  • Thomas_OttThomas_Ott RapidMiner注册分析师RapidMiner认证Expert, MemberPosts:1,761Unicorn

    I went back a while after that original process was posted and fixed it because it wasn't generating the closing values per day correctly. I have to look for it on my other machine.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    Hi@luc_bartkowski- OK I spent some time looking at your process. Maybe I'm missing something but where you are "testing" the model you are actually forecasting forward. The output of that Apply Model operator is showing you 10-day-forward predictions of oilLast. Right?

    Screen Shot 2017-10-04 at 10.38.30 AM.png





















    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="112" y="85">

    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">


































    < connect from_op="Read Database (2)" from_port="output" to_op="Store (11)" to_port="input"/>
    <连接from_op = "检索CHRIS-CME_CL1”from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
    < connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
    < connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>





    < connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>








    <运营商激活= " true "类=“乘”兼容ibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="45" y="340"/>





























    <运营商激活= " true " class = "系列:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="849" y="238">





    < connect from_port="training" to_op="SVM" to_port="training set"/>
    < connect from_op="SVM" from_port="model" to_port="model"/>

    < portSpacing端口= " sink_model”间隔= " 0 " / >











    < connect from_port="model" to_op="Apply Model" to_port="model"/>
    < connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    < connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    < connect from_op="Performance" from_port="performance" to_port="averagable 1"/>

































    < connect from_op="Get/Join Data" from_port="out 1" to_op="Sort" to_port="example set input"/>
    <连接from_op = "排序" from_port = "例子出发了put" to_op="Multiply" to_port="input"/>
    < connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
    < connect from_op="Multiply" from_port="output 2" to_op="Set Role (2)" to_port="example set input"/>
    < connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    < connect from_op="Select Attributes" from_port="example set output" to_op="Filter Start of Trend" to_port="example set input"/>
    < connect from_op="Filter Start of Trend" from_port="example set output" to_op="Train until Hold-off" to_port="example set input"/>
    < connect from_op="Train until Hold-off" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    < connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
    < connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    < connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
    <连接from_op = "设置角色(2)“from_port = "的例子set output" to_op="Select Attributes (4)" to_port="example set input"/>
    < connect from_op="Select Attributes (4)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    < connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
    < connect from_op="Apply Model (2)" from_port="labelled data" to_op="Sort (2)" to_port="example set input"/>
    < connect from_op="Sort (2)" from_port="example set output" to_port="result 2"/>




    Process Configuration (training example set, horizon, window, holdoff example set)
    <描述一致= "中心"颜色=“绿色”彩色="true" height="185" resized="true" width="947" x="83" y="199">Train / Validate the Time Series Model
    Get source data



    Scott

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    My model has a process parameter (top right) which sets the horizon.

    So I can play around with different horizon options.

    This horizon is used in the training/validation process, and also in the test process.

    I want to use the same horizon for future predictions.

    So yes, if the horizon is set to 10 then I want to forecast the Last value of Oct 8, taking into account that the last date in the training/validate/test example set is Sep 28.

    I suspect that the example model of Thomas is working only on horizon = 1. I therefore have altered my model.

    My altered model selects the last n values from the test example set and puts it in a "Loop Examples" subprocess.

    So the subprocess in "Loop examples" get the values to calculate the prediction(label) for future oilDates.

    In the "Loop examples" subprocess I have also managed to alter de Date e.g. oilDate N days ahead. N=horizon again.

    But then I'm stuck, don't know what to do / which operators to use, to get the desired predictions for future dates.

    Please find the altered model in the following XML.

















    Replace oilLast-0 value,<br>using backreference to the previous operator &quot;$1-&quot;, by the prediction(label) value


  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    @Thomas_Ott

    Dear Thomas, I agree.

    I realized that myself also. You solved another problem, independant of the process.

    But still, it was the only template for a solution. I was happy to find any template for a solution regarding the topic.

    Dispite all information, toturials, tempates, blogs, videos on the web regarding Time Series Forecasting with RapidMiner you pointed me to a possible solution. Because of your posts and I thank you for that. Again, you're doing a great job, learned a lot from you, thank you.

    Best regards,

    Luc

    sgenzer
  • Thomas_OttThomas_Ott RapidMiner注册分析师RapidMiner认证Expert, MemberPosts:1,761Unicorn

    @luc_bartkowskiThank you for your kind words. I have a bunch of time series processes tha I should just organize and repost. They are super important because Time Series in RapidMiner is not very organized (as of yet) but the development team and Community have made progress.

    sgenzer luc_bartkowski
  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    Hi Scott,

    I know why you are able to predict until Oct 3rd in your picture. That is because you downloaded the Quandl source data yesterday on Oct 4th.

    Your source data includes values for oilOpen, etc. on Oct 3rd. That is the reason Oct 3rd, including a valuable prediction, is visible in your picture. But your picture doesn't show predictions beyond Oct 3rd, whatever your horizon is.

    I'm sorry but I therefore cannot hit the "Solved" button on this topic.

    I'm beginning to suspect that the problem, addressed in this topic, is:

    The "Apply model" operator (also) always needs a Label to calculate predictions.

    Because such Label is not available for Future dates, the "Apply model" operator will never be able to calculate predictions for future dates.

    To illustrate this conclusion take again a look to my first picture in this post. In order to generate this picture I added future dates with fake values ("0" e.g. zero) for all attributes beyond Sep 28, including zero values for the Labels (oilLast) on Sep 29 to Oct 5. The "Apply model" operator uses these future (fake) Labels to predict on these future dates. Therefore all predictions beyond Sep 28 have a value of 7.667, based on a Label with a "0" e.g. zero value for these future dates. As stated before: I suspect that "Apply model" always needs a valuable Label in order to predict.

    Either that is the explaination of the problem addressed in this topic or my implementation of "Apply model" is not correct.

    If the latter is the case, please send as a reply an example model in XML that implements an "Apply model" operator that will predict beyond the scope of a source data set.

    Best regards,

    Luc

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    OK I spent some time on this. Let me know what you think.









    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="112" y="85">

    <运营商激活= " true "类= com“子流程”patibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">

































    < connect from_op="Read Database (2)" from_port="output" to_op="Store (11)" to_port="input"/>
    <连接from_op = "检索CHRIS-CME_CL1”from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
    < connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
    < connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>





    < connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>

























































    <运营商激活= " true " class = "系列:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="849" y="238">





    < connect from_port="training" to_op="SVM" to_port="training set"/>
    < connect from_op="SVM" from_port="model" to_port="model"/>

    < portSpacing端口= " sink_model”间隔= " 0 " / >











    < connect from_port="model" to_op="Apply Model" to_port="model"/>
    < connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    < connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    < connect from_op="Performance" from_port="performance" to_port="averagable 1"/>






















































    < connect from_op="Get/Join Data" from_port="out 1" to_op="Sort" to_port="example set input"/>
    <连接from_op = "排序" from_port = "例子出发了put" to_op="Analyze From Date" to_port="through 1"/>
    < connect from_op="Analyze From Date" from_port="through 1" to_op="Training To Date" to_port="through 1"/>
    < connect from_op="Training To Date" from_port="through 1" to_op="Prediction Horizon" to_port="through 1"/>
    < connect from_op="Prediction Horizon" from_port="through 1" to_op="Filter Analysis Data" to_port="example set input"/>
    < connect from_op="Filter Analysis Data" from_port="example set output" to_op="Split Data" to_port="example set"/>
    < connect from_op="Split Data" from_port="partition 1" to_op="Set Role" to_port="example set input"/>
    < connect from_op="Split Data" from_port="partition 2" to_op="Set Role (3)" to_port="example set input"/>
    < connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    < connect from_op="Select Attributes" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    < connect from_op="Windowing" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
    < connect from_op="Rename by Replacing" from_port="example set output" to_op="Rename" to_port="example set input"/>
    < connect from_op="Rename" from_port="example set output" to_op="Validation" to_port="training"/>
    < connect from_op="Validation" from_port="model" to_op="Apply Model (3)" to_port="model"/>
    < connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
    < connect from_op="Set Role (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
    < connect from_op="Select Attributes (3)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    < connect from_op="Windowing (2)" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
    < connect from_op="Windowing (2)" from_port="original" to_op="Apply Model (2)" to_port="unlabelled data"/>
    < connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Rename (3)" to_port="example set input"/>
    < connect from_op="Rename (3)" from_port="example set output" to_op="Apply Model (3)" to_port="unlabelled data"/>
    < connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
    < connect from_op="Apply Model (3)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    < connect from_op="Apply Model (2)" from_port="labelled data" to_op="Join" to_port="right"/>
    < connect from_op="Performance (3)" from_port="performance" to_port="result 2"/>
    < connect from_op="Performance (3)" from_port="example set" to_op="Select Attributes (4)" to_port="example set input"/>
    < connect from_op="Select Attributes (4)" from_port="example set output" to_op="Join" to_port="left"/>
    < connect from_op="Join" from_port="join" to_op="Sort (2)" to_port="example set input"/>
    < connect from_op="Sort (2)" from_port="example set output" to_port="result 3"/>





    Process Configuration (training example set, horizon, window, holdoff example set)
    <描述一致= "中心"颜色=“绿色”彩色="true" height="205" resized="true" width="1074" x="83" y="199">Train / Validate the Time Series Model
    Get source data
    Test Model
    Forecasting



    Scott

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    我做了一些额外的测试。

    The model that Thomas repaired regarding Remember/Recall was based on an implementation of shifting dates.

    I noticed that RapidMiner uses global implementations of java variables. Macro's aren't storage locations, the're pointers to global variables.

    Using that knowledge I changed the dates elsewhere, in front of the Loop operator. These values won't change in Loop because the're global.

    I included the results in the next picture. One can change dates, shift that date forwards, backwards, anywhere. The pictures are based on a horizon of 10 days. The Apply model will just use these dates as an ID. I noticed that in my previous topic. But the "Apply model" operator doesn't predict beyond the scope of the source data set, whatever the date of those examples are or will be whitin the process.

    Because of the absence of a future Label in the scope of the source data set.

    preddatemin.jpegShifting dates 10 days backwards, Sep 28 becomes Sep 18

    preddateplus.jpegSept 28 becomes Oct 8Example of a glo

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    Well, the only thing I did in this XML process is to change the source date to my MySQL based examples sets.

    As you know that source example set has values untill Sep 28.

    These are the results. See the following pictures.

    The only result example set in this process is provided by the operator Sort (2).

    The scope of that prediction does not go beyond the source data set, in my source data set Sep 28.

    predictionssort(2).jpegSort (2) result example set.

    predictionscotttest.jpeg

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    Hello. So I guess by your posts that you did run the process I built. The predictions for 10 days forwardarethere in that screenshot - they are just not in new rows. If you look at the column labeled "prediction(10 days forward)", that column represents the predicted price of oilLast 10 days AFTER the date listed in oilDate. So for example, on September 15, prediction(10 days forward)=50.077. Hence this is the prediction for oilLast for 10 days after September 15. By my calculations, this is not Sept 25 because these prices are only listed 5 out of 7 days per week. Hence this is showing that oilLast, according to this model, will be 50.077 on Sept 29 and so on...

    Oct 12: 52.473

    Oct 11: 52.258

    Oct 10: 52.336

    Oct 9: 51.892

    Oct 6: 50.454

    Oct 5: 50.648

    Oct 4: 49.682

    Oct 3: 49.479

    Oct 2: 49.606

    Sept 29: 50.077

    这就是为什么你没有看到价值在“10天间rd" column there - it has not happened yet in your data set. Yes I could have spent some time moving all that around so that it actually looks like what I typed above...:)

    Scott

  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    Solved with ARIMA Trainer & Apply Forecast.

    Thank you for your support. ?

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,417RM Data Scientist

    Dear@luc_bartkowski,

    if you have any feedback on the ARIMA operators, please post it here with@tftemmein "CC". We are happy for any feedback on this extension which is work in progress.

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven
  • Thomas_OttThomas_Ott RapidMiner注册分析师RapidMiner认证Expert, MemberPosts:1,761Unicorn

    Very nice@luc_bartkowski!

    sgenzer luc_bartkowski
  • luc_bartkowskiluc_bartkowski MemberPosts:46Maven

    The nice thing about prediction operators like svm and neural nets is that they are multivariable.

    In stock trading terms: Amplitudes of the Moving Average and trading volumes have probably a corrolation.

    ARIMA is univariable but the only operator able to predict a real future.

    What I am going to do to enable multivariable future predictions is:

    To feed the multivariable prediction operator with real multivariable data and adjectently all of their univariable related predictions, the prediction output of an ARIMA model. I will train that model with real data. Yes, therefore I have to wait untill the future is past and I have obtained the labels to train to. Yes, I know, the resulting prediction will have a lag. The label data cannot be newer than now(). We all don't have real multivariable data from the future. But one can optimize a prediction.

    What happens if q,d,p used in ARIMA change? Well, I guess that the multivariable prediction operator will get improved data to train its model untill now() with training data for the future minus the horizon. It is and will be always the future we want to predict.We have to make a guess. We ask therefore ARIMA a prediction, it's ARIMA's best guess. The multivariable prediction operator will train on it with a target label until now() aka prediction horizon minus horizon.

Sign InorRegisterto comment.