"Sliding Window Validation - What Model?"

B_MinerB_Miner MemberPosts:72Maven
edited May 2019 inHelp
Hi All,

I will admit I am perplexed by the sliding window validation process (what it does and the parameters). In trying to understand it, the first question is what model is actually fit at the end? Is it the one using the most recent records (with the number of said records depending on the settings in the operator)?
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    do you mean, on which data the model is fitted that will be delivered at the mod port?

    With kind regards,
    Sebastian Land
  • B_MinerB_Miner MemberPosts:72Maven
    Hi Sebastian,

    Yes, that is what I mean. What is that final model - is it fit using the last k records, where k is set in the parameters as the window?
  • dcubeddcubed MemberPosts:6Contributor II
    Hi All,
    I had the same question and couldn't find an answer.
    What model is delivered at the mod port? If a model is returned, what is it's value for future data?

    My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right? Thus, no single model returned at the port will be of value.

    我是clearly confused. Please help.

    Thank you
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    dcubed wrote:

    My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right?
    Exactly.
    Thus, no single model returned at the port will be of value.
    That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.

    Best, Marius
  • dcubeddcubed MemberPosts:6Contributor II

    That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.
    The model thus returned is therefore different from all prior models in that the data used to train it is all the data in the data set not just the data in any of the prior training windows?

    Stated differently, if I have 1000 rows with a training window of 50 validated on the next row, I will have gone through 949 models each with 50 rows of data for training. The model returned, however, will be trained on 1000 rows?

    If the reason I am training on 50 rows to predict the next is because the process generating the rows is not stationary, does it not follow that the final model trained on 1000 rows will be of little value in predicting the 1001 row?
  • haddockhaddock MemberPosts:849Maven
    Hi Dcubed,

    I remember having exactly this exchange with Ingo a year or so ago right here; I was using SVMs to make short term forecasts in foreign exchange markets, and optimised the look-back and prediction horizon sizes in a sliding window validation. The performance figures were fine, as you would expect, but I had to store the model at every iteration within the validation , just to get the last one. Yes, wasteful of course, yes easily fixable, that's the wonder of open source!

    What I, like you, never worked out was the correct scenario for using a model built on all the examples of a concept drift.

    Happy days!
Sign InorRegisterto comment.