"time series attribute selection dimensionality problem"

韦塞尔韦塞尔 MemberPosts:537Guru
edited May 2019 inHelp
Dear All,

I wish to use CFSFeatureSetEvaluator to remove a lot of irrelevant attributes.
Because I have a dataset of more then 20 attributes, and I'm using a MultivariateSeries2WindowExamples with window size 96,
I end up with 20 * 96 windowed attributes.

problem:
CFSFeatureSetEvaluator can not handle so many attributes.

solution?
Apply CFS 20 times, to all windowed examples of the same type.
So for example on all attributes with name attribute_one-.*
Then do this again for attributes with name attribute_two.*


I been trying out different xml set-ups, but I don't want to post them just yet, because it might be confusing..


Thanks in advance,

Regards,

Wessel
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Wessel,
    that's probably a fine process building work, but I doubt if it yields very good results. As far as I know, the CFS works on correlations? Then probably attributes from near timepoints are highly correlated and removed.
    At least you should use the WindowExamples2ModelingData in order to transform your data into relative changes instead of the absolute values. This is always worth a try on series prediction.
    But I personally prefer using learning algorithms and XValidation in order to evaluate a feature subset instead of heuristics...

    Generally you have to consider if removing an attribute reflecting the value x-days before is of much use. Because if the day -6 is important, this value is day - 7 the next example...


    Greetings,
    Sebastian
  • 韦塞尔韦塞尔 MemberPosts:537Guru
    all attributes from near time points, attribute_name-[1...24] are already removed, since I'm doing 24 hours ahead prediction.

    CFS does yield good results, when I do it by hand.
    Problem: I don't understand how to automate it in Rapid-Miner.

    WindowExamples2ModelingData,是的好主意.
    But I want to try CFS first:(

    Regards,

    Wessel
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Wessel,
    I think this problem can be solved using the ForwardSelectionOperator, but I assume it's not included in 4.4 and will become part of the upcoming 4.5.
    So solving for this problem, you could use the AttributeSubsetPreprocessing. I will post a process below.








    <参数键= " label_attribute " value = " att1 " / >







































    Greetings,
    Sebastian
  • 韦塞尔韦塞尔 MemberPosts:537Guru
    Hey Land,

    I got your parameter iteration to work when attributes are really nicely named,
    but I can't practically implement it on my problem.

    I uploaded my dataset here:
    http://student.science.uva.nl/~wluijben/workfile.csv

    I tried to make my xml file as nice as I possibly could.
    It makes a prediction for wind, using a very small window size of 49 hours, with 23 horizon attributes removed.
    If I want to make the window size bigger, attribute selection gets really really slow!:(

    Any suggestions?
    Maybe your previous derivative + smoothing to reduce the number of attributes?

    Regards,

    Wessel

    Current output:
    absolute_error: 4.095 +/- 3.096 (mikro: 4.095 +/- 3.096)

    Weights:
    wk1_kn-27 1.0
    wk1_kn-26 1.0
    wk1_kn-25 1.0
    wk1_kn-24 1.0
    wind-24 1.0
    dampdruk-48 1.0
    dampdruk-44 1.0
    gewasverdamping-46 1.0
    gewasverdamping-45 1.0
    gewasverdamping-44 1.0
    gewasverdamping-35 1.0
    systime_kn_week 1.0
    systime_kn_month 1.0




































































































  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    the growing slowness probably results from an algorithm with quadratic runtime in the number of attributes...Unfortunately you cannot do anything about that beside buying a faster computer...

    Greetings,
    Sebastian
  • 韦塞尔韦塞尔 MemberPosts:537Guru
    @ AttributeSubsetPreprocessing
    Ehm, or be more smart? :P
    Is there any way I use AttributeSubsetPreprocessing to take the first n attributes?
    Or split the number of attributes into n subsets?


    Can I save my attribute selection from the last go?
    So I don't have to run it every time?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    hmm, it is not designed to do this, but you might use regular expressions for specifiying the attributes. Perhabs this already suits your needs?

    Hmm, you could save the resulting example set and lateron merge all of the sets. Just use the exampleSetWriter inside the loop. If you add the predefined macro %{a} into the filename, it will be replaced with the number of application of the current operator. That's the way you can avoid overwritting previous results in loops.

    Greetings,
    Sebastian
Sign InorRegisterto comment.