"time series attribute selection dimensionality problem"

韦塞尔 · June 2009

Dear All,

I wish to use CFSFeatureSetEvaluator to remove a lot of irrelevant attributes.
Because I have a dataset of more then 20 attributes, and I'm using a MultivariateSeries2WindowExamples with window size 96,
I end up with 20 * 96 windowed attributes.

problem:
CFSFeatureSetEvaluator can not handle so many attributes.

solution?
Apply CFS 20 times, to all windowed examples of the same type.
So for example on all attributes with name attribute_one-.*
Then do this again for attributes with name attribute_two.*

I been trying out different xml set-ups, but I don't want to post them just yet, because it might be confusing..

Thanks in advance,

Regards,

Wessel

land · June 2009

Hi Wessel,
that's probably a fine process building work, but I doubt if it yields very good results. As far as I know, the CFS works on correlations? Then probably attributes from near timepoints are highly correlated and removed.
At least you should use the WindowExamples2ModelingData in order to transform your data into relative changes instead of the absolute values. This is always worth a try on series prediction.
But I personally prefer using learning algorithms and XValidation in order to evaluate a feature subset instead of heuristics...

Generally you have to consider if removing an attribute reflecting the value x-days before is of much use. Because if the day -6 is important, this value is day - 7 the next example...

Greetings,
Sebastian

韦塞尔 · June 2009

all attributes from near time points, attribute_name-[1...24] are already removed, since I'm doing 24 hours ahead prediction.

CFS does yield good results, when I do it by hand.
Problem: I don't understand how to automate it in Rapid-Miner.

WindowExamples2ModelingData,是的好主意.
But I want to try CFS first

Regards,

Wessel

land · June 2009

Hi Wessel,
I think this problem can be solved using the ForwardSelectionOperator, but I assume it's not included in 4.4 and will become part of the upcoming 4.5.
So solving for this problem, you could use the AttributeSubsetPreprocessing. I will post a process below.









<参数键= " label_attribute " value = " att1 " / >

Greetings,
Sebastian

韦塞尔 · July 2009

Hey Land,

I got your parameter iteration to work when attributes are really nicely named,
but I can't practically implement it on my problem.

I uploaded my dataset here:
http://student.science.uva.nl/~wluijben/workfile.csv

I tried to make my xml file as nice as I possibly could.
It makes a prediction for wind, using a very small window size of 49 hours, with 23 horizon attributes removed.
If I want to make the window size bigger, attribute selection gets really really slow!

Any suggestions?
Maybe your previous derivative + smoothing to reduce the number of attributes?

Regards,

Wessel

Current output:
absolute_error: 4.095 +/- 3.096 (mikro: 4.095 +/- 3.096)

Weights:
wk1_kn-27 1.0
wk1_kn-26 1.0
wk1_kn-25 1.0
wk1_kn-24 1.0
wind-24 1.0
dampdruk-48 1.0
dampdruk-44 1.0
gewasverdamping-46 1.0
gewasverdamping-45 1.0
gewasverdamping-44 1.0
gewasverdamping-35 1.0
systime_kn_week 1.0
systime_kn_month 1.0

land · July 2009

Hi,
the growing slowness probably results from an algorithm with quadratic runtime in the number of attributes...Unfortunately you cannot do anything about that beside buying a faster computer...

Greetings,
Sebastian

韦塞尔 · July 2009

@ AttributeSubsetPreprocessing
Ehm, or be more smart? :P
Is there any way I use AttributeSubsetPreprocessing to take the first n attributes?
Or split the number of attributes into n subsets?

Can I save my attribute selection from the last go?
So I don't have to run it every time?

land · July 2009

Hi,
hmm, it is not designed to do this, but you might use regular expressions for specifiying the attributes. Perhabs this already suits your needs?

Hmm, you could save the resulting example set and lateron merge all of the sets. Just use the exampleSetWriter inside the loop. If you add the predefined macro %{a} into the filename, it will be replaced with the number of application of the current operator. That's the way you can avoid overwritting previous results in loops.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"time series attribute selection dimensionality problem"

Answers