Feature Selection - Backward X Val

B_Miner · January 2010

Hi guys-

I am running a feature selection. I included the direct mail generated data set to be replicatable.

As I have it configured, I was under the impression that the backward algorithm of FS should:

1) start with all 'p' predictors, and use 10 fold x-validation to get an accuracy figure.
2) drop the least important predictor and use 10 fold x-validation to get an accuracy figure using the p-1 predictors.
3) continue until down to 1 predictor or unless a stopping criteria is reached (limit generations without improval is checked).

Looking at the process log, this is not the case. It also seems that not the full 10 folds are being done.

Thanks!


















<p> Transformations of the attribute space may ease learning in a way, that simple learning schemes may be able to learn complex functions. This is the basic idea of the kernel trick. But even without kernel based learning schemes the transformation of feature space may be necessary to reach good learning results. </p> <p> RapidMiner offers several different feature selection, construction, and extraction methods. This selection process (the well known forward selection) uses an inner cross validation for performance estimation. This building block serves as fitness evaluation for all candidate feature sets. Since the performance of a certain learning scheme is taken into account we refer to processes of this type as &quot;wrapper approaches&quot;.</p> <p>Additionally the process log operator plots intermediate results. You can inspect them online in the Results tab. Please refer to the visualization sample processes or the RapidMiner tutorial for further details.</p> <p> Try the following: <ul> <li>Start the process and change to &quot;Result&quot; view. There can be a plot selected. Plot the &quot;performance&quot; against the &quot;generation&quot; of the feature selection operator.</li> <li>Select the feature selection operator in the tree view. Change the search directory from forward (forward selection) to backward (backward elimination). Restart the process. All features will be selected.</li> <li>Select the feature selection operator. Right click to open the context menu and repace the operator by another feature selection scheme (for example a genetic algorithm).</li> <li>Have a look at the list of the process log operator. Every time it is applied it collects the specified data. Please refer to the RapidMiner Tutorial for further explanations. After changing the feature selection operator to the genetic algorithm approach, you have to specify the correct values. <table><tr><td><icon>groups/24/visualization</icon></td><td><i>Use the process log operator to log values online.</i></td></tr></table> </li> </ul> </p>

B_Miner · January 2010

OK, I think I found that I was using a depreciated operator (FS)? On these posts it is hard to know (I copied one I found).

So... here is the new code. The log operator tracking the inner process (there are 10 performance metrics for each of the 10 folds per generation) seems to work ok, but the outer one, which I was hoping to track the average of the x-fold validation gives multiple (and a declining number of them) performance metrics per generation (should there not just be one metric - the average - output for each generation)?

另外,做所有这些特征选择过程公关oduce a 1 or 0 (where 1 means retain)? Or is there a way to rank them?

Finally - can you feed the results of feature selection into a model and only have the important (i guess weight=1) attributes used?

Thanks!
















<p> Transformations of the attribute space may ease learning in a way, that simple learning schemes may be able to learn complex functions. This is the basic idea of the kernel trick. But even without kernel based learning schemes the transformation of feature space may be necessary to reach good learning results. </p> <p> RapidMiner offers several different feature selection, construction, and extraction methods. This selection process (the well known forward selection) uses an inner cross validation for performance estimation. This building block serves as fitness evaluation for all candidate feature sets. Since the performance of a certain learning scheme is taken into account we refer to processes of this type as &quot;wrapper approaches&quot;.</p> <p>Additionally the process log operator plots intermediate results. You can inspect them online in the Results tab. Please refer to the visualization sample processes or the RapidMiner tutorial for further details.</p> <p> Try the following: <ul> <li>Start the process and change to &quot;Result&quot; view. There can be a plot selected. Plot the &quot;performance&quot; against the &quot;generation&quot; of the feature selection operator.</li> <li>Select the feature selection operator in the tree view. Change the search directory from forward (forward selection) to backward (backward elimination). Restart the process. All features will be selected.</li> <li>Select the feature selection operator. Right click to open the context menu and repace the operator by another feature selection scheme (for example a genetic algorithm).</li> <li>Have a look at the list of the process log operator. Every time it is applied it collects the specified data. Please refer to the RapidMiner Tutorial for further explanations. After changing the feature selection operator to the genetic algorithm approach, you have to specify the correct values. <table><tr><td><icon>groups/24/visualization</icon></td><td><i>Use the process log operator to log values online.</i></td></tr></table> </li> </ul> </p>

B_Miner · January 2010

Hi all-

Just curious if anyone has insight on this. Thanks!!

land · January 2010

Hi,
it seems to me, I have somehow overlooked your questions a few times. Guess have read it sometimes and forgot that I didn't answer it, yet. But now here's my answer:

I would suggest the new backward elimination and forward selection operator for this purpose. They are faster, consume less memory and are much more stable. Last but not least they offer better stopping criteria.

Greetings,
Sebastian

cherokee · January 2010

Hi B_Miner,

just a bit to your direct questions:

Multiple (but decreasing) Performance Vectors: The output of your outter log is not the average of a generation. It is the average of the 10-fold cross validation -- the average of one individual. Assume you have 10 features. In the first step backward elimination must create 10 feature combinations (each time leaving one out). Each of these combinations must be tested (run to the XVal). So you get 10 averages. The operator then chooses the best combination. In the next step it has to test 9 combinations (each time leaving one of the remaining 9 features out), and so on. This way you see multiple averages (for every possible feature combination) but with decreasing numbers (as there are less possible combinations over time).

Putting feature selection in a model: This not directly possible. You have to store the feature weights (actually only 1s and 0s). Then you can use those weights with the operator Select by Weight. Just select every attribute with weight greater or equal to 1.

This is (afaik) also the case for the new Operators mentioned by Sebastian.

Best regards,
chero

B_Miner · January 2010

That is an extremely helpful explanation - I assumed I was not familiar with what was happening.

Thanks a lot!

Brian

B_Miner · February 2010

Hi Cherokee, Can I ask a follow-up?

So that I can understand what this algorithm does, is this correct?

Step 1, take all the 8 predictors and create eight runs, where in each, only 7 of the predictors are included. Run each of these subsets through 10-fold x validation. So my inner log should have 80 accuracy measures and the outer log should record 8 (the average of each of the 10-fold cross validations).

For this part, I see that what I actually get from the outer loop is the LAST value from each of the 10 fold x validations, not the mean. (?!)

第二步,重复步骤1但只用7预测where the 7 predictors are chosen as picking the best subset (7, in essence dropping one predictor) from the cross validation.

Do you know how the final '1' instead of '0' are chosen in the final output?

Here is my code again:

Thanks so much for your help!

cherokee · February 2010

Hi B_Miner,

shure, a folluw-up is no problem.

B_Miner wrote:

Step 1, take all the 8 predictors and create eight runs, where in each, only 7 of the predictors are included. Run each of these subsets through 10-fold x validation. So my inner log should have 80 accuracy measures and the outer log should record 8 (the average of each of the 10-fold cross validations).

In general yes. You can change that behaviour a bit with the parameter "keep best".

For this part, I see that what I actually get from the outer loop is the LAST value from each of the 10 fold x validations, not the mean. (?!)

Unfortunatelly I could replicate this behaviour. I see that this is happening but I don't know why. One of the developers should check on that. Hopefully it is just a problem with the deliverance of values not with the algorithm itself.

第二步,重复步骤1但只用7预测where the 7 predictors are chosen as picking the best subset (7, in essence dropping one predictor) from the cross validation.

Yes. But you can change how many descendants are kept with the parameter "keep best".

Do you know how the final '1' instead of '0' are chosen in the final output?

Well I don't know exactly what you mean here. Either you want to know (a) why the empty set isn't checked or you want to know (b) how the resulting feature combination is selected. For (a) I don't know the answer. It should be checked imho. Regarding (b): The final set is that checked set having the best performance value.

Hope I could help,
chero

B_Miner · February 2010

谢谢切诺基!我现在得到的概念hopefully one of the developers can chime in on why the last value is being extracted from the xvalidation and not the mean of the 10 folds.

Brian

land · February 2010

Hi,
where did you log the value from? The performance value of the cross validation will be the avarage of all previous iterations.

Greetings,
Sebastian

B_Miner · February 2010

Hi Sebastian,

The XML code is immediately above. The log is from the x-validation. It appears not to be the average but the last value per validation run. Does this answer your question?

land · February 2010

Hi,
真的很尴尬,但有布鲁里溃疡g in the XValidation, that found its way into the code during the porting of the operator to 5.0. We have removed that in the current developer version and it will not be in the final release.

Greetings,
Sebastian

B_Miner · February 2010

Thanks!

Is there a place to get a snapshot when bugs are fixed? For example. there was one in the text mining plugin that I found that was corrected. But im not sure where to find the newest app.

land · February 2010

Hi,
you could check out the newest developer version from svn on sourceforge. I'm not sure, if the extensions get mirrored there, but if not, I will advise the admin in charge of that to do it.
Unfortunately we are really busy because of the CeBit. There's a lot work to do, so that we cannot make updates available as frequent as we would wish.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Feature Selection - Backward X Val

Answers