Feature Selection mandatory columns

bhupendra_patilbhupendra_patil Administrator, Employee, MemberPosts:168RM Data Scientist
edited November 2018 inKnowledge Base

RapidMiner provides various feature selection techniques like forward selection, backward elimination, weight guided, evolutionary etc.

Very rarely there is a need to incorporate certain set of features(columns/attributes) always when you are trying various combination. This article demonstrates one of the ways to always have certain set of columns as part of feature selection.

Supposed you had columns like this and you wanted to ensure that columns a1 and a2 are always considered during your optimization steps.

2016-08-23 16_26_24-Settings.png

To force RapidMiner workflow to do so, we can use the Set Role operator to let the optimization step ignore it first and then during the model building reincorporate it first

We will introduce a set role operator just outside the optimization step like seen below

2016-08-23 16_29_23-Settings.png

Then in the parameter section we will select attribute name a1 and type in target role with any arbitrary string (Ignoreme in the screen shot).

If you have additional columns that you want to always use, then you can specify them using the set additional roles dialog.

2016-08-23 16_31_37-Settings.png

Please note that the target role used is a different string. So you will need to come up wiht unique string for each column, simple solution will be to use ignoreme1, ignoreme2 , igmoreme3 .and so on

By setting up this meta data the optimization step basically always ignore this column, however the model operators etc will also ignore it.

Hence to counter this effect we need to add an additional step inside the "Optimize" operator.

We will add an additional Set Role inside the optimize step

2016-08-23 16_35_00-Settings.png

And then change the role back to regular for the two attributes that we had given special role earlier.

As the data moves to the validation step, it will be included in the model building as well as validation step.

Please find attached example process too.

Hopefully you find this article helpful, Feel free to post comments or questions on community regarding these or other topics.

image

MartinLiebig

Comments

  • Fred12Fred12 MemberPosts:344Unicorn
    are you sure this will take the previously ignored attributes into account in the attribute selection process? for me it looks like it will ignore a1 and a2, and start with the remaining attributes (starting from 1 or from all remaining attributes) and choose (or eliminate9 those with the best successive performance..
    but it doesnt take a1 and a2 into account in the optimization, only adds them in the end. Or am I wrong?
Sign InorRegisterto comment.