How to find the most important features in a dataset?

Christos_KarapapasChristos_Karapapas MemberPosts:25Contributor II
I have a dataset in csv format with more than 500 columns, I have imported it to a database marking every column as polynomial since they all hold different types of information and now, I want to find which of those are the most important.

So far, I have managed to get a table with the feature and its weight, using the weight by "X" operator, but the problem is that on the results I get every feature-value separately on a different row. Instead what I want is to aggregate by feature and have a single weight for each of them. I tried using the aggregate operator but with no luck.

As an example, this is what I get:
feature01-value05, weight:0,71
feature01-value13, weight:0,69
feature09-value03,体重:0,55

Instead I want something like this:
feature01, weight:0,7
feature09, weight:0,55

Best Answer

  • Christos_KarapapasChristos_Karapapas MemberPosts:25Contributor II
    Solution Accepted
    Thank you so much Lionel!

    I finally managed to figure it out. I was getting a ArrayIndexOutOfBoundsException on the Weight by Information Gain operator due to some missing values in my dataset, so I was trying with various (wrong) operators to overcome this problem. One of those was the nominal to numerical which apparently caused this behavior. Once i replaced it with the (obviously right for this job) Replace Missing Values operator everything worked as expected.
    lionelderkrikor

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Hi@chris_skg,

    I'm not able to get the results you obtained...
    Here the results I get by applyingWeight by Information Gainoperator to theGolfdataset :



    In order we can reproduce what you observe and understand what's going on, can you please share :
    - your XML process or your file process (.rmp file)
    - your data

    Regards,

    Lionel


  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    OK,@chris_skg,

    Glad that you finally found a solution !

    Regards,

    Lionel
    Christos_Karapapas
Sign InorRegisterto comment.