weight of attributes -

GosiaRzeGosiaRze MemberPosts:3Learner I
edited June 2020 inHelp
Dear RapidMiner Community!

I am a newbie here, the same as in data science. I am doing my first analysis project for the college assignment.
I tried to find the answer here in the forum and followed the suggestions, but still I am stuck.

The data set I am working on has got 35 attributes, the target one is binominal (yes/no).
Before I choose the most relevant attributes for further exploring and examing correlation, I want to see how much % of positive values 'Yes' there is in every attribute.

I will appreciate any help for a begginer student.
Cheers,Gosia

Tagged:

Best Answers

  • GosiaRzeGosiaRze MemberPosts:3Learner I
    Solution Accepted
    @mschmitz- Thank you! I tried the Aggregate operator yesterday, but still I make some mistake.

    If I understood correctly:
    "Aggregate-> Default Aggregation -> Sum -> Group by attributes -> (my attribute) "

    What I get is the sum of the data in different columns, e.g. for the column "Age" I got the sum of age values for "Yes" and "No", respectively to my target attribute. That is not what I am looking for.

    I changed "Default Aggregation->Sum" for "Default Aggregation -> Count (percantage)", but the results for every columns are the same - every columns shows the % of Yes and No from my target attribute.

    What I trying to get is: how much % of Yes from my target attribute is linked to every column?
    In other words, what % of examples in every column is defined by Yes and No from the target column?

    What is the mistake that I make?




Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,389RM Data Scientist
    mhh,
    yes it is a bit more complex. Maybe the attached process helps?
    Best,
    Martin































    only nominal


































    <连接from_op = from_“主(2)”port="output" to_op="Generate Attributes" to_port="example set input"/>








    <运营商激活= =“聚合”比较“false”类atibility="9.7.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="289">



































    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • GosiaRzeGosiaRze MemberPosts:3Learner I
    @mschmitz,@Telcontar120- thanks a million, I got what I wanted following your suggestions! Best, Gosia
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,389RM Data Scientist
    good to hear! It's worth noting that this quickly moves into operators like Weight By Gini Index or Weight by Information Gain. Those are more typically used to asses "Discreminative Power" of your attributes.
    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign InorRegisterto comment.