weight of attributes -
Dear RapidMiner Community!
I am a newbie here, the same as in data science. I am doing my first analysis project for the college assignment.
I tried to find the answer here in the forum and followed the suggestions, but still I am stuck.
The data set I am working on has got 35 attributes, the target one is binominal (yes/no).
Before I choose the most relevant attributes for further exploring and examing correlation, I want to see how much % of positive values 'Yes' there is in every attribute.
I will appreciate any help for a begginer student.
Cheers,Gosia
I am a newbie here, the same as in data science. I am doing my first analysis project for the college assignment.
I tried to find the answer here in the forum and followed the suggestions, but still I am stuck.
The data set I am working on has got 35 attributes, the target one is binominal (yes/no).
Before I choose the most relevant attributes for further exploring and examing correlation, I want to see how much % of positive values 'Yes' there is in every attribute.
I will appreciate any help for a begginer student.
Cheers,Gosia
Tagged:
0
Best Answers
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,389RM Data ScientistHi@GosiaRze,you can do aggregate with an default aggregation of sum and group by your assignment attribute.Best,Martin
- Head of Data Science Services at RapidMiner -
Dortmund, Germany6 -
GosiaRze MemberPosts:3Learner I@mschmitz- Thank you! I tried the Aggregate operator yesterday, but still I make some mistake.
If I understood correctly:
"Aggregate-> Default Aggregation -> Sum -> Group by attributes -> (my attribute) "
What I get is the sum of the data in different columns, e.g. for the column "Age" I got the sum of age values for "Yes" and "No", respectively to my target attribute. That is not what I am looking for.
I changed "Default Aggregation->Sum" for "Default Aggregation -> Count (percantage)", but the results for every columns are the same - every columns shows the % of Yes and No from my target attribute.
What I trying to get is: how much % of Yes from my target attribute is linked to every column?
In other words, what % of examples in every column is defined by Yes and No from the target column?
What is the mistake that I make?
0 -
Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635UnicornYou can also run a Naive Bayes classifier and then output the model, which shows the distribution table which will have the % of Yes and No for each value of each attribute.
6
Answers
<连接from_op = from_“主(2)”port="output" to_op="Generate Attributes" to_port="example set input"/>
<运营商激活= =“聚合”比较“false”类atibility="9.7.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="289">
Dortmund, Germany
Dortmund, Germany