Sign parity / calculated data to weight
Dear community,
I am looking for a hint on how to realize the following algorithm in Rapidminer:
Given is an example set with a label and a few attributes:
label att1 att2
1 2 -1
3 2 -2
-1 1 -3
Next I want to calculate sign parity (-> ratio of identical sign):
Sign parity label / att1 => +/+ and +/+ and -/+ divided by 3 => 0,66
Sign parity label / att2 => +/- and +/- and -/- divided by 3 => 0,33
Finally, the results shall be assigned as weights to each attribute.
Weight att1 = 0,66
Weight att2 = 0,33
I managed to calculate sign parity so far by performing a simple statement (e.g. label * att1 >= 0) which I can loop through all examples and then divide by the number of examples. But how to transfer this back to weights?
Best regards
Sachs
Tagged:
0
Answers
The Set Role operator let's you select an attribute column and set it to weight. Just select "weight" in the drop down menu and RapidMiner will recongize it as a weight.
Thank you for your input. The set role operator allows to define one attribute as weight. This will give me one weight per example.
Contrary to this I want to have a weight for each attribute (based on how often the sign of each exampel equal the label example).
Kind regards
Sachs
Ok, I see so label = 0.66att1 + 0.33att2? Will the sum of the attributes equal 1? Did you try the Weights to Data or Data to Weights operator?
Hi Thomas,
Maybe my process description was a bit misleading. I try again in other words:
1) Determine the weight for each attribute.
This is done by comparing the sign of each example in an attribute with the sign of the label's examples. Then the overall ratio shall be computed. So that I get a statement like 75% of the examples of label and attX have the same sign.
2) Assign weight to attribute.
The calculated values (e.g. 75% for attX) shall then be assigned as weights to the corresponding attributes.
(The final step would be to select top n attributes with "select by weights" operator.)
The "data to weight" operator sounded good but actually it does nothing else that assigning a weight of "1" to each attribute and there is no way to feed in the determined weight.
Best regards
Sachs
Why not do this via the Generate Attributes operator?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Brian,
Thank you for trying to help! Your post came just a second after my last one, where I tried to give a better description of what the result should be.
The generate attributes operator is indeed what I use to do the comparison on the examples (label n * att1 n >=0). But how to accumulate the results and transform to a weight of the ATTRIBUTE?
Kind regards
Sachs
一个简单的聚合使用平均函数寿ld do the trick after that, once you have the values for every example, which will give you one overall value per attribute. Then if you want you can transpose the resulting data to get a table of overall values per attribute (one attribute being each example in the transposed data) which can then be sorted and the top N can be selected.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thank you for your contribution. In my attempt to implement your suggestion the aggregate operator with average function on a generated attribute does the job to calculate the desired value. I also can imagine how transpose will look like. But currently I am stuck in the middle of this process.
Aggregate does now calculate the "weight" for att1 but the operator's result still needs to be moved to the last example of att1. Only if in the end the weights of all attributes are in the same example row I can start with transpose.
Best regards
Sachs
Perhaps if you can post a small dataset with some examples and your process then it would be easier to try to work this through?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Brian,
Here is a piece of code that calculates what I want to have as weights. The point where I am struggling now is to use this information in order to filter the original attributes' list.
Best regards
Sachs