Sign parity / calculated data to weight

qwertz2qwertz2 MemberPosts:49Guru
edited November 2018 inHelp

Dear community,

I am looking for a hint on how to realize the following algorithm in Rapidminer:

Given is an example set with a label and a few attributes:

label att1 att2
1 2 -1
3 2 -2
-1 1 -3


Next I want to calculate sign parity (-> ratio of identical sign):
Sign parity label / att1 => +/+ and +/+ and -/+ divided by 3 => 0,66
Sign parity label / att2 => +/- and +/- and -/- divided by 3 => 0,33


Finally, the results shall be assigned as weights to each attribute.
Weight att1 = 0,66
Weight att2 = 0,33


I managed to calculate sign parity so far by performing a simple statement (e.g. label * att1 >= 0) which I can loop through all examples and then divide by the number of examples. But how to transfer this back to weights?



Best regards
Sachs
Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    The Set Role operator let's you select an attribute column and set it to weight. Just select "weight" in the drop down menu and RapidMiner will recongize it as a weight.

  • qwertz2qwertz2 MemberPosts:49Guru
    Hi Thomas,

    Thank you for your input. The set role operator allows to define one attribute as weight. This will give me one weight per example.

    Contrary to this I want to have a weight for each attribute (based on how often the sign of each exampel equal the label example).


    Kind regards
    Sachs
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Ok, I see so label = 0.66att1 + 0.33att2? Will the sum of the attributes equal 1? Did you try the Weights to Data or Data to Weights operator?

  • qwertz2qwertz2 MemberPosts:49Guru

    Hi Thomas,

    Maybe my process description was a bit misleading. I try again in other words:

    1) Determine the weight for each attribute.

    This is done by comparing the sign of each example in an attribute with the sign of the label's examples. Then the overall ratio shall be computed. So that I get a statement like 75% of the examples of label and attX have the same sign.

    2) Assign weight to attribute.

    The calculated values (e.g. 75% for attX) shall then be assigned as weights to the corresponding attributes.

    (The final step would be to select top n attributes with "select by weights" operator.)

    The "data to weight" operator sounded good but actually it does nothing else that assigning a weight of "1" to each attribute and there is no way to feed in the determined weight.

    Best regards

    Sachs

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Why not do this via the Generate Attributes operator?

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • qwertz2qwertz2 MemberPosts:49Guru

    Hi Brian,

    Thank you for trying to help! Your post came just a second after my last one, where I tried to give a better description of what the result should be.

    The generate attributes operator is indeed what I use to do the comparison on the examples (label n * att1 n >=0). But how to accumulate the results and transform to a weight of the ATTRIBUTE?

    Kind regards

    Sachs

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    一个简单的聚合使用平均函数寿ld do the trick after that, once you have the values for every example, which will give you one overall value per attribute. Then if you want you can transpose the resulting data to get a table of overall values per attribute (one attribute being each example in the transposed data) which can then be sorted and the top N can be selected.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • qwertz2qwertz2 MemberPosts:49Guru
    Hi Brian,

    Thank you for your contribution. In my attempt to implement your suggestion the aggregate operator with average function on a generated attribute does the job to calculate the desired value. I also can imagine how transpose will look like. But currently I am stuck in the middle of this process.

    Aggregate does now calculate the "weight" for att1 but the operator's result still needs to be moved to the last example of att1. Only if in the end the weights of all attributes are in the same example row I can start with transpose.


    Best regards
    Sachs
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Perhaps if you can post a small dataset with some examples and your process then it would be easier to try to work this through?

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • qwertz2qwertz2 MemberPosts:49Guru

    Hi Brian,

    Here is a piece of code that calculates what I want to have as weights. The point where I am struggling now is to use this information in order to filter the original attributes' list.

    Best regards

    Sachs







    < =“tru运营商激活e" class="process" compatibility="7.5.000" expanded="true" name="Process">

    < =“tru运营商激活e" class="generate_data" compatibility="7.5.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">



    < =“tru运营商激活e" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="179" y="34">


    < =“tru运营商激活e" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="45" y="34">




    < =“tru运营商激活e" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34">























Sign InorRegisterto comment.