How to do Y-randomization in Rapidminer?

pengiepengie MemberPosts:21Maven
edited November 2018 inHelp
Hi,

I was wondering how do I do Y-randomization in Rapidminer? In Y-randomization, the y value of an example is randomly exchanged with the y value of another example. This is used in validation of QSAR models, whereby the performance of the original model (r2) is compared to that of models built for permuted (randomly shuffled) response.

问候

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    although there is no operator for Y-Randomization in RapidMiner yet, we can make use of its modularity. I have created a process, doing Y-randomization. You could encapsulate it within an OperatorChain to use it within your process.















    <列出关键=“噪音”>





















    Hope that helps.


    Greetings,
    Sebastian
  • pengiepengie MemberPosts:21Maven
    Hi,

    thank you for your help. The code worked perfectly. I am now trying to use Rapidminer to do y-randomization, train a model, evaluate the model using leave-one-out and repeat this 100 times to get an average classification error for the y-randomization. I am using the following code



















    <列出关键=“噪音”>







































    However, it seems to give me an error about RepeatUntilOperatorChain.
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, MemberPosts:294RM Product Management
    Hi,

    just a hint: why do you not use the [tt]IteratingPerformanceAverage[/tt] operator which also iterates for a predifined number of times and also averages the performance vectors resulting from the inner operator chain?

    问候,
    Tobias
  • pengiepengie MemberPosts:21Maven
    Great hint!

    Met another error..."Message: The attribute 'random' does not exist.". Done a bit of tracing. It seems like the AttributeFilter (2) removes the attribute 'random' after the first round but on the second round, the NoiseGenerator generates attribute 'random1' instead of 'random', thus causing the error.






    <操作符的名字= " IteratingPerformanceAverage”类="IteratingPerformanceAverage" expanded="yes">











    <列出关键=“噪音”>







































  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    try to use our Permutation Operator. I forgot it myself in the previous solution. So many Operators...:)





    <操作符的名字= " IteratingPerformanceAverage”类="IteratingPerformanceAverage" expanded="yes">








































    This should help.

    Greetings,
    Sebastian
  • pengiepengie MemberPosts:21Maven
    Thank you so much. It worked perfectly. ;D

    Just one last question, when I do a breakpoint in ExampleSetJoin, I noticed that the id number of the dataset keeps increasing. Why is that so and will it have any impact on the memory?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    no this won't increase the memory consumption. Memory of ExampleSets will be freed, if no ExampleSet exists adressing this memory. Keep in mind, that it have not be freed immediately. Java will free its memory when it thinks thats appropriate or needs it.

    Greetings,
    Sebastian
Sign InorRegisterto comment.