"Why doesn't Split-Data Inherit the Global Random Seed?"

Panj1Panj1 MemberPosts:2Contributor I
edited June 2019 inHelp

I retrieved the Titanic dataset, than multipied it, and Copy and paste 3x split data operators at a 0.7/0.3 split. The data results are different each time. Now, I can set the local random seed to something in order to make sure it splits exactly the same each time, but I would've expected in this case that it inherits the Global Random Seed by default. Is this expected behavior? It seems unintuitive if it is.

If it is expected behavior, is there an option somewhere to force randomization operators to use a global random seed?

I am using RapidMiner Studio 8.2.

Titanic Split Data Test.png

Thank You,

Please see XML below.





<宏/ >



<运营商激活= " true "类=“检索”兼容ibility="8.2.000" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="340">





















<运营商激活= " true "类=“检索”兼容ibility="8.2.000" expanded="true" height="68" name="Retrieve Titanic (2)" width="90" x="45" y="748">








<运营商激活= " true "类=“检索”兼容ibility="8.2.000" expanded="true" height="68" name="Retrieve Titanic (3)" width="90" x="45" y="850">








< from_op = " Retrie连接ve Titanic" from_port="output" to_op="Multiply" to_port="input"/>






< from_op = " Retrie连接ve Titanic (2)" from_port="output" to_op="Split Data (5)" to_port="example set"/>

< from_op = " Retrie连接ve Titanic (3)" from_port="output" to_op="Split Data (6)" to_port="example set"/>










Tagged:

Best Answer

  • jczogallajczogalla Employee, MemberPosts:144RM Engineering
    Solution Accepted

    Hi Panj1!

    Welcome to the community.:)As a tip, you can use the "" button while writing your post to have a nice formatted version of your XML.This prevents conversion of part of the XML to smilies for example.

    Regarding the random generator question: The operators of course use the global random generator by default, but since it is the global random generator, it will progress with each operator that uses it. This means that as long as you keep the execution order the same, the end results will stay the same between process executions. But if you want two split operators to produce the same partitions, those two need to have the same local random seed. This is also the case for loops.

    If you just want to split the same data set multiple times the same way, you can also use the split operator once and multiply its outputs, example XML below.





    <宏/ >



    <运营商激活= " true "类=“检索”兼容ibility="8.3.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="340">










    < from_op = " Retrie连接ve Titanic" from_port="output" to_op="Split Data" to_port="example set"/>










    I hope this helps!

    Cheers

    Jan

    sgenzer kypexin SGolbert
    Sign InorRegisterto comment.