"Bug in Feature Generation: side effects"

steffensteffen MemberPosts:347Maven
edited May 2019 inHelp
Hello RapidMiner Team

I am using the latest cvs-version and tried to implement the ZTransformation. That means, calculating mean and std from input ExampleSet and then apply a series of RM-Operators, calling them within my code. Trying some preprocessing steps before my operator, I stepped over the strange behaviour of the FeatureGenerationOperator, which I also use. Then I simulated the Code in a process, using only RM-builtin-Operator. The strange things happened again. Two notes regarding the following setups:
  • The "useless" re-naming I got to perform because (originally) I wanted to use an attributenname containing a "(" within FeatureGeneration (confidence...)
  • In the following setups I used the dataset described by golf.aml delivered with the RM-distribution.
1. Here is my basic setup...which works!















<操作符名称= " skip_ijon " class = "FeatureNameFilter">









2. But accidently using the wrong attributename within FeatureGeneration, no error message appeared, but this wrong result.















<操作符名称= " skip_ijon " class = "FeatureNameFilter">









3. Setting the correct names, but applying the Sorting-Operator before causes the same results as in step 2.


















<操作符名称= " skip_ijon " class = "FeatureNameFilter">









At this point I came to the conclusion, that the problem must lurk deeply in the RapidMiner entrails ...

Hope this error-desription was somehow helpful

greetings

Steffen

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, MemberPosts:292RM Product Management
    Hi Steffen,

    wow, what a wonderful detailed bug report. I must admit, I just browsed over it since I have not that much time today, but I will have a closer look at it on Monday, if nobody else will have done so until then ...;)

    Regards,
    Tobias
  • steffensteffen MemberPosts:347Maven
    Hello RapidMiner-Team

    I just checked out the 4.2 Release and it seems, that this bug is still there. I will open a ticket now, because I guess it is easier to keep track of such things in the huge amount of work you got to do. I thought about it before, but I didnt want to be annoying;)

    beside this ... keep up the good work !

    greetings

    Steffen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hey,

    thanks for the reminder. We indeed missed this, sorry.

    Cheers,
    Ingo
  • Legacy UserLegacy User MemberPosts:0Newbie
    Hi Steffen, Hi Tobias, Hi All,

    I have found a bug in FeatureGeneration too (maybe the same ?), but strangely there is the same kind of bug in AttributeConstructionLoader.

    The basic idea of my experiment is to merge two lexical matrices in text mining : I have 10 documents in ".doc" format, 13 in "pdf", I use a "TextInput" subtree for each but I have to merge two examplesets with different lines and different atttributes.

    I have tried "ExampesetMerge/Join/cartesian", none of them are satisfactory. Now I tried AttributeConstructionLoader and FeatureGeneration, both using "keep all=true" and "filepath= true" options, but I have such a message :
    "The function name 'const' must be used with empty arguments".

    Here is my experiment :

































    Is this the known behaviour steffen has been talking about ?
    Cheers,
    Jean-Charles.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hi Steffen, Hi Jean-Charles,

    so, back again to feature generation. First, some comments on Steffen's Report:

    At this point I came to the conclusion, that the problem must lurk deeply in the RapidMiner entrails ...
    是的,它是。非常深。我们有两个不同的数据structures (actually only one data structure and a view structure) for the data we handle. First, the ExampleTable which actually holds the data and the ExampleSets which define views on the underlying tables. All operators work on the ExampleSets with one exception: the feature generation operators directly work on the tables for performance reasons and to easily share newly generated attributes among views without the need for re-creation. This is, for example, useful for the evolutionary feature construction approaches.

    However, changing the underlying table columns without "notifying" the view columns (attributes) might lead to some strange behaviour. For that reason, one simply have to copy the attribute (I kept the renaming) like in the following process. Then it works with both attribute names in the construction:



















    <操作符名称= " skip_ijon " class = "FeatureNameFilter">













    About the attribute construction loading: please use the operator "AttributeConstructionLoader" instread. The file parameter for the "FeatureGeneration" operator is sort of deprecated (unfortunately, we cannot mark this for parameters) and is only left in for backwards compatibility reasons.


    However, just a small comment on the whole feature generation stuff: we will revise the feature generation algorithms until the next release anyway in order to ease the generation process and allow more generation types.

    Cheers,
    Ingo
  • steffensteffen MemberPosts:347Maven
    Hello Ingo

    Thank you for the workaround !

    However, just a small comment on the whole feature generation stuff: we will revise the feature generation algorithms until the next release anyway in order to ease the generation process and allow more generation types.
    This would be nice. Did you consider using a language like JavaScript for user-defined functions ? Something like the "Modified Java Script Value" in Pentaho Kettle ? Beside "click-it-together-functions" it would be nice to have something powerful for the users with a stronger programming background.

    greetings

    Steffen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hi again,

    we actually also thought of a scripting engine for user defined functions which should be supported in Java 6 anyway (at least JavaScript should be supported).


    For the more "traditional" mathematical functions we currently evaluate JEP:

    http://www.singularsys.com/jep/index.html

    which would really nicely fit into RapidMiner.


    Any thoughts about this?

    Cheers,
    Ingo
Sign InorRegisterto comment.