"Bug in Feature Generation: side effects"
Hello RapidMiner Team
I am using the latest cvs-version and tried to implement the ZTransformation. That means, calculating mean and std from input ExampleSet and then apply a series of RM-Operators, calling them within my code. Trying some preprocessing steps before my operator, I stepped over the strange behaviour of the FeatureGenerationOperator, which I also use. Then I simulated the Code in a process, using only RM-builtin-Operator. The strange things happened again. Two notes regarding the following setups:
Hope this error-desription was somehow helpful
greetings
Steffen
I am using the latest cvs-version and tried to implement the ZTransformation. That means, calculating mean and std from input ExampleSet and then apply a series of RM-Operators, calling them within my code. Trying some preprocessing steps before my operator, I stepped over the strange behaviour of the FeatureGenerationOperator, which I also use. Then I simulated the Code in a process, using only RM-builtin-Operator. The strange things happened again. Two notes regarding the following setups:
- The "useless" re-naming I got to perform because (originally) I wanted to use an attributenname containing a "(" within FeatureGeneration (confidence...)
- In the following setups I used the dataset described by golf.aml delivered with the RM-distribution.
2. But accidently using the wrong attributename within FeatureGeneration, no error message appeared, but this wrong result.
<操作符名称= " skip_ijon " class = "FeatureNameFilter">
3. Setting the correct names, but applying the Sorting-Operator before causes the same results as in step 2.
<操作符名称= " skip_ijon " class = "FeatureNameFilter">
At this point I came to the conclusion, that the problem must lurk deeply in the RapidMiner entrails ...
<操作符名称= " skip_ijon " class = "FeatureNameFilter">
Hope this error-desription was somehow helpful
greetings
Steffen
Tagged:
0
Answers
wow, what a wonderful detailed bug report. I must admit, I just browsed over it since I have not that much time today, but I will have a closer look at it on Monday, if nobody else will have done so until then ...
Regards,
Tobias
I just checked out the 4.2 Release and it seems, that this bug is still there. I will open a ticket now, because I guess it is easier to keep track of such things in the huge amount of work you got to do. I thought about it before, but I didnt want to be annoying
beside this ... keep up the good work !
greetings
Steffen
thanks for the reminder. We indeed missed this, sorry.
Cheers,
Ingo
I have found a bug in FeatureGeneration too (maybe the same ?), but strangely there is the same kind of bug in AttributeConstructionLoader.
The basic idea of my experiment is to merge two lexical matrices in text mining : I have 10 documents in ".doc" format, 13 in "pdf", I use a "TextInput" subtree for each but I have to merge two examplesets with different lines and different atttributes.
I have tried "ExampesetMerge/Join/cartesian", none of them are satisfactory. Now I tried AttributeConstructionLoader and FeatureGeneration, both using "keep all=true" and "filepath= true" options, but I have such a message :
"The function name 'const' must be used with empty arguments".
Here is my experiment :
Is this the known behaviour steffen has been talking about ?
Cheers,
Jean-Charles.
so, back again to feature generation. First, some comments on Steffen's Report: 是的,它是。非常深。我们有两个不同的数据structures (actually only one data structure and a view structure) for the data we handle. First, the ExampleTable which actually holds the data and the ExampleSets which define views on the underlying tables. All operators work on the ExampleSets with one exception: the feature generation operators directly work on the tables for performance reasons and to easily share newly generated attributes among views without the need for re-creation. This is, for example, useful for the evolutionary feature construction approaches.
However, changing the underlying table columns without "notifying" the view columns (attributes) might lead to some strange behaviour. For that reason, one simply have to copy the attribute (I kept the renaming) like in the following process. Then it works with both attribute names in the construction:
About the attribute construction loading: please use the operator "AttributeConstructionLoader" instread. The file parameter for the "FeatureGeneration" operator is sort of deprecated (unfortunately, we cannot mark this for parameters) and is only left in for backwards compatibility reasons.
However, just a small comment on the whole feature generation stuff: we will revise the feature generation algorithms until the next release anyway in order to ease the generation process and allow more generation types.
Cheers,
Ingo
Thank you for the workaround ! This would be nice. Did you consider using a language like JavaScript for user-defined functions ? Something like the "Modified Java Script Value" in Pentaho Kettle ? Beside "click-it-together-functions" it would be nice to have something powerful for the users with a stronger programming background.
greetings
Steffen
we actually also thought of a scripting engine for user defined functions which should be supported in Java 6 anyway (at least JavaScript should be supported).
For the more "traditional" mathematical functions we currently evaluate JEP:
http://www.singularsys.com/jep/index.html
which would really nicely fit into RapidMiner.
Any thoughts about this?
Cheers,
Ingo