inconsistent behaviour when using replaceAll
kayman
MemberPosts:662Unicorn
When using the replaceAll operator it seems some functions are ignored while other seem to work fine.
As an example :
replaceAll([myField],"^(.)",upper("$1")) just returns the same, whereas the expected behaviour would be to get the first character being returned in upper case. There is no error thrown, the upper command is just ignored
replaceAll([myField],"^(.)",concat("-","$1","-")) nicely returns a concatenated field, as expected.
Any idea why?
Tagged:
0
Comments
Hello@kayman- I'm sorry this has sat here for so long. Can you please help with a sample XML and dataset so I can reproduce it?
Thanks.
Scott
Hi@sgenzer,
Thanks for your attention to this.
Below an example. nasically you will notice that the regex results are stored and used, just not in combination with all of the functions. Hope it helps, easier to see then to explain...
@
hi@kayman- ok thanks for the sample. So that is really interesting the way you're using the replaceAll function from within Generate Attributes. I have never seen RegEx in the third input of this function; the instructions ask for a "nominal replacement", not a "nominal RegEx" like it does for the second term, and so I would have never thought to put RegEx there:
I'm going to push this around internally and see what people think. My feeling is that you have discovered an undocumented, rather cool, functionality in replaceAll that could possibly be made as a documented feature of replaceAll.
Scott
Hey@kayman!
Let's think about how this is supposed to work. (Without having access to the source code.)
replaceAll is a function with three parameters: [myField], regular expression to search, replacement text.
When the function is called, the parameters are evaluated before being sent into it.
So if you use upper("$1"), this evaluates to "$1" (and that is the function parameter). If you use concat("-", "$1", "-"), that will evaluate to "-$1-". This is a correct regexp replacement string, so the $1 will be replaced by the string found by your regexp.
replaceAll can't magically apply arbitrary functions inside the replacement. It takes a replacement string; instead of manipulating that, just manipulate the result of replaceAll.
Your upper("$1") could be also outside of replaceAll: upper(replaceAll([myField],"^(.)","$1"))
But this could be done easier: upper(prefix([myField], 1))
Regards,
Balázs
Yeah, guess that's the advantage of not knowing that something isn't supposed to work and just try stuff :-)
@BalazsBarany, for the given example the prefix option would work indeed, but my use case was a bit more complex so I would end up with rather long and nasty code chains. hence the reason why I wanted to try the replaceAll option as it was easier to catch my phrase using regex than the static way.
The point is that my function parameter seems to be accepted as a string sometimes (like when using concat), and is ignored other times (like with the upper function), so it would be really cool if a regex nominal could be used as a regular nominal all over the place. It doesn't throw an error so at least it seems to be accepted, and then ignored.
In the end, if I take the result of my regex search, that becomes a nominal. And if that given nominal would be treated like any other nominal it becomes a pretty powerfull option.
Hi,
for reference, the source code is public:https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/expression/internal/function/text
And it's also realtivly easy to add new functions.
Best,
Martin
Dortmund, Germany
Hey@kayman,
我的效果t is that the function arguments are simple, non-magic strings. I'm sure they are like this in any programming language (maybe aside from Perl, too much magic going on there).
In any function call, the function arguments are evaluatedbeforethe value is passed to the function. This is how our programming languages work. So replaceAll sees "$1" from the upper() and "-$1-" from the concat().
This fully explains why concat() works in your example but upper() doesnt.
There is no way to specify that the regexp replacement should apply an arbitrary function to the replacement string inside of replaceAll.
There are languages like Perl (and libraries like PCRE) that support the "\U$1" syntax in the replacement to apply simple transformations like uppercase to the replacement string. But Java doesn't support this, therefore RapidMiner doesn't, too.
Regards,
Balázs
Fair enough@BalazsBarany, I just like a little magic time by time ;-)
ok I think we're all on the same page hereI have pushed this to the documentation folks and I will let them chew on this. Thanks@kaymanfor always showing me something new!