inconsistent behaviour when using replaceAll

kaymankayman MemberPosts:662Unicorn

When using the replaceAll operator it seems some functions are ignored while other seem to work fine.

As an example :

replaceAll([myField],"^(.)",upper("$1")) just returns the same, whereas the expected behaviour would be to get the first character being returned in upper case. There is no error thrown, the upper command is just ignored

replaceAll([myField],"^(.)",concat("-","$1","-")) nicely returns a concatenated field, as expected.

Any idea why?

Tagged:
0
0 votes

Declined·Last Updated

26 Jul 2019 - no votes or activity since December 2018. Changing status to "Declined". If you want to reopen this idea for voting, please comment and cc @sgenzer. PROD-699

Comments

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    Hello@kayman- I'm sorry this has sat here for so long. Can you please help with a sample XML and dataset so I can reproduce it?


    Thanks.


    Scott

  • kaymankayman MemberPosts:662Unicorn

    Hi@sgenzer,

    Thanks for your attention to this.

    Below an example. nasically you will notice that the regex results are stored and used, just not in combination with all of the functions. Hope it helps, easier to see then to explain...























    <连接from_op = "生成属性”from_port = "example set output" to_port="result 1"/>






    @

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hi@kayman- ok thanks for the sample. So that is really interesting the way you're using the replaceAll function from within Generate Attributes. I have never seen RegEx in the third input of this function; the instructions ask for a "nominal replacement", not a "nominal RegEx" like it does for the second term, and so I would have never thought to put RegEx there:

    Screen Shot 2018-03-08 at 5.11.02 PM.png

    I'm going to push this around internally and see what people think. My feeling is that you have discovered an undocumented, rather cool, functionality in replaceAll that could possibly be made as a documented feature of replaceAll.

    Scott

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:841Unicorn

    Hey@kayman!

    Let's think about how this is supposed to work. (Without having access to the source code.)

    replaceAll is a function with three parameters: [myField], regular expression to search, replacement text.

    When the function is called, the parameters are evaluated before being sent into it.

    So if you use upper("$1"), this evaluates to "$1" (and that is the function parameter). If you use concat("-", "$1", "-"), that will evaluate to "-$1-". This is a correct regexp replacement string, so the $1 will be replaced by the string found by your regexp.

    replaceAll can't magically apply arbitrary functions inside the replacement. It takes a replacement string; instead of manipulating that, just manipulate the result of replaceAll.

    Your upper("$1") could be also outside of replaceAll: upper(replaceAll([myField],"^(.)","$1"))

    But this could be done easier: upper(prefix([myField], 1))

    Regards,

    Balázs

  • kaymankayman MemberPosts:662Unicorn

    Yeah, guess that's the advantage of not knowing that something isn't supposed to work and just try stuff :-)

    @BalazsBarany, for the given example the prefix option would work indeed, but my use case was a bit more complex so I would end up with rather long and nasty code chains. hence the reason why I wanted to try the replaceAll option as it was easier to catch my phrase using regex than the static way.

    The point is that my function parameter seems to be accepted as a string sometimes (like when using concat), and is ignored other times (like with the upper function), so it would be really cool if a regex nominal could be used as a regular nominal all over the place. It doesn't throw an error so at least it seems to be accepted, and then ignored.

    In the end, if I take the result of my regex search, that becomes a nominal. And if that given nominal would be treated like any other nominal it becomes a pretty powerfull option.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,287RM Data Scientist

    Hi,

    for reference, the source code is public:https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/expression/internal/function/text

    And it's also realtivly easy to add new functions.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:841Unicorn

    Hey@kayman,

    我的效果t is that the function arguments are simple, non-magic strings. I'm sure they are like this in any programming language (maybe aside from Perl, too much magic going on there).

    In any function call, the function arguments are evaluatedbeforethe value is passed to the function. This is how our programming languages work. So replaceAll sees "$1" from the upper() and "-$1-" from the concat().

    This fully explains why concat() works in your example but upper() doesnt.

    There is no way to specify that the regexp replacement should apply an arbitrary function to the replacement string inside of replaceAll.

    There are languages like Perl (and libraries like PCRE) that support the "\U$1" syntax in the replacement to apply simple transformations like uppercase to the replacement string. But Java doesn't support this, therefore RapidMiner doesn't, too.

    Regards,

    Balázs

  • kaymankayman MemberPosts:662Unicorn

    Fair enough@BalazsBarany, I just like a little magic time by time ;-)

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    ok I think we're all on the same page here:)I have pushed this to the documentation folks and I will let them chew on this. Thanks@kaymanfor always showing me something new!

Sign InorRegisterto comment.