“分割文本案例——RegEx下降letters"

mob · January 2016

I have some text that comes in different cases sometimes tokens are TextendNewtext others textendNewtext I found this regex online for python
([A-Z])([A-Z])([a-z])|([a-z])([A-Z])

but when I apply it to my dataset using the replace tokens operator and ([A-Z])([A-Z])([a-z])|([a-z])([A-Z]) replaced by $1 $2 I get
Texten ewtext

I'm far from an expert in regex in any flavour but can anyone help me resolve this

mob · February 2016

这是StackOverflow邮报,供参考I got the regexhttp://stackoverflow.com/questions/15369566/putting-space-in-camel-case-string-using-regular-expression

is there a way in rapidminer to handle CamelCaseTextOfVariousLengths and split it into tokens?

MartinLiebig · February 2016

Hi mob,

why not simply use replace and replace capital letters with white space followed by the latter? Seems to work

~Martin

BalazsBarany · February 2016

The problem with your replacement is the following:

In the regular expression
([A-Z])([A-Z])([a-z])|([a-z])([A-Z])
all the parentheses are numbered. You're trying to replace by the value coming from the first and second parentheses, which would be two capital letters if your text matched that. It doesn't, so it goes to the second (alternate) match after the pipe symbol. But the contents of those parentheses are not in the replacement string. They would be $4 and $5.

Martin's approach seems to be what you want.

mob · February 2016

Thanks Martin and Balázs Martins simple solution did exactly what I needed and also handled the situation with tokens like notcamelCase

AndreasS · February 2017

Hi Martin, hi everybody

I am facing the same problem as mob. Unfortunately I couldn't solve it using your comment from 02-01-2016.

Problem:

I want to separate the following text: "PleaseSeparateMeByCapitalLetters" into "Please Separate Me By Capital Letters"

I tried to use the Replace Tokens operator

- replace what:[A-Z]

- replace by: $1

However the result is " lease eparate e y apital etters".

Thanks in advance for your help

Andreas

BalazsBarany · February 2017

The $1 refers to a "capture expression". You define capture expressions with (). If that's not in thereplace whatpart, then $1 will be empty.

kayman · February 2017

Try this : ([A-Z])(.) ,replace by $1$2 and ensure there is a space before $1. It also adds a space before the first word but you can remove that one again by doing a second regex or trim the string.

so like this :

Probably not the most sexy solution but plain simple sometimes does the trick also.

AndreasS · February 2017

Thank you !!!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

“分割文本案例——RegEx下降letters"

Answers