"[SOLVED] How to Split an Attribute and Keep the Split Character?"

jan_kvacekjan_kvacek MemberPosts:4Contributor I
edited June 2019 inHelp
Hello!

I have a trubble with splitting attributes in Rapidminer Studio. My attribute looks like this:

"A002W0541G001"

I need to split it to several new attributes:

"A002" "W0541" "G001" and so on.

But Split always dropps the character I use to determine where to split the original attribute. Is there any way to keep it?

Thank you for help!

Jan
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    If its always 4 chars, 5 chars 5 chars you might simply use Generate Attributes with cut?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • jan_kvacekjan_kvacek MemberPosts:4Contributor I
    Martin Schmitz wrote:

    If its always 4 chars, 5 chars 5 chars you might simply use Generate Attributes with cut?
    Unfortunately it is not. I need to do something like "find a letter, take the latter and all numbers behind it and make it new attribute"
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    It sounds like you just need the right RegEx.
    Assuming you have a pattern of [Letter+Numbers][Letter+Numbers] then this works: "(?<=[0-9]++)(.*?)(?=[A-Z])"
    Negative lookbehind to check there are numbers before, lookahead to check for the letter. Anything inbetween is used to split.

    Sample process below:
































    <运营商激活= " true "类= compatibi“分裂”lity="7.1.000-BETA" expanded="true" height="82" name="Split" width="90" x="447" y="34">

























  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    On a side-note... anyone happen to know the right RegEx to split into n-grams? I want one that splits a nominal value like "RapidMiner" into "Ra ap pi id dM Mi in ne er"... can you think of one? When I try it I always get "Ra pi dM in er" which isn't right. I wrote a rather complex loop to do it instead, but would prefer if could do it with one operator.
  • jan_kvacekjan_kvacek MemberPosts:4Contributor I
    JEdward wrote:

    It sounds like you just need the right RegEx.
    Assuming you have a pattern of [Letter+Numbers][Letter+Numbers] then this works: "(?<=[0-9]++)(.*?)(?=[A-Z])"
    Negative lookbehind to check there are numbers before, lookahead to check for the letter. Anything inbetween is used to split.
    This just does the thing! Thank you.
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    very nice. Thanks. This is something I face often. Maybe a feature request to simply add a checkbox option to keep the split text instead of removing it?;)
Sign InorRegisterto comment.