"Split string with n characters into n columns with each cell with only one character"

komal_chenthamakomal_chenthama MemberPosts:3Contributor I
edited June 2019 inHelp

I have around 700 rows with strings of varying length from 50 to 2000. They look like this:

MRILTWAITLLSLACFSLTEKYCYYPNGQIAVSDSPCNPNADDSACCDGDKGMMCMSNNLCRGPGGTTVRSSCTDKSWDSTACAALCMTENTVPADLTSCANVTGSDTTYCCDNHRVPCCDASIARFDVLPSKPQIFAIWDDSASAYLSINLPGTATTTATTTTSSSPAYPTDPPPSNTQPSSTPSNPPSPDAASAAALSLAVQAGIGVGAAVLALALAVVVYLVVKLRRNKNAVLAAGQRGQAGAVHGQYQGGVGVGGYDGWENKHMDKNGGGVGNGGGGAAAWYHPPAYGEPYHGGSGFGVVPRQELDAWPSVGYGQPRRQRHRQSHGQGYVQRFELPATPLGAPRRAF
MKTPLIFLLHLGLLQTCLGKKCYYPGGEEAPGDLPCDTEAEHSPCCAGGKIAGACLANKLCLAKGNPDWYARGSCTDPTFEAPECPKFCLSHEGRGWNLDYCFSQTGSETAFCCEGDANCCAAGRLEIQPAPTHVWALWNGAVSRYDVVTPLGTAKETSAPTSSATSSGTTSDAVEHSSTETTSASTTGTAAGGDRSDATGSANSNSNANSNESTGLSTGAQAGIGVGAAAGALLLAAVAFLWWRMNRMQKAMLVAQQQAAAAYPPPETPAYYSRTPAEKHELMAERPTHELAGQHYYVQGDTRSAELSSQPAYTPVESPAAGRNYGP
MRSVYIALAAALCWTGTLSASPAGAKDDVEVAMMAGRRRLTRTSGRYRSEFAALGARQGDQQCGAQFGRCPGDLCCSSYGFCGDSVDHCHPLFDCQTQYGTCGWPRAVPTTSARPTTSSTPAPPTTTTPSSTSVRPPTTSTSVTIPVPSGGLEVTQNGMCGNNTMCIGNPNYGPCCSQFFWCGSSIEFCGAGCQSDFGACLGIPGQPGNPITNGTTTSGGGSGPTSSPPTTRPTSTRVSTTTTTTTSSRTTSSSPSVTLPAGQTSSTDGRCGNNVNCLGSRFGRCCSQFGYCGDGDQYCPYIVGCQPQFGYCDPQ

I would like to split character into n(length of the string in that cell) columns, such that each cell contains only one character. And this should be done for all the rows. Then each letter is to replaced by a specific score (decimal). How can this be achieved? Please help.

Tagged:

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn

    Hi@komal_chenthama

    It would not be too complicated usingLoopoperator andGenerate Macro+Generate Attributesinside it, where macro would be just a counter of loop iterations and each new attribute would take out substring of length 1 and at position equals the number of current iteration.

    But the question is, would it be possible to make all strings of equal length before, with dummy of special characters? As otherwise each example would generate different number of attributes (equals to each string length) and you potentially may end up with an error. And honestly I am afraid I cannot come up with a very quick solution to accomplish it using RapidMiner, at least at the moment.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@komal_chenthama,

    I like this kind of problem !!

    The trick, here, is to replace the "no-spaces" by "-" (or in other words, add "-" between the letters) and

    then use theSplitoperator with the "-" pattern :





    <宏/ >























    < from_op = " Replac连接e" from_port="example set output" to_op="Split" to_port="example set input"/>







    Does this process answer to your need ?

    Regards,

    Lionel

登录orRegisterto comment.