Letter count in sequence

BDkBDk MemberPosts:3Newbie
Hi, I'm quite new with the software. I would like to count the number of letter in a random sentence (e.g.:GGGAATCGTCA), e.g. how many 'A' occurred in it and put it into a new column. Is there some operator that could be used for it? Thank you in advance!

Answers

  • MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, MemberPosts:270Unicorn
    Hi@BDk

    You can use a Process Documents and split the tokens and specify count occurrences

    <?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程版本sion="9.10.011">                                                     


    BalazsBarany
  • BDkBDk MemberPosts:3Newbie
    Do I need some extension for this 'process documents' operator? I've an education version of the software and I could not find this operator.
  • BDkBDk MemberPosts:3Newbie
    OK, found the extension, sorry. It works for 1 row fine, thanks Marco. Could it be multiplied?
    I've a table that has 1000+ rows and all contains a letter sequence like the one that posted above. I would like to count the letters in each one by one. But with the posted solution it only works for 1 row or if I enter all the 1000+ via 'create document' it only counts the letters together in all rows...
  • MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, MemberPosts:270Unicorn
    @BDk

    You'll need to use a Process Documents or Process Documents from Data or Files it depends on how your data was originally collected.

Sign InorRegisterto comment.