"How to filter lines with regexp with RapidMiner?"

pocakkapocakka MemberPosts:1Contributor I
edited June 2019 inHelp
Hello!
I have ten millions txt files in a folder (100KB/file), and I would filter special lines from this files.
In UltraEdit I use this regexp:
<强类=“名字”。* id -。*
My problem is the large number of files, because the Ultraedit goes wrong...

How can I filter it? RapidMiner could do it?
My process is this:
1. Filter line by this regexp from the ten millions txt:
<强类=“名字”。* id -。*
2. The filtered line must be in a new txt file...

Can you help solve my problem?
Thanks,
Attila


Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi,

    you can use the text processing extension to filter the files. Please have a look at the attached process: inside the process documents operator, the Tokenize operator cuts the document into separate lines, and the next operator, Filter Tokens, selects only the lines containing the word "hallo".

    Best, Marius





    <宏/ >





























    < portSpacing端口= " source_file对象“间隔= " 0 "/>














Sign InorRegisterto comment.