"Any Text Processing 5 extension examples?"

thomas0221thomas0221 MemberPosts:4Contributor I
edited June 2019 inHelp
Dear RapidMiner Experts,

I am able to get RapidMiner 4.6 and Text Plugin 4.6 work with the help from "rapidminer-text-4.6-tutorial.pdf" and "rapidminer-text-4.6-examples.zip", and other online resources including the discussions in this user forum. However, when I try basic text mining tasks (such as the ones based on the idea in "rapidminer-text-4.6-examples.zip") in RapidMiner 5 with Text Processing 5 extension, I have no luck. It seems that some members in this forum have figured out how to use Text Processing 5 extension in RapidMiner 5 for some basic tasks that we can accomplish in V4.6. So I wonder whether some of experts could help to share some of your working examples of text mining process XML file with RapidMiner 5. I understand that RapidMiner 5 product team has limited resources and time. Thus they do not get a chance to provide completed tutorial and examples for Text Processing 5 extension in RapidMiner 5 (for the same reason V4.6 has Web Crawler, but V5 does not yet). I wish some community members could help out by sharing your sample XML files of text mining process. I would greatly appreciate the help. The documentation, tutorial, and examples are the single defining factor to get the software work or not.

顺便说一下,我一直在使用RapidMiner只有阿布t 10 days and I am impressed with the rich features. With RapidMiner 5 I like the new flow design (compared to V4.6's tree process), meta-data availability on design page, and quick fix suggestions. However, I find that the process designed in RapidMiner 4.6 cannot be imported to RapidMiner 5. Also RapidMiner 5's process XML file cannot be opened in V4.6. I understand the significant changes from V4.6 to V5, many operators get name changed and reorganized to be more logical. I guess one way to get around for getting V4.6's process XML work in new V5 is to just redesign the process from scratch in V5.

Thanks,
Thomas

Answers

  • jennylynnohjennylynnoh MemberPosts:1Contributor I
    I'm in the same boat. I've been wrangling version 5 for awhile now, and the farthest I've gotten is being able to set up the processes. However, when I review the findings, it says that every row has 0 tokens. I must be missing something, but I have no idea what. If I do manage to figure it out, I'll be sure to post a tutorial with screenshots on my blog.

    -Jen
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi all,
    in general RapidMiner 4.x process files are very well importable to RapidMiner 5.0. We made a huge effort in writing an import mechanism although the process structure has been changed completely and several operators had been redesigned to make their parameter settings more user friendly and understandable. Even for the old plugins we wrote Import rules and so we would have done with the Text Plugin. Unfortunately we found it much too limited in the old version, hard to maintain and it didn't fit into the RapidMiner construction with IO Objects very well, because it rather tended to writing everything into temporary files. So we decided to redesign it from scratch, keep the best ideas (and there were many) and combine it with an up to date way of handling data objects. The result is a more flexible, more powerful and a much faster (!) Extension, that unfortunately changed so much, that old processes couldn't be adapted automatically. So only for processes containing operators of the former Text Plugin, you need to redesign your processes.

    Here I will give you a basic example of how to work with the Text Processing Extension. The below process will load data, that contains two attributes of type text. They are chosen for Vector Creation by the specify weights parameter of the Process Documents operator.

    运营商内部的过程文档,首先l all letters are changed to lower case, then the texts are splitted into the single tokens and finally stemmed. Each token of the document delivered finally to the Process Documents operator will become part of the word list and hence a single attribute in the resulting word vector forming the example set.
    During this transformation, Meta Data might be attached to the documents. If you make a breakpoint inside the Process Documents operator, you will see all meta data at the right of the text. This meta data is added as additional attribute to the resulting ExampleSet if the add_meta_information parameter of the Process Documents operator is checked.

    这是过程:













































    <运营商激活= " true " class = "文本:transform_cases" expanded="true" height="60" name="Transform Cases" width="90" x="45" y="30"/>








































































    And here's what the data looks like:
    label score numeric
    regular reasons_negative text
    regular reasons_positive text
    regular customer_age polynominal
    regular customer_type polynominal
    regular customer_group polynominal

    and here's a small snippet from the data:
    1 7.3 Hoher Preis für Internetnutzung. Schnelles Hotel - schnell in der City. 41-50 Jahre geschäftlich allein reisend
    2 8.7 Bei dem Preis für´s Frühstück fehlt uns ein wenig der Fisch aber es geht auch mal ohne. Auch nach unserem 3. Besuch in diesem Hotel. Alles in Ordnung, besonders das Personal, immer freundlich, immer hilfsbereit, kurz gesagt immer gut drauf. 51-60 Jahre geschäftlich als Paar reisend


    I hope this will help you, to get your processes run again. After this, you will reveal the new possibilities bit by bit. Anyway we will add a basic tutorial as soon as possible.

    Greetings,
    Sebastian
  • thomas0221thomas0221 MemberPosts:4Contributor I
    Hi Sebastian,

    非常感谢你为你的文本处理的例子XML code. Based on your example, I finally figure it out using Text Processing extension. What struck me (and maybe for other newbie) is that in RapidMiner 5 design workspace, it has parent and child sub-process. I need to navigate from parent process (such as Process Documents from Data or cross validation) to its child sub-process by double clicking the parent process. then in the child sub-process page, I can add Tokenize, stopword filter, stem ... I should not add these sub-process in the parent level process. Maybe this is the reason that I did not get RapidMiner 5 Text Processing Extension work in the first place, as put Process Documents from Data, Tokenize, stopword filter, stem ... at the same level and try to connect them. anyway, it is only my partial understanding and I might be wrong. While in RapidMiner 4.6 Text Plugin, in the tree design mode, everything appears in the same page. Moving to RapidMiner 5, I should understand the parent-child sub process relationship. Just in case other users want to see a simple example, I attach my text mining process XML file bellow. You could change the text directories to your local ones, while I use the example data coming with wvtool-1.1.

    Thomas













































    <运营商激活= " true " class = "文本:transform_cases" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30">















    A cross-validation evaluating a decision tree model.




















































  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Thomas,
    if you are used to the tree, you might add the Tree as a View in RapidMiner 5, too. It will give you an overview what your process is about. There has been only slight changes, because now subprocesses are modeled explicitly instead of the implicit design in RapidMiner 4.x

    Greetings,
    Sebastian
  • thomas0221thomas0221 MemberPosts:4Contributor I
    Hi Sebastian,

    Thank you for your help. I do find "Tree View" in RapidMiner V5, under "View" --"Show View". So I can use the Tree view in RapidMiner V5.

    In RapidMiner V5, I see a new feature of searching operators by name. I can type in part of the name of an operator that I vaguely remember, then the software will find some relevant ones for me. However, in Rapidminer 4.6 I do not see such operator search filter. Is there any way to search operators in RapidMiner V4.6?

    Moreover, in RapidMiner V4.6, it has BOX View that I can export to a JPEG file of the process design. In RapidMiner V5, I cannot find such BOX View. So does RapidMiner V5 only support Flow View and Tree View? No Box View anymore?

    Thanks!

    Thomas
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    the search box in RapidMiner 4.6 is below the operator tree. But it is there. Otherwise you could use the new operator dialog, where you can filter and search after various properties.

    The box view is gone now, because the data flow is now modeled explicitly and not implicitly, so that the process isn't well defined with only the execution order of the operators.

    Greetings,
    Sebastian
  • JepseJepse MemberPosts:11Contributor II
    @Sebastian:
    Can you provide a 100 rows (or more) snip of the file "D01 - ProcessedHotelCustomerSatisfaction_de"? I couldn't find it in the sample repository.
    Do you plan to provide samples for the new text processing extension?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    we already planned to deliver it with the first version...I will see what we can do.

    Greetings,
    Sebastian
  • JepseJepse MemberPosts:11Contributor II
    哦,不错!不能等待它:-)
Sign InorRegisterto comment.