text mining filter stopwords (dictionary)

kulturevulturekulturevulture MemberPosts:4Contributor I
edited June 2019 inHelp
Hello. I have a process to read an Excel file, process documents from data, and create correlation matrix. The process runs fine except when I want to add a filter stopwords (dictionary) after the filter stopwords (English) and before Filter Tokens (by Length). It seems to ignore my list of stopwords (dictionary). I've tried using a csv and txt file for stopwords (dictionary) but the stopwords still show up in the output. I've set preference rapidminer.general.encoding to UTF-8. All the files should be in English. Any ideas?





<宏/ >

























<运营商激活= " true " class = "文本:标记“compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="40" y="144"/>













<连接from_op = "Transform Cases" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
<连接from_op = "Stem (Snowball)" from_port="document" to_op="Tokenize" to_port="document"/>
<连接from_op = "Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<连接from_op = "Filter Stopwords (English)" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
<连接from_op = "Filter Stopwords (Dictionary)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<连接from_op = "Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
<连接from_op = "Generate n-Grams (Terms)" from_port="document" to_port="document 1"/>






<连接from_op = "Read Excel" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
<连接from_op = "Process Documents from Data" from_port="example set" to_op="Correlation Matrix" to_port="example set"/>
<连接from_op = "Correlation Matrix" from_port="example set" to_port="result 1"/>
<连接from_op = "Correlation Matrix" from_port="matrix" to_port="result 2"/>
<连接from_op = "Correlation Matrix" from_port="weights" to_port="result 3"/>








Answers

  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    Hi,

    your process seems to be okay. How does your Dictionary file look like?

    Best,
    Nils
  • kulturevulturekulturevulture MemberPosts:4Contributor I
    Hi Nils,
    My dictionary has about 52 words, one per line, all in one column. Do I need a column header or any other formatting? Thanks for responding.
  • kulturevulturekulturevulture MemberPosts:4Contributor I
    Hi Nils,
    I retyped all the words into a new txt file and it now runs fine. Thanks for your efforts.
  • mehakmehak MemberPosts:6Contributor I

    Hello..... i am having the same problem as it depicted. i follow all the steps which are written in the comment and also having a dictionary of words but it getting all the words and not picking the words specified in my dictionary which i want actually.

    Regards.

    code.rmp 7.9K
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Did you create a new text file, like the previous poster?

  • mehakmehak MemberPosts:6Contributor I

    Thanks T-Bone.Yes i created text file for vocabulary.

    Regards,

  • mehakmehak MemberPosts:6Contributor I

    My input is a folder having text format files.

    regards,

  • mehakmehak MemberPosts:6Contributor I

    Hello.. Can i have a video tutorial of text mining using filter stop words Dictionary...please let me know it’s really urgent.

    Regards,

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    The process you provided does not provide an operator to load in the TXT file to the Filter Stop Words (Dictionary) operator. You will need an Open File operator and connect it to the Filter Stop Words (Dictionary) "fil" input port.

Sign InorRegisterto comment.