AMOUNT OF EXAMPLES DOES NOT CORRELATES WITH INPUT DATA LOADED FROM PDFs

antonio_herediaantonio_heredia MemberPosts:1Learner I
edited April 2020 inHelp
on="1.0" encoding="UTF-8"?>



<宏/ >















<连接from_port="document" to_op="Tokenize" to_port="document"/>
<连接from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<连接from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>



Type your comment
Type your comment







<连接from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>






I tried to tokenize pdf articles, resulting in only 21 examples. Why does it happen? It should outcome many more. To do so, I used: "Process data from files" and inside I included "Tokenize" and "filter stopwords", Which again works but not throughout all the documents. What should I do to fix it?

Cheers,

Antonio

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@antonio_heredia,

    Do you have a lot of files ?

    Can your share these files in order we can reproduce what you observe ?

    Regards,

    Lionel

    NB : The first line of your XML process is broken, however I was able to repair it.

    sgenzer
Sign InorRegisterto comment.