AMOUNT OF EXAMPLES DOES NOT CORRELATES WITH INPUT DATA LOADED FROM PDFs

antonio_heredia · July 2018

on="1.0" encoding="UTF-8"?>



<宏/ >















<连接from_port="document" to_op="Tokenize" to_port="document"/>
<连接from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<连接from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>



Type your comment
Type your comment







<连接from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>

I tried to tokenize pdf articles, resulting in only 21 examples. Why does it happen? It should outcome many more. To do so, I used: "Process data from files" and inside I included "Tokenize" and "filter stopwords", Which again works but not throughout all the documents. What should I do to fix it?

Cheers,

Antonio

lionelderkrikor · July 2018

Hi@antonio_heredia,

Do you have a lot of files ?

Can your share these files in order we can reproduce what you observe ?

Regards,

Lionel

NB : The first line of your XML process is broken, however I was able to repair it.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

AMOUNT OF EXAMPLES DOES NOT CORRELATES WITH INPUT DATA LOADED FROM PDFs

Answers