AMOUNT OF EXAMPLES DOES NOT CORRELATES WITH INPUT DATA LOADED FROM PDFs
antonio_heredia
MemberPosts:1Learner I
on="1.0" encoding="UTF-8"?>
<宏/ >
<连接from_port="document" to_op="Tokenize" to_port="document"/>
<连接from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<连接from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>Type your comment Type your comment
<连接from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
I tried to tokenize pdf articles, resulting in only 21 examples. Why does it happen? It should outcome many more. To do so, I used: "Process data from files" and inside I included "Tokenize" and "filter stopwords", Which again works but not throughout all the documents. What should I do to fix it?
Cheers,
Antonio
0
Answers
Hi@antonio_heredia,
Do you have a lot of files ?
Can your share these files in order we can reproduce what you observe ?
Regards,
Lionel
NB : The first line of your XML process is broken, however I was able to repair it.