"Text mining"

Andreas_M_Andreas_M_ MemberPosts:1Contributor I
edited June 2019 inHelp
Hi,

I 'm new to Rapidminer and I don't quite cope with it yet.
What I want to do: I have about 300 pdf documents and one wordlist with about 100 different words. I want to find out the total occurrency of these words for each pdf document. And I would like to know the total number of words each pdf ducument contains.

Can somebody help me with modelling the process?

Thanks in anvance.

Answers

  • Freddie2310Freddie2310 MemberPosts:2Contributor I
    Hello,

    Did you finally find the process ? Would you please share it ?
    I have the same concerns with many pdf documents.

    Thank you
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist
    Hi,
    the operator to read pdf files is read Document. You can combine that with Loop Files to read several files.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • M700760M700760 MemberPosts:1Learner I
Sign InorRegisterto comment.