"Search for a phrase in multiples PDFs via a list of URLs"

carlcarl MemberPosts:30Guru
edited June 2019 inHelp

Is it possible to input a list of URLs (which contain PDFs), then search for a phrase in all the PDFs, and return a table with the URL path and the searched-for text? I'd like to do this without downloading all the PDFs.

This is what I have so far, which runs, but doesn't create an attribute.

1 - Read Excel (with URLs)

2 - Loop Examples

2a - Extract Macro

2b - Open File

2c - Read Document

2d - Extract Information

3 - Select Attributes































<参数键= " url " value = " % {GetURL} " / >



















<连接from_op = from_port =“中提取信息document" to_port="output 1"/>


















Tagged:

Best Answer

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    Solution Accepted

    Here you go. You needed a Documents to Data operator to change your PDF text into an ExampleSet.































    <参数键= " url " value = " % {GetURL} " / >






















    <连接from_op = from_port =“中提取信息document" to_op="Documents to Data" to_port="documents 1"/>






















    stevefarr

Answers

Sign InorRegisterto comment.