Read full article RSS feeds with RapidMiner and a free API

SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn
edited April 2020 inKnowledge Base

Hi RapidMiners!

I wanted to share a process that I use to get full articles out of RSS feeds. It uses Python's Beautiful Soup and a web API called Mercury Postlight.





<宏/ >




https://www.presseportal.de/rss/polizei/laender/9.rss2"/>



+ address for dummy in range(10): try: response = requests.get(url, headers = headers) break except: continue html = json.loads(response.content) html = html['content'] soup = BeautifulSoup(html, "lxml") text = soup.get_text() text = text.replace('\n', ' ') results.append(text) data['main_text'] = results return data"/>









Considering that there are comercial products that do the same, I think it is a valuable resource! The limit of API calls is however limited, so take it into account. It's speed is also much lower than using web scraping alternatives.I hope you enjoy it!

BalazsBarany Pavithra_Rao rfuentealba

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    this is GREAT,@SGolbert! Can I put this on the community repo (with full credit to you of course)?

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    Yes, sure!

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    DONE! You can find the processhere.

    Scott

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn
    小更新过程:代码for the mercury API has been open sourced!

    You can find it under

    and use it in your own server, possibly making it a lot faster.

    Regards,
    Sebastian

    sgenzer
Sign InorRegisterto comment.