Read full article RSS feeds with RapidMiner and a free API
SGolbert
RapidMiner Certified Analyst, MemberPosts:344Unicorn
Hi RapidMiners!
I wanted to share a process that I use to get full articles out of RSS feeds. It uses Python's Beautiful Soup and a web API called Mercury Postlight.
<宏/ >https://www.presseportal.de/rss/polizei/laender/9.rss2"/> https://mercury.postlight.com/parser?url='+ address for dummy in range(10): try: response = requests.get(url, headers = headers) break except: continue html = json.loads(response.content) html = html['content'] soup = BeautifulSoup(html, "lxml") text = soup.get_text() text = text.replace('\n', ' ') results.append(text) data['main_text'] = results return data"/>
Considering that there are comercial products that do the same, I think it is a valuable resource! The limit of API calls is however limited, so take it into account. It's speed is also much lower than using web scraping alternatives.I hope you enjoy it!
Tagged:
3
Comments
this is GREAT,@SGolbert! Can I put this on the community repo (with full credit to you of course)?
Yes, sure!
DONE! You can find the processhere.
Scott