Is it possible to crawl the links on the "IBM Watson News Explorer"?
jonas_boersch
MemberPosts:1Learner I
Hello Community,
I can't manage to crawl the links to the news articles on IBM Watson's News Explorer. The operator "crawl web" just stops after crawling the header of the web page, the links to the articles are in the "details" window on the left side of the web page.
Can someone help me find a solution, I would be very thankful. The link to the webpage is:http://news-explorer.mybluemix.net/?query=ipcc&type=unconstrained
Kind regards,
Jonas
I can't manage to crawl the links to the news articles on IBM Watson's News Explorer. The operator "crawl web" just stops after crawling the header of the web page, the links to the articles are in the "details" window on the left side of the web page.
Can someone help me find a solution, I would be very thankful. The link to the webpage is:http://news-explorer.mybluemix.net/?query=ipcc&type=unconstrained
Kind regards,
Jonas
Tagged:
0
Best Answer
-
rfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568UnicornGood Sir@jonas_boersch, I deeply apologise to inform you that your requirement is currently not feasible to achieve with the current RapidMiner tooling, because the operators developed for "Get Page" and "Crawl Web" were developed before the proliferation of JavaScript-built, API-driven websites with Vue.js, Angular.js, Ember.js or React.js. The sun has not set and to my knowledge there are two other choices:
- Explore the code and find the original data sources. Seems feasible to find the REST servers on the IBM Watson's code, after a quick inspection I have made for you
- 我们e the Selenium Web Browser, a headless Web browser that obtains the entire code and then gets the page. I would call this the hard way, because it is not easy to set up but worth the time if you retrieve pages frequently.
Have a good day,
Rodrigo.1
Answers
Scott