[SOLVED] Crawl Web and generate reporting
pemguinkpl
MemberPosts:14Contributor II
Hi,
i have try the crawl web process, but the result showed no have any document i have crawled. May i know what is the problem?
I follow exactly the step from the video below, but encounter the problem.
http://www.youtube.com/watch?v=zMyrw0HsREg
Any help please... :-\
How to use the generate report n report operation in rapid miner?
Anyone know???
Thank You!
i have try the crawl web process, but the result showed no have any document i have crawled. May i know what is the problem?
I follow exactly the step from the video below, but encounter the problem.
http://www.youtube.com/watch?v=zMyrw0HsREg
Any help please... :-\
How to use the generate report n report operation in rapid miner?
Anyone know???
Thank You!
Tagged:
0
Answers
I didn't watch the video and don't have the time to. Could you please post your process and describe more specifically what you are trying to do?
Best regards,
Marius
my initially research is to analyze H1N1 news and using crawler to get all the news about h1n1. This is the link i try to crawl
http://my-h1n1.blogspot.com/search/label/news?updated-max=2009-07-26T02:03:00%2B08:00&;max-results=20
但是我不能得到任何文档。
This is my process xml:
May i know what is the problem? Thanks
问题是,你想cra的页面wl does not allow to be crawled, and of course RapidMiner obeys this exclusion by default. The crawl operator has to options to ignore the so called robot exclusion, but as it says in the documentation, you are usually not allowed to disable it for pages which are not your own. These are the parameters:
obey robot exclusion: Specifies whether the crawler obeys the rules, which pages on site might be visited by a robot. Disable only if you know what you are doing and if you a sure not to violate any existing laws by doing so. Range: boolean; default: true
really ignore exclusion: Do you really want to ignore the robot exclusion? This might be illegal. Range: boolean; default: false
Best,
Marius
thank you for the replies, it's solved my problem