"Using Regex in the web crawler"

guitarslinger · April 2010

Hi there,

I am struggling with the setup of the crawlers in the web mining extension:

I can't figure out how to set the crawling rules so that the crawler produces any results.
Leaving the rules empty does not work either.

Can I find an example for crawling rules somewhere?

Thx in advance

GS

B_Miner · April 2010

Post what you are trying to do (XML) and description. Maybe someone can help. I used it successfully, but again are not sure your aim

guitarslinger · April 2010

Hi B_Miner, good point:

Here ist the XML, just having the crawler connected to the main process and having two rules:
1. follow every link ".*"
2. store every page ".*"











<宏/ >




http://www.aol.com"/>

guitarslinger · April 2010

Problem solved: I had no value in parameter "max. pages".

I thought this parameter is optional, leaving it blank will just not limit the number of pages, but actually without any value it does not crawl at all.

Works now, I am happy!

问候GS
;D

land · April 2010

好吧,
it should be optional. ****. I will make sure, it's optional in future

Good thing you got it to work, though.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Using Regex in the web crawler"

Answers