"WEB crawler rules"

keops9876keops9876 MemberPosts:5Contributor II
edited June 2019 inHelp
Hi!

I'm new to RapidMiner and I must say I like it. I have in-depth knowledge in MS SQL but I'm completely fresh in RapidMiner.
So I've started to use Web Crawler Processor.

I use the following query to process Slovenian real estate webpage and I have troubles setting Web crawler rules.

I know that there are 2 rules important: what to follow and what to store.

I would like to store "http://www.realestate-slovenia.info/nepremicnine.html"+id=something
for example this is the URL i want to storehttp://www.realestate-slovenia.info/nepremicnine.html?id=5725280

What about URL rule to follow? It doesn't seem to work. I tried something like that: .+pg.+|.+id.+

Any help would be apreciated!

U.
Tagged:

Answers

Sign InorRegisterto comment.