"Web mining Rapidminer robot_filter"

antoineantoine MemberPosts:7Contributor II
edited June 2019 inHelp
Hello all,


I don't if it is the right place to post my request. I need to know how you ( a Rapid Miner user who uses it as a web miningusage tool)- when you're importing your web log file- do to set your robot_filter file.

它工作当我在robot_filter类型文件[g|G]oogle for example. However I don't really want to do so for a thousand different bots...

So I've tried to find a list which I can paste in my file. On this websitehttp://www.robotstxt.org/db/all.txtthey offer the possibility to download the robots list in a .txt format .
But apparently RapidMiner doesn't like it, i got many errors due to bad characters and wrong enclosure...

So what do I have to do in order to have a proper robots list which can be read by rapidminer ?


Thank you in advance,


Antoine

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Antoine,
    what does RapidMiner complain about in detail? Unfortunately I'm not too familiar with the web mining operators, but I assume the file must consists of regular expressions? Then you would need to escape special characters of regular expressions, you will find some advice on this on google.

    Greetings,
    Sebastian
Sign InorRegisterto comment.