0.0
No commit activity in last 3 years
No release in over 3 years
Detect robots that are ignoring your robots.txt file and give them the middle finger
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0
 Project Readme

Rack::EvilRobot¶ ↑

Robots are good, they crawl your site and bring people to it. Evil robots are bad, they slam your site, follow links they shouldn’t and generally are a pain. Using Rack::EvilRobot you can tell them what’s up, and keep them away for good.

INSTALL¶ ↑

sudo gem install rack-evil_robot

SETUP¶ ↑

require 'rack/evil_robot'

use Rack::EvilRobot
-- or --
use Rack::EvilRobot, :redirect_path => "http://www.whatever-you-want.com"

Once you have a list (you’ll see how to build one in the next section) you will want to tell the app which ones to allow, so put this code somewhere in your load path.

module Rack
  class EvilRobot
    def evil_robots
      # add your evil-bots here... these are just for examples sake
      /badBot|reallyBadBot|Robokiyu/i
    end
  end
end

SET THE TRAP¶ ↑

Update your robots.txt to contain these lines:

User-agent: *
Disallow: /honey_pot/index.html

Include this link tag on your page(s)

<a href="/honey_pot/index.html"><img src="images/pixel.gif" border="0" alt=" " width="1" height="1"></a>

After you have this set up for a few days check your access logs and look at the user agents that have been accessing /honey_pot/index.html. Hopefully there is nothing there, but if there is, and they are misbehaving, add them to the regex defined in the ‘evil_robots’ method. This will prevent this user agent from accessing any pages on your site anymore.

Win¶ ↑

Now you don’t have to worry about getting crawled by bots you don’t care to be hit by (mp3 bots, torrent bots, bots that are slamming your servers, etc…)