SourceForge.net Logo

Spambot - prevent unwanted site downloads

Mod_Spambot is an Apache plugin which monitors the data being downloaded from a server. When the number of requests for a client exceeds a preset level no more downloads are allowed for a preset time. When this happens the client received a tailored message informing them of what has happend. Many of the features can be tailored to the needs of the webmaster to help to prevent false positives and to customise the definition of a client to be blacklisted.

Mod_spambot comes from code written for an old website I ran. My ISP blacklisted the site for being too "popular". Doing research that the ISP couldn't be bothered to do I discovered that unwanted crawlers were bombarding the sites with requests. The ISP blamed me and threatened to take the site down, and frankly it was too much time, that I didn't have, to find a new ISP. Since these spiders are not real users, or Google, I had to put in code to stop them, mod_spambot was borne out of that need.

As a useful side-effect, mod_spambot will help the system when it suffers a DoS attack.

If a client, that has not been whitelisted, downloads more than 100 pages in an hour they are blacklisted unless there has been more than 10 minutes between the downloads. All the figures are configurable - those values are the defaults.

The algorithm is simple, but surprisingly effective. No doubt it will improve with time and feedback. When an IP is blacklisted, it receives either a bespoke page, the default 403 ErrorDocument or a honeypot. Users are tracked either by IP or by their mod_usertrack cookie. A user is then blacklisted until 10 minutes elapses between requests.

Users which blast many requests in a short time can be throttled back before being blacklisted.

HEAD requests are ignored.

Users are strongly encouraged to join the mailing list.