I checked my stats today to find 306 hits from EmailSiphon. This is an email address harvesting robot (‘spambot’) which takes email addresses from web sites and uses them to fill up spam list databases. Now while all email addresses on this site should be filtered in such a way that most robots will ignore, I’m still annoyed that these bots get through. So I decided to block them. >:-)
The two resources I used, from a Google search for ‘EmailSiphon‘, were a mod_rewrite tutorial and ‘a close to perfect .htaccess file‘. I would have only used the second one but then even I wasn’t able to get to any files, so I ditched it and took its list of user agents, but used the syntax on the first site.
Although when I tried to fetch pages using SamSpade with those user agents they still appeared, I think this is more to do with the University of Bradford’s caching server.
I actually feel like writing a ‘terms of usage’ for this site forbidding the usage of such robots on the site – and then seeing about legal action against any company that does use them. It’s an idea, anyway, and I’ve invoked the LazyWeb in the hope that someone can shed some light. Help, anyone?