Neil Turner's Blog

Blogging about technology and randomness since 2002

Making the most of SpamLookup

Note: This guide has now been superseded by Making the most of SpamLookup 2.1, which is the version included in Movable Type 3.3. The new guide also covers SpamLookup 2.0 which ships with MT 3.2 and so you should read that instead. This page is archived here for convinience.

Since upgrading to Movable Type 3.2 I’ve dumped Jay Allen’s MT-Blacklist and instead made SpamLookup handle comment/trackback spam on its own. The plugin is included by default on MT 3.2, and while it can do a good job as it is, you might like to try some tune-ups to make it more effective.

Moderation and Junking

In Movable Type 2.x, comments just had one status – published. Any spam blocking system could only accept or deny comments and trackbacks. In MT 3.0x and 3.1x, comments gained an additional status – ‘moderated’. This was where comments could be held for human approval before being published, and tools like SpamLookup and MT-Blacklist could hold comments here if they thought they might be spam but couldn’t be sure.
With 3.2x, trackbacks can also be moderated, but a new third status has been added for both: junk. Now, rather than deleting spam outright, you’ll find plugins send it sent here instead. That way, if you have a false positive – a comment that is seen as being spam but isn’t – you can retrieve it.
The junk status also has a rating system, and plugins can adjust the rating for an individual comment or trackback. The rating is between 10 and -10 – comments with a negative score are junked, otherwise they are moderated or published. You’ll find that SpamLookup can reduce the rating of comments that it thinks are spam, but also add points if, say the comment has no links or has been posted with a URL that has already been accepted before.

1. How to find the SpamLookup configuration options

SpamLookup can be configured at two levels – blog and installation. If you just have the one blog, or want any settings to apply across all the blogs on your installation of Movable Type, configure SpamLookup at the installation level, using the Plugins item on the MT main menu or System Overview screen – it should be towards the bottom. If you only want settings to apply to one blog, you can configure SpamLookup using the Plugins tab of the Settings item on the weblog menu.

2. Lookups Settings

When the plugin was first launched as MT-DBSL, lookups was all the plugin did. Now, it’s just one weapon in its formidable anti-spam arsenal. There are three options here:

IP address lookups
These look up the source IP address of the comment or trackback and compare it with several centralised blacklist servers (you can add extra servers if you wish). If the IP address is found on the blacklist server, you have the option of forcing moderation of the comment, adjusting its junk status (the default action is to subtract 1 from its score) or do nothing. This can be quite effective but only if you trust the blacklisting systems.
Domain name lookups
This works in the same way, but looks up the domain names of the posted links. This is similar to how MT-Blacklist worked, except the blacklist is hosted elsewhere and not on your MT installation. Again, this is effective but only if you trust the blacklisting systems.
Advanced Trackback Lookups
This is quite badly explained, which is unfortunate as it can be very effective. This compares the IP of the source URL of the the trackback with the IP it was sent from. Normally the blog software sending the ping is on the same server as the blog itself, so they should match. A lot of spam is sent from zombie machines, not from the web site itself, so this will catch this sort of spam. As I said, much of the spam I get is caught by this rule, but some spammers have become wise and started sending trackback pings from the same IP address as the web site they are trying to promote so as to get around it. Also, I often get pings from a reader who blogs with Blogger and sends his pings from a third-party service not hosted on his site and these do get junked sometimes because of this option.

3. Link Settings

This looks at the source URL (trackbacks) or comment author URL (comments), along with any URLs posted in the comment itself.

  • The first option adds to the junk score if a comment has no links, since the general aim of blog spam is to link to a dodgy site to improve its ranking in search engines – a comment with no links is unlikely to be spam.
  • The second option will forcibly moderate any comment or trackback that has more than a certain number of links. Spam comments tend to have lots of links, but some commenters may post legitimates lists of links to other handy resources so don’t set this too low.
  • The third option is like the second but will subtract from the junk score. Set this higher than the previous option if you have it enabled – mine is at 4.
  • The ‘Link memory’ option adds to the score if you have previously approved a comment containing the same URLs. This means that regular commenters are less likely to fall foul of any other rules. Keep this enabled.
  • The ‘Email memory’ does the same, except with email addresses. This may be undermined if you approve a lot of comments with false email addresses, though, and obviously doesn’t really work with trackbacks.

4. Keyword Filter Settings

These options act upon keywords in comments, and again replicate some functionality of MT-Blacklist. The first box contains words that should force a comment to be moderated, a new word on each line. In here you would want to put in words that may be indicative of spam but could also be used often in a legitimate sense. Mine includes words like ‘video’, ‘sexy’, ‘bankruptcy’ and a variety of swear words.
The second box contains words that would force a comment to be junked. By default, the junk status gets subtracted by 1 every time one of these words is found, so a spam comment that mentioned 3 different drug names all blocked by this list would get 3 taken off its junk score. However, if there are words that you think will never be used in a legitimate comment, you can put a number after it and any comment containing that word will have its junk status subtracted by that number. So putting in “viagra 4” would subtract 4 from the junk score of any comment containing the word ‘viagra’.
In both cases, you can also use Perl regular expressions – one I use is “direct(v|tv)” which matches “directv” and “directtv”.
This is an incredibly powerful feature, however by default the plugin hardly has any keywords in it. A good starting place is, ironically enough, The WordPress Wiki which has a list that you can paste in. And it doesn’t necessarily have to be words – I get a lot of spam from a site called xxlfind.biz, so I added “xxlfind.biz 4” to my list to block out the spam from it.
SpamLookup does miss out a couple of handy features, namely blocking of duplicate comments and trackbacks, and better throttling, like in Real Comment Throttle, but after doing the above on my site, I get around 98% accuracy, with only a couple of trackbacks getting marked as junk and no false negatives (spam that gets through all the filters). Hopefully you will too 🙂 .

7 Comments

  1. Making the most of SpamLookup

    Many people have been complaining about SpamLookup’s apparent lack of doing anything. This great writeup should help you, as the title says, get the most out of SpamLookup!

  2. Thanks Neil. That made it a lot easier for me to figure this out!

  3. Most helpful – thank you! One follow-up question: Do you use the WordPress list as-is or did you wind up adding a number (for junk status) after each? Or is that more of a “adjust to taste” sort of thing …

  4. Wow, this info should be built in to the plug in. I’ve been still using MT-BLacklist but I need to look at this more.
    One question: Is the junk score and the feedback score (rating) the same thing? It seems as if they are but when “junk” is used it’s a negative number and vice-versa if it’s feedback (or a “credit”).
    When it says “Junk when more than X link(s) are given (and a score to adjust)”, is it just subtracting from the score (or is it automatically moving it to the junk folder AND subtracting)?
    Hmm… Maybe I’ve answered the one part that’s been puzzling me. Does the junk score ever get high (I mean low) enough to where it just deletes the message (or does it just move it to the junk folder)? I’m now thinking the answer is “no” the junk score only gets it moved into the junk folder (so even if the score is -200, it’s just going to get held in the junk folder). Previous to this I’ve been confused as to where the threshold between the two (junk and delete) are, but now I’m thinking the one setting I’ve seen is it.
    But if so, why is it moving to the junk folder and still subtracting…

  5. Neil, congratulations, this is a very helpful guide for all MT users… Thank you so much!
    Just let me make you a question, probably simple. I noticed a lot of spam comments coming with the “h1” style tag… is there anyplace I can block comments with this tag? SpamLookup, with your tweaks, is getting them all, anyway; just thought this could be a somehow easy setting to increase defense, yet “real people” don’t use this kind of tag when commenting.
    Thank you once again (and sorry ’bout the bad english 😉

  6. Gary: Junk score and feedback score are, I think, different, but related. I’m not entirely sure though.
    As for it getting low enough to be deleted, to be honest I’m unsure. I’d imagine it does but I’ve never tried.
    Tiagon: You could try adding <h1> as a keyword. Again, I haven’t tried it.

  7. I generally moderate or junk if they have any part of ”