Neil Turner's Blog

Blogging about technology and randomness since 2002

Making the most of SpamLookup 2.1

With the release of Movable Type 3.3 approaching, I felt it was time I updated my Making the most of SpamLookup guide to feature the enhancements made with the new release. Most of the information here still applies to version 2.0 of SpamLookup, included in MT 3.2, but I’ve expanded some bits so it’s probably worth a read again even if you’re not contemplating a move to MT 3.3 just yet.

Moderation and Junking

In Movable Type 2.x, comments just had one status – published. Any spam blocking system could only accept or deny comments and trackbacks. In MT 3.0x and 3.1x, comments gained an additional status – ‘moderated’. This was where comments could be held for human approval before being published, and tools like SpamLookup and MT-Blacklist could hold comments here if they thought they might be spam but couldn’t be sure.
With 3.2x and 3.3x, trackbacks can also be moderated, but a new third status has been added for both: junk. Now, rather than deleting spam outright, you’ll find plugins send it sent here instead. That way, if you have a false positive – a comment that is seen as being spam but isn’t – you can retrieve it.
The junk status also has a rating system, and plugins can adjust the rating for an individual comment or trackback. The rating is between 10 and -10 – comments with a negative score are junked, otherwise they are moderated or published. You’ll find that SpamLookup can reduce the rating of comments that it thinks are spam, but also add points if, say the comment has no links or has been posted with a URL that has already been accepted before.

1. How to find the SpamLookup configuration options

SpamLookup can be configured at two levels – blog and installation. If you just have the one blog, or want any settings to apply across all the blogs on your installation of Movable Type, configure SpamLookup at the installation level, using the Plugins item on the MT main menu or System Overview screen – it should be towards the bottom. If you only want settings to apply to one blog, you can configure SpamLookup using the Plugins tab of the Settings item on the weblog menu.

2. Lookups Settings

When the plugin was first launched as MT-DBSL, all it did was perform lookups. Now, it’s just one weapon in its formidable anti-spam arsenal. There are three options here:

IP address lookups
These look up the source IP address of the comment or trackback and compare it with several centralised blacklist servers (you can add extra servers if you wish). If the IP address is found on the blacklist server, you have the option of forcing moderation of the comment, adjusting its junk status (the default action is to subtract 1 from its score) or do nothing. This can be quite effective but only if you trust the blacklisting systems. I added the following extra services to the list – they resulted in more spam being blocked but also more false-positives:

  • list.dsbl.org
  • dnsbl.sorbs.net

Separate each service with a comma.

Domain name lookups
This works in the same way, but looks up the domain names of the posted links. This is similar to how MT-Blacklist worked, except the blacklist is hosted elsewhere and not on your MT installation. Again, this is effective but only if you trust the blacklisting systems.
Advanced Trackback Lookups
This is quite badly explained, which is unfortunate as it can be very effective. This compares the IP of the source URL of the the trackback with the IP it was sent from. Normally the blog software sending the ping is on the same server as the blog itself, so they should match. A lot of spam is sent from zombie machines, not from the web site itself, so this will catch this sort of spam. As I said, much of the spam I get is caught by this rule, but some spammers have become wise and started sending trackback pings from the same IP address as the web site they are trying to promote so as to get around it. Also, I often get pings from a reader who blogs with Blogger and sends his pings from a third-party service not hosted on his site and these do get junked sometimes because of this option.

3. Link Settings

This looks at the source URL (trackbacks) or comment author URL (comments), along with any URLs posted in the comment itself.

  • The first option adds to the junk score if a comment has no links, since the general aim of blog spam is to link to a dodgy site to improve its ranking in search engines – a comment with no links is unlikely to be spam.
  • The second option will forcibly moderate any comment or trackback that has more than a certain number of links. Spam comments tend to have lots of links, but some commenters may post legitimate lists of links to other handy resources so don’t set this too low.
  • The third option is like the second but will subtract from the junk score. Set this higher than the previous option if you have it enabled – mine is at 4.
  • The ‘Link memory’ option adds to the score if you have previously approved a comment containing the same URLs. This means that regular commenters are less likely to fall foul of any other rules. Keep this enabled.
  • The ‘Email memory’ does the same, except with email addresses. This may be undermined if you approve a lot of comments with false email addresses, though, and obviously doesn’t really work with trackbacks.
  • You can also use the exclude function on the two ‘memory’ settings to ensure that comments posted within a certain time limit don’t get approved. If a spam comment does get published for some reason, then with this enabled it is less likely to allow future spam comments containing the same URL and email address to be published. This feature is new in SpamLookup 2.1

4. Keyword Filter Settings

These options act upon keywords in comments, and again replicate some functionality of MT-Blacklist. The first box contains words that should force a comment to be moderated, a new word on each line. In here you would want to put in words that may be indicative of spam but could also be used often in a legitimate sense. Mine includes words like ‘video’, ‘sexy’, ‘bankruptcy’ and a variety of swear words.
The second box contains words that would force a comment to be junked. By default, the junk status gets subtracted by 1 every time one of these words is found, so a spam comment that mentioned 3 different drug names all blocked by this list would get 3 taken off its junk score. However, if there are words that you think will never be used in a legitimate comment, you can put a number after it and any comment containing that word will have its junk status subtracted by that number. So putting in “viagra 4” would subtract 4 from the junk score of any comment containing the word ‘viagra’.
In both cases, you can also use Perl regular expressions – one I use is “/direct(v|tv)/i” which matches “directv” and “directtv” (as well as “DirectTV” as it’s case insensitive).
This is an incredibly powerful feature, however by default the plugin hardly has any keywords in it. So I’ve created some keyword lists based on keywords used on my own blog – here are keywords to moderate and here are keywords to junk. They may not be totally suited to your blog but they work well on mine and block a lot of junk.
You can go further by installing extra spam plugins – I’d personally recommend the following:

  • Real Comment Throttle – stops your weblog getting overloaded by floods of spam by setting limits on how many comments to accept in an hour/day
  • Akismet – another centralised system designed for weblog spam, with quite good accuracy rates
  • Autoban – automatically bans IP addresses of frequent spammers
  • Ban Ping 2 Old Entry – bans or junks trackback pings to old entries
  • SCode – adds a CAPTCHA for users to decipher before posting a comment (though they’re easily broken).

The first four of those are used here on this blog, and seem to work well.
I hope this guide has been useful for you, and that it helps to reduce the amount of spam you have to deal with on your blog. SpamLookup is a very powerful tool when configured correctly and can prove to be an ample warrior in the battle against spam.

7 Comments

  1. I’m not surprised that you saw a lot more false positives if you put dnsbl.sorbs.net in your IP lookup list. That list notoriously includes every IP address that is listed as dynamic by the ISP that owns the block.

  2. Hi Neil. Thanks for this helpful tutorial. I’m trying to download your keywords to moderate and junk. You mention that each keyword must be on a separate line but your files contain all of the keywords on 1 or 2 lines! I’m a bit confused about that. Can I simply download your keyword lists and paste them into SpamLookup?
    Thanks!

  3. You might want to check out my extended version of SpamLookup, which is backwards compatible but has bug fixes lets you blacklist / whitelist on any comment or trackback field. The downside is that it’s a modification of SpamLookup, not a separate plugin.

  4. I have to agree that AOG’s extended SpamLookup is the linchpin for my spamfighting and IMHO nobody should be without it (and this is coming from someone who has a strong aversion to deviating from the production code base). In fact, I really think 6A should just write him a check and fold it into 3.4.

  5. A check — haha. 6A won’t even answer my email about contributing it for free.

  6. Thank you for this interesting information.
    I have one question, is there is a way to customize the error message on SpamLookup? customize it to go with your template style and design?
    Thanks

  7. Hrmm that was weird, my comment got eaten. Anyway I wanted to say that it’s nice to know that someone else also mentioned this as I had trouble finding the same info elsewhere. This was the first place that told me the answer. Thanks.