Neil Turner's Blog

Blogging about technology and randomness since 2002

Trackback spam analysis

There’s something a bit revealing in my access logs. Here’s one entry: - - [03/Jul/2004:01:32:35 +0000] "GET /2004/May/09/nigritude_ultramarine.html HTTP/1.0" 200 10871 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - - [03/Jul/2004:01:32:37 +0000] "POST /scgi-bin/mt/ping.cgi/1795 HTTP/1.0" 200 84 "" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"

Now, maybe this is just me, but I don’t see how it’s possible to switch from IE6 in Windows XP to IE 5.5 in Windows Me in 2 seconds, unless someone’s using VMWare. Or, more likely, one or more of those is faked, especially as it didn’t request my stylesheet or the external Javascript file for TypeKey integration. A lot of others sent a Typepad user agent instead of a IE5.5/Win Me agent.
I also found that IP blocking hasn’t been as much of a failure as I thougt – I’ve had a number blocked. As Richy said, a number of these have been computers owned by the US military, which is both amusing and also very, very scary at the same time.
One thing I did find interesting was that I only had one GET request for any of the IP addresses used today, which was the one above. I reckon they have used that to find out what my trackback script is called, and then appended random numbers to it. Some of the pings were to entries where trackbacks had been closed for some months now. Richy says that despite renaming his script the attackers came back, so it’s possible that they’re rediscovering the script name once a day, or something.
incidentally, this isn’t an MT-only phenomenon, as Les has been hit – he uses pMachine’s ExpressionEngine. Therefore, my theory is that it is parsing the RDF code block with the trackback data to get the trackback URL, so any blogging system which includes that is potentially affected (assuming I’m correct).
Like Jay, I am surprised it has taken so long for trackback spam to get off the ground, considering how easy it is. I’m starting to wonder, what with the problems with character encodings that I’ve heard the likes of Sam Ruby and Jacques Distler talk about, and now this, that maybe we need a Trackback 2.0 system that addresses some of the problems with the existing system.


  1. Well, for one thing, Six Apart could follow through with the full implementation of the spec.
    Notice the deprecation at the bottom under v1.1: GET requests. Uh uh.. Still there.
    Now sure, someone could write a script that POSTs the data instead, but at least this would, as Phil Ringnalda once said, raise the bar so that at least a basic knowledge of the LWP module (or comparable in other languages) is required.
    (By the way, Neil, you have that data loss bug on registered preview. Are you using the MTCommentFields tag? You may want to scrap that in favor of the full template code..)

  2. They could of course both be on the same NATed network, but yes – chances are they are both scripts rather than real people.

  3. Trackback Spam

    Spammers have discovered Trackback and have recently been leaving their trail of unwelcome links all over the blogosphere. As with comment spam, your first recourse is Jay Allen’s MT-Blacklist. The blacklist will help you delete the trackbacks and ban …

  4. Being I am military I have been in contact with the appropriate .mil security people and they are working this very dilgently. I would like forward to me at any 198.x.x.x IP’s or any others that tracert back to a .mil address.

  5. Trackback Spam

    So it’s come to this. Comment spamming is no longer a problem for MT 3.0 users (with comment moderation and typekey access). So they finally are spamming via trackback (who knows why it took them this long to figure that…