Neil Turner's Blog

Blogging about technology and randomness since 2002

Compression comparisons

Earlier today I downloaded a complete copy of the access log for this site. Unfortunately, my current host doesn’t allow you to have file chunks smaller than 1 month, which, on a relatively busy site, means you get some rather monstrous files. Its redeeming feature is that the log file is gzipped on the fly so it is possible to download it in a reasonably timely fashion, even on dial-up.
Since I first moved to this host, my log file has grown to 58.9MB. gzipped, it’s down to 6.09MB – somewhat more managable. But gzip isn’t the most efficient format out there, so I did a little experiment. By using the latest 7-Zip beta souped up to the maximum compression settings (using Deflate64 with Ultra compression and Word Size set to 255), I was able to get file down to 5.05MB – a whole megabyte saved. WinZip 9, currently in beta, also supports the souped up compression mode, but could only get it down to 5.25MB. Unfortunately both of these files would suffer compatibility issues since Deflate64 has only been around a few years and some older compression utilities won’t be able to read them (though the majority do, I gather).
Using BZip2, another entirely open format, the file size dropped even further – 3.74MB, although it took quite some time to compress it. I could have also used 7Zip’s own open format, 7z, but at maximum settings I’d need nearly 2GB of memory to compress it. Still, reducing a file from nearly 60MB down to just under 4 is pretty impressive compression – a 93.7% reduction. And compared gzip, using BZip2 in this instance would have lead to a 38.6% reduction, although it would lead to extra load on the server.

2 Comments

  1. Neil,
    You have way too much time to blog!

  2. That is nuts.