Neil Turner's Blog

Blogging about technology and randomness since 2002

Caching gravatars

Gravatar.com is on a new host, which means it can go on serving gravatar images for a while yet. That doesn’t stop me feeling guilty about contributing to its bandwidth problem though.
The problem with Gravatar as it is at the moment is best illustrated by this entry on Redemption in a Blog (which is now 5th in Google for ‘Homer Simpson’). Here, you have a weblog entry which got lots of attention and had lots of comments posted to it. For each of those comments, a request to gravatar.com was needed. You then need to multiply that by the number of hits that page got. We’re therefore talking about a lot of requests to gravatar.com.
So what I’m proposing is some kind of caching mechanism. Instead of the client grabbing an image from gravatar.com every time someone visits the page, the server has a script which checks for a locally-cached copy of the gravatar, and, if it finds one, displays that instead. Otherwise, it pulls the image from gravatar.com, and then caches it.
This could mean far, far fewer requests being made to gravatar.com. Instead of a gravatar being requested every time it is needed, it may only be requested once a day (we’ll assume here that cached gravatars expire after 24 hours). That would amount to some sizable bandwidth savings.
To illustrate what I’m proposing, here’s a flow diagram:
Diagram showing my Gravatar proposal
Now, all we’d need is for someone to implement this. I’d do it myself but my PHP skills aren’t good enough. Any takers?

9 Comments

  1. You mean something like my CacheRemote script which was developed back in the days when BlogRolling was having exactly the same problems?

  2. Yes I do. I had a play around trying to get that to work in this instance but didn’t get very far.

  3. I’m currently working on one in Python using MySQL. It’s an interesting project. Ironic, really, as my own gravatar doesn’t work. I’ll make it work one day.

  4. Done!
    It uses Python (tested in 2.3.4) and the MySQLdb module (lamp.inf.brad.ac.uk doesn’t have MySQLdb installed so there’s no working example unfortunately).
    It has a bit of a problem with JPEGs, but GIFs and PNGs work fine. It’s a bit kludgy at the moment, but it works.
    Grav cache.
    It takes the exact same GET stuff that Gravatar’s script accepts.

  5. Actually, I’ve been working on adding caching to EEGravatar for awhile now in part to help offset periods when the server isn’t available. I’ve been trying to make it comply with how ExpressionEngine does caching, though, and haven’t got it all figured out yet. I’m still relatively new to PHP programming after all, but it is something I’ve been meaning to add to my version of the plugin for awhile now.
    Suppose I should take another crack at it. 🙂

  6. I sent off a few notes about the Gravatar API a while back and am going to attach them.
    Long story short the API is pretty useless and wastes a lot of bandwidth.
    Not supporting conditional gets is a problem too.
    Right now the API is insufficient for me to even use it. On top of that problem theres the problem that I’m wasting my OWN bandwidth and since the API doesn’t use HTTP I don’t even have the option to save bandwidth. ug.
    Heres my email:
    It might be a good idea for you to also support FOAFs SHA1SUM format for use with Gravatar’s REST interface.
    “mailto:” concatenated with the email VALUE,
    then hashed with SHA1SUM, then Base16 encoded.
    The SHA1SUM is used for emails that are blinded by the system,
    as will be the case of FOAFnet import.
    This way I could fetch a Gravatar with JUST the SHA1SUM…
    … and another thing:
    If I use “burton@foobar.com” (which doesn’t exist in your DB) you return HTTP 200 OK and then a transparent GIF.
    If ‘default’ isn’t specified you should return 404 Not Found.
    If you don’t then the only way I know that its not found is to use the hashcode of the image as a token.
    I could certainly use your default parameter but its going to waste your bandwidth.
    Also… if the user specified ‘default’ (which is a full URL to another image) you should be able to use HTTP 302 and then set the Location header to the value of the ‘default’ param.
    Almost all HTTP implementations support this.

  7. What about linking to gravatar through the distributed Coral cache ( http://www.scs.cs.nyu.edu/coral/ )?

  8. Why put the limit of 1 day on the gravatar? I can’t imagine them changing that frequently? Say 1 week might be a bigger bandwith saver.
    But I know nothing about programming so I can’t really say much!

  9. In response to the comment from Tom, I’ve ported his Python Script to PHP.
    PHP Port
    I’ve made a couple of comments in the PHP version as well, which weren’t in the original one.
    It uses the same DB Schema as Tom’s version, so should be identical.
    Colin