Neil Turner's Blog

Blogging about technology and randomness since 2002

Foursquare Thursday – The importance of data integrity

Huddersfield Station
This is the ninth post in a series about Foursquare – read part one, part two, part three, part four, part five. part six, part seven part eight and part nine.
In the two years since Foursquare launched, its users have added quite literally millions of venues. I don’t know exactly how many venues are in the database but I believe it’s in the region of 20 million places. Which is great because it means that in countries where Foursquare is prevalent, you can check into just about anywhere without having to add the venue yourself. Of course, if the venue genuinely doesn’t exist, the option to add it is there.
That being said, Foursquare does have its fare share of junk – duplicate venues, fake venues for places that don’t actually exist, or venues which do exist but have inaccurate or missing info. I’ve touched on how to get errors fixed and the role of superusers in keeping Foursquare tidy, but not necessarily why it’s so important.
On a simple level, making sure data is accurate means that we have a nice tidy database, but that isn’t a reason itself and would mean that all superusers are just obsessive-compulsives who don’t want the precious Foursquare database being soiled with bad information. And while in my case that may be partly accurate, there are some real reasons why we want the venue database to be organised.
For example, we merge venues for the following reasons:

  1. If you want to check in to somewhere and see multiple listings for the same physical venue, which one do you check into? Making sure that each venue in the real world only appears once on Foursquare makes it less confusing for people.
  2. Business can claim ownership of their venues, and doing so gives them access to various useful statistics about who checks in and when, as well as being able to run special deals. If a business claims their venue, but there’s a duplicate they haven’t claimed, then they not get the full picture about who is checking in, which is bad for them. And if users check into the duplicate venue, they may not be able to take advantage of the special deal.
  3. If a critical mass of users check in at the same time to a popular venue then it’s possible that they will be awarded one of the ‘swarm’ badges. If there are duplicates, it may be that the checkins are split across several venues, making obtaining the swarm badges harder.
  4. It ensures that mayorships of busy venues, like major train stations, are reserved for those who make the effort and check in every day. Creating a duplicate ‘London Waterloo Station’ and becoming the mayor is arguably cheating.

Getting the category of venues correct is also important. Several badges are dependent on a number of checkins to venues in a specific category, and so venues that have no category – or the wrong category – won’t count. With the advent of version 3 of the Foursquare mobile apps for the iPhone and Android, correct categories are also needed for the ‘Explore’ feature that allows the app to suggest new venues to visit in a town. And we’re also keen to ensure people’s houses are correctly classified in the ‘Home’ category to prevent them from showing in general area searches.
A lot of venues lack addresses, mainly because it’s optional when adding venues via the mobile apps. While this does allow the database to grow more quickly, and means that you don’t need to know the address of the place you’re adding whilst on the move, it does result in lots of venues where we may just have the name and its location to work with. Superusers actively seek out addresses for venues, because when faced with several venues with the same name in a small area (for example, multiple Subway outlets) having the address makes them easier to differentiate. That, and it’s easier for superusers to find duplicates. Foursquare’s main competitor Gowalla emails users after they create a venue, inviting them to fill in any missing venue information on the main web site, and I think Foursquare would do well to imitate this feature to improve data integrity.
And then there’s junk venues – venues which don’t exist, or are inappropriate, such as people’s beds, kitchens, toilets; tables at restaurants; or puerile things like ‘your mum’. These can be flagged for deletion, although Foursquare staff themselves have to approve all deletions and I believe the deletion queue is months long. Plus venues which used to exist but have closed down can be ‘closed’ – retaining the checkin history and mayorship, but not allowing any new checkins and not showing in search results.
So now you know both the how and the why of what superusers, and keen helpers, do to make Foursquare a tidier and more ordered place.

Comments are closed.