On the cross-fertilization of geospatial and semantic web technology

Geohash for spatial index and search

Geohash is a new algorithm for encoding latitude and longitude coordinates. Given a pair of lat/long coordinates (e.g., 42.6 -5.6) as the input, the algorithm produces a string output that is the Geohash encoding of the coordinates (e.g., ezs42).

Why is this interesting? In many Web applications, we often need to create unique identifiers for locations. It’s easy to use the Geohash algorithm to construct URL (or URI) for identifying “point” locations (i.e., locations defined by a pair of lat/long coordinates).

Here is an example:

  • Location: Baltimore, MD
  • Lat/long coordinates: 39.286534 -76.613558

If we set this coordinates as the input to the Geohash algorithm, we will get an output string:

  • dqcx2xgswswx

To create a unique URL (or URI) that identifies “Baltimore, MD”, we simply append the string produced to an URL prefix (e.g., http://geohash.org)

  • http://geohash.org/dqcx2xgswswx

What are the advantages of using Geohash? Geohash seems to be useful in building simple spatial index and spatial search. In a typical geospatial application, we often rely on spatial databases (e.g., MySQL Spatial and Oracle Spatial) to provide spatial index and search. However, this imposes a significant amount of overhead during the development and deployment. This is where Geohash can help. If the spatial operations in the application are relatively simple (e.g., only involve points), Geohash provides an easy solution to build an index of locations and allow them to be searched.

How do we do this? The encoding function of the Geohash algorithm has an interesting property. Given any two pairs of latitude and longitude coordinates about a similar location but of different granularity, the strings produced by the algorithm will always share a common prefix string.

For example:

Given two points,

  • Point A (39.286534 -76.613558), and
  • Point B (39.286 -76.613)

The corresponding Geohash encoding of these two points are

  • dqcx2xgswswx
  • dqcx2xu1

Notice these two strings share a common prefix string: ‘dqcx2x’.

Given this unique property, we can utilize this knowledge to find locations that are nearby each other or deduce whether or not different locations are about the same physical place.

You can read more about Geohash on Wikipedia and Geohash.org.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

3 Comments

  1. hey — that’s pretty neat Harry! I’m sure there must be a good use we can put it to.

    Comment by tim finin — May 30, 2008 @ 5:24 pm

  2. That doesn’t work too well for London, which straddles 0 longitude–either side of 0 flips the MSB. These two places are pretty close to each other:

    http://geohash.org/u10hb7951
    http://geohash.org/gcpuzewfz

    Comment by Michael S. — June 8, 2008 @ 5:11 am

  3. [...] reading this article about geohashing, I thought I’d share a similar implementation I had which used a matrix to translate [...]

    Pingback by Sho Fukamachi Online » Blog Archive » Geohashing using matrices in Ruby — June 8, 2008 @ 12:26 pm

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>