On the cross-fertilization of geospatial and semantic web technology

Geonames integrates hotel data

Geonames recently added 70,000 geocoded hotel data from three major hotel booking sites: hotels.com, diytravel and laterooms. This is part of geonames’ latest initiative to include more Point of Interest data into its open source geonames database.

Integrating data from different data providers is not a task without challenges.

The challenge in this task was to integrate and match data from various data providers. Names and addresses of hotels as well as data quality may vary dramatically among providers and it is often difficult to figure out whether two hotels are actually the same hotel or not.

Thinking about this data integration problem, at first I thought the problem could be easily solved if all data providers share a common hotel ontology, but later I realized it’s not that simple — at least building a such ontology is not straightforward.

Here are some data modeling issues must be considered:

  • Hotel name change: hotel names may change overtime because of buyouts or re-branding. How to represent hotels that have their names changed? If a hotel changed its name from A to B, are there two different hotels?
  • Address format representation: how to represent hotel address? While the conventional text-string representation may work, but it could impose overhead in data alignment process — e.g., matching “3000 Redwood Lane” to “3000 Redwood LN” needs extra matching logic.
  • Names/addresses in foreign languages: can ontologies be used to align hotel names/addresses that are in different foreign languages? What if there is no standard English names/addresses associated with the hotels, is data alignment still possible?

Data integration is a hard problem. While Semantic Web technology may seem to be a natural solution to solve this problem, but it’s not a “silver-bullet”. There are many domain-specific modeling issues must be considered.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

3 Comments

  1. [...] Geonames | Geocoded Hotels Geonames | Nuevo mapa con datos de los hoteles Vía Geospatial Semantic Web Blog | Geonames integrates hotel data [...]

    Pingback by La Cartoteca » Blog Archive » Geonames integra datos de hoteles — May 20, 2007 @ 3:59 pm

  2. Why is hotel name change a problem? If each hotel has its own URI then the name would just be an rdfs:label attached to the URI (also in multiple languages). Or you could have a temporal relationship such as label from 2003-2006. In any case, references to the hotel would remain the same, as would be the geo metadata.

    Comment by Holger Knublauch — May 20, 2007 @ 4:16 pm

  3. Holger,

    Thank you for your comment. I think hotel name change is a problem because it could bring about inconsistent hotel information published by multiple data providers. In a perfect world, hotel name changes should be reflected in all data providers. However, in the real world, it may not be the case.

    What you have suggested — adding a temporal property — is a plausible solution. Although it may seem to be a trivial design to you, but it may not be so to other people. I choose to explicitly mention “hotel name change” as an issue because I want to bring special attention to people that when applying Semantic Web technology to solve data integration problem, it takes more than just a hand-waving — let’s use RDF/OWL. We must also pay attention to domain-specific modeling, such as temporal relationship that you have mentioned.

    Comment by harrychen — May 20, 2007 @ 5:33 pm

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>