On the cross-fertilization of geospatial and semantic web technology

Geotagged blogs are rare in the blogosphere

To geotag blogs is to annotate weblog posts with geographical information. There are at least few reasons why this is a good idea.

Geotagged blogs will enable web search engines to effectively index blogs based on geographical information. This information will help to build more powerful search engines that support spatial queries (e.g., find all blogs on the topic “war” and written by people who are located in “Iraq”). Moreover, geographical information of blog posts will also help us to understand the trends and the ecology of the blogosphere (e.g., what’s the most discussed topic in a particular geographical region, and how opinions differ between people who live in different counties?) At present, these studies are done by collecting IP locations of the bloggers and the written languages of the posts (I think).

There are few different languages and tools for publishing geographical information about blogs, for example, GeoRSS, Microformats, W3C Geo, FOAF, WordPress Geo Plugin (see also [1]).

A question I ask myself today is that “if we already have the languages and tools for geotagging blogs, why geotagged blogs are rare in the blogosphere?

Understanding the Problem

Some may argue that geotagging is not a good idea to begin with — typical users don’t have interests in GIS. I think that is a false argument. In the parallel universe, Flickr’s geotagging photo application caught the public’s attention within a short period of time, and the idea of geotagging has been proved to be useful, at least for photos. Personally, I think that asking users to geotag their blogs is not much different from asking them to geotag their photos. The reason that cause a slow adoption of geotagged blogs must be something else.

Essential Elements for Enabling Geotagging

Enabling geotagged blogs requires at least two elments: (1) an effective language for publishing or embedding geographical information in blog posts, and (2) a set of tools that enable the discovery of geotagged posts and the data-mining of geotagged information.

Core of the Problem

We don’t lack knowledge representation languages for describing geographical information. To the contrary, I think we have too many of them. Not having a common standard language is causing problems for buliding effective tools.

Some critical issues that I’ve observed:

  • RDF vs. XML. RSS feeds are important part of the present blogosphere. Since bloggers have adopted the use of both XML and RDF representation for publish their blog feeds, geotagging languages of both types have emerged. For example, GeoRS was created for adding geo information in XML RSS feeds and W3C Geo was created for describing geo information in RDF documents. There is no major technology difference between the two, but rather the difference is in bureaucracy. One is backed by OGC and the other is by W3C. One is advocated by DAML/OWL Semantic Web researchers, and the other is by experts from the world of GIS and GML.
  • Shared vocabularies with ambiguous usage semantics. While different geotagging languages may be defined in different representation languages, but they all seem to share core set of vocabularies such as latitude and longitude (or an variation POINT). However, there is no standard agreements on the usage of these vocabularies to represent information. A blog posts may be associated with different kinds of location information: the location of the person who wrote the post, the location at where the post was written, and the location that is related or mentioned in the post. Different geotagging tools are built with different assumptions. This made the crawling of geotagged data difficult and complex. See the difference between this, this and this.
  • An embedded solution vs. a separate document solution. There are two different two ways to publish semantic annotations such as geotagged data on the Web. One can embed annotations in the original HTML or publish a separate document and link to which from the original HTML. Microformats encourage the former approach, and others like W3C Geo, FOAF and GeoRSS suggest the latter. Building crawlers to discover geotagged information requires the designer to make some assumption sabout the location of the documents. Since there are different ways to publish geo information, the design of a crawler becomes rather complex.
Finding Remedy

Steps toward resolving these issues, I suggest the following:

  • Standard bodies and special interest groups should pay extract attentions to the usage of the languages in conjunction with other established languages and technology. For example, should all bloggers provide both W3C Geo and GeoRSS description in their FOAF profiles and SIOC profiles? Should blogger provide both XML and RDF version of a location description when publishing their blog posts?
  • Tool developers should avoid re-inventing the wheels by reusing existing vocabularies as much as possible. SIOC Export for WordPress is a good example that encourages the reuse of ontology vocabularies. It uses FOAF ontology to describe people and SIOC ontology to describe posts. I imagine it can be easily extended to use W3C Geo or GeoRSS to describe location information, as well.
  • Researchers and developers should attempt to put aside philosophical and political differences in building languages and tools. We should think what’s the best for the users and not ourselves. Whether it’s RDF or XML, or whether it’s semantic web (lower-case) or Semantic Web (upper-case). Whatever works is the best solution. Building useful applications for the everyday people, we need innovation — combine the use of different technologies to solve an existing problem.

[1] Other discussions on geotagging languages

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

6 Comments

  1. [...] Sigo con la web semántica, el futuro de Internet (aunque no hay prisas) La web semántica consiste en una serie de tecnologías, todavía por estandarizar o popularizar, que añaden marcas semánticas a las páginas de forma que las convierten en una especie de base de datos indexables por cualquier motor de búsqueda y relacionables entre sí. (X)HTML se transformaría, dejando de ser un formato de presentación para reconvertirse en un formato también de significado, ya que se enriquecería con una serie de tags o etiquetas nuevas, comprensibles por todos los navegadores, que incluirían este tipo de información computable; así, nuestras búsquedas encontrarían respuestas personalizadas, ya que al fin los buscadores manejarían un material claramente marcado mediante ontologías (vocabularios definidos formal y públicamente para evitar ambigüedades y confusiones). Web semántica para buscadores inteligentes, capacitados para discriminar los datos que contienen las páginas web. Igual se necesita un ejemplo. Pongamos que exista una etiqueta (le llamaremos “precio”) a la que incorporar justamente el precio de un artículo; otra llamada “tv” con atributos como “marca” para especificar el fabricante, y “pulgadas” para la dimensión. Un día decides buscar información en Internet sobre televisores de 32″ que cuesten menos de 700 € identificados por fabricantes. El buscador (¿Google todavía?) te devolvería un resultado fiable porque indexa miles o millones de sitios de forma transparente accediendo al contenido de las categorías “precio”, “tv”, “marca” y “pulgadas”. En realidad, la web semántica es más compleja pero igual el ejemplo vale para imaginar su potencialidad. Bueno, volvamos al presente. Mientras no se generaliza la web semántica, podemos ir probando cositas. Por ejemplo, la geolocalización para tu blog. ¿Te parece raro? Fíjate sin embargo en el éxito de la geolocalización en Flickr y otros sitios, que permiten añadir este tipo de información a su contenido, sean imágenes, contactos del Messenger o productos comerciales. También los sitios, y por tanto los blogs, pueden contener estos metadatos. Existe un sitio dedicado al tema que se llama geospatialsemanticweb, lo recomiendo. Este sitio ha publicado un artículo que analiza las ventajas de añadir geotags a nuestro blog. Entre las formas más sencillas de incorporar esta metainformación, se encuentra un plugin para Word Press, que la añade automáticamente a la cabecera para que sea accesible a los buscadores, y que es muy configurable: permite especificar las coordenadas o decidir los datos y la forma (por ejemplo, un mapa) en que queremos mostrar la geolocalización. [...]

    Pingback by despuesdegoogle » Archivo del weblog » Añade geoinformación a tu Word Press — September 23, 2006 @ 4:10 am

  2. I think it comes down to usability. There is a disconnect between standards developers & actual users who are writing about their trips, photos, and news stories. Plugins/Tools/Applications need to be easy to use for the ‘typical user’ (whomever they are for your target). Most users don’t care about “ontology” or RDF, they just want to post up an interesting location, a hike, or an area of interest.

    As developers/standards advocates we need to push extensions and tools into the realm of our target users. Hide whatever deep tech you’re using and just present to the user a box “Enter your location:”

    Comment by Andrew Turner — September 24, 2006 @ 12:39 pm

  3. Hi, for your info, we are working for a GeoRSS slash plugin for slashgeo.org, see http://software.lottadot.com/projman.pl?op=view&id=18 . The plugin could eventually be used by slashdot and thus, make a lot of people aware of the value of geolocating blog entries.

    Comment by Alexandre Leroux — September 25, 2006 @ 8:08 am

  4. You might be interested in Auto Geo, a tool I made to automatically add microformat geocoding to addresses marked up with the hCard microformat. So anyone living in an area covered by one of the free online geocoding services can gain the benefits of geocoding without actually looking up their coordinates.

    Comment by Scott Reynen — September 25, 2006 @ 1:45 pm

  5. Collecting IP addresses and geo information is wrong. We enter directly in the individual privacy zones.

    Collecting geo information in a post to locate what the post is talking about may be good. Exemple:

    Good: locating a post in Paris talking about the Tour Eiffel.
    Wrong: locating a post giving the apartement location of a friend when you went to his/her party.

    Comment by karl — September 26, 2006 @ 8:48 pm

  6. I agree with karl. No one wants some wacko to find where they live and pester them (or worse). If it’s a blog talking about a trip or location, maybe that would be okay, but most folks are concerned over privacy issues.

    Comment by Wynter — February 1, 2007 @ 8:57 am

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>