On the cross-fertilization of geospatial and semantic web technology

Reflecting on the news Digg embraces RDFa

Digg, one of the popular social news web sites, announced that it will begin to support RDFa, a standard for embedding RDF statements in XML documents. Here is a screenshot of digg RDFa in action.

Although it’s unclear at the moment how this new feature will help digg to expand to its market share, but the downstream consequence is definitely positive. Technologies like RDFa and Microformats are crucial to the success of the Semantic Web.

My speculation is that HTML will continue to dominate the market of web publishing. People will continue to publish information in HTML because it’s the best markup language for displaying human-readable information in browsers. It’s the lowest common denominator for cross-platform information display. All desktop computers can run browsers to display HTML. Just about every mobile devices on the market today support some form of HTML rendering. In addition, there are incentives not to introduce other format representations because HTML contents can display well in mobile browsers like the Opera Mobile.

If HTML is here to stay, then from the Semantic Web development point of view, we must figure out how to publish semantic data along side with HTML. In general, there are two approaches: (1) publish the semantic data of each and every HTML pages in separate documents, (2) embed the semantic description in the same HTML pages. RDFa and Microformats are technologies of the latter.

There are pros and cons associated with both approaches. For this reason, I think in the near future we will see web applications to support both approaches. However, if you ask which approach will likely to attract web developers to share data, my answer is the latter approach (i.e., RDFa and Microformats).

First, they would require less overhead in Web development. Adding few extra HTML attributes in the existing template pages is relatively easy. But, creating separate full-blown RDF documents would require completely different set of business logic and template pages.

Second, the use of RDFa and Microformats can utilize the existing techniques for optimizing Web publishing. For example, caching is common technique used by many web sites to improve performance. If semantic data is embedded in HTML, then it can also be cached without much re-implementation.

Third, embedding semantic data in HTML gives web developers a sense of familiarity. People like to work with what they are familiar with, and many of them are reluctant to change. In an early stage of the Semantic Web movement, some web developers may show signs of resistance to RDF document publishing. But, convincing them to use RDFa and Microformats should be easy.

I’m happy to see that RDFa is adopted by Digg, and hope that more news sites will come to follow. I’m thinking that in the next release of gnizr, I will introduce the publishing of semantic data in RDFa or Microformats — some editing of the existing Freemarker template pages should do the trick.

A solution to the RDF publishing dilemma

The Semantic Web is a vision about the Web in which computer programs can process not only the syntax structure but also the semantic meaning of Web pages. To achieve this , we invented knowledge representation languages like RDF and OWL. The idea is that we will use these languages to describe the meta-data about a Web page and the semantic meaning of its content. One question remains to be unanswered — how should we publish RDF documents on the Web?

I called this the RDF publishing dilemma. In the Semantic Web, should a content creator publish an explicit semantic description of an HTML page in a separate RDF document, or should the semantic information be embedded within the HTML page itself? There are pros and cons associated with either approaches.

If the semantic information is described in a separate RDF document, it simplifies the editing and the management of documents. RDF documents will be treated like other Web documents — e.g., unique URL for each document and no messy syntax mashup between RDF and HTML. However, it has some disadvantages. Version controls become a bit more complex because we need to maintain information consistence between an RDF and an HTML document. Also, it discourages Web designers from adopting the Semantic Web idea because many see the creation of RDF documents as an extra task that gives no immediate benefit.

On the other hand, if semantic information is embedded within the HTML pages, it simplifies version control and lowers the barrier for Web designers to create semantic documents. Adding semantics to an HTML page is simply adding new tags to the existing page. But, this approach has its own problem. Because embedding semantic information in a Web page (e.g., RDF + HTML), it imposes significant overhead and challenge for computer programs to process the document — extra logic needs to be implemented to parse and extract RDF description from the HTML pages.

I came across Ivan’s blog that describes a simple solution that solves the RDF publishing dilemma. You can read about the details in Ivan’s post. The basic idea is that Web publishers will use RDFa to describe semantic information in an HTML page. Instead of requiring computer programs to parse and extract RDF information from the page, the web server is configured to serve an RDF-version of the HTML page by exploiting an RDFa-to-RDF translator and some Apache Rewrite rules.

Mixing RDF/A with GeoRSS

RDF/A is a syntax for layering RDF information on any XML document, via attributes. GeoRSS is a syntax for annotating geographical information in RSS feeds. In the past, I’ve showed examples using RDF/A.

Today I saw James Fee blogging about GeoRSS. I thought it will be interesting to try to mix GeoRSS and RDF/A.

Read the rest of this entry »