On the cross-fertilization of geospatial and semantic web technology

Reflecting on the news Digg embraces RDFa

Digg, one of the popular social news web sites, announced that it will begin to support RDFa, a standard for embedding RDF statements in XML documents. Here is a screenshot of digg RDFa in action.

Although it’s unclear at the moment how this new feature will help digg to expand to its market share, but the downstream consequence is definitely positive. Technologies like RDFa and Microformats are crucial to the success of the Semantic Web.

My speculation is that HTML will continue to dominate the market of web publishing. People will continue to publish information in HTML because it’s the best markup language for displaying human-readable information in browsers. It’s the lowest common denominator for cross-platform information display. All desktop computers can run browsers to display HTML. Just about every mobile devices on the market today support some form of HTML rendering. In addition, there are incentives not to introduce other format representations because HTML contents can display well in mobile browsers like the Opera Mobile.

If HTML is here to stay, then from the Semantic Web development point of view, we must figure out how to publish semantic data along side with HTML. In general, there are two approaches: (1) publish the semantic data of each and every HTML pages in separate documents, (2) embed the semantic description in the same HTML pages. RDFa and Microformats are technologies of the latter.

There are pros and cons associated with both approaches. For this reason, I think in the near future we will see web applications to support both approaches. However, if you ask which approach will likely to attract web developers to share data, my answer is the latter approach (i.e., RDFa and Microformats).

First, they would require less overhead in Web development. Adding few extra HTML attributes in the existing template pages is relatively easy. But, creating separate full-blown RDF documents would require completely different set of business logic and template pages.

Second, the use of RDFa and Microformats can utilize the existing techniques for optimizing Web publishing. For example, caching is common technique used by many web sites to improve performance. If semantic data is embedded in HTML, then it can also be cached without much re-implementation.

Third, embedding semantic data in HTML gives web developers a sense of familiarity. People like to work with what they are familiar with, and many of them are reluctant to change. In an early stage of the Semantic Web movement, some web developers may show signs of resistance to RDF document publishing. But, convincing them to use RDFa and Microformats should be easy.

I’m happy to see that RDFa is adopted by Digg, and hope that more news sites will come to follow. I’m thinking that in the next release of gnizr, I will introduce the publishing of semantic data in RDFa or Microformats — some editing of the existing Freemarker template pages should do the trick.

Optimus: Microformats data transformer

Microformats is data format standards for embedding semantic information in XHTML documents. Dmitry Baranovskiy created a tranformer application that can output Microformatted semantic information into formats that are suitable for mashup (JSON and XML).

The application is open source, and is hosted on Google Code.

Here is the XML and JSON outputs of my Biosketch page.

The implementation is surprisingly simple but powerful. It relies on XSTL to transform Microformatted content from an XHTML file into JSON or XML. If you want use the application as a web service, follow the instructions here.

What if Firefox 3 supports Microformats

firefoxMozilla Firefox 3 may support Microformats, RWW reports. This news was spreading because Mozilla designer Alex Faaborg wrote an interesting blog on why browsers should support semantic markups. In his blog, Alex writes,

Much in the same way that operating systems currently associate particular file types with specific applications, future Web browsers are likely going to associate semantically marked up data you encounter on the Web with specific applications, either on your system or online.

Alex envisions the future Firefox browser as an information broker (or an aggregator) for the Web. It will automatically detect, extract and collect different semantic information on the Web and populates users’ favorite applications with this information. RWW has a nice diagram that illustrates Firefox as an information broker.
Read the rest of this entry »