On the cross-fertilization of geospatial and semantic web technology

A solution to the RDF publishing dilemma

The Semantic Web is a vision about the Web in which computer programs can process not only the syntax structure but also the semantic meaning of Web pages. To achieve this , we invented knowledge representation languages like RDF and OWL. The idea is that we will use these languages to describe the meta-data about a Web page and the semantic meaning of its content. One question remains to be unanswered — how should we publish RDF documents on the Web?

I called this the RDF publishing dilemma. In the Semantic Web, should a content creator publish an explicit semantic description of an HTML page in a separate RDF document, or should the semantic information be embedded within the HTML page itself? There are pros and cons associated with either approaches.

If the semantic information is described in a separate RDF document, it simplifies the editing and the management of documents. RDF documents will be treated like other Web documents — e.g., unique URL for each document and no messy syntax mashup between RDF and HTML. However, it has some disadvantages. Version controls become a bit more complex because we need to maintain information consistence between an RDF and an HTML document. Also, it discourages Web designers from adopting the Semantic Web idea because many see the creation of RDF documents as an extra task that gives no immediate benefit.

On the other hand, if semantic information is embedded within the HTML pages, it simplifies version control and lowers the barrier for Web designers to create semantic documents. Adding semantics to an HTML page is simply adding new tags to the existing page. But, this approach has its own problem. Because embedding semantic information in a Web page (e.g., RDF + HTML), it imposes significant overhead and challenge for computer programs to process the document — extra logic needs to be implemented to parse and extract RDF description from the HTML pages.

I came across Ivan’s blog that describes a simple solution that solves the RDF publishing dilemma. You can read about the details in Ivan’s post. The basic idea is that Web publishers will use RDFa to describe semantic information in an HTML page. Instead of requiring computer programs to parse and extract RDF information from the page, the web server is configured to serve an RDF-version of the HTML page by exploiting an RDFa-to-RDF translator and some Apache Rewrite rules.