On the cross-fertilization of geospatial and semantic web technology

MySpace plans to support the Semantic Web

According to the Web User News, MySpace plans to support Semantic Web technologies. From the article, it’s unclear exactly how the world’s largest social networking site plans to support RDF, RDFa or microformats. It may be still too early to celebrate.

At the SocialDevCamp East, we talked about the importance of Semantic Web technologies in context of social media. One question that didn’t get answered is “Who is going to publish semantic descriptions of social media data on the Web?”

Given that leaders like Digg and MySpace have plans to support the Semantic Web, I’m optimistic that the unanswered question will be answered soon.

Previously there was no way of linking this data, but the semantic web is able to retrieve and collate it. This means that it can now input your personal information should you join another social network.

MySpace users will still maintain complete control over what information they share and who gets to see it, but it will make sharing information across different platforms easier and quicker.

Yahoo announced earlier this year its plans to utilise the semantic web with a more efficient tagging system to give better search results.

MySpace’s DeWolfe said he hoped other networking sites, including Facebook, would sign up to the agreement.

Reflecting on the news Digg embraces RDFa

Digg, one of the popular social news web sites, announced that it will begin to support RDFa, a standard for embedding RDF statements in XML documents. Here is a screenshot of digg RDFa in action.

Although it’s unclear at the moment how this new feature will help digg to expand to its market share, but the downstream consequence is definitely positive. Technologies like RDFa and Microformats are crucial to the success of the Semantic Web.

My speculation is that HTML will continue to dominate the market of web publishing. People will continue to publish information in HTML because it’s the best markup language for displaying human-readable information in browsers. It’s the lowest common denominator for cross-platform information display. All desktop computers can run browsers to display HTML. Just about every mobile devices on the market today support some form of HTML rendering. In addition, there are incentives not to introduce other format representations because HTML contents can display well in mobile browsers like the Opera Mobile.

If HTML is here to stay, then from the Semantic Web development point of view, we must figure out how to publish semantic data along side with HTML. In general, there are two approaches: (1) publish the semantic data of each and every HTML pages in separate documents, (2) embed the semantic description in the same HTML pages. RDFa and Microformats are technologies of the latter.

There are pros and cons associated with both approaches. For this reason, I think in the near future we will see web applications to support both approaches. However, if you ask which approach will likely to attract web developers to share data, my answer is the latter approach (i.e., RDFa and Microformats).

First, they would require less overhead in Web development. Adding few extra HTML attributes in the existing template pages is relatively easy. But, creating separate full-blown RDF documents would require completely different set of business logic and template pages.

Second, the use of RDFa and Microformats can utilize the existing techniques for optimizing Web publishing. For example, caching is common technique used by many web sites to improve performance. If semantic data is embedded in HTML, then it can also be cached without much re-implementation.

Third, embedding semantic data in HTML gives web developers a sense of familiarity. People like to work with what they are familiar with, and many of them are reluctant to change. In an early stage of the Semantic Web movement, some web developers may show signs of resistance to RDF document publishing. But, convincing them to use RDFa and Microformats should be easy.

I’m happy to see that RDFa is adopted by Digg, and hope that more news sites will come to follow. I’m thinking that in the next release of gnizr, I will introduce the publishing of semantic data in RDFa or Microformats — some editing of the existing Freemarker template pages should do the trick.

Semantic Web 2.0 and Semantic HTML

Continue with the previous Semantic Web discussion, I gave two more lectures on the subject. The first lecture was on thinking about the Semantic Web in the context of Web 2.0 and the Social Web. The second lecture was an introduction to RDFa and Microformats.

Students were very excited about these topics. Some continued her thinking online.

Next month, we should see more interesting student discussions on our blog. As part of their assignment, students are asked to write about their view of the Web in 2013.


Optimus: Microformats data transformer

Microformats is data format standards for embedding semantic information in XHTML documents. Dmitry Baranovskiy created a tranformer application that can output Microformatted semantic information into formats that are suitable for mashup (JSON and XML).

The application is open source, and is hosted on Google Code.

Here is the XML and JSON outputs of my Biosketch page.

The implementation is surprisingly simple but powerful. It relies on XSTL to transform Microformatted content from an XHTML file into JSON or XML. If you want use the application as a web service, follow the instructions here.

The Semantic Web status check

The Economist publishes an article on the Semantic Web. Not too technical, it provides a quick overview of what has happened and what could happen.

SOME new ideas take wing spontaneously. Others struggle to be born. The “semantic web” is definitely in the latter category. But it may have found its midwife in Reuters, a business-information company.

Reuters is not alone, of course. Yahoo!, desperate to gain a technological edge over its rival Google, recently endorsed a set of machine-readable formats that will make better sense of the information streaming through the vast universe of web sites it searches.

Radar Networks, based in San Francisco, is one example. Radar has launched a service called Twine, into which users can stuff any link, document or e-mail message they want and hope for some organising principle to emerge. If Twine fails (and reviews of the usefulness of its experimental “beta” version have been mixed) other small firms such as Powerset and Metaweb (also both based in San Francisco) and Hakia and Adaptive Blue (both from New York) stand ready to fill the breach.

Teach students GIS using Geonames

Geospatial Web and Semantic Web are two major discussion topics of the Social Web Technologies course. In the past few classes, we talked about GIS, Google Maps API, geotagging and Geonames.

When introducing Geonames to the students, I decided to do a little experiment. I used Geonames as a tool to teach students the basics of GIS and provide them an opportunity to experience a “social-able” Geospatial Web.

An Annotation Competition

The experiment was relative simple. I spent few minutes introducing Geonames to the students. And then, I asked them to play a game. The class was divided into two teams: Team1 and Team2. Using Geonames, the teams competed with each other in identifying landmarks, buildings and roads that are located within the close vicinity of the UMBC campus. Each student signed up for a free Geonames user account. Using the wiki-style annotation tool provided by Geonames, students tried to annotate as many spatial features as they can in 10 minutes. The team produced the most annotated features would win.

To keep track of the features that each team had annotated, students were asked to tag their features using their team ID: “team1″ and “team2″. Using the Geonames search tool, I displayed the real-time progress in front of the class.

Lesson Learned
  • It’s fun to use Geonames in a collaborative environment. Students enjoyed the process of creating annotations while chatting with each other and arguing about the location of a specific landmark. It was a social-able experience.
  • Geonames has a relative open policy for users to make contributions — whatever the user enters, Geonames stores it. In general, this is a good thing. However, this policy can also lead to unintended creations of duplicated data. For example, because students were entering data simultaneously, we frequently saw multiple annotations of the same location were entered and they had different coordinates values assigned.
  • It seems that using the Web as a platform can encourage non-GIS experts (e.g., students) to do GIS tasks (e.g., annotation). Not sure if this is an inherent feature of the Web or just because of the UI of Geonames is well designed.

Use Oracle technology for spatial RDF graph query

I came across a paper by Matthew Perry and his colleagues, in which they described an system implementation that builds on Oracle DB technology to enable the store and query of spatial and temporal RDF data. There are many Semantic Web tools (e.g., Jena and Sesame) that provide RDF data stores and support queries. But, none of them provide native support for spatial and temporal queries. The work described in Perry’s paper is aimed to address this problem.

In general, there are two different approach to this problem. First, we can design and implement a whole new RDF storage engine to support efficient spatial and temporal operations. Second, we can transform the problem into a typical relational database problem and solve it using a mix bag of RDF and RDBMS technology.

Read the rest of this entry »

A solution to the RDF publishing dilemma

The Semantic Web is a vision about the Web in which computer programs can process not only the syntax structure but also the semantic meaning of Web pages. To achieve this , we invented knowledge representation languages like RDF and OWL. The idea is that we will use these languages to describe the meta-data about a Web page and the semantic meaning of its content. One question remains to be unanswered — how should we publish RDF documents on the Web?

I called this the RDF publishing dilemma. In the Semantic Web, should a content creator publish an explicit semantic description of an HTML page in a separate RDF document, or should the semantic information be embedded within the HTML page itself? There are pros and cons associated with either approaches.

If the semantic information is described in a separate RDF document, it simplifies the editing and the management of documents. RDF documents will be treated like other Web documents — e.g., unique URL for each document and no messy syntax mashup between RDF and HTML. However, it has some disadvantages. Version controls become a bit more complex because we need to maintain information consistence between an RDF and an HTML document. Also, it discourages Web designers from adopting the Semantic Web idea because many see the creation of RDF documents as an extra task that gives no immediate benefit.

On the other hand, if semantic information is embedded within the HTML pages, it simplifies version control and lowers the barrier for Web designers to create semantic documents. Adding semantics to an HTML page is simply adding new tags to the existing page. But, this approach has its own problem. Because embedding semantic information in a Web page (e.g., RDF + HTML), it imposes significant overhead and challenge for computer programs to process the document — extra logic needs to be implemented to parse and extract RDF description from the HTML pages.

I came across Ivan’s blog that describes a simple solution that solves the RDF publishing dilemma. You can read about the details in Ivan’s post. The basic idea is that Web publishers will use RDFa to describe semantic information in an HTML page. Instead of requiring computer programs to parse and extract RDF information from the page, the web server is configured to serve an RDF-version of the HTML page by exploiting an RDFa-to-RDF translator and some Apache Rewrite rules.

« Previous Entries