On the cross-fertilization of geospatial and semantic web technology

WordPress SIOC Import

Uldis Bojars developed a new WordPress plugin to demonstrate the use of SIOC ontology. Similar to the WordPress built-in Import/Export function, this plugin also allows users to import posts from another WordPress blog and export posts for backup. However, it differs from the built-in function in that it uses the SIOC ontology — the WP built-in Import/Export function uses WordPress eXtended RSS (WXP).

There are several advantages in using SIOC as oppose to WXP.

  1. Users can import any blog-like things into a WordPress site. As long as the import data source is expressed using the SIOC ontology, the plugin is able to construct WordPress blog posts from SIOC post objects (e.g., import messages from public forums as WP posts).
  2. Introduce new vocabularies in SIOC data files will not break the plugin implementation. Because data files of SIOC are RDF documents, new namespaces and vocabularies can be added to SIOC data files without breaking the core function of the plugin. This capability is essential if developers want to extend SIOC data file and preserve the existing SIOC plugin implementation.

Geonames integrates hotel data

Geonames recently added 70,000 geocoded hotel data from three major hotel booking sites: hotels.com, diytravel and laterooms. This is part of geonames’ latest initiative to include more Point of Interest data into its open source geonames database.

Integrating data from different data providers is not a task without challenges.

The challenge in this task was to integrate and match data from various data providers. Names and addresses of hotels as well as data quality may vary dramatically among providers and it is often difficult to figure out whether two hotels are actually the same hotel or not.

Thinking about this data integration problem, at first I thought the problem could be easily solved if all data providers share a common hotel ontology, but later I realized it’s not that simple — at least building a such ontology is not straightforward.

Here are some data modeling issues must be considered:
Read the rest of this entry »

Deep tagging

Today’s conventional tagging mechanism, found on flickr, del.icio.us and youtube, allows users to associate web resources (i.e., documents, photos, videos and audios) with a set of keywords (i.e., tags). This mechanism works well for tagging resources that have a direct accessible URL. However, it falls short in being an effective tool for labeling the “deep” content of those resources — e.g., a partial clip from a youtube video.

To solve this problem, startups are building around a new tagging idea called deep tagging. Deep tagging is extremely useful in organizing and sharing lengthy video and audio files online. Here is how deep tagging works in Veotag:

  1. Users upload a video or audio file to Veotag
  2. Using a web-based deep tagging tool, users can assign tags (veotags) to specific clips from the file.
  3. When other users are viewing this file, they have the option to play the whole file from the beginning to the end or jump back and forth between different veotagged clips.

See this Veotag video for more information.

I’m intrigued by this new deep tagging idea.

Read the rest of this entry »

Springer book: The Geospatial Web

the geospatial webThe Geospatial Web is an edited volume that summaries the latest research on the Geospatial Web.

This book presents the state-of-the-art in geospatial Web technology. It gradually exposes the reader to the technical foundations of the Geospatial Web, and to new interface technologies and their implications for human-computer interaction. Several chapters deal with the semantic enrichment of electronic resources, a process that yields extensive archives of Web documents, multimedia data, individual user profiles and social network data.

Taking a quick look at the table of content, I find the book to be interesting from two different perspectives. One, it covers a broad range of Geospatial Web topics — from basic research to real-world applications. Second, it includes several chapters that cover the cross-fertilization of geospatial and Semantic Web technology. Topics that I find to be especially interesting: location-based Web search, extracting geospatial semantics from documents, geospatial communities and ubiquitous cartography.

This book will be shipped on May 18, 2007. You can pre-order it on Amazon.

102 alternative search engines

Read/WriteWeb compiled it’s list of 100 alternative search engines (as of April 2007) — i.e., search engines other than Google, MSN, Yahoo! and Ask. While it’s pretty amazing to see different categories of search engines (people search, video search, cluster search, social search etc.), the list is missing one important search category: Semantic Web search engines.

Search engines of the Semantic Web should be added to the list:

  • Swoogle: triple and ontology search
  • SWSE: combines typical web search with RDF query filters

So, will Semantic Web search engines change the way we search the Web? Yes, they will. However, this new wave of Web 3.0 search engines will not replace the existing search engines. Like those 100 alternative search engines mentioned in Read/WriteWeb, they together will compete with the current giants for market shares in various specialized search categories — videos, blogs, triples, metadata etc.

Here is my speculation.

It’s clear that Google, Yahoo!, MSN, and Ask are the current champions of general-purpose Web search. However, it’s unclear if they can also be the champions of domain-specific search. My guess is that they can’t.

Given that there are at least 100 + 2 other different search engines in the competition, few are likely to overpower the existing giants in the specialized web search. Of course, this is only true if cash-rich companies like Google and Microsoft don’t buyout the new startups before they get a chance to become successful.