On the cross-fertilization of geospatial and semantic web technology

Google’s new Digg-like search page

Google is experimenting a new search interface that is very similar to which of Digg. Users can vote and comment on search results — watch this video. I like the idea that users can customize and influence how results are displayed, but also I’m very worried about privacy issues that arise from this new feature.

Allowing individual users to influence search results can improve the quality of search. Vote and comment on links will harness the power of crowdsourcing, making results produced by the machine algorithms more relevant to the humans.

However, benefits of a new technology don’t usually come free of problems. Privacy is a key issue that I’m worried about. There is the problem of who gets to see what I have done in Google — links I have voted and comments I have written. Then, there is the problem of someone bad-mouth about my work. What if someone really dislikes me and wants me to fail in every stage of my life. That person can go through all links related to me in Google and spread false information about me. Unless there is a way for me control what gets displayed on different links. I will have no way to prevent my reputation being destroyed. Some people may argue that this type of attack can be done on the current web. But, I think this new tool will make those attacks a lot easier to create but difficult to prevent.

In summary, I’m happy to see a Digg-like interface in Google, but I think there are serious privacy issues must be addressed by Google. Although we can’t fully control all information on the web that says about us, but this new interface makes too easy for people make effective personal attacks.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Eye-Fi: the ultimate SD memory card

I dreamed of designing a new digital camera that knows my location when I take pictures. Since then, few products have showed up in the market. But, none of them is as sexy as this new product called Eye-Fi.

It’s an SD memory card with built-in Wi-Fi capability and does geotagging. It requires no special hardware modules. It works with any digital cameras that support SD memory card. How much? $129.

Technical details:

  • Supports 802.11b/g/n
  • Geotagging is built on the Skyhook technology (not GPS)
  • Can upload photos to the Web without connecting to a computer

Skyhook is a Wi-Fi based geo-location technology. It doesn’t rely on GPS signals to determine a device’s current location. Instead, it exploits the signal strength of Wi-Fi stations in the close vicinity.

500 full-time Skyhook employees have spent the last five years driving every road, lane and highway in every major American city —and, lately, European and Asian cities. Its equipment measures all those Wi-Fi signals leaking out of homes and stores and offices, and marries that information with the car’s G.P.S. location as it drives.

Read more about Eye-Fi in this NY Times article.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Search engine paradigm shift

Search engines are essential to the success of the Web. Without search engines, our Internet experience would be crippled. Recently Ian Hendry wrote a blog post that speculates the usefulness of the search engines in the near future. He thinks that search engines are facing extinction.

Paradigm Shift

Ian’s premise is an interesting one — the emergence of specialized web sites will drive away many search queries from the general-purpose search engines like Google and Yahoo!. For example, if you want to lookup the education background of your colleague, you would go to LinkedIn and Facebook, and if you want to find out facts about John Locke, you would go to Wikipedia and Freebase.

People can often bypass the use of search engines because they know exactly which web sites to go to for finding information that they are after. The use of search engine plugins in Firefox is a good example. If a person uses the IMDB search engine plugin to lookup the actors in a movie, then the person bypasses the use of Google and Yahoo!.

What Causes the Paradigm Shift

Coming back to Ian’s theory, search engines are facing extinction in the future. I think the word “extinction” is a bit too extreme. It’s unlikely that search engines will disappear from the Web. However, they will no longer be the one-stop location for people to find information.

In the past, without specialized web sites, people have no choice but rely on search engines to find information. Today, the Web is gradually becoming a collection of independent islands of information (YouTube of videos, Facebook of people, Wikipedia of facts, etc.). People have choices in deciding where to send their search queries. “Not all search query are belong to Google”.

What’s Next for the Search Engine Companies

If my analysis is correct, then search engine companies that rely on ad revenues to operate will ask one question, how can we drive more traffic to our search engines? There are few different solutions to this problem. One, the company that can try to monopolize the general-purpose search engine space. Second, change the way people use search engines.

The first approach is easy to understand. Let’s think a bit about the second approach.

We use search engines because we want to find information. A typical flow of the process is the following: (1) We send a query. (2) A list of results is displayed. (3) Go through the first new pages on the top of the list and try to find information that we are after.

I think we spend most of our time in Step 3. If we can’t find the information we want in the first few pages, we repeat the process all over again.

Smarter Search Results

A solution to this problem is to make the result list “smarter”. For example, based on the user’s query intention, display the most relevant information on the top of the result page. When you search for “John Locke” in Google, the first few links point to books by or written about John Locke. This is a good start, but we can do better.

SearchMonkey is better solution to this problem. Third-party developers can introduce new ways to format search results and influence the search experience of the users. My favorite SearchMonkey use case is the LinkedIn public profile.

Concluding Remarks

I think Ian is right about the emergence of a paradigm shift in how we use search engines on the Web. As we become more familiar with and develop trust in specialized web sites, we can bypass the general purpose search engines to find information that we are after. In order for search engines to keep up with this paradigm shift, they must reinvent themselves. Google’s mashup search results and Yahoo!’s SearchMonkey are on the cutting edge of this new movement.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Geohash for spatial index and search

Geohash is a new algorithm for encoding latitude and longitude coordinates. Given a pair of lat/long coordinates (e.g., 42.6 -5.6) as the input, the algorithm produces a string output that is the Geohash encoding of the coordinates (e.g., ezs42).

Why is this interesting? In many Web applications, we often need to create unique identifiers for locations. It’s easy to use the Geohash algorithm to construct URL (or URI) for identifying “point” locations (i.e., locations defined by a pair of lat/long coordinates).

Here is an example:

  • Location: Baltimore, MD
  • Lat/long coordinates: 39.286534 -76.613558

If we set this coordinates as the input to the Geohash algorithm, we will get an output string:

  • dqcx2xgswswx

To create a unique URL (or URI) that identifies “Baltimore, MD”, we simply append the string produced to an URL prefix (e.g., http://geohash.org)

  • http://geohash.org/dqcx2xgswswx

What are the advantages of using Geohash? Geohash seems to be useful in building simple spatial index and spatial search. In a typical geospatial application, we often rely on spatial databases (e.g., MySQL Spatial and Oracle Spatial) to provide spatial index and search. However, this imposes a significant amount of overhead during the development and deployment. This is where Geohash can help. If the spatial operations in the application are relatively simple (e.g., only involve points), Geohash provides an easy solution to build an index of locations and allow them to be searched.

How do we do this? The encoding function of the Geohash algorithm has an interesting property. Given any two pairs of latitude and longitude coordinates about a similar location but of different granularity, the strings produced by the algorithm will always share a common prefix string.

For example:

Given two points,

  • Point A (39.286534 -76.613558), and
  • Point B (39.286 -76.613)

The corresponding Geohash encoding of these two points are

  • dqcx2xgswswx
  • dqcx2xu1

Notice these two strings share a common prefix string: ‘dqcx2x’.

Given this unique property, we can utilize this knowledge to find locations that are nearby each other or deduce whether or not different locations are about the same physical place.

You can read more about Geohash on Wikipedia and Geohash.org.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Using Wii Fit to cruise around Google Earth

Mattieu Deru and Simon Bergweiler at DFKI demonstrate the use of Wii Balance Board as an input device to Google Earth. The application they developed is written in C#.

It’s pretty amazing what one can do with Wii controllers.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Structured data representation of financial data

The U.S. Security and Exchange Commission recently proposed a timetable requiring 500 of the largest public companies to begin filling their financial data using XBRL (Extensible Business Reporting Language).

Why XBRL? Instead of publishing financial data in tables and charts, an XBRL representation will allow financial data be described in a more structured representation that is suitable for machine processing. Based on this format, researchers will be able develop software programs to verify data and detect errors.

I think this is a good news for the Semantic Web community. It will create a mass amount of free and real-world data for research. Also, for those who want to play with financial data in a semantic representation, it should be relatively easy to map XBRL into RDF.

Read the full story.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

MySpace plans to support the Semantic Web

According to the Web User News, MySpace plans to support Semantic Web technologies. From the article, it’s unclear exactly how the world’s largest social networking site plans to support RDF, RDFa or microformats. It may be still too early to celebrate.

At the SocialDevCamp East, we talked about the importance of Semantic Web technologies in context of social media. One question that didn’t get answered is “Who is going to publish semantic descriptions of social media data on the Web?”

Given that leaders like Digg and MySpace have plans to support the Semantic Web, I’m optimistic that the unanswered question will be answered soon.

Previously there was no way of linking this data, but the semantic web is able to retrieve and collate it. This means that it can now input your personal information should you join another social network.

MySpace users will still maintain complete control over what information they share and who gets to see it, but it will make sharing information across different platforms easier and quicker.

Yahoo announced earlier this year its plans to utilise the semantic web with a more efficient tagging system to give better search results.

MySpace’s DeWolfe said he hoped other networking sites, including Facebook, would sign up to the agreement.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

Reflecting on the news Digg embraces RDFa

Digg, one of the popular social news web sites, announced that it will begin to support RDFa, a standard for embedding RDF statements in XML documents. Here is a screenshot of digg RDFa in action.

Although it’s unclear at the moment how this new feature will help digg to expand to its market share, but the downstream consequence is definitely positive. Technologies like RDFa and Microformats are crucial to the success of the Semantic Web.

My speculation is that HTML will continue to dominate the market of web publishing. People will continue to publish information in HTML because it’s the best markup language for displaying human-readable information in browsers. It’s the lowest common denominator for cross-platform information display. All desktop computers can run browsers to display HTML. Just about every mobile devices on the market today support some form of HTML rendering. In addition, there are incentives not to introduce other format representations because HTML contents can display well in mobile browsers like the Opera Mobile.

If HTML is here to stay, then from the Semantic Web development point of view, we must figure out how to publish semantic data along side with HTML. In general, there are two approaches: (1) publish the semantic data of each and every HTML pages in separate documents, (2) embed the semantic description in the same HTML pages. RDFa and Microformats are technologies of the latter.

There are pros and cons associated with both approaches. For this reason, I think in the near future we will see web applications to support both approaches. However, if you ask which approach will likely to attract web developers to share data, my answer is the latter approach (i.e., RDFa and Microformats).

First, they would require less overhead in Web development. Adding few extra HTML attributes in the existing template pages is relatively easy. But, creating separate full-blown RDF documents would require completely different set of business logic and template pages.

Second, the use of RDFa and Microformats can utilize the existing techniques for optimizing Web publishing. For example, caching is common technique used by many web sites to improve performance. If semantic data is embedded in HTML, then it can also be cached without much re-implementation.

Third, embedding semantic data in HTML gives web developers a sense of familiarity. People like to work with what they are familiar with, and many of them are reluctant to change. In an early stage of the Semantic Web movement, some web developers may show signs of resistance to RDF document publishing. But, convincing them to use RDFa and Microformats should be easy.

I’m happy to see that RDFa is adopted by Digg, and hope that more news sites will come to follow. I’m thinking that in the next release of gnizr, I will introduce the publishing of semantic data in RDFa or Microformats — some editing of the existing Freemarker template pages should do the trick.

Sharing is Good. These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • YahooMyWeb
  • Reddit
  • co.mments
  • Furl
  • Ma.gnolia
  • NewsVine
  • Simpy
  • bodytext
  • E-mail this story to a friend!
  • Facebook
  • Google
  • StumbleUpon
  • Technorati
  • TwitThis

« Previous Entries