Today at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Google released a new research paper that looks at building a web-scale landmark-recognition engine. This could lay the groundwork for some interesting advancements in image search.
“To be clear up front, this is a research paper, not a new Google product, but we still think it’s cool,” says Jay Yagnik, Head of Computer Vision Research.
The goal is to get computers to recognize landmarks (for example, the Eiffel Tower, the Lincoln Memorial, or the example Google shares – the Acropolis). This is no easy task when the engine has to rely on images of the landmarks, which are incredibly varied by angle, lighting, photo quality, etc.
Google says it has managed to achieve 80% accuracy on over 50,000 landmarks. Google demonstrated how it did this with the Acropolis. Essentially, they began with an unnamed, untagged picture of it, entered the web address into the recognition engine, and the computer identified it as “Recognized Landmark: Acropolis, Athens, Greece.”
To make a long and technical story short, Google generated a list of landmarks based on GPS-tagged photos from Picasa and Panoramio, and online tour guide webpages. Google then found “candidate images” for each landmark using those resources as well as Google Image Search. These images were “pruned using efficient image matching and unsupervised clustering techniques, as Yagnik explains. Then Google developed an indexing system for fast image recognition.
Here’s what the clustered recognition model looks like:
“Here, we build a world-scale landmark recognition engine, which organizes, models and recognizes the landmarks on the scale of the entire planet Earth,” concludes Google’s research paper (PDF). “Constructing such an engine is, in essence, a multi-source and multi-modal data mining task.”
There isn’t really any mention of how such an engine could contribute to the improvement of Google Image Search in the research paper, but given that search, and “organizing the world’s information” are what Google is essentially all about, it’s not hard to imagine this technology being incorporated into how Google retrieves results for Image Search in the future.