But there's been a good bit of discussion in the last week about Google's "Image Labler" game, and the observers who I respect may be missing the point -- so I'll break my silence for a contrasting view. Battelle goes on about the vocabulary, arguing with Google's choice of the term "label" instead of "tag"
I just wish Google would use the terminology the rest of the web has already settled upon. It's not a label. It's a tag. "Tag" means something - an intentional attribute given to an object on the web. That's what we are doing here.Who cares what they call it, certainly not me. Battelle infers that Google is simply doing what others like Flickr have done, and simply assigning text attributes to help with searching. But there are signs that tagging isn't Google's objective here, no matter what you want to call it.
Philipp also sees the Image Labeler as simply a tagging exercise, using humans to annotate the Web:
More than a game, for Google this is a way to tag images using human brain power... to improve their image search results. I wonder if Google can reach critical mass with this game – enough players participating long enough to label many images – to ever make this relevant for their main image search. The idea of this approach isn’t new, but scaling it with the web will be tough.But if we look at Google's longstanding aversion to human tagging, and join that with the recent acquisition of Neven Vision, you can see that there's likely a lot more to this than just getting people to label the Web for free.
Luis Von Ahn is the brain behind the ESP Game, which Google licensed to create the Image Labeler. Von Ahn talked to a Google audience about the ESP Game and other "games with a purpose" that he's developed to assign meaning to words and images on the Web. You can watch the entire 1-hour talk on Google Video, and see if you don't come away with the same impression that I did -- it's not about tagging or labeling, it's about machine learning.
Google was born out of a rebellion against human-edited directories of the Web that could not keep up with the Web's exponential growth, and were routinely manipulated by special interests willing to pay to subvert information for financial gain. Page and Brin put their faith in the computer algorithm, and to this day believe that the algorithm is faster, more comprehensive, and less biased than human editors.
Page and Brin may love the concept of the ESP Game and Google's Image Labeler because they take the good aspects of the social web and eliminate the crap that usually comes with user-generated content. For example, the ESP Game auto-corrects because each player must independently agree on the labels. And even if large group were to conspire to subvert results (i.e. hordes of Slashdot users all tag “monkey” to every image). the game automatically inserts known images into the stream, and if players don't guess known-good labels, the labels from those players are discarded. These "symmetric validation" and similar "asymmetric validation" methods result in extraordinarily reliable results.
Von Ahn points out that there is a lot of unused human capacity to do this grunt work. One slide (shown below) says that if people diverted the time they play solitaire to construction projects, they could build 1285 Empire State Buildings, or 450 Panama Canals every year. Why not have them label the Web?

Of course, this is all just a means to an end. And while other observers may think the end is a set of reliably tagged images, I think that's just an intermediate step. Because once Google has a set of images that are labeled with a high signal-to-noise ratio, this would be an ideal corpus of material for their newly acquired "machine vision" technology to train on. And once Google's computers are trained to interpret images and video, then Google's algorithms can do for images and video what they've done for text -- organize it and make it universally accessible and useful.

Indeed, Von Ahn alludes to that possibility in the video:
If we had this information for a lot of images, we could use this information for training computer vision algorithms....one of the major stumbling blocks [to effective computer vision algorithms] was a lack of training data...And it's not just images that could benefit from this technique. The long-promised "semantic web" has been slow to realize because it's so difficult to generate all that meta-data. As Von Ahn says in the video when talking about another game under development to infer "common-sense facts" about about language:
Research projects have failed due to a lack of training data, since it is so tedious to key in all those common-sense facts. But if there was a fun game that could do it ...Google used a large corpus of training documents to teach their computers to translate Arabic. Until now, there was no similar monster data set of reliable meta-data to train their computers to interpret images, or to infer meaning. With these games, Google may soon have a reliable training corpus for these purposes, too.
These are ingenious games with a purpose. But while many observers would point to the obvious purpose of "tagging," that's just a means to an end. The ultimate purpose is to feed the Google AI.
