buygoogle  
 Google's prospects from a Google user and independent investor   
 
    
« Home

Add to Google

Web www.buygoogle.com

Posts

There's no delete button on the Web
Games with a purpose
Google's secret weapon
It's not an Excel killer, it's the Google Grid
Vonage train wreck realized
Vonage IPO trainwreck
Google and the market for TV ads
Will new entrants bust the online ad market?
Google and the Paradox of Choice
Microsoft adCenter is broken
 
     Archives
04/25/04 05/02/04 05/09/04 05/16/04 05/23/04 05/30/04 06/13/04 06/20/04 07/04/04 07/11/04 07/25/04 08/01/04 08/08/04 08/15/04 09/19/04 10/10/04 10/17/04 01/30/05 02/06/05 03/13/05 03/27/05 04/10/05 04/17/05 04/24/05 05/01/05 05/08/05 05/15/05 05/22/05 05/29/05 06/05/05 06/12/05 06/19/05 06/26/05 07/17/05 07/24/05 08/07/05 08/14/05 08/21/05 08/28/05 09/18/05 09/25/05 10/02/05 10/09/05 10/30/05 11/13/05 11/27/05 12/04/05 12/11/05 01/08/06 01/15/06 01/22/06 01/29/06 02/12/06 02/26/06 03/05/06 03/12/06 03/19/06 03/26/06 04/02/06 04/09/06 04/16/06 04/23/06 04/30/06 05/07/06 05/21/06 06/04/06 06/11/06 09/03/06
 
     Links
Chris Anderson, Current TV, Google Blog, Google Investor, Inside Google, John Battelle, MSN Search Blog, PVR Blog, Yahoo! Search Blog


There's no delete button on the Web - 9/06/2006 05:47:01 PM

Google grudgingly added a delete button to Gmail a while ago, but they continue to prove that there's no delete button for the Web.

Today Google released one of the most powerful (and under-hyped) recent products, an archive of news articles going back to the 1700s.  Did you know that the first use of the term 'email' was in 1772?  Oh, wait, maybe all those articles on a technology not invented until the late 20th century are just OCR errors of old printed newspapers.  In any case, articles that have long been forgotten and left for dead have now been resurrected and given a new lease on life, in a boon for content owners and citizens alike.

Scott Adams, creator of the Dilbert cartoons wrote a snarky blog post about the death of "Crocodile Hunter" Steve Irwin.  Adams found out today that the Web has no delete button after he deleted the post from his blog, but it proliferated across the Web anyway.  As of this post it's still in Google's cache.  And if Google removes it from their cache, plenty of bloggers have preserved Adams' sentiments for eternity - just search for [ crikey "crazy seal"].

I remember when Google themselves tried to hit the delete button to flush an inconvenient item down the memory hole, but until Google adds a delete button to the Web, that's gonna be difficult.


Games with a purpose - 9/04/2006 06:11:00 PM

I haven't posted in a while, since I didn't have much new to say. I hoped that the Buygoogle site would bring a new perspective on the company that might make the biggest difference in the first decade of the twenty-first century, but I haven't had many new perspectives lately. Hence the silence.

But there's been a good bit of discussion in the last week about Google's "Image Labler" game, and the observers who I respect may be missing the point -- so I'll break my silence for a contrasting view. Battelle goes on about the vocabulary, arguing with Google's choice of the term "label" instead of "tag"
I just wish Google would use the terminology the rest of the web has already settled upon. It's not a label. It's a tag. "Tag" means something - an intentional attribute given to an object on the web. That's what we are doing here.
Who cares what they call it, certainly not me. Battelle infers that Google is simply doing what others like Flickr have done, and simply assigning text attributes to help with searching. But there are signs that tagging isn't Google's objective here, no matter what you want to call it.

Philipp also sees the Image Labeler as simply a tagging exercise, using humans to annotate the Web:
More than a game, for Google this is a way to tag images using human brain power... to improve their image search results. I wonder if Google can reach critical mass with this game – enough players participating long enough to label many images – to ever make this relevant for their main image search. The idea of this approach isn’t new, but scaling it with the web will be tough.
But if we look at Google's longstanding aversion to human tagging, and join that with the recent acquisition of Neven Vision, you can see that there's likely a lot more to this than just getting people to label the Web for free.

Luis Von Ahn is the brain behind the ESP Game, which Google licensed to create the Image Labeler. Von Ahn talked to a Google audience about the ESP Game and other "games with a purpose" that he's developed to assign meaning to words and images on the Web. You can watch the entire 1-hour talk on Google Video, and see if you don't come away with the same impression that I did -- it's not about tagging or labeling, it's about machine learning.

Google was born out of a rebellion against human-edited directories of the Web that could not keep up with the Web's exponential growth, and were routinely manipulated by special interests willing to pay to subvert information for financial gain. Page and Brin put their faith in the computer algorithm, and to this day believe that the algorithm is faster, more comprehensive, and less biased than human editors.

Page and Brin may love the concept of the ESP Game and Google's Image Labeler because they take the good aspects of the social web and eliminate the crap that usually comes with user-generated content. For example, the ESP Game auto-corrects because each player must independently agree on the labels. And even if large group were to conspire to subvert results (i.e. hordes of Slashdot users all tag “monkey” to every image). the game automatically inserts known images into the stream, and if players don't guess known-good labels, the labels from those players are discarded. These "symmetric validation" and similar "asymmetric validation" methods result in extraordinarily reliable results.

Von Ahn points out that there is a lot of unused human capacity to do this grunt work. One slide (shown below) says that if people diverted the time they play solitaire to construction projects, they could build 1285 Empire State Buildings, or 450 Panama Canals every year. Why not have them label the Web?

Idle time

Of course, this is all just a means to an end. And while other observers may think the end is a set of reliably tagged images, I think that's just an intermediate step. Because once Google has a set of images that are labeled with a high signal-to-noise ratio, this would be an ideal corpus of material for their newly acquired "machine vision" technology to train on. And once Google's computers are trained to interpret images and video, then Google's algorithms can do for images and video what they've done for text -- organize it and make it universally accessible and useful.

Computer vision research

Indeed, Von Ahn alludes to that possibility in the video:
If we had this information for a lot of images, we could use this information for training computer vision algorithms....one of the major stumbling blocks [to effective computer vision algorithms] was a lack of training data...

And it's not just images that could benefit from this technique. The long-promised "semantic web" has been slow to realize because it's so difficult to generate all that meta-data. As Von Ahn says in the video when talking about another game under development to infer "common-sense facts" about about language:
Research projects have failed due to a lack of training data, since it is so tedious to key in all those common-sense facts. But if there was a fun game that could do it ...
Google used a large corpus of training documents to teach their computers to translate Arabic. Until now, there was no similar monster data set of reliable meta-data to train their computers to interpret images, or to infer meaning. With these games, Google may soon have a reliable training corpus for these purposes, too.

These are ingenious games with a purpose. But while many observers would point to the obvious purpose of "tagging," that's just a means to an end. The ultimate purpose is to feed the Google AI.


 buygoogle.com