Sunday, October 28, 2007

In human grid, we're the cogs

Images from a research shopping trip with GroZi a Grocery Shopping Assistant for the Visually Impaired developed by UC San Diego computer science professor Serge Belongie. On October 15, 2007 Belongie presented a paper at an interactive computer vision conference and described how people posting comments on blogs could provide data critical for this project. Credit: Serge Belongie, Usage Restrictions: Mandatory credit: Serge Belongie, UC San Diego.Images from a research shopping trip with GroZi a Grocery Shopping Assistant for the Visually Impaired developed by UC San Diego computer science professor Serge Belongie.
On October 15, 2007 Belongie presented a paper at an interactive computer vision conference and described how people posting comments on blogs could provide data critical for this project. Credit: Serge Belongie.

Human computation placed in a grid, for a greater good

Before you can post a comment to most blogs, you have to type in a series of distorted letters and numbers (a CAPTCHA) to prove that you are a person and not a computer attempting to add comment spam to the blog.

What if – instead of wasting your time and energy typing something meaningless like SGO9DXG – you could label an image or perform some other quick task that will help someone who is visually impaired do their grocery shopping?

In a position paper presented at Interactive Computer Vision (ICV) 2007 on October 15 in Rio de Janeiro, computer scientists from UC San Diego led by professor Serge Belongie outline a grid system that would allow CAPTCHAs to be used for this purpose – and an endless number of other good causes.

Structure of the SOYLENT GRID presented by UC San Diego computer scientist Serge Belongie at a computer vision conference on Oct. 15, 2007 called ICV 2007. The users benefiting from our grid can be of two kinds: researchers (needing some information analysis) and commercial clients (that simply use the Turing test generation service). These providers impose their constraints to the back end MySQL server by giving their datasets and describing the tasks to be performed by the end users. Next, when a participant requests a CAPTCHA (Turing test), the Java front end interacts with the server to get a Turing test and also tests the validity of the provided answers. Any information input by the participant (like the answer itself or the time taken to answer) is also sent back to the server for statistical purposes. Credit: Serge Belongie. Usage Restrictions: Mandatory Credit: Serge Belongie / UC San Diego.Structure of the SOYLENT GRID presented by UC San Diego computer scientist Serge Belongie at a computer vision conference on Oct. 15, 2007 called ICV 2007.
The users benefiting from our grid can be of two kinds: researchers (needing some information analysis) and commercial clients (that simply use the Turing test generation service). These providers impose their constraints to the back end MySQL server by giving their datasets and describing the tasks to be performed by the end users. Next, when a participant requests a CAPTCHA (Turing test), the Java front end interacts with the server to get a Turing test and also tests the validity of the provided answers. Any information input by the participant (like the answer itself or the time taken to answer) is also sent back to the server for statistical purposes. Credit: Serge Belongie

“One of the application areas for my research is assistive technologyfor the blind. For example, there is an enormous amount of data that needs to be labeled for our grocery shopping aid to work. We are developing a wearable computer with a camera that can lead a visually impaired user to a desired product in a grocery store by analyzing the video stream. Our paper describes a way that people who are looking to prove that they are humans and not computers can help label still shots from video streams in real time,” said Belongie.

The researchers call their system a “Soylent grid” which is a reference to the 1973 film Soylent Green (see more on this reference at the end of the article).

“The degree to which human beings could participate in the system (as remote sighted guides) ranges from none at all to virtually unlimited. If no human user is involved in the loop, only computer vision algorithms solve the identification problem. But in principle, if there were an unlimited number of humans in the loop, all the video frames could be submitted to a SOYLENT GRID, be solved immediately and sent back to the device to guide the user,” the authors write in their paper.

From the front end, users who want to post a comment on a blog would be asked to perform a variety of tasks, instead of typing in a string of misshapen letters and numbers.

Calit2 researcher John Miller (UC San Diego Ph.D. '03, electrical engineering) who is blind, tests a Grozi shopping prototype at a grocery store near the UC San Diego campus. On Oct. 15, 2007 Belongie presented a paper at an interactive computer vision conference and described how people posting comments on blogs could provide data critical for this project. Credit: John Miller. Usage Restrictions: Mandatory credit: John Miller / UC San Diego.Calit2 researcher John Miller (UC San Diego Ph.D. '03, electrical engineering) who is blind, tests a Grozi shopping prototype at a grocery store near the UC San Diego campus.
On Oct. 15, 2007 Belongie presented a paper at an interactive computer vision conference and described how people posting comments on blogs could provide data critical for this project. Credit: John Miller. Usage Restrictions: Mandatory credit: John Miller , UC San Diego.

“You might be asked to click on the peanut butter jar or click the Cheetos bag in an image,” said Belongie. “This would be one of the so called ‘Where’s Waldo’ object detection tasks.”

The task list also includes “Name that Thing” (object recognition), “Trace This” (image segmentation) and “Hot or Not” (choosing visually pleasing images).

“Our research on the personal shopper for the visually impaired – called Grozi – is a big motivation for this project. When we started the Grozi project, one of the students, Michele Merler – who is now working on a Ph.D. at Columbia University – captured 45 minutes of video footage from the campus grocery store and then endured weeks of manually intensive labor, drawing bounding boxes and identifying the 120 products we focused on. This is work the soylent grid could do,” said Belongie.

From the back end, researchers and others who need images labeled would interact with clients (like a blog hosting company) that need to take advantage of the CAPTCHA and spam filtering capabilities of the grid.

“Getting this done is going to take an innovative collaboration between academia and industry. Calit2 could be uniquely instrumental in this project,” said Belongie. “Right now we are working on a proposal that will outline exactly what we need – access to X number of CAPTCHA requests in one week, for example. With this, we’ll do a case study and demonstrate just how much data can be labeled with 99 percent reliability through the soylent grid. I’m hoping for people to say, ‘Wow, I didn’t know that kind of computation was available.’” ###

This work incorporates recent work from a variety of researchers, including computer scientist Luis von Ahn from Carnegie Mellon University. His reCAPTCHA project uses CAPTCHAs to digitize books.

Explanation of the name of the grid and title of the paper:

The researchers call their system a “Soylent grid” and titled their paper “Soylent Grid: it’s Made of People! Both the grid name and paper name are references to the 1973 cult classic film Soylent Green, a dystopian science fiction film set in an overpopulated world in which the masses are reduced to eating different varieties of “soylent” – a synthetic food that suggests both soybeans and lentils.

The line from the movie that inspired the title of this paper comes is delivered when someone discovers that soylent green is actually made of cadavers from a government sponsored euthanasia program – prompting the phrase “Soylent green, it’s made of people!” The computer scientists are playing off this famous phrase with their title: “Soylent Grid: it’s Made of People!” The idea being that people from all over the world need to jump through anti-spam hoops such as CAPTCHAs, and the power of these people can be harnessed through a grid structure to do some good in the world.

Author contact: Serge Belongie. sjb@cs.ucsd.edu

Paper citation: “Soylent Grid: it’s Made of People!” by Stephan Steinbach1, Vincent Rabaud2 and Serge Belongie2 1Calit2 and 2Department of Computer Science and Engineering. University of California, San Diego, La Jolla, CA 92093, USA
vision.ucsd.edu and grozi.calit2.net/

Paper presented at ICV 2007 (Interactive Computer Vision) in conjuction with ICCV 2007 Rio de Janeiro, Brazil, October 2007. Download the paper in PDF format at: cs.ucsd.edu/icv2007.pdf

Contact: Daniel Kane. dbkane@ucsd.edu. 858-534-3262. University of California - San Diego

Technorati Tags: and or and or Presidential Podcast 10/27/07 and Happy Halloween Banners and Taming tiny, unruly waves for nano optics

No comments:

Post a Comment