Building a Linux Media Network, one step at a time

Monday, January 28, 2008

Google/Flickr Image Scraper

I just received my XO laptop through the Give-1-Get-1 program. I don't have a lot of plans for it yet but one thing I want to do is set up some photo albums for my 3-year-old son. Interests include: helicopters, airplanes, trucks, trains. The usual!

I wrote a small app to grab pictures from Google Images based on a query string. It was a short exercise in concurrent programming, more than anything else. Lesson learned: the Java 1.5 concurrency APIs don't make the producer/consumer design pattern as easy as you might think, particularly when it comes to producer shutdown.

After wrapping up the Google images scraper my friend Gwilli pointed out that Flickr is a much better resource for this sort of thing. D'oh! Fortunately it was a very small change to scrape their database, too.

Instructions:
  1. Make sure you have Java 1.5 or Java 6. Download googlor.jar.
  2. Run java -jar googlor.jar (in OS X, just double-click the jar file)
  3. The fields are pretty straightforward. By default images will go into the images/ subdirectory of the current working directory.
  4. When images are presented, press 'j' to junk them or 'k' to keep them. That's it.
I'm happy to make the source available to anyone who's interested.

(I should point out, the images grabbed from Flickr are low-res, maximum 500x500. This is a good fit for the XO's screen)