Building a Linux Media Network, one step at a time

Tuesday, November 13, 2007

Data Processing and the Gravel Biz

My Dad's in the gravel business. Ever watch The Flintstones? He's like Mr. Slate. His job is to get as much sand and gravel as possible out of a mountain and onto barges. From there, it floats down a river to a depot where (presumably) people are willing to pay for it.

The trick about the gravel business, as with any other commodity industry I guess, is that you pretty much live or die based on 2 factors:
  • How often you touch the product.
  • How much it costs you each time you touch it.

In my Dad's case, they can't drive loaded trucks down the steep hill of their quarry to get to the river bank. The road is too narrow to allow the trucks to pass each other. So this is what they do:
  • A dumptruck pulls up to a big exacavator, which is scraping away at the side of the mountain pretty much non-stop.
  • The excavator fills the dump truck. Takes somewhere around 5-6 scoops, I think.
  • The dump truck backs into position at the top of the cliff and waits for the all-clear to dump its load. You don't want to rush that task or you wind up with a lot of expensive metal at the bottom of a cliff.
  • While that dump truck is getting ready to dump its load, another dump truck (from the pool, see where I'm going with this?) pulls up to the excavator and begins to receive its load.

It's not a perfect setup: there is a finite amount of room on this plateau where the excavation takes place, you can't fit an unlimited number of dumptrucks in there. Sometimes a dumptruck is forced to wait while the excavator fills up another truck. We would not call this problem "embarassingly parallel" but there is definitely a producer-consumer pattern here.

But a similar pattern plays itself out at the bottom of the cliff: loaders scoop up the dumped gravel and deposit it on a conveyor belt, where it is fed into a crusher (from there into another gravel pile, and from there onto a conveyor belt/barge, and from there to the sales facility, where another loader unloads the barge. All told, I think they handle the product 4 times).

The bottleneck here, of course, is the road. At some point as the production capacity up on the plateau expands, the capital and operational expense of widening the road will become less than the cost of handling all that sand and gravel one extra time.

What I find interesting about this problem is that the cost of handing a piece of gravel is infinitesimally small. But when you multiply that cost by several trillion, it adds up to real dollars and cents. And so it is with the Wide Finder. Handling a single byte's worth of data, or a line's worth of data, is so "cheap" we hardly ever think about it. But handling a Gb's worth of data, or 10 million lines worth, now you're talking real money. Because processing time, especially in a batch environment like this, is money.

Meh, some neat parallels there, is all I'm saying.


Post a Comment

<< Home