April 13th, 2005

belize

gearman demo; distributed map in Perl

We've been talking about writing a distributed job system for awhile now. I'm pretty aware of what's out there, but you know me. (hate everything, picky, etc....)

So Whitaker and I spec'ed this out last week and I got to hacking on it today. Even with on/off interrupted hacking time all day, it works:

-- Multiple job servers track clients and workers. (scales out)

-- Clients hash requests to the "right" job server, or whoever's alive. There is no right job server, but the job server can merge requests if the clients both wanted, so it's beneficial to map the same jobs to the same job servers. That is, if two callers both ask for job "slow_operation_and_big_result" to be done, it's only done once, even if the second client comes in 2 seconds into the computation.

-- Workers on connect announce what job types they're willing to do. Can also unown things.

-- Workers poll all job servers doing a "grab_job" operation. If no jobs, workers announce they're going to sleep, and select on a "noop" wake-up packet from the server if it gets something the worker can do. (job server can't just give out the job, lest there be races between multiple job servers, and you want low-latency: it'd be bad for two job servers to send a request to the same worker when another worker was idle... so all jobservers can do is wake up workers)

-- Clients can submit lots of jobs, get handles for them, and wait for their results (and status updates) in parallel

-- Client can submit a "background" job where a handle is returned, but client isn't subscribed to status notifications. The client just wants it done sometime soon, but it's going away.

-- Job server makes no promises about things getting done eventually, durability, etc. That's all done at different layers. (for instance: for non-background jobs, the client module can be told that the result is idempotent and on failure, it should be retried, the failure hidden from the client)

Anyway... I didn't want to go into all those details, because they're poorly explained. But it works, and one worker process I just wrote registers with the job server and announces it can help do a distributed map for Perl, using Storable.pm's CODE serialization using B::Deparse.

Observe:
sammy:server $ cat dmap.pl
#!/usr/bin/perl

use strict;
use DMap;
DMap::set_job_servers("sammy", "kenny");

my @foo = dmap { "$_ = " . `hostname` } (1..10);

print "dmap says:\n @foo";

sammy:server $ ./dmap.pl
dmap says:
 1 = sammy
 2 = papag
 3 = sammy
 4 = papag
 5 = sammy
 6 = papag
 7 = sammy
 8 = papag
 9 = sammy
 10 = papag
So it's like Perl's map, but the computations are spread out all over, and recombined in order.

I can't wait for this to be production-quality so we can do things like parallel DB queries. (I especially love the request merging.)