Log in

No account? Create an account
busy busy - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

busy busy [Oct. 15th, 2004|12:31 am]
Brad Fitzpatrick
[Tags|, , ]

Work's been interesting lately. Dozen things going on in all directions.

I'll try to list some of them:

-- Mahlon's investigating using MySQL Cluster. He has 4 nodes up (2 sets of 2 redundant nodes) so we can learn how to admin it, monitor it, benchmark it, understand its limitations, etc. Fun stuff. This is the future.

-- We've been running an increasing number of async jobs distributed. The problem with that is how do they get their work? They can all independently query what needs to be done, randomize the list, and iterate over it with proper locking on each work item, but it gets to be a waste of time. We're moving all to a new model (called a "message board" or "Linda tuple space") where a job server is the hub between workers and populators, all working in parallel, and the job server atomically hands out jobs ("grab" operation). This is speeding up async work a lot, and we're only just starting to use it in more places.

-- We've moved all userpics and phoneposts to MogileFS, stressing it in a way I didn't anticipate: ephemeral port exhaustion. 65k should be enough for anybody! (well, more like half that) Anyway, came up with some fixes for that, some done, some pending. (also got a bunch new hardware tonight that'll help quite a bit... we'd been running pretty low)

-- Ephemeral port exhaustion led us to work on persistent connections for Perlbal from the client side. Previously it only did persistent connections between perlbal and backends. But because we can do that funky reproxy (internal redirect) thing, those reproxied connections weren't cached and were wasting ephemeral ports pretty quickly. Junior to made the webserver (mogstored, which is perlbal's libraries) to persistent connections, then we need to make the internal redirect HTTP client stay persistent.

-- Minor perl profiling / optimization work with Perlbal. You could waste days and weeks on this... best to limit yourself and not get carried away. Makes for good work when everything else is frustrating you though.

-- Been getting lots of new hardware in and setup. Ordered 8 new dual proc 3.0 Ghz Xeon 1MBs. Also 2 dual 246 Opterons w/ 8 GB of memory. Fought some NUMA bugs where Linux was unreliable in certain topologies. lkml people blamed the motherboard for even allowing it, or being unstable with it as it was. Got two new Itanium machines coming soon.

-- Jesse's investigating OpenLaszlo, which I can't wait to see be magic. Also more JavaScript stuff, which finally feels almost stable enough as a platform to do reliable things. Almost stable. I watch the crap he puts up with and am glad I'm not a JavaScript programmer.

-- FotoBilder's new protocol is coming along. Some more of us need to fully review it and officially publish it to spur some client developent.

-- Been doing a lot of late-night database maintenance, so I sit here while operations are pending, writing long-winded posts.

From: insom
2004-10-15 03:16 am (UTC)

Tuple Spaces

I use a tuple space for distributing message sending tasks in an SMS app, and it's absolutely fantastic.

One issues, however, is that you don't want one task given to two people, so it has to be transactional (easy).

The other issue, is that if a task fails, it needs to be re-inserted into the space. By fail, we could mean dies, throws a fatal exception, or gets unplugged from the network. We need (and, actually don't have) a timeout mechanism to show that a tuple is really "done" or else we re-insert it.
(Reply) (Thread)
[User Picture]From: brad
2004-10-15 07:57 am (UTC)

Re: Tuple Spaces

Our tuplespace doesn't allow you to reinsert a job that's either pending or being worked on by somebody so we'll just run our populators (to find work that needs to be done) every 30 seconds or so as needed to keep enough work populated on the message board.

If something is handed out and later dies, its connection to the server will die and we'll mark it as no longer being worked on and a populator will be able to put it back in. (and to make sure the job isn't still running on a partitioned network, out of touch from the job server, we require the worker to contact the job server and confirm it's still there before it commits its work right at the end)
(Reply) (Parent) (Thread)
(Deleted comment)
[User Picture]From: mart
2004-10-15 06:15 am (UTC)

No, but eventually the goal is to migrate to Python.

(Reply) (Parent) (Thread)
[User Picture]From: brad
2004-10-15 07:58 am (UTC)
One day in the ever-distant future, yes. Parrot's promise is, uh, promising. Running Perl 5, Perl 6, Python, Ruby, etc, all side-by-side, mix-and-matching libraries... should be neat.
(Reply) (Parent) (Thread)
[User Picture]From: agreg
2004-10-15 11:45 am (UTC)

New hardware

Are the Xeons you've ordered the 64-bit variety?
(Reply) (Thread)
[User Picture]From: brad
2004-10-15 11:49 am (UTC)

Re: New hardware

Don't think so. We just needed a bunch of dumb, fast web nodes. Only 2GB of memory in each.

We did order 4 of the 64-bit Xeons, but the Lindenhurst chipset that's not the -VS model is flaky, as are the SuperMicro boards that use them. And other boards just aren't available.
(Reply) (Parent) (Thread)
[User Picture]From: agreg
2004-10-15 12:02 pm (UTC)

Re: New hardware

Ah right. I was wondering if they were going to be web or database servers.

How are things progessing with the new 64-bit database servers? Are any of them ready or approaching readyness to go into production?
(Reply) (Parent) (Thread)
[User Picture]From: brad
2004-10-15 12:36 pm (UTC)

Re: New hardware

The two Opterons and two Itaniums are all about ready.
(Reply) (Parent) (Thread)
[User Picture]From: sahrie
2004-10-15 06:33 pm (UTC)
this has too much text. I can't be bothered to read it all now. it's probably something all computer related anyway.... soo...........

just thought I'd say hello.... so hello rad.
(Reply) (Thread)