Log in

No account? Create an account
brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

NFS blows, so.... FUSE! [Jul. 7th, 2004|03:16 pm]
Brad Fitzpatrick
[Tags|, ]

Because NFS blows in more ways than I have fingers, I've started down the road of making MogileFS (our distributed filesystem currently in use on picpix.com/pics.livejournal.com) a true[r] filesystem. NFS seems to not reclaim its nfs_inode_cache (previously discussed here), it has problems with file corruption, it can't export (until v4) hierarchies made of different devices to wildcard hosts, and filehandles go stale.

Using FUSE, I got a dummy filesystem working from Perl, after fixing up the bitrotted Perl bindings for FUSE.

Check this:

bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618828
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237655
bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618828
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237656
bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618829
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237657
bini:/home/bradfitz/proj/fuse/mnt# cat 5+34
bini:/home/bradfitz/proj/fuse/mnt# cat 5*5

There are no files here. The directory listing just shows "Hello_World_at_TIME" and doing a read on any math-looking expression returns the evaluation of that expression.

The way it works is:

-- load the FUSE kernel module
-- start a userspace daemon, which talks to FUSE's device node, telling the kernel to place a FUSE mount somewhere
-- all VFS (virtual filesystem) operations the kernel receives are given a unique number and hurled back at the userspace daemon through the device node
-- the userspace daemon (C, Perl, or Python) has an event loop waiting on the kernel to send it work, and it dispatches subs to handle those calls.

Presumably I can reply out of order, too, so the next step is to rewrite FUSE's event loop and integrate it into Danga::Socket's event loop, or hope it's designed to easily plug into master event loops like Linux::AIO was.

After that, we kill NFS and make a light-weight file serving daemon (think Perlbal but without HTTP header parsing, still doing sendfile and such, so it's quick), and make the LUFS daemon get its files from the network and feed to the kernel.

It's kinda lame that we're going to be doing so many copies: network in to LUFS, LUFS to kernel, kernel to user via sendfile. But it shouldn't be too bad.

Fun stuff.

Now, who to recruit to work on this with me? Unfortunately I think everybody has a dozen projects already.

[User Picture]From: mart
2004-07-07 03:36 pm (UTC)

This FUSE thing sounds like fun. I might have to take a look at it myself… sometime.

How many wacky daemons does Danga have now? I remember memcached (obviously), ddlockd, mogilefsd, mailgated… does currently-available free software suck so much that everything has to be made from scratch, or are you doing it for fun?

It's weird that LJ is the first web application that needed all this stuff. Cool that you guys are the ones making it, though.

(Reply) (Thread)
[User Picture]From: brad
2004-07-07 03:55 pm (UTC)
perlbal: mod_proxy kinda sucked. not very flexible, either.

memcached: nothing existed.

ddlockd: couple options existed, but one didn't quite do what i wanted, and another wasn't open source at the time, and was incredibly large and overkill. plus ddlockd is tiny.

mailgated: just mailgate running forever, unspooling stuff.

phonepostd: same as mailgated pretty much. they should be merged at some point in time.

(Reply) (Parent) (Thread)