Log in

No account? Create an account
NFS blows, so.... FUSE! - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

NFS blows, so.... FUSE! [Jul. 7th, 2004|03:16 pm]
Brad Fitzpatrick
[Tags|, ]

Because NFS blows in more ways than I have fingers, I've started down the road of making MogileFS (our distributed filesystem currently in use on picpix.com/pics.livejournal.com) a true[r] filesystem. NFS seems to not reclaim its nfs_inode_cache (previously discussed here), it has problems with file corruption, it can't export (until v4) hierarchies made of different devices to wildcard hosts, and filehandles go stale.

Using FUSE, I got a dummy filesystem working from Perl, after fixing up the bitrotted Perl bindings for FUSE.

Check this:

bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618828
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237655
bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618828
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237656
bini:/home/bradfitz/proj/fuse/mnt# ls -l
total 544618829
-rwxr-xr-x 1 bradfitz bradfitz 1234 Jul 7 2004 Hello_World_at_1089237657
bini:/home/bradfitz/proj/fuse/mnt# cat 5+34
bini:/home/bradfitz/proj/fuse/mnt# cat 5*5

There are no files here. The directory listing just shows "Hello_World_at_TIME" and doing a read on any math-looking expression returns the evaluation of that expression.

The way it works is:

-- load the FUSE kernel module
-- start a userspace daemon, which talks to FUSE's device node, telling the kernel to place a FUSE mount somewhere
-- all VFS (virtual filesystem) operations the kernel receives are given a unique number and hurled back at the userspace daemon through the device node
-- the userspace daemon (C, Perl, or Python) has an event loop waiting on the kernel to send it work, and it dispatches subs to handle those calls.

Presumably I can reply out of order, too, so the next step is to rewrite FUSE's event loop and integrate it into Danga::Socket's event loop, or hope it's designed to easily plug into master event loops like Linux::AIO was.

After that, we kill NFS and make a light-weight file serving daemon (think Perlbal but without HTTP header parsing, still doing sendfile and such, so it's quick), and make the LUFS daemon get its files from the network and feed to the kernel.

It's kinda lame that we're going to be doing so many copies: network in to LUFS, LUFS to kernel, kernel to user via sendfile. But it shouldn't be too bad.

Fun stuff.

Now, who to recruit to work on this with me? Unfortunately I think everybody has a dozen projects already.

[User Picture]From: brad
2004-07-07 09:40 pm (UTC)
I think it happen{ed|s} only during load when there wasn't free memory to allocate for that NFS page. But what should've happened is an error, not a silent corruption. We rebooted the offending client with a kernel upgrade and it "fixed" it, but I'm not sure it's a result of the kernel upgrade or the corrupted caches being reset, or more memory being available.
(Reply) (Parent) (Thread)