Log in

No account? Create an account
readahead / blocking sendfile - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

readahead / blocking sendfile [Jun. 5th, 2006|11:34 pm]
Brad Fitzpatrick
[Tags|, , , , ]

I've had a known inefficiency in Perlbal for ages now and finally broke down and fixed it. The inefficiency is that sendfile can block, even if the destination fd is a non-blocking socket, because the source fd (a disk-based file), can force a disk read if it's not already in pagecache.

FreeBSD has a fancy sendfile that lets you request it not block, but Linux doesn't.

The solution on Linux is to do a readahead() call first in another thread, or just sendfile() in another thread, either of which IO::AIO can do. I wanted to test the theory changing as little code as possible, so I went with the async readahead.

Before I did that, though, I wrote a test case.

The test case runs two processes in parallel: one fetching 3 small hot files over and over again, measuring the mean speed of 100 requests. The other process is there just to mess with the first one: it doesn't actually output anything. The second process either fetches the same 3 small files, or with the "big" parameter, fetches seven 100MB in a loop, more than this xen instances's 512 MB of memory. The idea is see if the disk reads serving the big files stall the event loop and decrease turn-around time.


lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big;
mean: 0.287987213134766, stddev: 0.0829109309255669
mean: 0.279777903556824, stddev: 0.0957734761804354
mean: 0.238886480331421, stddev: 0.0949280425469577

mean: 0.351436612606049, stddev: 0.0791952577383974
mean: 0.361295075416565, stddev: 0.0863025646086743
mean: 0.3904807305336, stddev: 0.173639453608837

The first set of three lines is the time to serve small files with other small files being served in the background. The second send of three is serving small files with big files being served in the background.

Adding in the async readahead() call, doing the sendfile in the callback (once the data to be sendfile'd is in the pagecache), and the results even out a bunch:

lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big;
mean: 0.296060967445374, stddev: 0.0586433388625736
mean: 0.262518639564514, stddev: 0.0726501212827927
mean: 0.285000162124634, stddev: 0.0321991597111094

mean: 0.302280473709106, stddev: 0.0811435349061447
mean: 0.303003549575806, stddev: 0.0787071540895621
mean: 0.298841729164124, stddev: 0.0953137343692458

Probably some more work to be done, but promising.

From: evan
2006-06-06 08:10 am (UTC)
Would readability (as determined by poll or whatever) on the input fd also work for knowing whether sendfile would block? Or is it unrelated?
(Reply) (Thread)
[User Picture]From: brad
2006-06-06 04:02 pm (UTC)
Yeah, except you can't.

You can't set non-blocking on a file-backed fd, and you can't select/poll/epoll on a file-backed fd.

There's been talk of better AIO in Linux for years now (was hot a couple years ago at OLS) but nothing ever seems to happen with it. It seems the few people who need it (databases) just do it themselves in threads and know more about what's going on anyway that they can be a bit smarter than the kernel in some ways. OTOH, that's not always the case: somebody ported InnoDB to use Linux AIO and got a good speed-up. [link].
(Reply) (Parent) (Thread)
From: dan_erat
2006-06-06 02:05 pm (UTC)
What do you do once you know that sendfile would block, or what triggers the other thread calling readahead? Do you answer other, faster requests in the meantime?
(Reply) (Thread)
From: dan_erat
2006-06-06 02:16 pm (UTC)
Uh, just answered my own question by re-reading and seeing "doing the sendfile in the callback". Sorry.
(Reply) (Parent) (Thread)
From: dan_erat
2006-06-06 03:03 pm (UTC)
So my readahead(2) says that it blocks until the data has been read into the page cache. Does IO::AIO do something different, or do you have a pool of readahead threads, or ... ?
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-06-06 03:58 pm (UTC)
Yeah, IO::AIO sets up a pool of threads to do blocking operations. AIO read/write/sendfile/readahead/stat/etc....
(Reply) (Parent) (Thread)
From: josuah
2006-06-06 05:23 pm (UTC)
Thanks for posting this. This will actually help us out over here with a problem we're working on. Thought I'd let you know. :)
(Reply) (Thread)
[User Picture]From: brad
2006-06-06 05:29 pm (UTC)
Cool. When you got it solved, share your story. :)
(Reply) (Parent) (Thread)
From: pos_le_terrible
2006-06-07 09:00 pm (UTC)
I had a discussion with Mark Lehmann on that subject some times ago (I asked him to add this async sendfile support in IO::AIO :). At that time he told me that async read/writes with small file chunks would give better results.
In your test you are doing the sendfile when the readahead is done, but that would no benefit for really big files not fitting in page cache, or when sending a lot of not so big files simultaneously. Plus this will take some latency time before the real senfile begins. Maybe setting a maximum readahead length (say 100Mo) would be a solution, or doing multiple readahead/sendfile calls with small chunks of file?...
Anyway setting the fd as blocking and doing the whole sendfile in the thread is simpler in my opinion, and should give better results. that is the option I am currently using, but I didn't did any benchmark.

The ultimate solution would be to have some sort of mincore functionality, or maybe keeping some heuristics of which small files where recently requested and should still be held in page cache?...
(Reply) (Thread)
[User Picture]From: brad
2006-06-07 10:12 pm (UTC)
The cache of small/hot files that don't get the extra readahead latency is the approach I've been going with. So only new unknown/rare/big files get the paranoid treatment.
(Reply) (Parent) (Thread)