Log in

No account? Create an account
Badger Project - IO priorities - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Badger Project - IO priorities [Apr. 16th, 2004|05:01 am]
Brad Fitzpatrick
[Tags|, ]

My favorite talk yesterday was from Christoffer Hall-Frederiksen of the Badger Project.

The basic gist of his talk was that there are too many abstraction layers in a modern database and they don't communicate well enough, hurting possible performance.

Instead of DB : Disk, as it was more in the past, we now have DB : OS : Controller : Disk. And each layer does its own caching, scheduling, etc, often accidentally hurting another layer.

So he started off simply showing how Linux and a single disk reacts (both latency and throughput) in both sequential and random IO for everywhere from 0 to 4k outstanding requests. (using Linux async IO)

The results, as you might imagine, are:

Sequential: constant throughput, regardless of outstanding requests, but increasing latency with more requests

Random: throughput increase with more outstanding requests, also with latency increasing as well. (and quite a bit more)

So there's a sweetspot where you can get better throughput without unacceptable latency. The key is keeping enough outstanding requests such that the operating system's block scheduler (which has a halfway accurate view of the disk geometry), that it can make smart batching decisions.

Next he analyzed MySQL (InnoDB) vs Oracle and showed that InnoDB wasn't aggressive enough at submitting IO to the operating system to keep it busy to always utilize the disk. (often the disk was doing nothing)

So how to improve InnoDB? The places where InnoDB writes:

-- synchronous requests: log writes (sequential) on transaction commit
-- async writes on page writeback.

But, if writeback were more aggresive, it'd increase latency on the synchronous log writes.

In conclusion, he's working to bring back IO priorities, which used to exist in mainframe days (and are in the POSIX async IO interfaces) but are unused in Linux. So he's modifying Linux 2.6's deadline scheduler and making the different async IO priorites have different deadlines, then making InnoDB write more aggresively, but having the async page writeback threads submit low priority IO.

Fucking cool, eh?

[User Picture]From: gregw
2004-04-16 10:07 am (UTC)
Chalk up another one for open source. Instead of a third-party company pleading with a monolothic OS company to make a tweak to their OS to make your app run better (which takes away from all their time tweaking their OS to make their competing apps run better), you can just modify the OS kernel yourself to make your own app better.
(Reply) (Thread)
From: jeffr
2004-04-16 12:19 pm (UTC)
I did both a commercial, closed-source, io priority subsystem for FreeBSD and an open-source one. They weren't targeted at database systems, but that is a possible consumer.

The only problem with io prioritization is dealing with priority inversion and potential deadlock situations. It turns out that is really the hard problem, unless you ignore it, which probably works for most applications.
(Reply) (Thread)