brad's life - diskchecker.pl [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

diskchecker.pl [May. 9th, 2005|08:49 pm]
Previous Entry Share Next Entry
[Tags|, ]

Dear Slashdot comment flood: I didn't write the summary in the Slashdot story. The submitter did. I know the disks themselves don't handle an fsync().

I know fsync() only tells the operating system to flush. This script tests whether fsync works end-to-end. A database in userspace can only fsync... it can't send special IDE or SCSI "flush your buffers" commands to the disks. So that's what I care about and what I want to test: that a database can be Durable.

The problem is manufacturers shipping SCSI disks with a dangerous option (hardware write-caching) enabled by default. It makes sense for consumer ATA stuff, but for SCSI disks that already have reliable TCQ, there's much less point. And any respectable raid card should just disable write-back caching on the disks if the raid card has its own nvram-backed cache, but LSI doesn't anymore (they used to, but stopped).

And I'm glad Linux is finally starting to tell the disks to flush on an fsync. But months ago I was stuck with databases that couldn't survive power outages and needed a way to test whether everything from the filesystem to the block device driver to the disks themselves were doing what the database expected when it did an fsync. That's no component's job alone, so I needed to test everything together.


Remember my disk-checker program I wrote about before? I'd never released it because it was too hard to use, but now it's dead simple, so here it is:

http://code.sixapart.com/svn/tools/trunk/diskchecker.pl Edit: now https://gist.github.com/3172656

Run it and be amazed how much your disks/raid/OS lie. ("lie" = an fsync doesn't work)

It seems everything from PATA consumer disks to high-end server-class SCSI disks lie like crazy. Yes, that includes SATA there in the middle. I'll discuss fixing your storage components in a second.

In a nutshell, run it like this:

Tester machine (machine that won't crash):
$ diskchecker.pl -l

And then just let it chill. (the el is for listen). This program will listen (on port 5400 if no number follows -l) and will write one tiny file per host to /tmp/. It can be run as any user.

Machine being tested (machine you're going to pull the power cable on)
$ diskchecker.pl -s TESTERMACHINE create test_file 500

That creates a 500 MB file named "test_file" and it reports everything it's about to do and does to the TESTERMACHINE (which can be an IP or hostname).

Now, pull the power cable on the machine being tested. Don't turn it off nicely. Don't just control-C the program. Wait a couple seconds and plug your testee machine back in and reboot it. When it's back up, do:

$ diskchecker.pl -s TESTERMACHINE verify test_file

If the server process is still running, the machine you just killed will connect to the server and information about what's supposed to be where. The client will then verify it and produce an error report.

What you should see is:

Total errors: 0.

But you probably won't. You'll probably see an error count and a histogram of errors per seconds-before-crash. Most RAID cards lie (especially LSI ones), some OSes lie (rare), and most disks lie (doesn't matter how expensive or cheap they are). They lie because their competitors do and they figure it's more important to look competitive because the magazines only print speed numbers, not reliability stats. They must figure people who care about their disks working know how to test/fix their disks.

Ways to maybe fix your disk:

hdparm -W 0 /dev/hda -- worked on a crap office PATA disk (and it failed otherwise)
scsirastools -- need this on lots of SCSI disks. you'll probably have to remove your SCSI disks from your RAID card and fix the disks directly, since RAID cards very often won't disable it for you

Anybody have anything else to add?

Enjoy.
LinkReply

Comments:
[User Picture]From: valiskeogh
2005-05-10 05:35 am (UTC)

(Link)

meh, in that case, not worth the trouble, as interesting as the results might be.

this comp has a raid 0 I put in for speed

one comp in back has raid 1 for data security

another server in back has a 500 gig raid 5 array for storage.

would be nice to know the results of the test
[User Picture]From: brad
2005-05-10 05:42 am (UTC)

(Link)

Is your desktop so IO-bound that you see any noticeable improvement from a striped disk array? I couldn't stand the noise. I don't even have a single disk in my desktop machine.

And if downloading ActivePerl is so much trouble, how important is your data to you? But in fairness, it probably doesn't matter to a fileserver... fsync working is mostly important for databases, mail servers, news servers, etc.
[User Picture]From: valiskeogh
2005-05-10 05:50 am (UTC)

(Link)

YES actually, becoming increasingly annoying, the slow down these days are in disk I/O not cpu, i put the raid 0 in espressly for quickpar and newsgroups, when you routinely download files ~a gig in size, unrar them, check them for consistency, it actually does help quite a bit, especially when doing multiples at the same time.

the comp itself is p4 northwood 3.2 gigahertz (haven't o'clocked yet, probably soon), so disk is really my only real limitation when doing that type of stuff.

i can probably dl activeperl on a secondary machine, i try to keep this one as lean as possible.

oh and on noise, hehehe, the previous incarnation of this comp didn't have a case at ALL, all parts laid out on the hutch of my desk, it wasn't until this latest upgrade that i actually put a case on it. even with the 12 total fans i have in this thing it's still almost SILENT compared to a single "not so great" cpu fan totally exposed. 5 of the fans i've routed to an idependant power supply, and can vary their speed as i see fit, it's actually quite silent!
[User Picture]From: brad
2005-05-10 05:54 am (UTC)

(Link)

even with the 12 total fans i have in this thing it's still almost SILENT

We have different definitions of silent, or you've been to one too many rock concerts.
[User Picture]From: valiskeogh
2005-05-10 05:56 am (UTC)

(Link)

i have a 4 year old (single dad with full custody), there are times during which the dull roar of an approaching tornado could be called "silent"...
;)
[User Picture]From: loganb
2005-05-10 06:40 pm (UTC)

(Link)

What did you end up using to do your disk-less client? NFS? NBD? iSCSI? NFS held the most hope for me, but I gave up when NFS-locking would cause a kernel panic on boot and no locking would screw up other things.