Log in

No account? Create an account
Drive tester - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Drive tester [Feb. 17th, 2005|03:47 pm]
Brad Fitzpatrick
After the big Internap power failure recently, we no longer trust any storage product to work as advertised.

I wrote a program to test disks/RAIDs and Matthew's been running it, finding out that, indeed, disks and RAIDs lie.

The test works like this:

-- Matthew goes to storage vendor with his laptop and crossover-connects his laptop and the server to be tested.

-- Matthew runs the server half of my disk tester on his laptop.

-- Matthew then runs the spewer client on a raw(8)-ified disk partition. the client picks random 16kB-aligned offsets on the partition and picks a random 32-bit number which it writes in hex (%08x) over a 16kB range. it reports to the spewserver both BEFORE and AFTER the disk write.

-- the server notes what the client said it was about to do and what it reported doing.

-- let it run for awhile....

-- Pull the power...

-- server notices client hasn't sent anything in 3 seconds, quits, writing out a map of what 32-bit number pattern should be at each sector.

-- power on server

-- copy map file laptop (spewserver) to the server, run spewclient in verify mode. it dumps a histogram of errors per seconds-before-powerloss:

Histogram of seconds before end:
3 31
4 7
5 1
65 1

Well, the 3 seconds is really because the "end" is considered time AFTER the 3 second timeout, so that's kinda a bug. That should read 0,1,2 seconds before, not 3,4,5. But see how there are 31 regions that are bogus at t=0, 7 at t=-1, and 1 at t=-2?

That means something was lying, and we don't buy that hardware until we get it configured so it doesn't lie.

[User Picture]From: brentdax
2005-02-17 11:59 pm (UTC)
From what I understand:
  1. Connect hard drive/storage system/blah to computer A.
  2. Connect computer A to computer B.
  3. Make computer B tell computer A what to write where on the drive/system/blah.
  4. At some point, pull the plug on computer A.
  5. Compare what should have been written (which is recorded on computer B) to what actually was written (on computer A).
  6. ...
  7. Profit!
(Reply) (Parent) (Thread)
[User Picture]From: brad
2005-02-18 12:01 am (UTC)
6 = angsty teens put bad poetry on disks
(Reply) (Parent) (Thread)
[User Picture]From: ericjay
2005-02-18 12:35 am (UTC)
Wow, funniest comment reply all day.
(Reply) (Parent) (Thread)
From: evan
2005-02-18 01:31 am (UTC)
(Reply) (Parent) (Thread)
From: alpha
2005-02-18 01:48 am (UTC)
Ahahaha. Yeah it is.
(Reply) (Parent) (Thread)