will this bit of script work on windows (w/ activeperl), or will it freak on me like it's retarded?
or is the only way to find out to test it?
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-10 05:20 am (UTC)
| (Link)
|
Should work. I use nothing fancy. All pure, base Perl.
There's far too little emphasis on reliability in benchmarks and other comparisons.
Is this Linux only, or can it be run in a Windows environment with a perl shell running in Windows?
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-10 05:20 am (UTC)
Re: Linux only? | (Link)
|
Should work on Windows unmodified.
um, yes, an executable or com would be nice for us non linux freaks :)
It's perl. You can't expect many perl developers who give stuff away for free to build a Windows executable for you. Try ActivePerl. It's free, but they probably send you a bunch of crap if you give them a real e-mail address and don't unsubscribe.
Maybe send a link to this off to the folks that review storage for us normal folk? It would be interesting to see Cnet or PC Mag include information like this along with the speed and volume specs.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-10 05:39 am (UTC)
| (Link)
|
Go for it.
Isn't this... caching? By pulling the plug, you kill whatever data is pending write in drive's volatile RAM cache, and that's what they have UPSes and stupidly expensive battery-backup-capable RAID controllers for.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-10 05:58 am (UTC)
| (Link)
|
Re-read my post. I'm testing whether the entire storage stack respects the fsync() system call. All the way from the OS to the drivers to the raid array to the disks themselves.
The fsync() system call says: "Stop caching, it's very important that everything I've given you now must be on disk, and don't return to me with an answer until it has."
So this program tests that your fsync() works as advertised and some part of the storage stack isn't faking the fsync. (it's usually the disks themselves, against specs, and unknown the operating system, which thinks the disks are behaving)
Otherwise caching's just fine and it's done all over. My complaint is when it's done when you tell it not to.
![[User Picture]](http://l-userpic.livejournal.com/996772/447266) | From: ydna 2005-05-10 07:34 am (UTC)
| (Link)
|
Sweet, Brad. Perfect timing. I've got an ATA-over-Ethernet evaluation unit from Coraid on its way. I'm planning several tests on this equipment for a demo and presentation up here in June (background: see my rant and LUG discussion). It'll be fun to add this test to the mix (and pull the plug at different places in the connection to see which part does most of the lying). Thankee much.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-10 07:52 am (UTC)
| (Link)
|
I still don't really get AoE. Who's it supposed to be for?
BTW, you mentioned in one of those posts capturing ethernet frames in userspace to make an AoE server. Look at "tap". In the kernel source, read:
./Documentation/networking/tuntap.txt
And docs for CONFIG_FILTER=y and CONFIG_PACKET=y. Between the three, you could start to write an AoE server.
Some Windows-related details of disk caching that I've found out about (and may be useful to those of you using Windows).
Under Windows NT (that's 2k, XP, 2k3 as well - remember those are NT 5.x), you can disable disk caching by going to the properties for the hard disk and then going to the Settings/Properties/Policies tab (is named differently under different versions but does the same thing). By default write cache is enabled except on disks containing the Active Directory. It is also disabled by default for removeable disks (USB, memory cards (but not all!), iPods, etc.) and apparently anything it thinks is SCSI (this includes some IDE/SATA controllers). I think Windows tells the disk itself to explicitly disable the disk cache as well.
When using CreateFile(), you can set the FILE_FLAG_WRITETHROUGH flag, which tells Windows to flush write caches quicker than usual, or you can set FILE_FLAG_NO_BUFFERING, which bypasses all Windows caches but has some restrictions (you have to be careful with memory alignment and read in multiples of the sector size). These are set in the dwFlagsAndAttributes parameter.
5.8.1's Getopt::Long ignores this, but 5.6.1's chokes on it.
--- diskchecker.pl?rev=HEAD 2005-05-09 20:28:18.000000000 -0400
+++ diskchecker.pl 2005-05-10 10:23:39.647304453 -0400
@@ -22,3 +22,3 @@
usage() unless GetOptions('server=s' => \$server,
- 'listen:5400' => \$listen);
+ 'listen:i' => \$listen);
usage() unless $server || $listen;
Hah, no, that's not enough! You and your little ands and ors.
--- diskchecker.pl?rev=HEAD 2005-05-09 20:28:18.000000000 -0400
+++ diskchecker.pl 2005-05-10 10:33:01.437798948 -0400
@@ -21,10 +21,11 @@
my $listen;
usage() unless GetOptions('server=s' => \$server,
- 'listen:5400' => \$listen);
-usage() unless $server || $listen;
-usage() if $server && $listen;
+ 'listen:i' => \$listen);
+usage() unless $server || defined $listen;
+usage() if $server && defined $listen;
# LISTEN MODE:
-listen_mode($listen) if $listen;
+my $port = $listen ? $listen : 5400;
+listen_mode($port) if defined $listen;
# CLIENT MODE:
I sent you via e-mail a 1.5 meg PDF file regarding hard drives that you might be interested in. Or maybe not. But this post made me think you might.
Do you mind if I mention this tool elsewhere?
does it work on windows????? :P
![[User Picture]](http://l-userpic.livejournal.com/2949189/725716) | From: pne 2005-05-11 04:16 pm (UTC)
| (Link)
|
It should, as several previous comments mentioned. You'd need a Perl interpreter, of course, such as ActivePerl, IndigoPerl, or Cygwin's perl.
so the scsirastools are used to disable write caching on the drives themselves? Going to run this later today with a freshly minted dell running freebsd 5.4, I wonder if there are any similar tools for it.
Isn't this hdparm -W option causes severe performance loss?
From: (Anonymous) 2005-05-13 08:04 am (UTC)
We need a "name and shame" list or wiki | (Link)
|
This is very interesting, but we can only check the disks that we have physical access to.
What we need is a public "name and shame" list of disk models, so folks who *require* reliable data integrity can make informed choices.
Anyone interested in setting up an online database or Wiki?
Matthew
From: (Anonymous) 2005-05-13 03:20 pm (UTC)
Re: We need a "name and shame" list or wiki
| (Link)
|
Matthew,
I'll talk to my boss and see if he'll allow me to create a small space on our webserver for a "name and shame" type thing that you're talking about (if no one else has done it, that is). It might not be a Wiki (just a static webpage at first) but it might be a good excuse for me to work on getting the company Wiki up and running 80).
S. Garcia SLM Industries ~NOSPAM~ steven &DOT& garcia %AT% slmindustries ~DOT~ com ^SPAMISEVIL^
From: (Anonymous) 2005-05-13 08:15 am (UTC)
Nothing strange here, its your assumptions that are wrong | (Link)
|
It is strange that after finding out that fsync() never functions the way you expect, you didn't doubt your assumptions or your code! Your assumption is wrong! fsync() is required to flush the hard-disk cache. It's function is just to flush all buffers inside the OS and the device driver to the hard-disk and it does that perfectly. As you see, these are two different things: flushing to the device and flushing to the disk. fsync() flushes to the device but you expect it to flush to the disk (which it does not). Flush to the disk usually can only be done in a system level program (like a device driver) and not a user program.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-13 02:24 pm (UTC)
Re: Nothing strange here, its your assumptions that are wrong | (Link)
|
Yes, but that's all a userspace program like a database can do. And while I know that fsync() says it can't promise it makes it to disk if write-caching is enabled, this tool is a great way to see if your disk's write-caching is indeed on.
The old version of this script used the raw(8) interface to bypass all filesystems and kernel buffers, but it produced the same results as the fsync version, so I made it just use fsync so it was easier to use. (don't need a spare block device on the disk(s) being tested handy....)
![[User Picture]](http://l-userpic.livejournal.com/58431648/5062600) | From: error10 2005-05-13 08:29 am (UTC)
hdparm -W 0 /dev/hda | (Link)
|
I needed this two months ago! I suffered massive filesystem corruption due to exactly this problem. Well, this and the fact that ext3 still sucks. And why doesn't Red Hat do reiserfs anyway?
From: (Anonymous) 2005-05-13 09:45 am (UTC)
OS shutdown? | (Link)
|
Does anyone know how the disk knows to do the write during an OS shutdown?
There must be a trick otherwise we'd be losing data at every reboot..
Did you disable writeback cache on both the controller and drive? I only care whether that fails to ensure fsync is writing to stable storage.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-13 02:26 pm (UTC)
| (Link)
|
Yes, we do. And that's what I wrote the script for: to verify it all works. Because we have got bitten by certain hardware (LSI RAID cards) where it doesn't always work.
I have deep suspicions that device mapper is not very good at making sure fsyncs() actually hit disk.
In response to iSCSI, if you are going to push a lot of data the cost of iSCSI (Cisco) gear is higher than low end fcal gear, and there is a big benefit with fcal which is, fibre doesn't carry electric shocks :-).
From: (Anonymous) 2005-05-13 11:37 am (UTC)
Apparent | (Link)
|
I didn`t know this untill I was asked to supply an identical disk for a recovery job in a raid-0 crash. I had 2 80gb WD disks, and one of them had crashed making the whole raid array unreadable. When we bought a spare disk of *exactly* the same factory type, but from a later period, it appeared that the new disk (of the exact same user size) had a media that was a tiny bit smaller (it had less so called 'spare blocks' that are used by the disk to park data that can be damaged sooner or later). The end result was that the raw data block of the old disk didn`t fit on the new disk.. in the end we ended up using 2 maxtor drives of 120Gb and praying, but it worked.
So yes, most modern HD disks seem to have hardware-driven relocation and repair strategies, moving blocks into spare space when a error is detected (or maybe predicted) in a block.
From: (Anonymous) 2005-05-13 11:46 am (UTC)
WHat make you sure it's the HW? | (Link)
|
I know for a fact that bugs in ext3 file system (on Linux of course) will cause an fsync() operation to succeed even if the disk itself is turned off (it was a FC external disk).
The problem was tracked to a bug in ext3 (since then fixed I believe, but still found in RHEL3 at the very least).
In short - yes, fsync() lies 0- but what makes you sure it's the hardware fault?
Gilad Ben-Yossef http://benyossef.com
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-13 02:30 pm (UTC)
Re: WHat make you sure it's the HW? | (Link)
|
Because the script produces the same results when it's done via the raw(8) interface[1], bypassing the filesystem and kernel buffers. And it produces near identical results with any filesystem.
But once I fix the disk's write-caching, all tests work, filesystem or not.
[1] You should see the ugly Perl required to do aligned writes.
From: (Anonymous) 2005-05-13 11:58 am (UTC)
Do you have write cache enabled? | (Link)
|
Maybe this /. post applies here. http://hardware.slashdot.org/comments.pl?sid=149349&cid=12517596
excerpt:
From Linux --
NOTES In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2005-05-13 02:32 pm (UTC)
Re: Do you have write cache enabled? | (Link)
|
Read my fucking post. Yes, I know. This whole script is used to detect if write caching is enabled. It should never be, though. The kernel does caching for you, so there's no reason for your disk to be doing it. Unless you want your applications not to be able to fsync.
From: (Anonymous) 2005-05-13 12:38 pm (UTC)
Official release of the tool? Any experience with Spew? | (Link)
|
Brad,
I just added to the InnoDB Configuration (http://dev.mysql.com/doc/mysql/en/innodb-configuration.html) section of the MySQL manual (http://dev.mysql.com/doc/mysql/en/) a warning message about fsync() that I had prepared some time ago. I would like to mention your tool there, to give users an easy tool for pull the plug tests. However, I would like to link to a more official address than a CVS gateway.
Does anyone have any experience with Spew (http://spew.berlios.de/)? Could it be used in place of Brad's script?
Unexpected shutdowns are not the only source of InnoDB corruption. Sometimes the operating system or hardware writes data to wrong offset in the file, or only writes part of the data, or the memory gets somehow corrupted, especially on some fairly recent 64-bit systems.
Marko Mäkelä Senior Software Engineer Innobase Oy
![[User Picture]](http://l-userpic.livejournal.com/1155370/475993) | From: desh 2005-05-13 01:21 pm (UTC)
| (Link)
|
Congrats, Brad Fitzgerald, on making Slashdot with this. So what if they don't know your actual name...
From: (Anonymous) 2005-05-13 04:49 pm (UTC)
MacOS X | (Link)
|
This darwin-dev article may also shed some light:
http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html
MacOS X provides a F_FULLFSYNC fcntl to provide the best possible chance of getting data written to disk. Is there any way you could test with that turned on, to see if that behaves better? |