September 24th, 2004

belize

comedy of errors

Before I forget, I shall now chronicle the story of our database servers:

August 12th: we order two new identical database servers: dual 64-bit Xeon (EMT64), 8GB ram for now (because 2GB sticks aren't available for a couple more months), and 12 15k scsi disks... 2xraid 1, 8xraid 10, 2 hot spares (1 per scsi channel).

August 13th-Aug 29th: I'm in Germany. I expect the servers to be back, racked, and with an operating system right before I return. I plan to setup MySQL and go. But I come back and they're not ready. Apparently Linux won't run on them... hangs during "checking processor flags".

-- SuperMicro gets us a bios fix

-- Now it boots, but only one NIC works.

-- SuperMicro gets us a fix, flashing the NIC's firmware

-- All gets working, but then they discover the motherboard/bios battery leaks, and if unplugged from the wall for 3 seconds or so, all the bios settings clear.

-- SuperMicro discovers some bad diodes were used in that board, but they can't get us a corrected board because Intel's chipset has been recalled and Intel's not making more because of a bug with PCI-Express or something.

-- SuperMicro offers to take our machines back and fix them by hand, soldering on new diodes. Since we don't need the PCI-E fixed, we can use the buggy Intel chipset still.

-- SuperMicro ships the boards back, and they work now on a table, but not in the case. (??) Kinda fuzzy on that communication. But then they find the BIOS would get corrupt too and the clock wouldn't even tick. So something was shorting.

-- At this point, a new motherboard is available, so we order that one. They arrive, but the memory doesn't. (these new boards take DDR2, not DDR)

-- The memory was supposed to arrive at the same time, but then it's bumped, and bumped, and bumped....

-- It was supposed to arrive today (Friday), but it got bumped again. Apparently it was still in Korea on its way from Samsung to ATP?

We're now promised Monday, but ... yeah.

A vote!

Poll #355784 Crap servers

Think the memory will arrive Monday?

Yes, you were promised!
14(10.9%)
Bet not, dude.
115(89.1%)

Think the server will work once it's all put together?

Probably.
43(33.3%)
So unlikely.
86(66.7%)

Would you trust these two servers in production?

If they pass burn-in, why not?
58(44.6%)
No way, they're cursed!
72(55.4%)