|Bugs, Meetings, Testing
||[Jul. 10th, 2005|11:46 pm]
So the combination of like four of us teamed up to contribute seemingly benign patches which in combination produced a bug that was caught and fixed within hours of release.
It was suggested that we have a post-mortem to discuss it, but I, being stubborn, said I wouldn't be attending, since we'd already had a suitable (I thought) post-mortem with the developers involved the night of the bug, and we'd recently had a long post-mortem for another unrelated issue that ended up covering tons of stuff, so I thought nothing new could come of this 3rd post-mortem that wasn't already covered in the first two, short of playing the blame game or something.
Ah, but I was wrong.
While I wanted to avoid a meeting and perhaps hack, it appears my time savings argument was fruitless as I've likely spent more time reading/writing emails about the meeting and the issue than the proposed meeting would've taken had I just attended.
So I admit defeat, and in addition to having spent time and my precious wrists/fingers writing emails, I will also be attending the meeting, if only to cut my losses and not type anymore.
But it might be fun as I'd love to discuss writing a test suite to cover the entirety of LiveJournal. Historically I've shunned tests, mostly because anything non-trivial I work with is distributed on lots of machines, deals with timing, and is just generally a bitch to test accurately. Lately, however, I've had success writing test suites of pretty complicated things, like LWPx::ParanoidAgent, OpenID, and just this weekend with Ben, Gearman (which Ben pretty much did).
So I'm warming up to automated testing, especially considering it'd be something the sysadmins could run first to feel better about code being pushed, and there'd be proof in the code repo that a test was or was not written. (which there would then be policy to include)
In conclusion: fun, fun.
2005-07-11 06:54 am (UTC)
CSI: Production Website Failure
I hate post mortems, and particularly post mortems about the failure of a public product or service. I've yet to attend one that didn't become the blame game, no matter what the good intentions were of the people who called the meeting.
There are sometimes good results (changes in process that prevent further outages, etc.) but I can't escape the feeling each time that those results could have been accomplished without the Ceremonial Humilitiation in Conference Room 3.
For me it always boiled down to the need for better QA. YMMV, but in the case of my own job we didn't have these meetings so much after we got a committed and meticulous QA team.
2005-07-11 07:00 am (UTC)
Don't forget that the S2 compiler has had a regression test suite for a long time! It's perhaps not the most complete regression test suite in the world, but it was one you wrote a long time ago and it wouldn't be too hard to beef it up a little to test more cases.
Still, I'm not yet convinced that something like LiveJournal can really be tested as a whole. It's too big. The individual components could perhaps be tested, but given that there are so many of these the interactions between them are… many. Writing a test suite might be easier if some of the code were refactored to avoid the duplication that's present in a few places, such as sending HTML email and so on, but then without a test suite how would you test the refactored code? Fun, fun.
2005-07-11 07:40 am (UTC)
Don't forget that the S2 compiler has had a regression test suite for a long time!
I quite enjoyed writing that one, back in the day. So much easier than, say, testing Perlbal.
"Ah, but I was wrong."
It's really cool to hear you say that, and then I think the problem is, you only hear really *good* coders say that about anything. People who suck never think they're wrong.
I typically think I'm always right in "real life" - discussions with people, and so on.
However, in my code, I typically tend to look at my code first for problems. I definitely know that I'm not perfect and have and do make a lot of mistakes.
Wonder if that makes me a good coder, or just overly self-critical. Or neither.
Testing Is Hard, as anyone who's written a CPAN module for release can tell you. (Of course, you have several CPAN modules under your belt, so I suppose you know that.)
On the other hand, testing can work wonders. Witness the incredible progress Pugs
has made with a "test first and implement it later" model. Out of 6,245 tests, 883 (14%) are currently marked TODO; but I suspect that practically every test in the suite was TODO at some point, and got implemented sometime in the last six months.
I think my resistance to automated testing is starting to wane as well... It never made much sense to me in my head, but we just hired a kid who graduated from RIT's Soft. Engr. program and of course, like every recent college graduate, he is anxious to apply all the stuff he learned in school...
We assigned him to some maintenance for one of his first projects here, and he wrote some JUnit tests
for some of the code he had to work with and his results were surprising.. His tests identified where the bugs we assigned him to fix were located, and he was able to fix them pretty quickly...
While I think my "just hack it" background is still making it hard for me to make out with automated/unit testing, I was impressed... I'm starting a new project here at work
in a few weeks, and I think I am going to give it a shot maintaining a test suite for the whole thing.
WHY DON'T YOU ANSWER YOUR EMAILS.??
2005-07-12 07:16 am (UTC)
If they're cased and punctuated as wonderfully as this comment, that might provide a hint.
2005-07-12 08:05 am (UTC)
Brad's too busy ;)
Brad is always too busy hacking on the LiveJournal code!!! :)