So marksmith went home one weekend and busted out an XS module that was a drop-in replacement for the existing Perl code, and could even be enabled/disabled at runtime.
It works great, drastically reducing CPU, but only for about 20 minutes or 2 hours, and then it crashes.
Mad props and a permanent account or so if you want to bug hunt for us and find the problem:
Four or five times we've seen it crash in getReconstructed() and the this pointer is always messed... usually with a value of like "18" instead of a real memory address.
Seems easy to find, right?
We ran it under valgrind with a debug perl built to not use Perl's malloc or Perl's slabs, but didn't find anything. But we didn't run it in production on the site, and it only crashes in production and only after millions of requests. So it's some weird corner case.
I don't pretend to understand XS that well, in particular the implict refcount changes coming in/out of XS code. Or the typemap stuff.
Update: Oh, and before Mark tells me --- I'm sure he'll want you to know that this is his first XS code, so don't hate too much.