Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick

lazyweb: buggy XS code

We profiled Perlbal a couple months back and found the only offensively slow part was parsing the incoming HTTP headers. The rest was just system-level stuff like read()/write()/sendfile() that was pretty fast.

So marksmith went home one weekend and busted out an XS module that was a drop-in replacement for the existing Perl code, and could even be enabled/disabled at runtime.

It works great, drastically reducing CPU, but only for about 20 minutes or 2 hours, and then it crashes.

Mad props and a permanent account or so if you want to bug hunt for us and find the problem:

Four or five times we've seen it crash in getReconstructed() and the this pointer is always messed... usually with a value of like "18" instead of a real memory address.

Seems easy to find, right?

We ran it under valgrind with a debug perl built to not use Perl's malloc or Perl's slabs, but didn't find anything. But we didn't run it in production on the site, and it only crashes in production and only after millions of requests. So it's some weird corner case.

I don't pretend to understand XS that well, in particular the implict refcount changes coming in/out of XS code. Or the typemap stuff.

Update: Oh, and before Mark tells me --- I'm sure he'll want you to know that this is his first XS code, so don't hate too much.
Tags: lazyweb, perl, tech
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.