Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick

I'm looking for something... oprofile?

I can't for the life of me figure out this sporadic bottleneck I'm seeing on LJ's servers. When it happens I try to hunt it down and by the time I'm just looking around, it's gone. But it's seeming to happen more frequently lately.

DBs, memcaches, CPU, network... all seem fine.

So I have no clue why web processes are stacking up. Straced a few... see nothing odd.

What I'd like to do is measure the total wall time per file descriptor that the process spends blocked on. (which would require the tool knowing the different syscalls, and keeping track of what fds were which, or just looking it up as it goes....)

Can oprofile produce reports for that? I don't want to measure where CPU is used... I want to figure out where blocking is happening.

