Log in

No account? Create an account
brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

I'm looking for something... oprofile? [Feb. 22nd, 2004|07:38 pm]
Brad Fitzpatrick
I can't for the life of me figure out this sporadic bottleneck I'm seeing on LJ's servers. When it happens I try to hunt it down and by the time I'm just looking around, it's gone. But it's seeming to happen more frequently lately.

DBs, memcaches, CPU, network... all seem fine.

So I have no clue why web processes are stacking up. Straced a few... see nothing odd.

What I'd like to do is measure the total wall time per file descriptor that the process spends blocked on. (which would require the tool knowing the different syscalls, and keeping track of what fds were which, or just looking it up as it goes....)

Can oprofile produce reports for that? I don't want to measure where CPU is used... I want to figure out where blocking is happening.

[User Picture]From: scsi
2004-02-22 08:14 pm (UTC)
Im sorta seeing the same thing too once in a great while. The Db/memcache/etc are all fine, yet the mod_perls's are sitting up on something on one or two webslaves.. If I restart apache, everything is fine again. But it appears to resolve itself in a few min or so.. Ghosts?
(Reply) (Thread)
[User Picture]From: eli
2004-02-22 08:48 pm (UTC)


"live" ?
(Reply) (Thread)
[User Picture]From: brad
2004-02-22 09:01 pm (UTC)

Re: Heh...

I'd hope that was a typo. Seems plausible, at least.
(Reply) (Parent) (Thread)
[User Picture]From: tytso
2004-02-24 05:14 am (UTC)
Unfortunately, no oprofile won't help for that. It will give you CPU utilization across both kernel and userspace, so it's great for certain problems, but not for measuring where a process might be blocked. Are your servers SMP machines? If so either oprofile, or a more specialized tool, lockmeter, can check to see if you might have some kind of spinlock contention.

If you really think you're blocked, have you simply done a "ps alx" listing, and looked at WCHAN so you can see at least which kernel function your processes might be stuck in a wait state?
(Reply) (Thread)