Then Michael and I noticed waves in the system calls from watching strace. My fear had come true: the synchronous file (NFS, worse) opens/stats in all the async network code were just killing it.
After fixing up the CPAN module Linux::AIO to compile in Perl 5.8 and Linux 2.6, I plugged it in to make all codepaths async and now...
29 Mbps. One process. One machine. So damn cool. (Our total bandwidth usage is like 170 Mbps, so this is a big piece.....)
And over 10x less active connections, much more CPU usage, etc.