|thundering herd; bleh
||[Jul. 3rd, 2007|06:06 pm]
Imagine the following scenario:
See where this is going? :-)
- Parent process opens listening socket, but will never accept() on it. Opening in parent so future forked child processes will inherit the fd, and because it's a low port (perhaps) and we want to open it before we drop root.
- Child processes (one per CPU) inherit listening fd, but they're event-based, handling many connections each, so they can't do a blocking accept() on it...
So instead I guess I have to...
- So if each child has readability of that listening socket fd in its select/epoll/queue set, guess what? Thundering herd. The kernel wakes up all processes, but only one accepts the new client connection.
Feels kinda lame that I have to do this.
- have a socketpair between parent and each child for communication,
- watch readability of the listening fd in parent process,
- on readability, send a message to "best" child (for some definition of best) to accept on the listening socket which it has, but isn't watching readability on.
- perhaps have children tell the parent process when connections go away, so parent knows approximate load on each child.
Any better suggestions?
2007-07-04 10:44 am (UTC)
We actually moved to the model you described in SpamAssassin 3.1.0.
Previously we did the "preforked pool of servers all doing blocking accept"
thing, but that didn't allow scaling of the pool size to deal with demand. So
instead, I sat down with some of the Apache httpd guys, went over how Apache 2
preforking and pool scaling works, and came up with a pure-perl implementation.
That's what we now use. Here's the key bits:
Parent shares the accept fd with kids, and socketpairs between parent and each child, as you describe.
Parent process maintains pool size to always include N idle children, scales up/down children as the number of idle kids increases/decreases with load (Apache-style preforking).
Parent selects on the accept socket.
When a new incoming connection appears, it picks the lowest-numbered idle child and orders it to accept.
The children report back state ("idle", "busy") over the socketpair as they become idle, or are about to accept a connection.
The child waits for orders from the parent. If the parent orders a child to accept, it reports itself as "busy", accepts the conn, deals with it, then reports itself as "idle".
Note, btw, the use of lowest-numbered idle child; that's an easy way to keep
the same kid processes "hot". Apache httpd does the same thing (iirc). Since
the communication is therefore generally between processes that are swapped in,
and no swapping is required, this was a key benefit that makes this a little
faster than the traditional "preforked blocking accept" style, at least for
most casual users. (Of course a well monitored setup where the admin is
careful to ensure swap is never hit would probably be more efficient using
the traditional "blocking accept" model, so we still offer that; but most
people aren't that careful.)
We had a nasty bug that went on for a while on some loaded servers, but
eventually we got it fixed (deleting an entry from a hash in a SIGCHLD signal
is unsafe, doh!). Nowadays it seems to be quite stable.
The code's in our
SVN: lib/Mail/SpamAssassin/SpamdForkScaling.pm and