Log in

No account? Create an account
thundering herd; bleh - brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

thundering herd; bleh [Jul. 3rd, 2007|06:06 pm]
Brad Fitzpatrick
[Tags|, , ]

Imagine the following scenario:
  • Parent process opens listening socket, but will never accept() on it. Opening in parent so future forked child processes will inherit the fd, and because it's a low port (perhaps) and we want to open it before we drop root.
  • Child processes (one per CPU) inherit listening fd, but they're event-based, handling many connections each, so they can't do a blocking accept() on it...
See where this is going? :-)
  • So if each child has readability of that listening socket fd in its select/epoll/queue set, guess what? Thundering herd. The kernel wakes up all processes, but only one accepts the new client connection.
So instead I guess I have to...
  • have a socketpair between parent and each child for communication,
  • watch readability of the listening fd in parent process,
  • on readability, send a message to "best" child (for some definition of best) to accept on the listening socket which it has, but isn't watching readability on.
  • perhaps have children tell the parent process when connections go away, so parent knows approximate load on each child.
Feels kinda lame that I have to do this.

Any better suggestions?

From: jmason
2007-07-04 10:44 am (UTC)

SA preforking

We actually moved to the model you described in SpamAssassin 3.1.0.

Previously we did the "preforked pool of servers all doing blocking accept" thing, but that didn't allow scaling of the pool size to deal with demand. So instead, I sat down with some of the Apache httpd guys, went over how Apache 2 preforking and pool scaling works, and came up with a pure-perl implementation. That's what we now use. Here's the key bits:

  • Parent shares the accept fd with kids, and socketpairs between parent and each child, as you describe.
  • Parent process maintains pool size to always include N idle children, scales up/down children as the number of idle kids increases/decreases with load (Apache-style preforking).
  • Parent selects on the accept socket.
  • When a new incoming connection appears, it picks the lowest-numbered idle child and orders it to accept.
  • The children report back state ("idle", "busy") over the socketpair as they become idle, or are about to accept a connection.
  • The child waits for orders from the parent. If the parent orders a child to accept, it reports itself as "busy", accepts the conn, deals with it, then reports itself as "idle".

Note, btw, the use of lowest-numbered idle child; that's an easy way to keep the same kid processes "hot". Apache httpd does the same thing (iirc). Since the communication is therefore generally between processes that are swapped in, and no swapping is required, this was a key benefit that makes this a little faster than the traditional "preforked blocking accept" style, at least for most casual users. (Of course a well monitored setup where the admin is careful to ensure swap is never hit would probably be more efficient using the traditional "blocking accept" model, so we still offer that; but most people aren't that careful.)

We had a nasty bug that went on for a while on some loaded servers, but eventually we got it fixed (deleting an entry from a hash in a SIGCHLD signal handler is unsafe, doh!). Nowadays it seems to be quite stable.

The code's in our SVN: lib/Mail/SpamAssassin/SpamdForkScaling.pm and lib/Mail/SpamAssassin/SubProcBackChannel.pm .

(Reply) (Thread)