I made myself a little table for the different cases and my game plan:
HTTP/1.0 HTTP/1.1 Default Conn close keep-alive Chunked? no yes Pipelining? no yes Do only 1.1 connections to the backend nodes. Client sends Perlbal gets Send via Find end via ------------ --------------------- ---------- ------------- 1.0 request 1.1 response (w/len) dumb copy count bytes 1.0 request 1.1 response (chunk) dechunk watch chunks 1.1 request 1.1 response (w/len) dumb copy count bytes 1.1 request 1.1 response (chunk) dumb copy watch chunks
I figure the only CPU-heavy case will be 1.0 requests (rare) to chunked replies (rare), so that's really rare, and I don't need to optimize for it... I just need to do it correctly, which is a lot easier.