Log in

No account? Create an account
today rocks - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

today rocks [Apr. 19th, 2004|07:31 pm]
Brad Fitzpatrick
[Tags|, , ]

Good things about today that aren't computer-related:

-- Music (Modest Mouse and Aesop Rock)
-- Weather (pouring rain)
-- Bike (picked up my Trek Fuel 90 2004 w/ disc brakes)

But the main cool thing: I worked from 8:30am - 6:30pm on my web server/proxy and the time just flew. In summary, it is/has:

-- written in Perl, but using Linux epoll and sendfile (IO::Epoll, IO::SendFile)
-- all async I/O
-- all event-driven (no need for synchronization! so easy to program!)
-- beautiful code & class hierachy which the event framework calls into
-- class for parsing/manipulating HTTP headers, both response and request, including merging headers, but not Set-Cookie headers in HTTP responses.
-- beautiful infrastructure for read/write buffering
-- proxy can read-ahead the backend node's response into a configurably-sized buffer (default 250k) and close down the backend's connection while the buffer is fed to the client. if page is 280k, 250k will be buffered, and once client has 30k, the 250k buffer will now contain the end, and the backend will shut down.
-- supports reproxying: the backend web node (say, a huge mod_perl process) can return the HTTP response headers (content-length, mime-type, etc) but tell the proxy to return the file body from somewhere else (say, some NFS path), and that's what uses sendfile (which means I not only avoid having to do async IO to files (which you can't do easily(?) in Linux yet), but I also avoid copies to/from userspace). this is what we'll need for photo hosting on LJ, so modem users don't tie up mod_perls while they download many-MB files.

I love Perl and how easy it is to bang stuff like the above out in record time (12 hours so far). Admittedly I've given this all a lot of thought, but it's still cool. During the process I compared parts to the C# equivalent and C# was incredibly verbose. I don't think I can productively use a language without built-in complex datatypes, or at least type inference. (Although a few times today I got bitten by misnamed hash keys as class attributes.... :-/ I used accessor (getter/setter) functions between classes, but raw hash keys within a class implementation....)

[User Picture]From: j7xz49br3m93xrr
2004-04-19 07:38 pm (UTC)
Haha, that's why Perl is so fun to code in :-) I've seen you mention this proxying idea before, but I don't recall you saying why you wanted to make it, or what ultimate motivation there was. Is it for a specific use, or just to say 'I've written X'? (Yeah, I have a lot of things I wanna write just to say I have too ;-))
(Reply) (Thread)
[User Picture]From: brad
2004-04-19 07:40 pm (UTC)
I thought my post made it pretty clear why I needed to write this.
(Reply) (Parent) (Thread)
[User Picture]From: adamthebastard
2004-04-19 08:36 pm (UTC)
"this is what we'll need for photo hosting on LJ, so modem users don't tie up mod_perls while they download many-MB files."

He's writting it so LJ can offer more services at a better speed. Cos that's the cool kinda guy he is.
(Reply) (Parent) (Thread)
[User Picture]From: j7xz49br3m93xrr
2004-04-19 09:10 pm (UTC)
Yeah, and now I can feel a real 'plonker', because in my speed reading efforts, I seemed to have missed that whole section. D'oh! More coffee for me ;-) Thanks for pointing it out guys.
(Reply) (Parent) (Thread)
[User Picture]From: edm
2004-04-19 09:37 pm (UTC)

Why proxy?

The proxy is useful because (a) modems are very slow (56 kbps at best), (b) modems are very common (alas), (c) the process size of a webserver that can do dynamic content generation is fairly large (30MB or more is not uncommon, so you can only fit so many on a machine), (d) the dynamic web processes typically tie up a database connection (or several) and you can only have so many of those.

Combine (a) and (b) (modems are slow and common) with (c) and (d) (processes generating dynamic content are large), and pretty soon you realise that you'd really rather not have those dynamic content generation processes tied up waiting for the modem to receive the rest of the data. This is especially so if you're sending out 100 kB or more of data, where the time to download on a 56 kbps modem is typically measured in 10s of seconds.

So how do you solve this problem? Buy more server hardware? Nah, too expensive. Ban modem users? Attractive, but it'd never fly. Instead, you put in a proxy between the slow modem users and the fast dynamic content generation processes -- it sucks up all the data from the dynamic content generation processes quickly, freeing them up for the next user, and the slow modem user can download their data over 10s of seconds without tying up many resources on the servers.

Hence Brad's proxy software. There are various other things around which try to do the same thing, but many of them (eg, Apache in reverse proxy mode) have limitations on buffers -- and as soon as you fill the buffer you get the same problem of tying up (large) dynamic content generation processes.

Brad: the "redirect" to a file available via NFS and then using sendfile() is a cute touch; I've not seen that particular one before. (Although I've done redirects to files served off, eg, thttpd in the past. But as you've said before that won't work in the LJ situation due to a combination of requiring authentication and requiring stable URLs for caching.)

(Reply) (Parent) (Thread)
[User Picture]From: brad
2004-04-19 11:28 pm (UTC)

Re: Why proxy?

Cable users aren't much better than modem users... they're still a ton slower than LJ's internal network. So proxying is always good.

Yeah, the internal redirect ("reproxy") stuff is sweet. Lets me have my cake (mod_perl auth, DB lookup of URIs) and eat it too (serve static files easily).
(Reply) (Parent) (Thread)
[User Picture]From: scosol
2004-04-20 12:55 pm (UTC)

Re: Why proxy?

I would add that the buffering is the key- proying without buffering wouldn't really change anything-
Some of the commercial "accellerator" boxes additionally do neeto stuff like request-aggregation, where multiple incoming requests for stuff get queued and aggregated so that you can do things like serve 5 frontend requests for the same object from one backend request (buffering it like you say)- or just pipeline stuff so you make a few large transactions instead of lots of small ones.

In a model like this memory becomes your limiting factor, so the request aggregation can *really* help (and the gains are even more if you're talking about slow transfers)

Oh and I just got my new bike too! yay! :)
(Reply) (Parent) (Thread)
[User Picture]From: brad
2004-04-20 01:10 pm (UTC)

Re: Why proxy?

That aggregation would help with static content, but this is all going to be for dynamic stuff, so we couldn't find enough dups to merge, since everybody's cookies/sessions matter.
(Reply) (Parent) (Thread)
[User Picture]From: scosol
2004-04-20 02:54 pm (UTC)

Re: Why proxy?

Oh- so you're sticking this *before* the app-logic- yeah, aggregation cant do anythign in front of that stuff hahah
(Reply) (Parent) (Thread)
[User Picture]From: xaosenkosmos
2004-04-19 08:04 pm (UTC)
Perl really wants C#'s handy-dandy accessor method magic (get/set). It's so damned convenient.

Before you complain about the C# stuff, always remember the painful nature of network I/O in C.
(Reply) (Thread)
From: evan
2004-04-20 09:37 am (UTC)
it's the right idea, but i think they got the implementation of it ugly.
(Reply) (Parent) (Thread)
[User Picture]From: billiam
2004-04-19 08:18 pm (UTC)
yah, well, i love perl jam
(Reply) (Thread)
[User Picture]From: eli
2004-04-19 09:43 pm (UTC)
Damn, dude....disc breaks. You'd better represent now :-)
(Reply) (Thread)
From: insom
2004-04-20 01:37 am (UTC)
Why use async IO for files again? I mean if you're worried about the disk blocking because its slow, does sendfile() not suffer from that too?
(Reply) (Thread)
[User Picture]From: brad
2004-04-20 09:02 am (UTC)
Not sendfile to a O_NONBLOCK socket.
(Reply) (Parent) (Thread)
[User Picture]From: chadhorn
2004-04-20 03:58 am (UTC)

PHP with Passion

Whilst Perl was once a twinkle in my eye, I have moved on to bigger and better things (PHP for instance) in my life of programming... :)
(Reply) (Thread)
[User Picture]From: brad
2004-04-20 09:00 am (UTC)

Re: PHP with Passion

(Reply) (Parent) (Thread)
From: melusina_shadow
2004-04-20 11:58 am (UTC)
The photo stuff sounds very exciting...I'm a meer geek in training so I don't know what you are talking about half the time but it's still interesting to read. :)
(Reply) (Thread)
From: (Anonymous)
2004-04-28 04:39 am (UTC)

Why not Apache2 + mod_proxy with big buffers?


Did you think about using a apache2 frontend with mod_proxy in accelerated mode, and mod_rewrite to pass gif's and static content directly from apache2?

I've been doing it with big mod_perl process in the backend, and it seems to work great.

It does I think all of what you talk about.
(Reply) (Thread)
[User Picture]From: brad
2004-04-28 09:25 am (UTC)

Re: Why not Apache2 + mod_proxy with big buffers?

I've done that in the past. In fact, most of LiveJournal is load-balanced like that.

The problem is you can't make the buffers big enough for huge files, and it ends up copying the files around a lot, both between machines (or across a socket on the local machine) and into buffers in user/kernel space.
(Reply) (Parent) (Thread)