?
brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Reverse proxies, multiple backends, access controls, ... [Sep. 10th, 2003|10:49 pm]
Brad Fitzpatrick
evan, avva, and I have been having fun with mod_rewrite's external rewrite engine support to do some whacky stuff. Combined with mod_proxy, using mod_rewrite's [P] support, we're going to be switching to a new type of load balancing for LiveJournal that's much smarter than our existing stuff. We'll still use the BIG-IP in front of it all to hit a random alive proxy, but we won't use the BIG-IP for internal load balancing where load and free connections fluctuate too quickly for the BIG-IP to make a good decision.

That'll all be open source in a couple days, but I've been thinking about something else:

Say we were to serve large files requiring complex access control. We'd need mod_perl to do authentication and authorization, but after that point, we don't want mod_perl wasting its resources feeding a slow client. Of course, we normally put mod_proxy in front of it to buffer and close the backend connection, but if the content's over a couple hundred kb, we'll end up wasting a lot of memory on the front-ends, or locking up the backends.

Now, imagine the large file's on some filesystem already, accessible by a lean, mean webserver like thttpd or TUX, if only it had the correct path (which is invariably going to be increadibly ugly due to load balancing and filesystem directory hashing to avoid large directories).

What I'd like to do is get the request with mod_proxy, send it to backend_fast (mod_perl), have backend_fast do authz/auth checks, then tell mod_proxy, not the client to do a redirect and get the resource from another backend server, the fast one (thttpd/TUX). Because the backend one won't have connection limits problems (or rather, hundreds times higher), the proxy won't need a lot of memory, and mod_perl won't be locked up.

So, the question is: is there an existing way to get mod_proxy (or Pound? or Squid? or any reverse proxy?) to do a redirect itself, instead of passing it along to the client. mod_proxy's ProxyPassReverse just rewrites the Location header that gets sent to the client. That's not a solution. We can't ever expose a URL going directly to the fast backend, bypassing access controls.

I imagine we could just hack mod_proxy/Pound/Squid, adding a new internal HTTP status code, but I'd like to think somebody else has done this.

Jumping a little ahead, something akin to LVS direct routing would be the best. Not sure how that'd interact, though: go right to mod_perl, and then have mod_perl tinker with some routing/headers and change the path through the system? I'd have to think about that longer... it's definitely more challening than just hacking a reverse proxy.
LinkReply

Comments:
[User Picture]From: edm
2003-09-11 02:10 am (UTC)

Local redirects

A few thoughts about this approach. The first is that if you return a relative URL in a redirect many servers in many situations will do a local redirect and fetch the resource themselves. I've not tried with Apache/mod_proxy, but at very least it'd be worth investigating; if it doesn't do it already it may be a quick point to hack something in.

Secondly when I last needed to do something like this (Apache/mod_perl site, serve large files to slow (modem) clients -- in this case mp3 files on a music sale site (I don't know if they still use the same technique)), what I did was have the Apache/mod_perl site generate a URL for the file on a thin HTTP server which:

  • Specified what file
  • Included some salt (eg, random text)
  • Included a crypto checksum of the file to retrieve, the salt, a shared secret, and some time/date stuff ,etc

Then on the thin HTTP server there was a tiny extension (CGI in this case, but you could probably hack it into the source) which validated the URL was a "good" redirect (ie, recalculated the crypto checksum), and if it was spooled the file out.

That kept the load on the Apache/mod_perl site down to creating the validation token, and allowed the thin HTTP server to validate access with a simple calculation (eg, no database access required).

IIRC we did client side redirects for this, but because there was a thin access control done on the thin HTTP server, it didn't matter -- the redirect URL was only good for the time that we allowed (a few minutes; you could probably make it shorter). Bypassing the access controls would require brute forcing the authentication calculation algorithm being used, which is one reason for including the salt to make it harder.

If I were doing it again now and really cared about the content getting out I'd probably consider using public key crypto instead of a shared secret; four years ago (when I did this) it wasn't as widely available/easily usable, and it didn't seem worth the extra CPU time.

Ewen

(Reply) (Thread)
[User Picture]From: edm
2003-09-11 02:24 am (UTC)

Re: Local redirects

One more thing I should have mentioned. If you send the right type of redirect code, even a client side redirect won't result in a changed URL displayed by the client. A 302 (temporary) or 303 (get content here) redirect would probably be suitable for this purpose. (If you're embedding the large files in a page it probably doesn't even matter.) You'll want a non-cached redirect in this situation anyway, as you'll need to regenerate the authentication URL after a few minutes.

Also where I say "the checksum of the file to retrieve,..." I of course mean of the filename (combined with the other stuff). The idea being that it should be trivial to validate, but difficult to generate without knowing some secret information. (Kind of like a NP-hard problem.)

Ewen

(Reply) (Parent) (Thread)
From: insom
2003-09-11 03:42 am (UTC)

By pass mod_perl?

Is there any way you could replace the heavy-weight mod_perl+Apache that you're using for auth[z], with a simple C module? Then have separate frontends just for doing this simple task, and leave full mod_perl ones for more important tasks?

Also I don't get what you mean about ProxyPassReverse not being good enough, we're using it to front some JSP servers without native Apache connectors, and it doesn't disclose the URLs - infact the JSP servers do not have public routable addresses.
(Reply) (Thread)
[User Picture]From: brad
2003-09-11 02:20 pm (UTC)

Re: By pass mod_perl?

It'd be too much work to do the authz in C. Very perl library dependent.

I know ProxyPass[Reverse] work, but I don't think you understood my post. I want the backend to tell mod_proxy to go to another host entirely, without exposing that host or HTTP redirect to the end client.
(Reply) (Parent) (Thread)
[User Picture]From: mart
2003-09-11 04:37 am (UTC)

Just invent your own wacky HTTP response code (say, 309 Wacky LJ Redirect) and then hack mod_proxy! You can probably also make it reject the “magic” URLs unless they came back in a Wacky LJ Redirect, thus avoiding the chance that anyone will be able to bypass authentication by accidental exposing of the URL.

I realise “just hack mod_proxy” is not as easy as saying it, but I can't imagine it'd be incredibly hard since it'd just involve creating a subrequest just like any Apache internal redirect, I assume.

(Reply) (Thread)
From: nbarkas
2003-09-11 11:14 am (UTC)
Would it be practical to have the proxy do any authentication? I was thinking the client could be given some kind of cookie, and when it tries to grab files from the stripped down fast servers, the proxy between it and them could do some cryptographic challenge stuff with the client and ask mod_perl again if that cookie was allowed to get the path it wanted. Or maybe it could take care of authenticating the request itself if it was a smart enough proxy.

Also, squid does rewrites as a reverse proxy.
(Reply) (Thread)
[User Picture]From: brad
2003-09-11 12:51 pm (UTC)
Squid doesn't seem to do what I want. Those docs you linked are the same as Apache's ProxyPassReverse.

Cryptographic hashes in ugly, temporary URLs aren't good. Kills client-side caching.
(Reply) (Parent) (Thread)