?

Log in

No account? Create an account
brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

LWPx::ParanoidAgent [May. 18th, 2005|11:55 pm]
Brad Fitzpatrick
[Tags|, , ]

This afternoon/evening I learned LWP inside and out, in order to make LWPx::ParanoidAgent, a subclass of LWP::UserAgent that protects you from evil. In particular:

-- won't connect to private, loopback, or multicast IPs. including on redirects

-- configurable blacklist of hostnames or hostname regexps

-- avoids a malicious/accidental tarpitting webserver, sending 1 byte per 9 seconds, to avoid LWP::UserAgent's timeout parameter if you set it at, say, 10 seconds. (my ParanoidAgent has a global timeout, including across all redirects)

I had to end up forking LWP::Protocol::http since there's pretty much just one huge function in it, and I had to change it. So LWPx::ParanoidAgent is subclassed, and it allows only two schemes, http and https, which map to protocols LWPx::Protocol::http_timelimit and https_timelimit. (https code is like 20 lines, just calling the http code after a different socket is made) Proxy support is explicity removed, telling you to do your paranoia on your proxy if you want to use a proxy.

This will be released on CPAN at about the same time as OpenID::Consumer, which lets you specify your own LWP::UserAgent subclass. Currently LiveJournal uses a version of this called SafeAgent.pm, but it has-a LWP::UserAgent in it, and it's not a sub-class, so it's annoying to use since it doesn't always work everywhere. And it uses alarm(), which just always sucks, and isn't portable. ParanoidAgent just tracks the time remaining and sets the select timeouts accordingly.

I had to end up writing a little httpd (just run from xinetd to make it easy) that lets me specify redirects and timeouts from the URL.

The test suite was fun:

#!/usr/bin/perl
use strict;
use LWPx::ParanoidAgent;
use Time::HiRes qw(time);
use Test::More 'no_plan';

my ($t1, $td);
my $delta = sub { printf " %.03f secs\n", $td; };

my $ua = LWPx::ParanoidAgent->new;
ok((ref $ua) =~ /LWPx::ParanoidAgent/);

my $HELPER_SERVER = "http://brad.lj.sixapart.com:8001";
$ua->blocked_hosts(
qr/\.lj$/,
);

my $res;

# black-listed via blocked_hosts
$res = $ua->get("http://brad.lj/");
print $res->status_line, "\n";
ok(! $res->is_success);

# checking that port isn't affected
$res = $ua->get("http://brad.lj:80/");
print $res->status_line, "\n";
ok(! $res->is_success);

# this domain is okay. sixapart.com isn't blocked
$res = $ua->get("http://brad.lj.sixapart.com/");
print $res->status_line, "\n";
ok( $res->is_success);

# internal. bad. blocked by default by module.
$res = $ua->get("http://192.168.64.35/");
print $res->status_line, "\n";
ok(! $res->is_success);

# okay
$res = $ua->get("http://danga.com/temp/");
print $res->status_line, "\n";
ok( $res->is_success);

# localhost is blocked, case insensitive
$res = $ua->get("http://LOCALhost/temp/");
print $res->status_line, "\n";
ok(! $res->is_success);

# redirecting to invalid host
$res = $ua->get("$HELPER_SERVER/redir/http://192.168.64.35/");
print $res->status_line, "\n";
ok(! $res->is_success);

# redirect with tarpitting
print "4 second redirect tarpit (tolerance 2)...\n";
$ua->timeout(2);
$res = $ua->get("$HELPER_SERVER/redir-4/http://www.danga.com/");
ok(! $res->is_success);

# lots of slow redirects adding up to a lot of time
print "Three 1-second redirect tarpits (tolerance 2)...\n";
$ua->timeout(2);
$t1 = time();
$res = $ua->get("$HELPER_SERVER/redir-1/$HELPER_SERVER/redir-1/$HELPER_SERVER/redir-1/http://www.danga.com/");
$td = time() - $t1;
$delta->();
ok($td < 2.5);
ok(! $res->is_success);

# redirecting a bunch and getting the final good host
$res = $ua->get("$HELPER_SERVER/redir/$HELPER_SERVER/redir/$HELPER_SERVER/redir/http://www.danga.com/");
ok( $res->is_success && $res->request->uri->host eq "www.danga.com");

# dying in a tarpit
print "5 second tarpit (tolerance 2)...\n";
$ua->timeout(2);
$res = $ua->get("$HELPER_SERVER/1.5");
ok(! $res->is_success);

# making it out of a tarpit.
print "3 second tarpit (tolerance 4)...\n";
$ua->timeout(4);
$res = $ua->get("$HELPER_SERVER/1.3");
ok( $res->is_success);
LinkReply

Comments:
From: evan
2005-05-19 08:35 am (UTC)
isn't has-a fine for perl as long as you provide all of the same methods? just wondering.
(Reply) (Thread)
[User Picture]From: mart
2005-05-19 08:57 am (UTC)

It is, but SafeAgent doesn't. (or at least didn't, last I looked.) Perl binds on the name without caring if it's overridden or what.

Also, I seem to remember that SafeAgent works slightly differently, so it doesn't mimic LWP::UserAgent's interface.

(Reply) (Parent) (Thread)
From: pos_le_terrible
2005-05-19 12:14 pm (UTC)
"Perl binds on the name without caring if it's overridden or what."

You can use the interface.pm pragma for that. It's a very simple yet really useful module.

This is one of the two modules I use all the time when doing OO stuffs in Perl, the first one being enum::fields (much more efficient and sane than the fields pragma)
(Reply) (Parent) (Thread)
[User Picture]From: torgo_x
2005-05-19 09:02 am (UTC)

Nice!

I'm glad to see I'm not the only person who's poked around inside LWP and survived.
(Reply) (Thread)
From: pos_le_terrible
2005-05-19 12:34 pm (UTC)
"And it uses alarm(), which just always sucks"

Why? (other than not being portable)
Does it have performance issues?
Are you planning to use select() and wait for readability instead?
(Reply) (Thread)
[User Picture]From: brad
2005-05-19 05:16 pm (UTC)
Well, it uses signals, and signals suck in Perl... with unsafe signals, Perl crashes. With safe signals, alarm fucks up the return values of long, blocking system calls and doesn't really work. (See the Sys::SigAction manpage)
(Reply) (Parent) (Thread)
From: pos_le_terrible
2005-05-19 06:15 pm (UTC)
The old behavior (unsafe signals) is back since perl 5.8.2 or 5.8.3 I think. But, thinking about it, it's true that I tend to do too much things in my signal handlers. That's a risky habit!

Thanks for the link.
(Reply) (Parent) (Thread)