brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Naming twins in Python & Perl [Jan. 6th, 2008|02:16 pm]
[Tags|, , , ]

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.

I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.

Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:
#!/usr/bin/python

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    letters.sort()
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []
    by_anagram[sorted_name].append(name)

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:
        continue

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy:
['TROY', 'TORY']
['CLAY', 'LACY']
['JEFFREY', 'JEFFERY']
['ANGEL', 'GALEN']
['FOREST', 'FOSTER']
['ANDRE', 'DAREN', 'ARDEN']
['LEROY', 'ELROY']
['LEONARD', 'RENALDO', 'LEANDRO']
['ARIEL', 'ARLIE']
['BRENDAN', 'BRANDEN']
['WARREN', 'WARNER']
['DEAN', 'DANE']
['CHRISTOPER', 'CRISTOPHER']
['COLE', 'CLEO']
['CARLO', 'CAROL']
['ELMER', 'MERLE']
['REUBEN', 'RUEBEN']
['JOSEPH', 'JOESPH', 'JOSPEH']
['COREY', 'ROYCE']
['JASON', 'JONAS']
['RAMON', 'ROMAN']
['JAMIE', 'JAIME']
['CARMELO', 'MARCELO']
['BYRON', 'BRYON']
['LEON', 'NOEL', 'OLEN']
['NEAL', 'LANE']
['MICHAEL', 'MICHEAL', 'MICHALE']
['KEITH', 'KIETH']
['BERT', 'BRET']
['BRIAN', 'BRAIN']
['OLIN', 'LINO']
['DION', 'DINO']
['DANA', 'ADAN']
['RONALD', 'ROLAND', 'ARNOLD']
['ISRAEL', 'ISREAL']
['DARNELL', 'RANDELL']
['ANTOINE', 'ANTIONE']
['ORLANDO', 'ROLANDO', 'ARNOLDO']
Just now, I was wondering the equivalent in Perl, and wrote:
#!/usr/bin/perl

use strict;
open (my $fh, "dist.male.first") or die;
my %by_anagram;
while (<$fh>) {
    chomp;
    s/\s.*//;
    my $name = $_;
    my $sorted_name = join('', sort split //, $name);
    push @{$by_anagram{$sorted_name}}, $name;
}

foreach my $sn (grep { @{$by_anagram{$_}} > 1 } keys %by_anagram) {
    print "@{$by_anagram{$sn}}\n";
}
In particular, I like about Python that errors-are-exceptions is the norm. But I like that regexps are built into Perl. (also hate Python's general hating on functional programming, unrelated to this post) I'm sure my Python could be way shorter, too. Anybody want to post either their short Python version, or their more-idiomatic Python version?

Also interesting: (fastest of a 3 consecutive runs each)
sammy$ time ./ananames.pl > /dev/null

real	0m0.026s
user	0m0.024s
sys	0m0.004s

sammy$ time ./ananames.py > /dev/null

real	0m0.043s
user	0m0.036s
sys	0m0.008s
Link77 comments|Leave a comment

RT is so slow [Sep. 29th, 2007|12:16 pm]
[Tags|, , , ]

Dear Jesse Vincent,

rt.cpan.org is still unusably slow. I don't care the specs of the box, how it's configured, or who runs it. It reflects poorly on your company and your software and I'd be embarrassed if I were you to have rt.cpan.org still running.

Please host this yourself, or buy perl.org a new box for this, and maintain it for them. Or take your name off the front page of rt.cpan.org to distance yourself from it. Either way.

(just grumpy because people are still filing tickets in it, most of which I want to close with WONTFIX or REJECTED, but the system's too slow to even let me do that....)

So note to everybody else: I will ignore your rt.cpan.org tickets and patches. Use the mailing lists. My inbox is my ticketing system.

Lovingly always,
Brad
Link15 comments|Leave a comment

rejecting residential zombies with qpsmtpd [Aug. 14th, 2007|03:29 pm]
[Tags|, , , ]

Qpsmtpd continues to be fun. For the past week or so, danga.com's mail has been running with qpsmtpd in front, instead of Postfix, and qpsmtpd now handles:

-- normal logging to syslog
-- logging all connections in structured/indexed way to mysql, many columns per connection
-- SMTP AUTH, letting me/friends use danga.com as our outgoing mail server
-- check_earlytalker (reject clients who speak before spoken to, as lot of spammers do)
-- counting bad commands (reject clients who HTTP POST to port 25 via open web proxy)
-- DNS RBL checks
-- rejecting non-members from posting to GNU Mailman lists
-- not letting people send mail as @danga.com, @bradfitz.com, etc, etc without SMTP-AUTH
-- rejecting mails to non-existent users (no bounces later, 5xx immediately)
-- spam checks via spamassassin
-- virus (and phishing) checks via ClamAV's clamd
-- queuing to postfix (which then delivers to local users via .forward/.procmailrc/Maildir, whatever)

My Postfix config, meanwhile, has shrunk to barely anything.

But after analyzing my logs in MySQL, I still wasn't happy. I saw that almost all my spam (~98%) came from residential DHCP/DSL/Cable connections, with stupid-looking, easily-recognizable hostnames, like:
+---------------------------------------------------+
| remotehost                                        |
+---------------------------------------------------+
| pool-71-187-1-147.nwrknj.fios.verizon.net         | 
| APointe-a-Pitre-103-1-66-27.w80-8.abo.wanadoo.fr  | 
| 80-219-140-217.dclient.hispeed.ch                 | 
| dsl-189-133-79-229.prod-infinitum.com.mx          | 
| c-71-196-65-212.hsd1.fl.comcast.net               | 
| dsl-189-174-204-42.prod-infinitum.com.mx          | 
| 150.218.111.218.klj03-home.tm.net.my              | 
| 201-25-211-207.fnsce702.dsl.brasiltelecom.net.br  | 
| d83-186-127-150.cust.tele2.be                     | 
| 24-159-245-214.dhcp.mdsn.wi.charter.com           | 
| aolclient-68-202-241-155.aol.cfl.res.rr.com       | 
| c-71-61-26-51.hsd1.pa.comcast.net                 | 
| i05v-87-90-179-109.d4.club-internet.fr            | 
| 85.137.89.143.dyn.user.ono.com                    | 
| c-76-100-225-155.hsd1.md.comcast.net              | 
| dslb-088-074-000-186.pools.arcor-ip.net           | 
| pD955ED51.dip.t-dialin.net                        | 
| 183.42.33.65.cfl.res.rr.com                       | 
| 82-42-29-1.cable.ubr02.knor.blueyonder.co.uk      | 
| client-200.121.44.138.speedy.net.pe               | 
| c-68-58-108-131.hsd1.in.comcast.net               | 
| host81-154-148-241.range81-154.btcentralplus.com  | 
| 216-199-78-234.atl.fdn.com                        | 
| 218.206.98-84.rev.gaoland.net                     | 
| cpe-65-28-157-135.bak.res.rr.com                  | 
| 82-45-185-97.cable.ubr01.chel.blueyonder.co.uk    | 
| KH222-156-64-178.adsl.dynamic.apol.com.tw         | 
| dslb-084-056-113-198.pools.arcor-ip.net           | 
| 195.64.189.72.cfl.res.rr.com                      | 
| dsl-189-179-129-154.prod-infinitum.com.mx         | 
| cpe-24-175-188-97.stx.res.rr.com                  | 
| 78-56-100-93.ip.zebra.lt                          | 
| 75-131-161-195.dhcp.spbg.sc.charter.com           | 
| host-81-190-121-194.gdynia.mm.pl                  | 
| c9112907.rjo.virtua.com.br                        | 
| ppp-124.120.110.148.revip2.asianet.co.th          | 
| dsl-189-144-11-85.prod-infinitum.com.mx           | 
| 201-95-47-232.dsl.telesp.net.br                   | 
| 202.31.101-84.rev.gaoland.net                     | 
| i05v-212-194-122-1.d4.club-internet.fr            | 
| 201-88-68-211.cbace700.dsl.brasiltelecom.net.br   | 
| dslb-082-083-049-211.pools.arcor-ip.net           | 
| 89.140.22.68.static.user.ono.com                  | 
| cpe-76-170-82-98.socal.res.rr.com                 | 
| 112.75.128.219.broad.fs.gd.dynamic.163data.com.cn | 
| S01060015e97b2781.du.shawcable.net                | 
| c-24-8-151-233.hsd1.co.comcast.net                | 
| 20150032076.user.veloxzone.com.br                 | 
| host-69-221-111-24.midco.net                      | 
| 67-130-61-6.dia.static.qwest.net                  | 
+---------------------------------------------------+

So I wrote a little Perl module (to be released) that incorporates a few dozen tests and regexps, looking for the IP address (or part of it) encoded in the hostname any number of totally f'ed up ways, and also double-checking that the user isn't some home Linux user running their own mail server... if their HELO hostname resolves to their source IP, I allow an ugly reverse DNS hostname. You often can't control your reverse. And forward DNS is free, so it's not an unreasonable barrier to entry.

Check out this table:
http://bradfitz.com/hacks/antispam/lj-example-table.txt

In conclusion, a lot can be learned from the (IP, reverse DNS of that IP, declared HELO host) tuple. I'm now rejecting 50% of incoming connections at the MAIL FROM stage (have to give them a chance to AUTH after HELO), based on the (IP, reverse DNS, HELO host) alone. I reject the remaining 50% (25% of total) via spamassassin, check_earlytalker, and dsnbl, and then I queue/deliver 25% of the total email to users on danga.com.

Yay qpsmtpd!
Link20 comments|Leave a comment

Random Tokyo Observations [Apr. 5th, 2007|09:18 pm]
[Tags|, , , ]

This list is by no means complete, and true to the <ul> tag's name, it's an unordered list. So... random observations on Tokyo:
  • YAPC::Asia was fun.
  • Here are my slides. And in Japanese too! Thanks, translators! (it's less of a regurgitation of previous slides than normal... lot of new stuff at the end, and early stuff is condensed a lot)
  • Cask rocks. Whitaker, you'd have loooooved it. And at only ~$20/drink, what a deal! :P
  • While I'm not one of those people that's all into Asian girls, I'm not against Asian girls... but what I really respect is the whole presentation: fancy clothes, makeup, accessories, tall socks, shiny big boots, skirts, elaborate hair... these girls really get into it. It's actually quite impressive. Makes me feel like a slob, even when I'm what-I-feel-is dressed up.
  • I know maybe a dozen Japanese words now, but my favorite is definitely 出口. I don't know how you say it, but I say "double trident walking box". It means "exit" and I see it everywhere. It got me successfully out of a bathroom that had two matching doors. I think the other one was a maintenance closet.
  • They drive on the left side? They also walk on the left side. Never knew that.
  • Good service.
  • Being surrounded by Japanese writing that I can't even pronounce or recognize or even attempt to understand is somewhat demoralizing. Fortunately the subway is in Rōmaji. Phew.
  • Everything on little trays. Bills, money, passports. Trays, trays, trays.
  • The whole business card exchange formality fanciness.
  • Bowing. (which I thought I'd have trouble adjusting to, but came without even thinking about it... weird.)
  • Everybody in suits & ties.
  • Face masks. If you're sick, you wear a face mask. Otherwise you're just an asshole that should be shunned or something. Because seriously, tons of face masks... at any one time, you can look around and see several of them.
  • Their phones suck. Big, boxy, plastic, not-shiny, large-lines. They look cheap, not sexy/sleek. This place is the future in so many regards (especially timezone!), so why are the phones so crap looking? Mystery.
  • My hotel is the future.
  • Except my hotel's TV turns on everytime I enter the room. So then I have to walk over and turn off the game show or whatever's on. I don't know how to disable this feature of my hotel room. I don't want the TV on whenever I enter.
  • What are all those buttons on the toilet? I'm afraid to press them.
  • The Engrish cracks my shit up. This might deserve its own post, with photos. I'll tease you with one: "Please shut the door when you take a bath. Because hot air of the bathroom makes a fire alarm ring." With a picture of a shower, not a bath. And arguably there are two fire alarm sensors here, so their use of the indefinite article "a" could be construed as correct, but there's just one alarm... or the system is one... or it's just colloquial to treat all fire alarms as singular. Like everybody in my Russian class messed up on a recent test thinking одежда (clothes) is plural, even though we could see the ending and know it's not. But I guess "clothing" is singular. In any case, I find myself over-analyzing all the Engrish.
  • I notice white people. Sometimes we pass on the street and make understanding eye contact with each other. ("What's your story?") Makes me think of this comic kinda. Actually hover your mouse on that comic. That alttext inspired me to start up a conversation recently (in Belize) in which I totally got shot down, but it was oh-so-fun and worth it for the story, so I totally don't regret it. <3 xkcd.
  • Vending machines.
  • Vending machines take $100 bills (10,000 Yen).
  • $5 coins, $1 coins, etc... Coins are worth stuff.
  • Japanese addresses are pure chaos. Craziness. Especially when you can't read Japanese, which would only make it slightly more sane.
  • ......
This post is long enough. I'll stop.

Conference is now done. It's 10 pm... should I go back out? Not sure what I'd do, and I'm already fading pretty quick. I think I'll wake up early and go to the fish market which keeps getting recommended to me.
Link53 comments|Leave a comment

Detecting equivalance of audio files [Dec. 29th, 2006|01:44 pm]
[Tags|, , ]

After several days of hacking on MogileFS, I decided to switch gears today and work on personal stuff. I boiled my personal problems down to:
  • getting all my DVDs ripped
  • consolidating three computers' mp3s into one unified, tagged collection
  • finishing brackup
Somehow I decided that the answer to all the above was more disk space, so I bought the Norco DS-1220 which [info]gholam had recommended (after deciding that the Port Multiplier support in Linux was basically mature). And researched the best disks to populate it with, but can get those later (Western Digital 500GB WD-5000YS)

After all that, I remembered I had a 300 GB external drive that I could use for mp3 consolidation scratch space. So on to that project....

Previously, my canonical location for mp3s was on my home server. Then my laptop kinda became the new place (and where I had everything auto-tagged, which I wrote on an airplane). Then I have some mp3s on my desktop at home. In other words: fucking gross.

Problem statement: How to merge all my music together?

Sub-problem: How to tell if two files are the same, if their paths are different, and their checksums are different (because, say, one's been tagged)?

Answer: decode the mp3/ogg to stdout, rather than a soundcard, and checksum the audio stream! (source: audmd5)

Demo:

$ md5sum "sammy/Weezer/02 Pinkerton/01 Tired of Sex.mp3"
b0298cdf1c2135f13d788863cb221ca3
$ md5sum "laptop/Weezer/1996-Pinkerton/Weezer - Pinkerton - 01 - Tired of Sex.mp3"
aa637f841945da67c2aad1f8c2b4ce16
$ audmd5 "sammy/Weezer/02 Pinkerton/01 Tired of Sex.mp3"
8c0952de1e8d13c3ab079adc4a21a400
$ audmd5 "laptop/Weezer/1996-Pinkerton/Weezer - Pinkerton - 01 - Tired of Sex.mp3"
8c0952de1e8d13c3ab079adc4a21a400

... one more tool in my arsenal to fight my mp3 collection.
Link41 comments|Leave a comment

LJ Talk activity (or, ejabberd vs djabberd) [Oct. 16th, 2006|12:40 pm]
[Tags|, , , ]

I've been watching our LJ Talk ganglia stats and also comparing them to the Jabber.org status (which runs ejabberd).

Our memory usage, even with a known memory leak, is way better. ejabberd seems to take 184 kB/connection, while djabberd is currently using 34 kB/connection. (which includes leaked data .... when it starts it's closer to 5 kB/connection)

In a week or two it looks like our connected clients will overtake jabber.org's too, at least with our current rate of growth. They currently peak at ~10,000 users. Our peak, currently at 4,000 users, keeps climbing each day, from a peak of just 1,000 a few days ago.

At least it's really easy now to track down memory leaks, using Devel::Gladiator and $^P |= 0x200, and Devel::Peek::CvGV .....

All objects in memory ...
Large dump.... )

So that'll give me something to do on the plane, too.
Link11 comments|Leave a comment

greylisting 4xx patterns [Aug. 17th, 2006|04:01 pm]
[Tags|, , ]

We're building a list of error messages as given out by greylisting email servers so we can pattern-match on it and re-schedule the email exactly when we're told it's okay to.

Here's the patterns we've seen so far:
451 Greylisting enabled, try again in 1 minutes                                                                                                                  
451 4.7.1 Greylisting in action, please come back in 00:09:00                                                                                                     
451 4.7.1 Greylisting in action, please come back later                                                                                                   
450 <xxx@xxx.com>: Recipient address rejected: Greylisted for 181 seconds
450 4.7.1 <xxx@xxx.com>: Recipient address rejected: Greylisted for 300 seconds (see http://isg.ee.ethz.ch/tools/postgrey/help/xxxxxx.html)
450 <livejournal.com[204.9.177.18]>: Client host rejected: Policy Rejection- GreyList learning. Please try later.
450 <xxx@xxx.com>: Recipient address rejected: Policy Rejection- Hotkey Greylisting in progress ... Please try again after 2 minutes
451 sender/recip/ip triad greylisted; retry AFTER A DECENT INTERVAL will succeed
450 <xxx@xxx.com>: Recipient address rejected: Greylisting in action. Please try delivery again in 240 seconds.
451 4.3.0 Temporarily greylisted as anti-spam measure.  Please try again.
451 <xxx@xxx.com>: Recipient address rejected: Service is greylisted.  Waiting for retransmit.
etc, etc.

Think I need to write a CPAN module just to return the number of seconds to retry given a string.
Link22 comments|Leave a comment

Djabberd connections, continued... [Jun. 27th, 2006|02:30 pm]
[Tags|, , ]

Just did 174,700 connections with 606 MB of memory. That's 3.5kB/connection, inclusive the initial 14 MB startup size.
Link4 comments|Leave a comment

djabberd: c10k? hah! [Jun. 26th, 2006|10:09 pm]
[Tags|, , , ]

DJabberd just did 25,200 (fully setup) connections with 97 MB of RAM before my Xen instance ran out of memory. It's now 3.4kB of overhead per connection (contrast to 30kB this morning) but there's still obvious ways to trim it down. Should be able to get it down to 2kB. The big win was when I implemented a [forget design pattern name] system where libxml parsers are shared, returned, kept on a freelist, etc.

From what Artur and I can tell, this is better than most/all the other jabber servers out there.

It means with 1GB of ram we can do 300k connections per process. (8GB of RAM boxes, 2x 2x core)

<3 epoll.
Link23 comments|Leave a comment

readahead / blocking sendfile [Jun. 5th, 2006|11:34 pm]
[Tags|, , , , ]

I've had a known inefficiency in Perlbal for ages now and finally broke down and fixed it. The inefficiency is that sendfile can block, even if the destination fd is a non-blocking socket, because the source fd (a disk-based file), can force a disk read if it's not already in pagecache.

FreeBSD has a fancy sendfile that lets you request it not block, but Linux doesn't.

The solution on Linux is to do a readahead() call first in another thread, or just sendfile() in another thread, either of which IO::AIO can do. I wanted to test the theory changing as little code as possible, so I went with the async readahead.

Before I did that, though, I wrote a test case.

The test case runs two processes in parallel: one fetching 3 small hot files over and over again, measuring the mean speed of 100 requests. The other process is there just to mess with the first one: it doesn't actually output anything. The second process either fetches the same 3 small files, or with the "big" parameter, fetches seven 100MB in a loop, more than this xen instances's 512 MB of memory. The idea is see if the disk reads serving the big files stall the event loop and decrease turn-around time.

Yup:

lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big;
mean: 0.287987213134766, stddev: 0.0829109309255669
mean: 0.279777903556824, stddev: 0.0957734761804354
mean: 0.238886480331421, stddev: 0.0949280425469577

mean: 0.351436612606049, stddev: 0.0791952577383974
mean: 0.361295075416565, stddev: 0.0863025646086743
mean: 0.3904807305336, stddev: 0.173639453608837

The first set of three lines is the time to serve small files with other small files being served in the background. The second send of three is serving small files with big files being served in the background.

Adding in the async readahead() call, doing the sendfile in the callback (once the data to be sendfile'd is in the pagecache), and the results even out a bunch:

lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big;
mean: 0.296060967445374, stddev: 0.0586433388625736
mean: 0.262518639564514, stddev: 0.0726501212827927
mean: 0.285000162124634, stddev: 0.0321991597111094

mean: 0.302280473709106, stddev: 0.0811435349061447
mean: 0.303003549575806, stddev: 0.0787071540895621
mean: 0.298841729164124, stddev: 0.0953137343692458

Probably some more work to be done, but promising.
Link10 comments|Leave a comment

Short-circuiting the invocant [May. 6th, 2006|10:43 pm]
[Tags|]

Before it was necessarily refactored, I had the honor of making Artur "Whoa, will that work? Oh, of course it will. Hah." at the following config parsing code:
($plugin || $vhost || $server)->set_config_option($key, $value);
I'm sad it had to go away. :-)
Link18 comments|Leave a comment

DJabberd rules. [Apr. 11th, 2006|11:33 pm]
[Tags|, , ]

DJabberd now does roster/presence stuff kick-ass, except for a few details.

It's now so kick-ass that it gets its own buzzword-compliant website:

http://danga.com/djabberd/

bling.

Okay, more later.
Link7 comments|Leave a comment

DJabberd Status Update [Apr. 9th, 2006|11:54 pm]
[Tags|, , ]

Tons of work on roster and presence in DJabberd tonight. This is where it's getting really fun. Roster stuff was easy/tedious, but required. Also lots of boring refactoring.

But the end result is that I can subscribe/ack presence requests now, have my roster update (SQLite plugin), and see my account on jabber.org change status now from my other Jabber client connected to DJabberd, subscribed to the jabber.org presence. Fun fun.

BTW, this will all run on POE once [info]hachi bridges the Danga::Socket and POE worlds. So DJabberd plugins (which are all async by design) can use POE components.
Link2 comments|Leave a comment

splice() [Mar. 30th, 2006|10:38 am]
[Tags|, , , ]
[Current Mood | excited]

Is anybody else excited about the in-development splice() system call?

I've wanted this for, like, ever.

I need to add support to Sys::Syscall so Perlbal can use it, avoiding copies to/from userspace to/from sockets.
Link18 comments|Leave a comment

$good->meets($evil) [Mar. 25th, 2006|04:36 pm]
[Tags|, , ]

Dear Graph Theorists and Wanna-Be Graph Theorists (like me),

Please help.

My fun Saturday morning project was writing a test harness so people like you could show me up, and show each other up, by writing an awesome graph analysis algorithm.

What does it have to do?

Separate out the "good nodes" from the "bad nodes".

Restrictions:
  • You implement one function, "calc", which takes a node, and returns a trust values 0.0-1.0 (inclusive), or undef if you don't have a solid answer.

  • Your calc function will run hundreds or thousands of times for each node, and in no particular order. The trust value of each node is constantly changing.

  • You can't keep state between invocations. If you try, it's cheating, and won't work in reality anyway when dozens/hundreds of processes are running independently.

  • All you can get from a provided node object is its current trust level, all nodes connected by inbound edges, and their trust levels. You can keep going deeper and deeper if you want, but:

  • I won't let you go more than two levels deep. You can't recurse the graph forever searching for stuff. That's why I also don't let you walk the graph forward.

The test harness starts out like this:
my $good = TrustGraph::Graph->load(file => "$Bin/data/good-network.dat.gz");

Wherein we load the first 100,000 LJ users' relationships amongst each other.

Then this:
my $evil = TrustGraph::Graph->generate(nodes => 10_000,
                           edges => 30_000);

That makes a 10,000 node graph of "evil" entities, with 30,000 edges amonst themselves.

Then we merge them:
my $all = $good->meets($evil);
But they're not linked yet. They just exist in the same graph, with two major islands.

We mark some as root nodes (in the good graph), to see it all. They have a fixed trust of 1.0:
# mark certain prominent members as trusted root nodes
foreach my $goodid (qw(2 10 14 15 1571)) {
    $good->node($goodid)->mark_as_root;
}
Now, we take a bunch of confused losers who accidentally befriend evil people:
for (1..$confused) {
    $good->random_node->links($evil->random_node);
}
Then we run your calc function on the collective graph:
$all->reset_trust;
$all->mass_compute(
                   iterations => 750_000,
                   func => $calc_func,
                   every => [10_000, $dump_stats],
                   );

As it goes it dumps stats, so you can abort it early if it's looking like it's sucking:

$ ./simulate --module=Brad --confused=2000

PROGRESS: 150000 / 750000 (20.00%)
$VAR1 = {
          'good' => {
                      'coverage' => '0.564841498559078',
                      'avg' => '0.813768195517867',
                    },
          'evil' => {
                      'avg' => '0.345016327849719',
                    }
        };
GOOD:
0.00 - 0.05: 
0.05 - 0.10: 
0.10 - 0.15: 
0.15 - 0.20: 
0.20 - 0.25: 
0.25 - 0.30: 
0.30 - 0.35: #
0.35 - 0.40: #
0.40 - 0.45: #
0.45 - 0.50: ##
0.50 - 0.55: 
0.55 - 0.60: ###
0.60 - 0.65: ###
0.65 - 0.70: 
0.70 - 0.75: #############
0.75 - 0.80: 
0.80 - 0.85: ##########
0.85 - 0.90: 
0.90 - 0.95: ########
0.95 - 1.00: ########################################

BAD:
0.00 - 0.05: ###################
0.05 - 0.10: ##################################
0.10 - 0.15: ###########################
0.15 - 0.20: ########################################
0.20 - 0.25: #################################
0.25 - 0.30: ########################
0.30 - 0.35: ###############################
0.35 - 0.40: #################################
0.40 - 0.45: #######################
0.45 - 0.50: ################
0.50 - 0.55: ############
0.55 - 0.60: ######################
0.60 - 0.65: #################
0.65 - 0.70: 
0.70 - 0.75: #####################
0.75 - 0.80: 
0.80 - 0.85: ###########
0.85 - 0.90: 
0.90 - 0.95: ####
0.95 - 1.00: ##

TODO:
  • Prettier output, use GD for graphs
  • more attack models (notable all evil nodes linking to good nodes) Update: done
  • better harness for running/evaluating lots of algorithms
  • .....
Want to write a calc function?

$ svn co http://code.sixapart.com/svn/TrustGraph/trunk TrustGraph

If you write a good one, tease others in the comments here with the output.
Link23 comments|Leave a comment

svk [Mar. 22nd, 2006|05:11 pm]
[Tags|, ]

I've been using svk as of today. Do I get cool points now?

Overall, svn+svk = very nice.

Also,
-- Released Brackup to CPAN.
-- wrote a little Net::LiveJournal module (LJ client) and put it on CPAN

Really, svn/svk just totally makes me want to hack.
Link11 comments|Leave a comment

Brackup: call for testers [Mar. 21st, 2006|06:20 pm]
[Tags|, , ]

So, Brackup pretty much works now.

Go snag a copy:

$ svn co http://code.sixapart.com/svn/brackup/trunk brackup

Install any prereqs:

$ sudo apt-get install libdbd-sqlite3-perl gnupg

(gnupg is optional)

Run it to build a config template for you:

$ cd brackup; ./brackup --help
$ $EDITOR ~/.brackup.conf

Then do a backup:

$ ./brackup --from=projects --to=amazon

Except the Brackup::Target::Amazon is kinda broken. It depends on Net::Amazon::S3, which isn't entirely robust yet. (notably the return values on errors for all the operations are weird and inconsistent.... how do I know a key was created successfully?) Also, it seems Amazon S3 isn't entirely robust yet: after creating my account, I was only getting good-auth requests through once every 5 requests or so. Seems all their servers don't know I exist yet.

But the filesystem driver works. Try that.

And at this point you can start hacking more Brackup::Target subclasses and sending them to me for inclusion. I won't be doing anything but Filesystem and Amazon.
Link9 comments|Leave a comment

Brackup Status [Mar. 21st, 2006|12:39 am]
[Tags|, , ]

Work on brackup continues a bit. Didn't get to hack on it hardly at all today, but there are some good docs now. Check out the svn repo.

The Filesystem target works now, and the DigestDatabase works. Haven't yet done Amazon, but Leon ([info]cudddly) did the hard work yesterday, so I just have to plop in Net::Amazon::S3 and I'm good to go.

Last part is serializing the metafile (the finished backup file you need to do a restore). Was going to use YAML, but YAML in Perl sucks, then maybe XML because Net::Amazon::S3 uses LibXML, but XML, really? Bleh. So now I'm leaning towards old-skool RFC-822 format. If it's good enough for Debian, right?

I should've finished tonight. Want to get back to DJabberd.
Link11 comments|Leave a comment

Brackup -- encrypted, over-the-net, multi-versioned backup [Mar. 19th, 2006|11:46 pm]
[Tags|, , ]

I've renamed wsbackup to "Brackup". [info]dina suggested "ass that back up" or just "assthat", which we later evolved into "Back that NAS up" but it was getting complicated. So Brackup.

It's not done, but it's damn close. Here's svn (props to Artur for setting it up):

http://code.sixapart.com/svn/brackup/trunk/

Here's my ~/.brackup.conf:
sammy:trunk $ cat ~/.brackup.conf
[TARGET:raidbackups]
type = Filesystem
path = /raid/backup/brackup

[SOURCE:proj]
path = /raid/bradfitz/proj/
chunk_size = 5m
gpg_recipient = 5E1B3EC5

[SOURCE:bradhome]
chunk_size = 64MB
path = /raid/bradfitz/
ignore = ^\.thumbnails/
ignore = ^\.kde/share/thumbnails/
ignore = ^\.ee/minis/
ignore = ^build/
ignore = ^(gqview|nautilus)/thumbnails/
You define backup sources and targets, then do:

$ ./brackup --from=proj --to=raidbackups

The "type" parameter on a [TARGET:...] is the subclass of Brackup::Target to use for storage.

Classes:

./lib/Brackup/DigestCache.pm
./lib/Brackup/Backup.pm
./lib/Brackup/Target/Amazon.pm
./lib/Brackup/Target/Filesystem.pm
./lib/Brackup/Target.pm
./lib/Brackup/File.pm
./lib/Brackup/Config.pm
./lib/Brackup/Root.pm
./lib/Brackup/Chunk.pm

The main backup routine is simple, see Brackup::Backup's 'backup' method.

I'll post again when it works. For now you'll probably want to stay away. Everything's subject to change, so please delay writing new Target subclasses.
Link7 comments|Leave a comment

wsbackup -- encrypted, over-the-net, multi-versioned backup [Mar. 19th, 2006|12:50 pm]
[Tags|, , , , ]

There are lots of ways to store files on the net lately:

-- Amazon S3 is the most interesting,
-- Google's rumored GDrive is surely soon coming
-- Apple has .Mac

I want to back up to them. And more than one. So first off, abstract out net-wide storage.... my backup tool (wsbackup) isn't targetting one. They're all just providers.

Also, don't trust sending my data in cleartext, and having it stored in cleartext, so public key encryption is a must. Then I can run automated backups from many hosts, without much fear of keys being compromised.

Don't want people being able to do size-analysis, and huge files are a pain anyway, so big files are cut into chunks.

Files stored on Amazon/Google are of form:

-- meta files: backup_rootname-yyyymmddnn.meta, encrypted (YAML?) file mapping relative paths from backup directory root to the stat() information, original SHA1, and array of chunk keys (SHA1s of encrypted chunks) that comprise the file.

-- [sha1ofencryptedchunk].chunk -- content being <= ,say, 20MB chunk of encrypted data.

Then every night different hosts/laptops recurse directory trees, consult a stat() cache (on,say, inode number, mtime, size, whatever) and do SHA1 calculations on changed files, lookup rest from cache, and build the metafile, upload any new chunks, encrypt the metafile, upload the metafile.

Result:

-- I can restore any host from any point in time, with Amazon/Google storing all my data, and only paying $0.15 cents/GB-month.

Nice.

I'm partway through writing it. Will open source it soon. Ideally tonight.
Link39 comments|Leave a comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]