March 19th, 2006

belize

DJabberd Status: Rosters

Artur and I worked on rosters tonight. Nice, clean hooks and abstractions now....

Data structures:

DJabberd::Roster
DJabberd::RosterItem

Plugins:

DJabberd::RosterStorage -- abstract base class
DJabberd::RosterStorage::SQLite -- functional
DJabberd::RosterStorage::Dummy -- example
DJabberd::RosterStorage::LiveJournal -- LJ integration

All in cvs. Check it out.

Lot more to do yet:
-- finish Roster API (adding/removing items)
-- do roster pushes to all connected clients (easily locally, more fun with a cluster)
-- presence (big one, but roster pushes will get it a lot of the way there)
-- easy way for plugin authors to note certain functions as blocking and run them in child thread that won't stall the event loop (Jonathan Steinert's working on some Gearman modifications to make this transparent and easy, whether it's in a local process or anywhere else on the network...)
-- finish LJ integration
belize

wsbackup -- encrypted, over-the-net, multi-versioned backup

There are lots of ways to store files on the net lately:

-- Amazon S3 is the most interesting,
-- Google's rumored GDrive is surely soon coming
-- Apple has .Mac

I want to back up to them. And more than one. So first off, abstract out net-wide storage.... my backup tool (wsbackup) isn't targetting one. They're all just providers.

Also, don't trust sending my data in cleartext, and having it stored in cleartext, so public key encryption is a must. Then I can run automated backups from many hosts, without much fear of keys being compromised.

Don't want people being able to do size-analysis, and huge files are a pain anyway, so big files are cut into chunks.

Files stored on Amazon/Google are of form:

-- meta files: backup_rootname-yyyymmddnn.meta, encrypted (YAML?) file mapping relative paths from backup directory root to the stat() information, original SHA1, and array of chunk keys (SHA1s of encrypted chunks) that comprise the file.

-- [sha1ofencryptedchunk].chunk -- content being <= ,say, 20MB chunk of encrypted data.

Then every night different hosts/laptops recurse directory trees, consult a stat() cache (on,say, inode number, mtime, size, whatever) and do SHA1 calculations on changed files, lookup rest from cache, and build the metafile, upload any new chunks, encrypt the metafile, upload the metafile.

Result:

-- I can restore any host from any point in time, with Amazon/Google storing all my data, and only paying $0.15 cents/GB-month.

Nice.

I'm partway through writing it. Will open source it soon. Ideally tonight.
belize

Brackup -- encrypted, over-the-net, multi-versioned backup

I've renamed wsbackup to "Brackup". dina suggested "ass that back up" or just "assthat", which we later evolved into "Back that NAS up" but it was getting complicated. So Brackup.

It's not done, but it's damn close. Here's svn (props to Artur for setting it up):

http://code.sixapart.com/svn/brackup/trunk/

Here's my ~/.brackup.conf:
sammy:trunk $ cat ~/.brackup.conf
[TARGET:raidbackups]
type = Filesystem
path = /raid/backup/brackup

[SOURCE:proj]
path = /raid/bradfitz/proj/
chunk_size = 5m
gpg_recipient = 5E1B3EC5

[SOURCE:bradhome]
chunk_size = 64MB
path = /raid/bradfitz/
ignore = ^\.thumbnails/
ignore = ^\.kde/share/thumbnails/
ignore = ^\.ee/minis/
ignore = ^build/
ignore = ^(gqview|nautilus)/thumbnails/
You define backup sources and targets, then do:

$ ./brackup --from=proj --to=raidbackups

The "type" parameter on a [TARGET:...] is the subclass of Brackup::Target to use for storage.

Classes:

./lib/Brackup/DigestCache.pm
./lib/Brackup/Backup.pm
./lib/Brackup/Target/Amazon.pm
./lib/Brackup/Target/Filesystem.pm
./lib/Brackup/Target.pm
./lib/Brackup/File.pm
./lib/Brackup/Config.pm
./lib/Brackup/Root.pm
./lib/Brackup/Chunk.pm

The main backup routine is simple, see Brackup::Backup's 'backup' method.

I'll post again when it works. For now you'll probably want to stay away. Everything's subject to change, so please delay writing new Target subclasses.