Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick

Project Idea: Distributed, Encrypted Backup

I've been thinking about writing a backup system lately.

Think MogileFS + git + GnuPG. Definitely GnuPG, likely parts of git, and perhaps only MogileFS in concept. We'll see.


-- client/server. client would be pretty dumb (basically: "give me stuff to store!") and would most definitely have to run on windows (as well as unix). (either Perl w/ ActiveState on Windows, or C# w/ Mono on Linux) it'd also be able to report what it has, verify the integrity of what it has (server gives the SHA1 the file must keep over time), and manage the disk quota.

-- client machines (friend/family's computers) would throw the backup client into their "Scheduled Tasks" on Windows, or cron on Unix, and it'd connect out to the backup server, getting incremental updates, throttling its bandwidth. it'd also be able to delete backup files that are older than the backup policy says to keep. (if the client has too many old revisions of a file)

-- server would be perl for sure. server would have to keep track of new/updated files/trees (the "git" part), and encrypt them, and keep track of what copies all the clients have (the "MogileFS" part).

-- clients should be assumed to be controlled by hostile parties, or at least incredibly prone to being owned. (again: think windows boxes admined by your family). as such:

* files are encrypted
* shouldn't even get to see filenames (files are stored named by their hashes)
* clients know nothing about encryption. they get access to no keys.
* deniability: it shouldn't be possible to say a backup contains a certain file. the filenames aren't the hash of the cleartext content, but of the encrypted file itself. also, contents aren't encrypted w/ a public (at least known) key.

-- server might have to split up huge files into manageable chunks. for instance I have vmware images I'd want backed up. because they're block devices, each iterative backup wouldn't spray 5GB around the network, but the dirty chunks.

-- config file should be able to let me define multiple repositories w/ different properties: notably retention policies, but also what directories are included/excluded in that repository.

-- in case of restore, client connects (at regularly scheduled time, or when you call up your family to run it by hand) and server asks it to send certain files.

I'd imagine friends partnering with each other to automatically store their opaque backup blobs. If I'd have written this months ago, evan wouldn't have lost so much stuff.

First off, does this exist?
Tags: perl, tech

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.