I suppose you've seen the famous quote from the author of make, about how it wasn't changed to accept spaces instead of a tab a zillion years ago because at the time it had 12! whole! users! or something like that.
(I think there's another semi-well-known one along those lines, but I can't remember it.)
And then there's the one about Larry Wall and perl regexp: how, when he changed parentheses and braces and such to being live metacharacters back in prehistoric times, it broke a few people's scripts, and they lamented, but just think of how many backslash keys have been saved from early deaths as a result.
IOW, despite being a non-user of Brackup, I feel that history supports the “change it now, while the pain is still relatively slight” approach.
couple of comments here.
why not make the first or last block of the backup be a superblock that contains the digest etc for the rest of the data. that way you can rebuild your digest database if it does get lost. Might be worth doing just to make the digestcache rebuild faster.
consistent encryption like you suggest still isn't doable, because then an attacker with knowledge of your random_seed contents (since you would HAVE to be storing them somewhere, and encrypting them would break the brackup policies of not needing human input for backups) could decrypt your backups.
in short, the goals of public-key encryption are incompatible with consistent encryption. Even GPG's public-key encryption of large chunks of data really isn't. It's public-key around a session key, and then symmetric encryption using that session key.
One problem here:
HMAC_DIGEST(DIGEST(gpg-secretkey), unencrypted-contents) + ".encrypted"
This would require decoding the secret key, so I don't see how that fits in with unattended public-key-encrypted backups.
in spite of the recent advanced hash attacks, why not DIGEST((unencrypted-contents + "some public per-user datachunk"))?
They'd have to be really good to recover DIGEST(unencrypted-contents) from that.
Per-user datachunk could be their Key ID. That's pretty public, and definitely available to Brackup.
2006-10-02 07:08 am (UTC)
2006-10-02 07:18 am (UTC)
Re: superblocks, etc: goes against spirit of most understandable data structures possible. I want people to understand it enough to trust it. Simple is more verifiable, too.
Consistent encryption, yeah, just had to mention it.
"This would require decoding the secret key"
Oh yeah. On my machine I was playing with, I just happened to have my secret key also installed.
I like your last idea, along with Max's idea of public fingerprint.
[note: I'm not a brackup user]
DIGEST((unencrypted-contents) + publicly known value)+ ".encrypted"
any different from:
DIGEST(unencrypted-contents) + ".encrypted"
With this, you are still 'exposing to others/authorities/etc, that you have the file with those "unencrytped-contents" backed up', as you said in top post. Not sure what level of security you are trying to achieve, but the public Key ID acts as mere salting here. This does help with 'quickly find everyone that has a copy of illegal.mp3', but not with 'find out if brad has a copy of illegal.mp3'.
As you said" "You want to mix in something to the key that others don't have." (emphasis mine, obviously ;-)
At least, if I understand the discussion correctly. I'm not familiar with gpg at all.
2006-10-02 07:14 pm (UTC)
Yeah, I'm still not that happy about it which is why I haven't implemented it.
It does drop the bar down to having to test for a certain file, rather than presenting a list of index keys to authorities which they could quickly map to known files.
So now I'm going back to thining about parallel, encrypted ".meta" files on each key, which describe that encrypted chunk's orginal contents. And having the .meta be encrypted itself. So if you lose the digest db, you can re-download all meta files and rebuild it .... but then you need your private key around, so it can't be automated!
Tradeoffs, tradeoffs .....
Just haven't decided which route to go.
I think you were/are aiming for something impossible:
- you want to identify a file (through a digest) with only public information
- you don't want anybody else to be able to do the same thing
It seems storing the database on the server is the only way out of that, although you already thought of that 20/3/2006 and rejected it due to 'race conditions if multiple backups are running' and the pain of implementing it.
About not being automated: true, but if you lost that file on your computer, certainly typing your gpg passphrase is a small price to pay? Or as you said before: 'restores may prompt for user input ("What's your Amazon S3 password?" and "Enter your GPG passphrase."), because they won't be automated or common'.
2006-10-02 10:52 pm (UTC)
Yeah, like I said: tradeoffs.
I'll have to sacrifice some ideals in some situations, but I have to decide which I care about and which situations are most likely and should be optimized for.
Couldn't you just leave in a backwards-compatibility mode for restores? Perhaps just leave the old digest mode as a plugin, or something, that can be activated with a command-line option.
2006-10-02 07:57 am (UTC)
Someone using brackup probably has some version of $clue. I'd say go for it.
Of course, the backwards compatible restore is an option too and might be nice if anybody's running a combined nightly auto-update of their CPAN + an automated brackup restore, which would break under said new scheme. Don't think the odds of that are very high yet, not sure brackup has been adopted very widely.
Stupid brackup question: Are backups always incremental? I don't see anything that ever removes a chunk.
I think checkpointing to a certain date is just a matter of reading through the metafiles from that date to the present and then removing any chunks not listed in any of them, but I'm still getting my head around everything.
If all goes well I should have at least a basic Brackup::Target::SFTP soon.
2006-10-06 01:47 am (UTC)
Yes, always incremental. I haven't done the code to remove chunks that don't exist in the set of backup (snapshots) you care about .... but it's like you describe basically.
Let me know if you want svn commit access, or if you want to do separate cpan releases of your target, that works too.
One more "I'm not on crack" confirmation: When restoring, does brackup-restore expect you to have already retrieved the metafile from the backup yourself?
Also, have you done a big encrypted restore? The volume of passphrase prompts I'm getting is... impractical. Enough so that I'm wondering if that's not what's meant to happen.
(And svn access is good -- separate CPAN releases is just inconvenience for all involved. I'm still a cvs-head, though, so tell me what credentials you need
from me to set me up. For that matter I can just fire you off the module if you want once it's shiny.)
2006-10-06 04:09 am (UTC)
Good questions! When restoring, does brackup-restore expect you to have already retrieved the metafile from the backup yourself?
Currently, but only because I forgot we stored it to the target too. So I imagine a future mode to list the meta files on the server and retrieve them would be good.The volume of passphrase prompts I'm getting is... impractical.
I figured there was a gpg-agent thing like ssh-agent. I don't know gpg, though, so I punted on that. So "no", I haven't. That's a TODO item. (Maybe you want to fix?)
As for svn, email email@example.com w/ the output of "htdigest" for realm "Danga" and the svn username you want. And that's it. Then use "svn" instead of "cvs" and it pretty much just works the same, except it doesn't suck. ;-)
BTW, if you haven't found it, the trunk is:http://code.sixapart.com/svn/brackup/trunk/
I'd love to hack on this with you, so lay on the questions.