brad's life - Brackup; gpg salting [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Brackup; gpg salting [Mar. 20th, 2006|08:00 pm]
Previous Entry Add to Memories Share Next Entry
[Tags|, ]

GPG "salts" (or whatever it's called) encrypted files, so if you encrypt a file multiple times using the same public key, the resultant encrypted file is always different.

For my purposes, this is a pain. Is there a way to disable it? I probably shouldn't, but I'm just curious. I realize if I disable it, it's possible for others to prove the existence of a file on my machine, if they have my public key (easy), and the original file they want to check for.

So yes, I really want to keep it enabled, but I didn't plan for it, and now I have to go back and re-design a bit.

I wanted to keep my digest caches as simply caches, with no harm but extra network latency and CPU if the digest cache is lost, but now I need to factor in the face that losing the local machine's digest cache will cause extra allocation of chunks on the server (because new encrypted file differs), which you pay for (for disk usage).

Which implies I need to store the digest cache on the server as well. Encrypted.

But that doesn't scale, since it'll only get bigger with time, and the uploads will suck more and more each day, not proportional to changed files.

Which implies storing iterative/delta digest caches, and having like a digest cache master index that lists all the digest caches on the server. But then there are race conditions if multiple backups are running.

Or I could just rely on the storage target to support enumeration of like objects. Like, "give me all digest cache files", but that removes the ability of a storage target to be purely key-value. So for the Filesystem, FTP, Scp, Amazon targets, this is no problem, but it might be a problem for other targets? MogileFS? well, mogile at least lets you enumerate keys by prefix, so I guess it's possible, if you design for it.

Okay, so that's fine. I'll go with that.

But I'm just annoyed that I have to go implement it all now.

Fuck all this: I just rename the "Digest Cache" to the "Digest Database" and make it required, documenting that you can't delete it. Just store it on the same filesystem as the files you're backing up. If you lose it, well... you probably lost all your other files anyway and need to restore. Okay, the show (er, hacking) goes on.... s/cache/database/.
LinkReply

Comments:
[User Picture]From: ckd
2006-03-21 04:55 am (UTC)

(Link)

The way GPG and similar tools do public-key crypto is that they generate a session key, encrypt the message with a regular symmetric cypher (3DES, IDEA, etc), and then do the expensive public-key operation on the session key only. (This is also how you do multi-recipient messages; one session key, n encrypted copies of the key, one per recipient key.)

The problem, therefore, is that if you reuse the session key you are more likely to have it cracked/exposed/etc.
[User Picture]From: brad
2006-03-21 05:13 am (UTC)

(Link)

Makes sense. I just wasn't thinking when I started.
From: photwenny
2006-03-21 05:54 am (UTC)

Dude, you rock...

(Link)

Thinking is overrated.

(Not to pointlessly boost your ego or anything, but if it happens its totally deserved.)

You are such an inspiration to me. I dig reading about all your projects and even more so dig that you actually implement them (I have kind of a serious mental stumbling block with moving from the "design phase" to actually starting something. I end up just designing and designing and designing and anticipating bugs and designing around them, but never actually implementing anything, but I digress...).

Now that I'm "old" its fun to watch progress happening. One of these days i'll put my fingers to typing something more useful than blog comments, and when that happens I'll have you among others to thank.

So thanks, man!
[User Picture]From: ciphergoth
2006-03-21 07:59 am (UTC)

(Link)

No, that's not the problem. A cipher like AES is designed to be secure even if the attacker has all but a few words of the entire codebook; there's no problem with re-using the session key that way. The problem is that the definition of semantic security means that if you encrypt the same thing twice, the attacker should be unable to discover that you have done so.

Once you break that guarantee, the next-most-secure thing you can have is a variable length super-pseudorandom-permutation (SPRP), but they are relatively expensive to calculate on big files compared to a stream cipher like CTR mode AES.

Things are even more hairy when it comes to the public-key encryption. These days every definition of security still used for PK encryption depends on it being probabalistic; I don't think there's a concept equivalent to SPRP for PK encryption. And even if there were, it would have to directly encrypt large files, without using hybrid encryption, because as soon as you use hybrid encryption the randomness is back. Creating such a primitive and proving it secure is probably an open research problem.

I think the "fuck all this" solution is the right one.
[User Picture]From: taral
2006-03-21 08:02 am (UTC)

(Link)

Hash the plaintext before encrypting, then use the hash to determine equivalence.
[User Picture]From: taral
2006-03-21 08:05 am (UTC)

(Link)

Okay, I just worked out that this is what your "digest database" does. You basically have an encrypted content-addressable store.
[User Picture]From: brad
2006-03-21 08:06 am (UTC)

(Link)

There are docs in svn as of about 15 minutes ago.