Log in

No account? Create an account
Tonight - brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Tonight [Apr. 5th, 2008|01:32 am]
Brad Fitzpatrick
[Tags|, , ]

I had two options for this Friday evening:
  • go to Tahoe & ski
  • go to Monterey & dive
I did neither.

Skiing would've required both days, I'd already committed to diving when my friends did their certification check-out dives, and I kinda hurt my knee the other day playing Ultimate.

But scuba: I couldn't find a scuba partner for tomorrow morning, and everybody else will be in classes, sitting on the bottom of the ocean tossing their regulators and finding them back.

So I sat at home, cleaned up music collection more, and did this hack: smart MP3-aware file chunking for Brackup. Now I can retag music at will and iterative backups in the future won't re-backup the music bytes, just the ID3 tags. MP3 files, when smart chunking is enabled, now have 1, 2, or 3 chunks, depending on number/type of ID3 tags.

As for fun stuff: I'll drive to Monterey tomorrow and just pay for one night's hotel, party it up in the hotel bar/pool, then dive Sunday morning with Erin, Whitaker, Tessa, Dan, Julie, and not Ojan.

Update: (oh, I also got iSCSI working from OS X to Linux. what a frickin' eventful evening, apparently.)

[User Picture]From: mart
2008-04-05 01:10 pm (UTC)

Is it acceptable for the $file in a returned PositionedChunk to be some file outside of the backup source path? I ask because I have a bunch of files that need special treatment to copy them safely -- databases, for example -- and with this "smart chunking" it could perhaps do the "hot copy" step into a temporary directory and use that file in place of the original file.

(I'd also like to do some "smart chunking" for SQLite databases and the like to avoid re-uploading the entire database whenever something changes, but that's not really bothering me enough to do it yet since my SQLite databases are small.)

(Reply) (Thread)
[User Picture]From: brad
2008-04-05 05:57 pm (UTC)
Perverse, but that could work I think.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2008-04-05 06:35 pm (UTC)
It's a shame you can't do some kind of smart delta. But that would require having a local copy of the latest backed-up version...
(Reply) (Thread)
[User Picture]From: brad
2008-04-05 08:59 pm (UTC)
One of us is confused. And I think you. :)

I don't need the latest backed-up version. If the file doesn't exist already on the remote data store, I cut it into chunks (smartly) and see which of those chunks already exist on the remote store. In the retagging case, the 5M+ chunk is already there (likely from a different offset, but the remote store knows nothing of offsets or file lengths, just digests... it's content-addressable), and then I upload the changed chunks (the ID3 tag metadata).

So it _is_ a smart delta, in a sense.

Before I could already rearrange all my files and backups would still be instant. Now I can even change parts of files and only the changed parts (regardless of offset in file) need to be backed up.

With mart's work, we can also now mount any backup, so we essentially have a snapshotting filesystem. If I don't touch my files for a week, but I do daily snapshots, those snapshots are free, re-using the same chunks. The storage cost is only proportional to changes.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2008-04-05 09:39 pm (UTC)
*laugh* No, neither of us is confused. With the latest backed-up version, you could simply store the bsdiff of the new version against the old one, effectively getting the smart chunking you talk about.
(Reply) (Parent) (Thread)
From: evan
2008-04-06 03:48 am (UTC)
(Note that re-using chunks is a design tradeoff -- if a chunk gets corrupted, then all of your backups that use that chunk are corrupted.)
(Reply) (Parent) (Thread)
[User Picture]From: brad
2008-04-06 06:44 am (UTC)
Yeah, but that's a pretty minor problem: when asking the remote store if it has a chunk, it can double-check it to see if its bytes still match its digest. Or it can be done in the background, with two copies of each unique chunk stored, and then the corrupt one replaced with the good one. etc, etc.

Mostly I'm optimizing for not having to think (backup daily, automated!) and upstream bandwidth sucking. Everything else can be engineered away.
(Reply) (Parent) (Thread)
[User Picture]From: wumbawoman
2008-04-07 08:40 pm (UTC)
Hello. Just wanted to let you know that I added you. I see that you scuba dive and so do I. Just want to read up on your dives.

If you want me to unfriend you, pleas let me know!

(Reply) (Thread)
[User Picture]From: calebegg
2008-04-10 07:53 am (UTC)
So, I don't think this is the best place to ask this, but I don't know where I should be asking, so here goes:

I started using Brackup to backup to an external on Vista. It works (with some minor hacking) for non-encrypted targets, but if I add an encryption key it writes one chunk then just exits with no error message. Any ideas at all as to what could be happening?

I also get this error (warning?) every time I run brackup, but it doesn't seem to affect anything:

"Use of uninitialized value in concatenation (.) or string at C:/Users/Caleb/Programs/Brackup/lib/Brackup/Target.pm line 17"

(Reply) (Thread)
[User Picture]From: brad
2008-04-10 03:02 pm (UTC)
I'd ask the mailing list. Somebody there (mart?) would know the answer for Windows. I sure don't.

(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-05-22 08:04 am (UTC)
There's no brackup tag on this entry...
(Reply) (Thread)