Log in

No account? Create an account
rsync, but supporting directory re-org - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

rsync, but supporting directory re-org [Dec. 4th, 2003|10:23 am]
Brad Fitzpatrick
I re-organize my mp3 directories at home a lot. Doing an rsync from the office all the time would be too slow. If I just rename or move some directories, I don't want to redownload everything.

I'm thinking some daemon which walks the directory tree, giving the client back the full path to all files, with their MD5s (which the daemon could cache). Then the client also keeps a mapping of MD5 to files, so it can quickly move things around.

If the server deleted something, the client can move it to some 'local' directory with the same path as before. 'local' would still be indexed, but never deleted from.

Now, id3/ogg tags also change. So each file should really have two hashes: one for the entire file, and one for the music segment. I could just rsync again for that part, but since we know the region that changed, might as well just add that to the protocol too.

Before I write this, is there something out there already I can use?

I'm reluctant to use AFS/Coda/Intermezzo, but maybe?

From: jon
2003-12-04 10:26 am (UTC)


I think you want something along the lines of Marimba, but open source.
(Reply) (Thread)
[User Picture]From: brad
2003-12-04 11:05 am (UTC)

Re: Marimba

That looks overkill I think.
(Reply) (Parent) (Thread)
From: jon
2003-12-04 11:35 am (UTC)
Yes, for you, it would be.

It's nifty because it computes an MD5 hash for each entry in the file system in such a way as to allow you to move files around (i.e. between directories). When a client synchronizes a relocated file, it knows enough to just move the existing file instead of deleting and then refetching the file in its new location.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-12-04 11:30 am (UTC)

AFS requires kerberos, which sucks. Oh, and AFS sucks too.
Coda is broken.
Intermezzo is broken.

Kind of leaves you with few (if any) options... I'd love a proper disconnected-operation filesystem myself.
(Reply) (Thread)
From: evan
2003-12-04 12:45 pm (UTC)
i want the same thing, to keep my laptop and desktop in sync.
(Reply) (Thread)
[User Picture]From: brad
2003-12-04 12:54 pm (UTC)
Well, how about I write it and give you a copy, eh?

I'll only charge a $1/file/year license fee, with a max fee of $250,000/year. Sound good?

Don't make your own. I have patents pending.
(Reply) (Parent) (Thread)
[User Picture]From: diez
2003-12-04 06:28 pm (UTC)
I'll buy it, but lower the price a bit eh? Haha.

$1 a file is a little bit over the edge, but I think that sounds like a bit of a joke about Napster & iTunes.

Oh, yeah, since you like Christmas Lights alot, check out the 11 foot fiberglass/Lexon candy canes I made, haha(Lots of Lights)

(Reply) (Parent) (Thread)
[User Picture]From: mart
2003-12-04 07:46 pm (UTC)

Since I got my laptop I've been fumbling with the idea of having some kind of checkout/checkin system for files on my “fixed” systems (desktop machine and server) to my laptop.

My laptop only has a small hard disk (6GB) so I don't want the data mirrored, but if I'm going to be away from my main systems for a while I generally want to have some portion of my filesystem on the laptop. What I want, then, is to “check out” files and directories to my laptop, and set them read-only on the fixed system until they are checked back in or explicitly unlocked (in the case where the checked out copy is now junk, or missing). I also want the alternative of checking out files read-only, in which case they remain read/write on the main system and are read-only on the laptop.

This way there's only one “most recent copy” of each file, and I avoid a situtation (which has arisen before) where I edited some source code on my laptop while I was out, then later found it on my main system and wondered where my changes went (forgetting I'd done it on my laptop) and re-implemented them a little differently and went on working, only to later discover the “parallel universe” version on my laptop, notice that version was better and have to hack the newer one to include the stuff from the “fork”.

The other thing I'd like to do if I can make it happen is to have the filesystem on the laptop look the same as when it's on the LAN and has the filesystems off the other machines mounted directly; I want the directories/files to appear but not be readable/writable unless they were checked out, just so that I remain concious of the layout of the filesystem and don't have to change my routine when I'm away (using the temporary mirror directory, with a different path, instead of the mounted network directory) and so that MRU lists in apps work, etc.

I'm not really sure how best to go about this, though, especially since I now have two systems to worry about. When I originally came up with this, I didn't have my file/dev server so I was just mirroring bits of the filesystem on my desktop system in a directory on my laptop and trying to remember to resync it later. I don't know much about this stuff, but I suspect the only way to do this completely transparently would be to write a filesystem-type driver which sits between userspace and the real underlying driver and communicates with userspace apps to do the checkin/checkout and keep the cached directory/file layout in sync. Not really something I'm currently qualified to write, sadly. :(

(Reply) (Parent) (Thread)
[User Picture]From: mart
2003-12-04 07:47 pm (UTC)

Heh… that came out much longer than I originally intended.

(Reply) (Parent) (Thread)
From: nymec
2003-12-04 03:58 pm (UTC)
This might be a bit ugly but why not keep all the mp3 files in one directory with symlinks to the real file in your organized structure. Then rsync just tosses around symlinks and the occasional new mp3 file.
(Reply) (Thread)
From: billemon
2003-12-04 04:38 pm (UTC)
Sounds like that should be a big RFE for rsync. Since it computes checksums anyway, or at least in some cases ...

Wouldn't work if you moved the file *and* changed the tag, though.

Have you considered leaving the files in one place and using either hard or symbolic links? That won't confuse rsync so much ...

(Reply) (Thread)
[User Picture]From: toast
2003-12-04 09:30 pm (UTC)
If you want a simple solution for this one task, and assuming the to-be-synced systems are identical at the start, couldn't you just log the changes (the mv's and such) as you make them and then run those changes again on the system you want to sync up?
(Reply) (Thread)
[User Picture]From: toast0
2003-12-05 12:05 am (UTC)
hmmm.... perhaps you could journal the changes? (maybe it wouldn't be terribly difficult to hack one of the journaled filesystems so it doesn't erase/overwrite the journal after the changes have been definately written to disk.
(Reply) (Parent) (Thread)
[User Picture]From: muerte
2003-12-04 09:53 pm (UTC)
I thought rsync was smart enough to say:

I see we both have file foo.txt that's 500 megs, and they're identical except for the last meg so I'll just redownload that last part.

Doesn't it do subhashing of large files that are changed minorly?
(Reply) (Thread)
From: demo
2003-12-05 07:59 am (UTC)


Might check this out, someone pointed me to it a few weeks ago as a random cool util. its a synchronizer based on the 'rsync algorithm'. based on the this following entry in the changelog, it looks like it does what you want.


# File movement hack: Unison now tries to use local copy instead of transfer for moved or copied files. It is controled by a boolean option ``xferbycopying''.

(Reply) (Thread)