?

Log in

No account? Create an account
Treearrange: a compliment to rsync - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Treearrange: a compliment to rsync [Aug. 27th, 2006|10:59 pm]
Brad Fitzpatrick
[Tags|, ]

My quick hack of the evening is treearrange, which rearranges a directory tree based on a description of a directory tree, which the tool also generates.

What problem does this solve? Here's my typical photo-uploading workflow:

-- bring GBs of unorganized photos to work
-- upload GBs of photos from work to my personal server at 100 Mbps.
-- go home
-- rsync down from my server all photos at 6 Mbps. still pretty fast.
-- rearrange/rename. instead of DCIM/nnnCANON/, I rearrange into, say, "Day5-Paris/".

Now, how do I get my photos online? Two choices:

1) upload them from home.
2) upload them from my server (not my home server)

But the problems with the above are, respectively:

1) slow upstream. Not 100 Mbps. More like 1. GBs would take forever.
2) the files aren't in the right places on the server. I only rearranged then locally.

Rsync won't do. Rsync doesn't deal with files changing directories.

Enter treearrange:

Here's my server, where it all begins:
bradfitz@personal_web:~/honeymoon_pics$ find -type d
.
./DCIM
./DCIM/179CANON
./DCIM/180CANON
./DCIM/181CANON
./DCIM/182CANON
./DCIM/183CANON
./DCIM/184CANON
./DCIM/185CANON
./DCIM/186CANON
./Elph
./Elph/DCIM
./Elph/DCIM/135CANON
./Elph/DCIM/136CANON
./Elph/DCIM/CANONMSC
I rsync them down to my house (pretty fast), and rearrange them:
sammy:Sorted $ find -type d
.
./Barcelona_Airport_Hell
./Barcelona-1
./Barcelona-2
./Boat
./Lisa
./Malta
./Marseille
./Midnight_Buffet
./Naples-Vesuvius-Pompeii
./Palma_de_Mallorca
./Rome
./Stockholm-1
Now, using treearrange, I snapshot where the files are supposed to live:
sammy:Sorted $ ./treearrange --to=arrange.dat

$ head arrange.dat 
945fc334853b4c5edfca34c9908258eacfc86823        Barcelona_Airport_Hell/IMG_8675.JPG
fe5551ad173e425c1c8f40c4f06e72389df7c2ab        Barcelona_Airport_Hell/IMG_8676.JPG
c9f0589a24a8de4a65e8670b8bbb4f570a4452ca        Barcelona_Airport_Hell/IMG_8677.JPG
b244692481c84857d2e7824ec310ca074eee5e6c        Barcelona_Airport_Hell/IMG_8678.JPG
20c6dd346021689b32702c28ec62cde6a2c3a7be        Barcelona_Airport_Hell/IMG_8679.JPG
f1fdd495d10aee11a1cb96019b7b6c0a11e5465f        Barcelona_Airport_Hell/IMG_8680.JPG
f429010fe9a906c8bf513016e03e371ee711f3f6        Barcelona_Airport_Hell/IMG_8681.JPG
7885c7b71cd21a28c11985a591e69e81a12ee316        Barcelona_Airport_Hell/IMG_8683.JPG
Next I upload the arrange.dat and treearrange to my server, and do the opposite:
bradfitz@personal_web:~/honeymoon_pics$  ./treearrange --from=arrange.dat
file 1 / 738...
file 2 / 738...
  DCIM/179CANON/IMG_7977.JPG -> Barcelona-1/IMG_7977.JPG
file 3 / 738...
  DCIM/179CANON/IMG_7978.JPG -> Barcelona-1/IMG_7978.JPG
file 4 / 738...
  DCIM/179CANON/IMG_7979.JPG -> Barcelona-1/IMG_7979.JPG
file 5 / 738...
  DCIM/179CANON/IMG_7980.JPG -> Barcelona-1/IMG_7980.JPG
file 6 / 738...
  DCIM/179CANON/IMG_7981.JPG -> Barcelona-1/IMG_7981.JPG
.....

bradfitz@personal_web:~/honeymoon_pics$ find -type d
.
./Barcelona-1
./Boat
./Marseille
./Lisa
./Rome
./Naples-Vesuvius-Pompeii
./Malta
./Midnight_Buffet
./Palma_de_Mallorca
./Barcelona-2
./Barcelona_Airport_Hell
./Stockholm-1
Tada!

(then I can rsync and get any rotations/adjustments/etc that I did locally which weren't just a directory move...)
LinkReply

Comments:
[User Picture]From: gaal
2006-08-28 06:55 am (UTC)
Oh, excellent, I've been wanting something like this for music. I wonder though if this can't be made better by knowing more about tags. The problem is syncing files several ways, when sometimes the updates are to metadata. Unfortunately the filenames can sometimes change too, so there's no key for this!
(Reply) (Thread)
From: evan
2006-08-28 07:03 am (UTC)
I have exactly this problem! I guess you could fingerprint the files minus the tags -- the one part that doesn't change is the music data itself.
(Reply) (Parent) (Thread)
[User Picture]From: gaal
2006-08-28 07:24 am (UTC)
But then how can resume work?

I start syncing by pulling a new file from remotehost to my localhost. Then the download is interrupted, and I resume it. What identifies the partial file on localhost?
(Reply) (Parent) (Thread)
[User Picture]From: gaal
2006-08-28 07:26 am (UTC)
Hm, maybe the syncer should pre-tag all files with their own fingerprint and make sure that gets transmitted early?
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-08-28 05:14 pm (UTC)
Sure there is. The digest of the non-tag part of the file is the key. Screw audio fingerprinting.... ignoring the ID3 stuff of the mp3/ogg when digesting is eash enough. Then just fix the ID3 up on the other side, if the modtime is older.
(Reply) (Parent) (Thread)