?

Log in

No account? Create an account
Treearrange: a compliment to rsync - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Treearrange: a compliment to rsync [Aug. 27th, 2006|10:59 pm]
Brad Fitzpatrick
[Tags|, ]

My quick hack of the evening is treearrange, which rearranges a directory tree based on a description of a directory tree, which the tool also generates.

What problem does this solve? Here's my typical photo-uploading workflow:

-- bring GBs of unorganized photos to work
-- upload GBs of photos from work to my personal server at 100 Mbps.
-- go home
-- rsync down from my server all photos at 6 Mbps. still pretty fast.
-- rearrange/rename. instead of DCIM/nnnCANON/, I rearrange into, say, "Day5-Paris/".

Now, how do I get my photos online? Two choices:

1) upload them from home.
2) upload them from my server (not my home server)

But the problems with the above are, respectively:

1) slow upstream. Not 100 Mbps. More like 1. GBs would take forever.
2) the files aren't in the right places on the server. I only rearranged then locally.

Rsync won't do. Rsync doesn't deal with files changing directories.

Enter treearrange:

Here's my server, where it all begins:
bradfitz@personal_web:~/honeymoon_pics$ find -type d
.
./DCIM
./DCIM/179CANON
./DCIM/180CANON
./DCIM/181CANON
./DCIM/182CANON
./DCIM/183CANON
./DCIM/184CANON
./DCIM/185CANON
./DCIM/186CANON
./Elph
./Elph/DCIM
./Elph/DCIM/135CANON
./Elph/DCIM/136CANON
./Elph/DCIM/CANONMSC
I rsync them down to my house (pretty fast), and rearrange them:
sammy:Sorted $ find -type d
.
./Barcelona_Airport_Hell
./Barcelona-1
./Barcelona-2
./Boat
./Lisa
./Malta
./Marseille
./Midnight_Buffet
./Naples-Vesuvius-Pompeii
./Palma_de_Mallorca
./Rome
./Stockholm-1
Now, using treearrange, I snapshot where the files are supposed to live:
sammy:Sorted $ ./treearrange --to=arrange.dat

$ head arrange.dat 
945fc334853b4c5edfca34c9908258eacfc86823        Barcelona_Airport_Hell/IMG_8675.JPG
fe5551ad173e425c1c8f40c4f06e72389df7c2ab        Barcelona_Airport_Hell/IMG_8676.JPG
c9f0589a24a8de4a65e8670b8bbb4f570a4452ca        Barcelona_Airport_Hell/IMG_8677.JPG
b244692481c84857d2e7824ec310ca074eee5e6c        Barcelona_Airport_Hell/IMG_8678.JPG
20c6dd346021689b32702c28ec62cde6a2c3a7be        Barcelona_Airport_Hell/IMG_8679.JPG
f1fdd495d10aee11a1cb96019b7b6c0a11e5465f        Barcelona_Airport_Hell/IMG_8680.JPG
f429010fe9a906c8bf513016e03e371ee711f3f6        Barcelona_Airport_Hell/IMG_8681.JPG
7885c7b71cd21a28c11985a591e69e81a12ee316        Barcelona_Airport_Hell/IMG_8683.JPG
Next I upload the arrange.dat and treearrange to my server, and do the opposite:
bradfitz@personal_web:~/honeymoon_pics$  ./treearrange --from=arrange.dat
file 1 / 738...
file 2 / 738...
  DCIM/179CANON/IMG_7977.JPG -> Barcelona-1/IMG_7977.JPG
file 3 / 738...
  DCIM/179CANON/IMG_7978.JPG -> Barcelona-1/IMG_7978.JPG
file 4 / 738...
  DCIM/179CANON/IMG_7979.JPG -> Barcelona-1/IMG_7979.JPG
file 5 / 738...
  DCIM/179CANON/IMG_7980.JPG -> Barcelona-1/IMG_7980.JPG
file 6 / 738...
  DCIM/179CANON/IMG_7981.JPG -> Barcelona-1/IMG_7981.JPG
.....

bradfitz@personal_web:~/honeymoon_pics$ find -type d
.
./Barcelona-1
./Boat
./Marseille
./Lisa
./Rome
./Naples-Vesuvius-Pompeii
./Malta
./Midnight_Buffet
./Palma_de_Mallorca
./Barcelona-2
./Barcelona_Airport_Hell
./Stockholm-1
Tada!

(then I can rsync and get any rotations/adjustments/etc that I did locally which weren't just a directory move...)
LinkReply

Comments:
[User Picture]From: jwz
2006-08-28 08:20 am (UTC)
You know the file names are unique, so you don't need to hash: I do this kind of thing on the fly with a keyboard macro that generates "mv" commands...
(Reply) (Thread)
[User Picture]From: iamo
2006-08-28 08:29 am (UTC)
However, by using a fingerprint it's somewhat more flexible. It'll work even when the original structure of the two trees were not the same.
(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2006-08-28 09:18 am (UTC)
I would say that makes it less flexible, as well as somewhat slower.

I'd rather something that was the same "shape" as rsync (ie runs on both ends at once) that tries to move files around to make "rsync" work, based on a number of heuristics (file name, size, last modified date, first bytes, last bytes...) applied in order.
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-08-28 04:32 pm (UTC)
Yeah, that's what I wanted too, but I realized it was only a few minute problem if I did the minimal work first.

Later I can add an rsync-ish interface, maybe just ssh'ing to the remote host, running Perl, and piping the original script into it, so the other side doesn't even need treearrange.
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-08-28 04:32 pm (UTC)
Not necessarily. I was shooting with two cameras. Both Canons.
(Reply) (Parent) (Thread)