Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

His last trick - compressed fie transfer without intermediate state:

  'tar cz folder/ | ssh server "tar xz"'
Can be pulled off with two flags to scp - and you get to see progress as a benefit!

  scp -Cr folder server:dest/


Tar can transfer more filetypes and attributes than scp can (even using -p option). `scp -p` only transfers mode, mtime and atime; you lose ownership, extended atributes, symlinks, and hardlinks.

You will also get better compression with tar (or rsync), as it is compressing the files directly, and not just the ssh stream (-C is just passed on to ssh).

I did the tests years ago, but a quick google found someone who tried to test the various combinations: http://www.spikelab.org/transfer-largedata-scp-tarssh-tarnc-...


In particular, scp is mindblowingly slow on lots of small files. I independently rediscovered the tar-pipe trick while sitting there watching scp laboriously copy thousands of 100-bytes so slowly I could count the files as they went by. That should not be possible, even at modem speeds. Fine for moving one file, OK for directories of very large files, not suitable for general usage where you might encounter a significant number of smaller files.


Absolutely. Connection latency hits you the hardest, since each file is sent serially, and requires 2 (or 3 with -p) round trips in the protocol, and this is on top of an ssh tunnel with it's own overhead. I can't remember what my tests showed, but I have this inkling feeling that tar over ssh was far faster than rsync for an initial load, since there's no round trips required, but you lose some of possible rsync benefits, like resume-ability and checksums.


If my first tar attempt fails for some reason, but it made a lot of progress, I switch to rsync. Best of both worlds. This hasn't come up often enough for me to script it.


If security isn't a big consideration (read: you control both machines and the network), you go even faster with netcat.

On the receiving machine, in its destination directory:

    nc -l 6789 | tar xvf -
And on the sending machine, from its source directory:

    tar cvf - . | nc receiving-machine 6789
netcat varies a bit from distro to distro, so you may need to adjust these command lines a bit to get it to work.


Add pv (available in many standard repos these days, from http://www.ivarch.com/programs/pv.shtml if not) into the mix and you get a handy progress bar too.


Unless pv has gotten way more magical since the last time I used it, you also need to tell it how many bytes to expect if you want a progress bar.

If it doesn't know how many bytes there will be, it just gives you a "throbber" (which is better than nothing, though).


It depends how you call it.

    cat file | pv | nc ...
and

    gzip < file | pv | nc ...
and so forth will result in a throbber as it can't query the pipe for a length.

If you demoggify the first example to:

    pv file | nc ...
you get a progress bar on the sending end without manually specifying a size.

Even without a proper % progress bar, the display can be useful as you can at least see the total sent so far (so if you know the approximate final size you can judge completeness in your head) and the current rate (so you can see it is progressing as expected (so you get some indication of a problem such as an unexpectedly slow network connection, other than it taking too long)).


rsync -av folder $USER@$SERVER:/destination/path.

scp won't copy certain file types correctly. Rsync really is better here, and not just because of that.


You need a -z to activate compression. But, nonetheless, please always use rsync when copying host to host.


rsync is slow if the data is not already on the destination. tar over ssh is fast, and tar over socketpipe is even faster but not encrypted.

I'm not aware of any attributes that tar doesn't preserve.


How so?

Also, I find that if I'm going to copy the data once, I'm often going to copy it twice, or which to get a more up to date version of it at a later time. Rsync clearly wins in these cases.

Finally, from the compress flag on rsync: Note that this option typically achieves better compression ratios than can be achieved by using a compressing remote shell or a compressing transport because it takes advantage of the implicit information in the matching data blocks that are not explicitly sent over the connection.


Rsync is brilliant and useful but gets very slow when you apply it outside of its sweet-spot.

Remember: Rsync trades CPU and disk i/o (lots of disk i/o) for network bandwidth.

In the pathological case "thousands of tiny files over a fast network" it can easily be orders of magnitude slower than a straight tar.


Seems like -W disables the delta transfer.


Exactly right. In my use cases, it's best to tar over ssh initially. Then, if I ever want to update the copy, rsync.


There's cryptcat if you need encryption. "bar" is also a nice little program if you like to see an ETA (c.f. http://clpbar.sourceforge.net/ )


I don't think your method preserves ownership, file perms, etc.


[deleted]


-p only preserves modification times, access times, and modes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: