When you want to copy files from one machine to another, you might
think about using scp to copy them. You might think about using
rsync. If, however, you’re
trying to copy a large amount of data between two machines, here’s a
better, quicker, way to do it is using netcat.
On the receiving machine, run:
# cd /dest/dir && nc -l -p 12345 | tar -xf -
On the sending machine you can now run:
# cd /src/dr && tar -xf - | nc -q 0 remote-server 12345
You should find that everything works nicely, and a lot quicker. If
bandwidth is more constrained than CPU, then you can add “z” or “j” to
the tar options (“tar -xzf -” etc) to compress the data before it sends
it over the network. If you’re on gigabit, I wouldn’t bother with the
compression. If it dies, you’ll have to start from the beginning, but
then you might find you can get away with using rsync if you’ve copied
enough. It’s also
worth pointing out that the recieving netcat will die as soon as the
connection closes, so you’ll need to restart it if you want to copy the
data again using this method.
It’s worth pointing out that this does not have the security that scp or
rsync-over-ssh has, so make sure you trust the end points and everything
in between if you don’t want anyone else to see the data.
Why not use scp? because it’s incredibly slow in comparison. God knows
what scp is doing, but it doesn’t copy data at wire speed. It aint the
encyption and decryption, because that’d just use CPU and when I’ve done
it it’s hasn’t been CPU bound. I can only assume that the scp process
has a lot of handshaking and ssh protocol overhead.
Why not rsync? Rsync doesn’t really buy you that much on the first copy.
It’s only the subsequent runs where rsync really shines. However, rsync
requires the source to send a complete file list to the destination
before it starts copying any data. If you’ve got a filesystem with a
large number of files, that’s an awfully large overhead, especially as
the destination host has to hold it in memory.