The question came up today about relative speeds of scp, tar and rsync (the latter two using ssh as a transport mechanism). While anecdotes and rumors are great for defining security policy (think TSA), I wanted some more concrete numbers so I ran a test.
I set up a script to copy a directory 5 times from my laptop to a server on the same subnet. I routinely pull 3MB/s from that server (over wifi), so bandwidth wasn't an issue. I used /var/lib/dpkg as my source directory. It weighed in a 57MB and contained 6896 files. Because rsync will compare changes between source and destination, I made sure to nuke the directory off the server after every run.
Method: scp rsync+ssh tar+ssh
Average Time: 269.75s 33.6s 24.43s
Bandwidth (mbps): 1.69 13.57 18.66
The results are what I expected, at least as far as scp is concerned. It does not do well with large numbers of small files. It copied each file over completely before it started with the next one. Tar of course put the whole thing together and then shipped it off. Rsync read all the files first, then compared them to the server and then shipped them all in one go. Apparently there were some significant I/O savings to be had that way.
One other important item of note is that scp did not handle symlinks the way tar and rsync did. It dereferenced the symlink and copied the contents of that link rather than copying the link itself. That was a problem because I had picked some self-referential directories before I settled on /var/lib/dpkg.
For your reference, here are the commands I ran to test:
for i in 1 2 3 4 5; do time scp -qrp /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done
for i in 1 2 3 4 5; do time rsync -ae ssh /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done
for i in 1 2 3 4 5; do time tar -cf - /var/lib/dpkg |ssh [server] tar -C /tmp -xf - ; ssh [server] rm -fr /tmp/dpkg; done
Recent comments