Rsync is a file transfer program for UNIX- or POSIX-based operating systems. It uses the 'rsync algorithm' which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. Some features of rsync include:

Rsync was formerly used (prior to the advent of complete CD- or DVD-sized ISO images being made available) to download the Debian GNU/Linux pseudo-image kit to get the ISO image. It worked by downloading the packages from the Debian FTP servers and then using rsync to convert the packages into a burnable ISO image. Rsync tends to work better under UNIX/Linux than under MS-DOS, but oh well.

rsync was written by the now legendary Andrew Tridgell, best known as the creator of Samba. Andrew wrote rsync for his PhD at the Australian National University.

In his PhD thesis, titled Efficient Algorithms for Sorting and Synchronisation, Andrew discusses some very advanced methods of sorting very large amounts of data. His method: run a number of fast (but inaccurate) sorts over the same data, and then run an (also fast) routine to clean up their mistakes. This approach yields better results than running a single sort with error handling built into it.

The second part of his thesis discusses rsync. rsync was developed as a way to remotely sync source code trees over a low bandwidth, high latency link. In designing rsync, Dr. Tridgell wanted an algorithm that would sync only changed files, and only the changed parts of each file.

When doing the initial sort, the rsync client and server both calculate a checksum (Andrew calls it a signature) for each file. On the second pass, rsync dives into each file whose signatures didn't match during the first pass and generates a series of signatures for each block of data in the file (block length depends on the type of file.) When it finds blocks that don't match, the rsync client downloads the changes off the server, recalculates the master signature and checks again. All going well, the file can be considered synced and the client moves on to the next file.

rsync itself is very well documented (it was, after all, written for someone's PhD. The simplest use of rsync is:

rsync -u rsync://rsync.example.org/Data /mnt/Data

which syncs the local /mnt/data directory with the Data share on the remote rsync server.

Andrew Tridgell's home page is http://samba.org/~tridge, which has links to his PhD thesis, available as a PDF. rsync's home page is http://rsync.samba.org.

Log in or register to write something here or to contact authors.