Synchronising a large directory tree with unison

Unison can synchronise your computer with a remote server. Large directory trees can be synchronised, if they are built up gradually. However, if a large directory tree already exists when unison runs the first time, the program might crash due to a memory overflow.

Here is my solution to this problem. Basically, it is a script that starts unison for each directory of the large directory tree individually. This is done only once, then the unison cache has been created and unsison can be used a usual.

The method involves two steps. First, unison is configured to run without entering a password, which is of course convenient anyways. Then, unison is run in a recursive manner for each directory.
The method I describe was developed to synchronise a local Windows XP client with a remote Linux computer with unison 2.9.1.

1. Use PuTTY to avoid entering a password

If you are using unison locally, you don't need a password anyway. If the remote server is contacted by SSH, entering a password is usually required, unless SSH certificates are used. The configuration of unison to use the PuTTY SSH client was described at spurtle.net. The PuTTY package contains the program puttyGen to create an SSH certificate and the program pageant that stores certificates. Pageant is a useful store for certificates that are protected by a password, which is of course recommended.

Step 1.1

Create and save a connection to the remote server with PuTTY. Use a certificate so that the connection is established automatically, without entering a password. You will need to configure the remote server to accept the certificate. On Linux, save the public certificate to the file .ssh/authorized_keys2

Step 1.2

Create a file, launcher.bat , containing one line:
@"C:\Program Files\PuTTY\plink.exe" -load your-putty-session-name unison -server
Adjust the path to plink.exe and replace your-putty-session-name with the name of the session you created in PuTTY. Put launcher.bat into C:\WINDOWS (for example).

Step 1.3

Adjust your unison default profile and replace the entry for sshcmd to
sshcmd = C:\WINDOWS\launcher.bat

Adjust the Path settings in the default profile; choose a small directory that does not lead to memory overflow. Run unison and make sure it runs without entering a password.

2. Run unsison recursively for each directory

Windows XP

Now we are ready to go. Open a Command Prompt window. Set the variables LROOT and RROOT to the local and remote root directory, as in the unison profile, e.g.
SET LROOT=C:\Documents and Settings\MyName\My Documents
SET RROOT=/home/myname
Set SYNCPATH to the subdirectory you want to synchronise, e.g.
SET SYNCPATH=.

Save the remote directory tree to a local file (all commands on one line):
"C:\Program Files\PuTTY\plink.exe" -load your-putty-session-name cd %RROOT% ; find %SYNCPATH% -type d > dirs.txt

Create the directories locally:
FOR /F "delims=" %d IN (dirs.txt) DO IF NOT EXIST "%LROOT%\%d" md "%LROOT%\%d"

Sort dirs.txt reversely:
sort /r dirs.txt /o dirs.txt

Run unison recursively for all sub-directories:
FOR /F "delims=" %d IN (dirs.txt) DO unison default -batch -ui text -path "%d"

After unison has run through the directory tree, it has created a cache and should not run into the memory problem anymore. Edit you profile so that the complete directory tree is synchronised, as originally desired. Unison should not crash anymore now.

Linux

If both machines are running linux, the following commands can be used in a bash shell:

lroot=/home/myname
rroot=/home/myname
syncpath=.
cd lroot
ssh myname@myserver.de "cd rroot; find syncpath -type d" > dirs.txt
cat dirs.txt | while read d; do if [ ! -d "$d" ]; then mkdir "$d"; fi; done
sort -r dirs.txt -o dirs.txt
cat dirs.txt | while read d; do unison default -batch -ui text -path "$d"; done

Created on 10 October 2004 by Konrad Büssow.