So after my little adventure or experiment with the Linux shell and rsync, I felt it was necessary to elaborate on my idea and make a decent script with some safety measures built in. As I mentioned I'm extremely paranoid when it comes to data, mostly so after having lost too much data in the past because of stupidity or laziness.
I started out with the idea to have the previous script simply check whether the current computer is the primary or secondary location, needless to say this is important for the sake of data safety.
So I added some lines which check for a certain file to exist, in case it doesn’t it will ask the user what the current location is and writes a hidden file to the user's home directory.
Additional safety routines
The token concept spawned a lot of other safety checks which are easier to show in a few diagrams, but for now i'll settle with a list :-).
Sync state | Controlled by two files on the portable drive, only one should exist. When lock file is found it will abort the sync.
.staffetta-state.ready .staffetta-state.lock
|
Last place of sync | Controlled by two files on the portable drive, only one should exist. After determining the token the last place of sync will be looked up. Sync will abort if token and last place of sync do not match and if at secondary location ask to rebuild sync base.
.staffetta-last.pri .staffetta-last.sec
|
Version check | Comprised of two files which are compared before every sync and updated after finishing a sync. If they do not match up the sync will abort and the base must be rebuild first at the secondary location.
.staffetta-base-date.{datestamp} .staffetta-base-build-date.{datestamp}
|
Empty files | This is run at the primary side before every sync and when building a sync base at the secondary location. They may not exceed the MAX_EMPTY_FILES value set in staffetta.sh. And after each primary sync .staffetta-empty_files.{datestamp}.{amount} is created which will be compared after a sync is made on a secondary location, if they differ it will output a warning.
|
Staffetta
You might have already noticed my script has a name by looking at the mentioned files above. When i thought of all the data i'm constantly moving back and forth with my portable drive, it reminded me of a relay run. In dutch this type of athletics goes by the word 'estafette', but i settled for the Italian version of the word 'staffetta'. I chose the Italian version partly because of the phonetics, but more so because the origins of the athletics have a big part in Italian and Greek history.
In the early days of Italy (when it was not even a country yet and Greeks had many settlements), all life was settled at or near the sea. Roads were hardly available and thus travel inland to other cities was only possible by foot. In case an urgent message needed to be sent to another city often the fastest way was to send special marathon runners. Sometimes the distance was too far and multiple runners had to be used, a 'staffetta'.
The script
Well i have to admit i was quite impressed by how much Linux's shell is able to do. I certainly didn't plan to write the program in shell, but it was just too convenient. No extra dependencies needed, only rsync and the Linux shell, neat.
That said i'm not a hardcore programmer, merely a hacker if you will. Shell is well documented on the net and tons of examples are available, this made shell a clear winner. From previous experiences with the shell and looking at the work of others i came up with the following script:
#!/bin/bash
# staffetta.sh
# Version: 1.0
# Author: Jochum Döring, jochum (dot) doring (at) gmail (dot) com
# License: CC BY-SA
# About: This script will mirror data to one or several locations
# Usage: Setup configuration parameters below and run this script
# URL: https://sites.google.com/site/joochdoesnotcompute/software/syncing-servers-offline
## CONFIGURATION ##
BASEDIR_PRI="mnt/DATA" # The base directory on the primary side where the folder to sync is located. Don't add slashes at the start or end!
BASEDIR_SEC="mnt/DATA" # The base directory on the secondary side where the folder to sync is located. Don't add slashes at the start or end!
SOURCEDIR="Muziek" # The name of the directory you want to synchronize
DRIVENAME="TRANSPORT" # The name of your portable drive, you can check this after mounting your disk with: ls /run/media/$USER
SYNCDIR="SYNC" # The name of the directory that will be used on your portable drive for synchronizing
MAX_EMPTY_FILES="10" # The maximum number of files that are allowed in a sync job, set at 0 to disable the feature
RSYNC_OPTIONS="-r -t -v -q -l --stats --delete --ignore-existing --modify-window=1"
## END OF CONFIGURATION ##
date=$(date +"%y%m%d%M%H")
sync_state_lock='mv /run/media/'$USER'/'$DRIVENAME'/.staffetta-state.ready /run/media/'$USER'/'$DRIVENAME'/.staffetta-state.lock'
sync_state_ready='mv /run/media/'$USER'/'$DRIVENAME'/.staffetta-state.lock /run/media/'$USER'/'$DRIVENAME'/.staffetta-state.ready'
sync_last_pri='mv /run/media/'$USER'/'$DRIVENAME'/.staffetta-last.sec /run/media/'$USER'/'$DRIVENAME'/.staffetta-last.pri'
sync_last_sec='mv /run/media/'$USER'/'$DRIVENAME'/.staffetta-last.pri /run/media/'$USER'/'$DRIVENAME'/.staffetta-last.sec'
sync_base_build_date=$(find /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.* 2>/dev/null | cut -d "." -f 3)
sync_base_date=$(find /run/media/$USER/$DRIVENAME/.staffetta-base_date.* 2>/dev/null | cut -d "." -f 3)
empty_files_pri=$(find /run/media/$USER/$DRIVENAME/.staffetta-empty_files.*.* 2>/dev/null | cut -d "." -f 4)
find_empty_files_pri=$(find /$BASEDIR_PRI/$SOURCEDIR/ -type f -name "*.*" -empty | wc -l)
find_empty_files_sec=$(find /$BASEDIR_SEC/$SOURCEDIR/ -type f -name "*.*" -empty | wc -l)
find_token=$(find /home/$USER/.staffetta-token.* 2>/dev/null -maxdepth 1 | wc -l)
PS3='Select location type: '
options=("Primary" "Secondary" "Quit")
function yes_or_no {
while true; do
read -p "$* [y/n]: " yn
case $yn in
[Yy]*) return 0 ;;
[Nn]*) echo "Aborted" ; exit 1 ;;
esac
done
}
function yes_or_no_1 {
while true; do
read -p "$* [y/n]: " yn
case $yn in
[Yy]*) return 1 ;;
[Nn]*) echo "Aborted" ; exit 1 ;;
esac
done
}
function choose_token {
select opt in "${options[@]}"
do
case $opt in
"Primary")
if [ -f /run/media/$USER/$DRIVENAME/.staffetta-pri-token-generated ]; then
echo "ERROR: Primary token allready generated, can't have two!" ; break
else
touch /home/$USER/.staffetta-token.pri
touch /run/media/$USER/$DRIVENAME/.staffetta-pri-token-generated
echo "Created /home/$USER/.staffetta-token.pri"
fi
./staffetta.sh
break
;;
"Secondary")
echo "Created /home/$USER/.staffetta-token.sec"
touch /home/$USER/.staffetta-token.sec
sleep 2
break
;;
"Quit")
Aborting.
break
;;
*) echo invalid option;;
esac
done
}
function build_sync_base {
if [ -d "/run/media/$USER/$DRIVENAME/$SYNCDIR/" ]; then
if test -f '/home/'$USER'/'.staffetta-token.sec''; then
echo "Cleaning sync directory"
rm -rf /run/media/$USER/$DRIVENAME/$SYNCDIR/*
echo "Done"
else
echo "ERROR: No secondary token found, aborting build." ; exit 1
fi
else
yes_or_no "$msg SYNC directory does not exist! Create it?" &&
mkdir /run/media/$USER/$DRIVENAME/$SYNCDIR
echo "Done"
fi
yes_or_no "$msg SYNC directory found, about to rebuild base. All data in SYNC directory will be destroyed! Continue?" &&
echo "Now building sync base..."
[[ find_empty_files_pri -gt MAX_EMPTY_FILES ]] && echo "ERROR: Too much empty files, aborting" ; exit 1 || false
cp -a --attributes-only /$BASEDIR_SEC/$SOURCEDIR /run/media/$USER/$DRIVENAME/$SYNCDIR/
echo "Updating build date..."
rm /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.*
touch /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.$date
echo "Finished!"
}
if grep -qs '/run/media/'$USER'/'$DRIVENAME'' /proc/mounts; then
if [[ find_token -gt 1 ]]; then
yes_or_no "$msg ERROR: Multiple tokens found! Remove and setup token?" &&
echo "Removing all tokens from /home/$USER/"
rm /home/$USER/.staffetta-token.pri
rm /home/$USER/.staffetta-token.sec
./staffetta.sh
else
[[ $1 = "-r" ]] && build_sync_base ||
if test -f '/home/'$USER'/'.staffetta-token.pri''; then
yes_or_no "$msg Token found, configured as primary location. Ready to update sync?" &&
[ ! -f /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.* ] && echo "ERROR: Sync base build date not found! Please sync secondary location first." && exit 1 || true
[ -f /run/media/$USER/$DRIVENAME/.staffetta-state.lock ] && echo "ERROR: Last rsync job did not finish! Please sync secondary location first." && exit 1 ||
[ -f /run/media/$USER/$DRIVENAME/.staffetta-last.pri ] && yes_or_no_1 "$msg Last change was already made by the primary location, update again?" ||
echo "Setting synchronizing state."
[ -f /run/media/$USER/$DRIVENAME/.staffetta-state.ready ] && $sync_state_lock || touch /run/media/$USER/$DRIVENAME/.staffetta-state.lock
[[ find_empty_files_pri -gt MAX_EMPTY_FILES ]] && echo "ERROR: Too much empty files, aborting" && exit 1 ||
echo "Now running rsync, please wait..."
rsync $RSYNC_OPTIONS --log-file=/run/media/$USER/$DRIVENAME/.staffetta-sync-pri_$sync_base_build_date.log /$BASEDIR_PRI/$SOURCEDIR /run/media/$USER/$DRIVENAME/$SYNCDIR
echo "Updating sync information..."
rm /run/media/$USER/$DRIVENAME/.staffetta-base_date.* 2>/dev/null
touch /run/media/$USER/$DRIVENAME/.staffetta-base_date.$sync_base_build_date
touch /run/media/$USER/$DRIVENAME/.staffetta-empty_files.$sync_base_build_date.$find_empty_files_pri
echo "Clearing synchronizing state..."
$sync_state_ready
[ -f /run/media/$USER/$DRIVENAME/.staffetta-last.sec ] && $sync_last_pri || touch /run/media/$USER/$DRIVENAME/.staffetta-last.pri
echo "Finished! Ready to synchronize data to secondary location."
else
if test -f '/home/'$USER'/'.staffetta-token.sec''; then
yes_or_no "$msg Token found, configured as secondary location. Ready to start sync?" &&
[ ! -f /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.* ] && yes_or_no_1 "$msg ERROR: Sync build date not found! Rebuild sync base?" && build_sync_base || true
[ ! "$sync_base_date" == "$sync_base_build_date" ] && yes_or_no_1 "$msg ERROR: Sync dates differ! Rebuild sync base?" && build_sync_base || true
[ -f /run/media/$USER/$DRIVENAME/.staffetta-state.lock ] && yes_or_no_1 "$msg ERROR: Last rsync job did not finish! Rebuild sync base?" && build_sync_base || false
[ -f /run/media/$USER/$DRIVENAME/.staffetta-last.sec ] && echo "Last change was made by a secondary location, nothing to do." && exit 1 || false
echo "Setting synchronizing state."
[ -f /run/media/$USER/$DRIVENAME/.staffetta-state.ready ] && $sync_state_lock || touch /run/media/$USER/$DRIVENAME/.staffetta-state.lock
echo "Now running rsync, please wait..."
rsync $RSYNC_OPTIONS --log-file=/run/media/$USER/$DRIVENAME/.staffetta-sync-sec_$sync_base_build_date.log /run/media/$USER/$DRIVENAME/$SYNCDIR/$SOURCEDIR /$BASEDIR_SEC/
echo "Updating build information..."
rm /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.* 2>/dev/null
touch /run/media/$USER/$DRIVENAME/.staffetta-base_build_date.$date
echo "Clearing synchronizing state..."
[ -f /run/media/$USER/$DRIVENAME/.staffetta-last.pri ] && $sync_last_sec || touch /run/media/$USER/$DRIVENAME/.staffetta-last.sec
$sync_state_ready
[ ! "$find_empty_files_sec" == "$empty_files_pri" ] && echo "WARNING: Amount of empty files differ from primary data!" && echo "Finished with errors." && exit 1 || true
echo "Finished! Ready to synchronize primary data."
else
yes_or_no "$msg No token found, would you like to create one now?" &&
choose_token &&
./staffetta.sh
fi
fi
fi
else
echo "Sync drive not found, exiting."
fi
Some coding annoyances
In case you are cooking your own shell script, one of the hardest things to figure out were these lines:
this fails
[ -f /run/media/$USER/$DRIVENAME/.some_file.* ] && || echo "ERROR: File not found!" && do_something
and this works
[ ! -f /run/media/$USER/$DRIVENAME/.some_file.* ] && echo "ERROR: File not found!" && do_something || true
For some reason shell cannot handle to do something at the end of a line, it must be right after running a test which is invoked with the [
symbol. It took me a lot of time to figure that out and find a solution, putting a !
symbol right after [
symbol. This reverses the boolean logic and now we can run the command at the right place.
Also it can be unclear when to use:
; do_something
or
Short answer: when you use an if statement you use ;
and when you are running an equation or test (like above) you use the double &&
symbol.
And last but not least if you run a function within an if statement and also use something like
at some point (for example with a yes or no question), it will work fine. But if you do this in an equation or test which is already within an if statement you will find that it does not behave like you expect it would. Instead use this:
I can't tell you exactly why this is, it was merely a hunch i had while debugging my script and it worked :-)
Conclusion
While this script still leaves much to be desired it does the job well and has enough safety measures built in to start thinking about fully automating the process with cron daemon. The automation is what we (or I at least) really want, and make daily life easier without having to worry about losing data.
There is one safety measure though i might add to the script in the next stage: mounting the primary side in read only mode before syncing. This would make any chance of data loss on the primary side nearly impossible, completing the paranoia wish list and ensuring a care free sleep :-)