June 25, 2017 8:38 pm
Published by Manuel Dewald
Ever lost data you stored on a usb drive just because it stopped working and you did not have a backup? How often did you promise yourself to set up a backup system so this will not happen again – just a few days before forgot you wanted to do so? You are not alone – so did I. Until a few months ago, when I decided to store my data on my own NAS, run by a RaspberryPi 3 and OwnCloud, to give me the feeling to have control over where my data is physically stored. On a USB drive below my desk. Without a popup reminding me, my Dropbox is running out of space.
As hard drives tend to fail, I decided to put a backup system in place so the data is safe as long as only one of the two hard drives stops working. And this was quite easy, so I want to share the simple bash scripts I use to create incremental backups of my data.
First, here is the backup strategy I implemented:
– Over 5 Years, I want to keep a backup of the data, as it was in the beginning of that year
– Over the last year, I want to keep the first backup of each month
– Over the last month, I want to keep the backup of every Monday
– Over the last week (7 days), the backups of every day are kept
That sounds like a high amount of data to store. But it is not, if you use the rsync argument –link-dest <folder> which makes rsync create hard links in the target folder to the folder we pass as an <folder> argument, instead of creating actual copies of the source. So, only a bit more space than the actual copy in the beginning is needed for every new backup. That is the data that actually changed – hence the data we want to back up, plus some overhead for folders and the hard links.
Here is the command we can use to create such incremental backups with rsync:
1
|
rsync -a --delete --link-dest ${LASTDAYPATH} ${DATADIR} ${TODAYPATH}
|
This command creates a backup of ${DATADIR} to ${TODAYPATH} creating links of unchanged data to ${LASTDAYPATH}.
The Scripts
Such a command should now be executed every night using a cron job.
1 2 3 4 5 6 7 8 9 10 11 12 13
|
#!/bin/bash
TODAY=$(date +%Y-%m-%d) BACKUPDIR=/nas/backup/daily/ SCRIPTDIR=/nas/data/backup_scripts DATADIR=/nas/data/ LASTDAYPATH=${BACKUPDIR}/$(ls ${BACKUPDIR} | tail -n 1) TODAYPATH=${BACKUPDIR}/${TODAY} if [[ ! -e ${TODAYPATH} ]]; then mkdir -p ${TODAYPATH} fi rsync -a --delete --link-dest ${LASTDAYPATH} ${DATADIR} ${TODAYPATH} $@ ${SCRIPTDIR}/deleteOldBackups.sh
|
The
data hard drive is mounted to
/nas/data, the
backup hard drive is mounted to
/nas/backup. Every day the backup scripts creates a backup of the data drive to the backup drive (in the folder
daily – which might be a misleading name as we store all the backups in it).
At the end of the script, we trigger another script deleting all the old backups, which are not needed anymore according to the backup strategy above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
|
#!/bin/bash
BACKUPDIR=/nas/backup/daily/ function listYearlyBackups() { for i in 0 1 2 3 4 5 do ls ${BACKUPDIR} | egrep "$(date +%Y -d "${i} year ago")-[0-9]{2}-[0-9]{2}" | sort -u | head -n 1 done }
function listMonthlyBackups() { for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 do ls ${BACKUPDIR} | egrep "$(date +%Y-%m -d "${i} month ago")-[0-9]{2}" | sort -u | head -n 1 done }
function listWeeklyBackups() { for i in 0 1 2 3 4 do ls ${BACKUPDIR} | grep "$(date +%Y-%m-%d -d "last monday -${i} weeks")" done }
function listDailyBackups() { for i in 0 1 2 3 4 5 6 do ls ${BACKUPDIR} | grep "$(date +%Y-%m-%d -d "-${i} day")" done }
function getAllBackups() { listYearlyBackups listMonthlyBackups listWeeklyBackups listDailyBackups }
function listUniqueBackups() { getAllBackups | sort -u }
function listBackupsToDelete() { ls ${BACKUPDIR} | grep -v -e "$(echo -n $(listUniqueBackups) |sed "s/ /\|/g")" }
cd ${BACKUPDIR} listBackupsToDelete | while read file_to_delete; do rm ${file_to_delete} done
|
The idea of this script is to first list all the backups that should be kept, according to our strategy, and afterwards invert this selection to find out the ones to delete.
And that’s it! Not much magic in creating incremental backups without needing too much space. My NAS is running these scripts every night since 10 months now, currently backing up 607 Gigabytes. The backups currently take 630 Gigabytes. Find the current version of my simple bash scripts in this GitHub repository: https://github.com/NautiluX/backup_scripts