Backup to Object Storage with Duplicity

Introduction

This is a basic overview showing how to create server backups in OpenStack Swift object storage using Duplicity.

Assumptions

  • You have familiarity with the Linux commandline and Openstack CLI tools.
  • You have installed the OpenStack command line tools and sourced an openrc file, as explained in Command line interface (CLI).

What is Duplicity

Duplicity is a band-width efficient backup utility capable of providing encrypted, digitally signed, versioned, remote backup in a space efficient manner.

Duplicity creates an initial archive that is a full backup. All subsequent backups are incremental and only save the difference between the latest (full or incremental) backup. A full backup and corresponding series of incremental backups can be recovered to any point in time covered by the incremental backups. If an incremental backup is missing from the backup chain then any subsequent incremental backup file cannot be recovered.

Duplicity is released under the terms of the GNU General Public License (GPL), and as such is free software.

Prerequisites

If you’re using a major Linux distribution you should be able to find a pre-compiled package in the repositories. If not then a tar file is available at Duplicity.

sudo apt-get update
sudo apt-get install duplicity

Because we are going to authenticate against keystone it is also necessary to install python-keystoneclient.

sudo apt-get install python-keystoneclient

or

pip install python-keystoneclient

If you intend to create encrypted backups you will also require a GPG key. The gpg --gen-key commandline tool can create a local one for you, see (GnuPG) for more information on this.

Duplicity requires certain environment variables to be set. One option would be to source a simple bash script like this. These data for these variables can be obtained from your OpenStack RC file.

#!/bin/bash

# Swift credentials for Duplicity
export SWIFT_USERNAME="somebody@example.com"
export SWIFT_TENANTNAME="mycloudtenant"
export SWIFT_AUTHURL="https://api.cloud.catalyst.net.nz:5000/v2.0"
export SWIFT_AUTHVERSION="2"

# With Keystone you pass the keystone password.
echo "Please enter your OpenStack Password: "
read -sr PASSWORD_INPUT
export SWIFT_PASSWORD=$PASSWORD_INPUT

In order to source this file run the following from the commandline

source <filename.sh>

This will need be done before each Duplicity run if the variables are not already set.

A simple backup example

Firstly, lets check our connectivity to the object store. If we run the following for an existing empty container, in this case ‘first-container’, we should see something like this

$ duplicity collection-status swift://first-container
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /home/ubuntu/.cache/duplicity/cd3fc2f113a80b76b6827a6f7b16aee5

Found 0 secondary backup chains.
No backup chains with active signatures found
No orphaned or incomplete backup sets found.

Now we can run our first backup. For this example we will use a single local file called foo.sh.

Note

if you do not have a valid gpg key you will need to append --no-encryption to the end of your duplicity commands.


$ duplicity foo.sh swift://first-container
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
GnuPG passphrase for decryption:
Retype passphrase for decryption to confirm:
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1484012914.11 (Tue Jan 10 01:48:34 2017)
EndTime 1484012914.11 (Tue Jan 10 01:48:34 2017)
ElapsedTime 0.01 (0.01 seconds)
SourceFiles 1
SourceFileSize 44 (44 bytes)
NewFiles 1
NewFileSize 44 (44 bytes)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 1
RawDeltaSize 44 (44 bytes)
TotalDestinationSizeChange 231 (231 bytes)
Errors 0
-------------------------------------------------

We can verify the state of our backups with:

$ duplicity collection-status swift://first-container
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jan 10 01:48:25 2017
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /home/ubuntu/.cache/duplicity/cd3fc2f113a80b76b6827a6f7b16aee5

Found 0 secondary backup chains.

Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Tue Jan 10 01:48:25 2017
Chain end time: Tue Jan 10 01:48:25 2017
Number of contained backup sets: 1
Total number of contained volumes: 1
 Type of backup set:                            Time:      Num volumes:
                Full         Tue Jan 10 01:48:25 2017                 1
-------------------------
No orphaned or incomplete backup sets found.

and check to see if there are local files that have not yet been backed up by running

duplicity verify swift://first-container .
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jan 10 01:48:25 2017
GnuPG passphrase for decryption:
Verify complete: 595 files compared, 0 differences found.

Warning

If you wish to backup the root ‘/’ directory, it is advisable to add --exclude /proc as this may cause duplicity to crash on the weird stuff in there.

Automating backup tasks with cron

In order to make this process more useful we could automate our backup tasks by creating a cronjob to run the Duplicity backups on a regular basis, by means of a shell script.

It is also best practice to create a separate backup user account in your cloud project that is only given rights to access object storage. The main justification for this is that in order to have scripts run commands unattended it is necessary to embed plaintext password information in the scripts.

Creating the backup user

To create a new user account, go to Management -> Project Users in the left hand menu of the dashboard, then click on the +Invite User button.

Fill in the Invite User form as shown making sure the only Role selected is Object Storage.

../_images/invite_object_user.png

Once you recieve the invite, complete the sign-in process as the new user. There should now be a new user with Object Storge as their only available role.

../_images/object_user.png

You can then download a copy of the backup users OpenStck RC file, see Source an OpenStack RC file, which will provide the credential information for the following section.

Creating the backup scripts

Now we can create our backup process. This will consist of:

  • the backup script itself
  • the variables file to control the backup script and provide authentication information
  • the cron job to run the backup task

Here is the basic script to manage the running of the duplicity backups. Typically this would be placed somewhere like /usr/local/bin.

#!/bin/bash

# Source SWIFT access variables required by duplicity
source /etc/duplicity/duplicity.vars
BACKUP_DEFINITIONS_DIR="/etc/duplicity/backup_sources.d"
BACKUP_CONFIG="${1}"

if [ -z "${BACKUP_CONFIG}" ]; then
   BACKUP_CONFIG='*'
fi

# Run backups defined in BACKUP_DEFINITIONS_DIR or only the one specified as $1
# The BACKUP_* variables need NOT to be double-quoted for the shell name expansion to work
for BACKUP_DEFINITION_FILE in ${BACKUP_DEFINITIONS_DIR}/${BACKUP_CONFIG}.conf
do
   # Make sure we don't have any leftover variables set before next loop run
   unset SRC
   unset DEST
   unset PRE_BACKUP_CMD
   unset POST_BACKUP_CMD
   unset DUPLICITY_BACKUP_RETENTION
   unset DUPLICITY_BACKUP_CYCLE
   unset DUPLICITY_VOLSIZE
   unset DUPLICITY_NUM_RETRIES

   # Source variables used on each loop run
   if [ ! -f "${BACKUP_DEFINITION_FILE}" ]; then
      INFO="No backups defined in ${BACKUP_DEFINITIONS_DIR}/ or ${BACKUP_DEFINITION_FILE} is not a file"
      echo $INFO
      continue
   fi
   # Source the main config file again as we overwrite some variables in backup definitions
   source /etc/duplicity/duplicity.vars
   source "${BACKUP_DEFINITION_FILE}"

   # Check if the src and dest backup vars are not empty
   if [ ! -z "${SRC}" ] && [ ! -z "${DEST}" ]; then

      # Run defined tasks before doing the backup
      if [ ! -z "${PRE_BACKUP_CMD}" ]; then
         eval "${PRE_BACKUP_CMD}"
         rc=$?
         if [ ${rc} -gt 0 ]
         then
            # Error handling
            INFO="Pre backup command failed with rc = ${rc}"
            echo $INFO
            continue
         fi
      fi

      # Run backup
      duplicity --verbosity Notice \
                --full-if-older-than ${DUPLICITY_BACKUP_CYCLE} \
                --num-retries ${DUPLICITY_NUM_RETRIES} \
                --asynchronous-upload \
                --no-encryption \
                --volsize ${DUPLICITY_VOLSIZE} \
                "${SRC}" "${DEST}"
      rc=$?
      if [ ${rc} -gt 0 ]
      then
         # Error handling
         INFO="Backup failed with rc = ${rc}"
         echo $INFO
         continue
      fi

      # Duplicity cleanups
      duplicity remove-older-than ${DUPLICITY_BACKUP_RETENTION} --verbosity notice --force "${DEST}"
      rc=$?
      if [ ${rc} -gt 0 ]
      then
         # Error handling
         INFO="Deleting old backups failed with rc = ${rc}"
         echo $INFO
         continue
      fi

      # Duplicity collection status summary
      duplicity collection-status "${DEST}"
      rc=$?
      if [ ${rc} -gt 0 ]
      then
         # Error handling
         INFO="Collection status failed with rc = ${rc}"
         echo $INFO
         continue
      fi

      # Run a command after doing the backup
      if [ ! -z "${POST_BACKUP_CMD}" ]; then
         eval "${POST_BACKUP_CMD}"
         rc=$?
         if [ ${rc} -gt 0 ]
         then
            # Error handling
            INFO="Post backup command failed with rc = ${rc}"
            echo $INFO
            continue
         fi
      fi

   else
      INFO="No backup source or destination defined in ${BACKUP_DEFINITION_FILE}"
      echo $INFO
      continue
   fi

   # If the script managed to reach this point all backup steps succeeded so we can report that to icinga
   INFO="Backup succeeded"
   echo $INFO

done

This script defines the control parameters such as retention and frequency for the backup tasks as well as providing authentication information for object storage. The previous script is expecting to find this in /etc/duplicity/duplicity.vars.

#!/bin/bash

# Variables used by the backup script

# Duplicity specific variables
export DUPLICITY_BACKUP_CYCLE='7D' #7 days
export DUPLICITY_BACKUP_RETENTION='14D' #14 days
export DUPLICITY_VOLSIZE='512' #object chunk size in bytes
export DUPLICITY_NUM_RETRIES='3'

# Catalyst Cloud object storage credential information
export SWIFT_USERNAME='<your-backup-user>@<your-project-name>'
export SWIFT_REGIONNAME='nz_wlg_2'
export SWIFT_TENANTNAME='<your-project-name>'
export SWIFT_PASSWORD='<your-openrc-password>'
export SWIFT_AUTHURL='https://api.cloud.catalyst.net.nz:5000/v2.0'
export SWIFT_AUTHVERSION='2'

Then we need to define the backup definitions. Create a file with a name relevant to the backup task in /etc/duplicity/backup_sources.d and add at least the following 2 entries

SRC="/path/to/files/"
DEST="swift://<container-name>"

Depending on the nature of the thing you wish to backup you may also need to include pre-backup commands such as the one shown below. This is to ensure that the data you wish to capture, in this case the contents of a gitlab repository, have been written to disk prior to the backup task running.

PRE_BACKUP_CMD="CRON=1 /opt/gitlab/bin/gitlab-rake gitlab:backup:create"

Finally we create a new file called duplicity-backup-cron in /etc/cron.d/. This is the cron job that will be responsible for running the backups. See (cron) for more information on this.

#
35 2 * * * root /usr/local/bin/duplicity-backup.sh >> /var/log/backup/duplicity.log 2>&1