How to Backup your Data to Amazon Glacier

All things are only transitory.

Johann Wolfgang Von Goethe

Most users store indispensable information on their computing devices. This can range from personal documents and family photos to financials. Having a backup and recovery policy is a must for preventing a loss of this data.

Personal backup options are flooding the market. The website About.com reviewed a staggering 41 backup services. This immense number of offerings can paralyze users in their tracks. What is consumer to do?

Amazon Glacier provides a unique solution for personal backups. Glacier allows greater control over user data than competitors. It is low cost at just $0.01 per GB each month. It is secure and allows for both encryption at rest and in transit. Finally, Glacier is simple to use. Anyone with an elementary understanding of git can use the steps given in this article to backup their important data.

In this article, I will introduce a backup procedure for setting up Glacier as a backend for a git add-on called git-annex. git-annex allows users to manage files without needing to add their contents to git. That is, git-annex offers users already familiar with git an easy way to put in place their own backup policy.

Backing up

  1. Install git-annex.
  2. Install glacier-cli.
  3. Sign up for an AWS account (if you do not already have one) and obtain your access key id and secret access key.
  4. Assign your AWS credentials to the appropriate environment variables:

    $ export AWS_ACCESS_KEY_ID=<aws-access-key-id>
    $ export AWS_SECRET_ACCESS_KEY=<aws-secret-key>  
    
  5. Create a git repository on a git server. You will likely want this to be a non-public repository. I recommend Bitbucket as a good provider for this purpose. The URI of this repository will be referred to as <git-repo> for the remainder of the article.

  6. Inside the directory you wish to backup, clone your git repository while also adding git-annex and Glacier support:

    $ git init
    $ git remote add origin <git-repo>
    $ git annex init
    $ git annex initremote glacier type=glacier
    

    The command git annex initremote glacier type=glacier creates what git-annex calls a remote.

  7. Now choose files you wish to backup:

    $ git annex add <file>
    $ git annex copy --to glacier <file>
    

    Vaults are AWS's name for the storage medium used by Glacier. When you run the command git annex copy --to glacier <file> git-annex pushes <file> to a Glacier vault derived from a generated UUID. This UUID appears to the left of the Glacier remote when executing git annex info. For example, in the print-out below, the vault name is likely glacier-813460b3-e736-41c0-8946-fe12edb5d0c3:

    $ git annex info
    repository mode: indirect
    trusted repositories: 0
    semitrusted repositories: 5
        00000000-0000-0000-0000-000000000001 -- web
        813460b3-e736-41c0-8946-fe12edb5d0c3 -- glacier
        e9a37655-52f5-42b8-a707-c61dfa4a4158 -- here (corey@host:~/tmp)
    untrusted repositories: 0
    
  8. Sync. This pushes changes to your remote:

    $ git annex sync
    

Restoring

For restoration we assume you have no previously installed instance of git-annex.

  1. Install git-annex.
  2. Install glacier-cli.
  3. Clone your git repository <git-repo> inside a directory you wish to restore to:

    $ git init
    $ git remote add origin <git-repo>
    $ git pull origin master
    
  4. Assign your AWS credentials to the appropriate environment variables:

    $ export AWS_ACCESS_KEY_ID=<aws-access-key-id>
    $ export AWS_SECRET_ACCESS_KEY=<aws-secret-key>
    
  5. Initialize the repository for git annex and enable the Glacier remote:

    $ git annex init
    $ git annex enableremote glacier
    
  6. Update glacier-cli's local cache of the Glacier vault where the backup is stored. Check the list of your Glacier vaults by running:

    $ glacier vault list
    vault1
    vault2
    glacier-ac10d287-747c-4578-801d-50d951ad596b
    glacier-dc112fe5-4d03-4ef9-884a-0f0288db37bb
    

    Compare these vault names to the output of git annex info. If this output does not seem to contain a vault with the Glacier remote's UUID, specify your vault information in .git/config. To do this, open .git/config and specify the value of the vault key under the [remote "glacier"] heading.

    Once the vault name is determined, run:

    $ glacier vault sync <vault-name>
    

    Wait for the sync job to complete (approximately 4 hours). Then rerun:

    $ glacier vault sync <vault-name>
    
  7. Now restore any files you wish; with Glacier it is important to note the retrieval cost:
    http://liangzan.net/aws-glacier-calculator/

    $ git annex get --from glacier <file>
    

    The command will initially return an exit status of 1. This is because a job has been queued to Glacier but awaits completion. After the job has been completed (approximately 4 hours), rerun the command again

    $ git annex get --from glacier <file>
    

Appendix

Installing git-annex

  1. Navigate to http://git-annex.branchable.com/install/. Follow the directions for your operating system.

Installing glacier-cli

  1. Navigate to https://github.com/basak/glacier-cli and follow the instructions provided in the README to install.

Signing up for AWS

  1. Navigate to http://aws.amazon.com/ and click the Sign Up button. Follow the directions to complete account registration.