Learn Docker With My Newest Course

Dive into Docker takes you from "What is Docker?" to confidently applying Docker to your own projects. It's packed with best practices and examples. Start Learning Docker →

Automatic Offline File Backups with Bash and Rsync

blog/cards/automatic-offline-file-backups-with-bash-and-rsync.jpg

Not backing up your files could lead to data loss in the future. Here's how to do offline backups with a few lines of Bash.

Quick Jump: Why Not Just Use Cloud Storage? | Backing up to an External USB HD | Creating the Bash Script for Backups | How Fast / Efficient Is It? | Scheduling the Script to Run Automatically

How long have you been using a computer for?

I have files on my current computer dating all the way back to 2000. That’s 20 years of computing history. Everything from notes, to source code, to video course source files, to business ideas, to pictures and so much more.

Imagine waking up one day to a hard drive crash which resulted in total data loss. You would lose your entire life’s work in a blink of an eye.

Not backing up your important files is asking for trouble. Your HD is going to die, it’s only a matter of when. I personally feel like most insurances are total scams but unlike most insurance related accidents, data loss without backups is a sure thing.

Why Not Just Use Cloud Storage?

You could say I have digital trust issues. I have nothing to hide but I just don’t like the idea of some company harvesting my private files for profit.

I’m not crazy enough to believe Drop Box and its competitors are sitting there waiting for me to upload files and then a team just gets notified with “WAKE UP, Nick just updated some source code. Go steal his ideas!”. It’s not like that at all haha.

I just feel like it’s not their business to know my business.

Backing up to an External USB HD

Nowadays you can grab a 1TB USB 3.0 external HD for around $55. Personally, I’ve been using the 1TB WD Passport for a few years and it’s been great.

For comparison, Drop Box would cost $99 per year for the same amount of storage or $150 / year for 2TB where as the the 2TB WD Passport is only $69 once.

But who cares about prices, this isn’t meant to be a post on saving money. Having a portable and reliable external HD for offline backups is what we’re after here.

Also keep in mind, the steps below will work the same with an internal HD too.

Creating the Bash Script for Backups

To follow along, you’ll need to be able to run Bash on your system along with rsync. If you’re on Windows like me, you’ll want to do all of this within WSL.

On Ubuntu / Ubuntu WSL you can install rsync with sudo apt-get install rsync. If you’re on MacOS, you can run brew install rsync.

Next up, create a file called backup anywhere you want and run chmod +x backup to make it executable. The last part is important, because it lets us run the script.

Source code for the Bash script:

Paste the contents of the code below in your backup file.

#!/bin/bash

# This would be the path to your external HD or wherever you're backing up
# your files. If you're on WSL, all of your drives can be found in /mnt but if
# follow my above blog post on setting up WSL, you can have them get mounted
# directly to /e or /f rather than /mnt/e or /mnt/f.
target_path="/f/backup"
target_dirname="$(dirname "${target_path}")"

# Exit early if the target is not mounted. This could happen if you forgot to
# mount your external drive. This will save you from writing potentially 100s
# of gigs to the wrong drive. Technically mountpoint can be used instead of
# piping mount into grep but mountpoint isn't available on macOS by default.
if ! mount | grep -q "on ${target_dirname}"; then
  echo "${target_dirname} is not mounted. Verify this by running: mount"
  exit 1
fi

# Create the target path if it doesn't exist. This command is smart enough to
# not do anything if it already exists, which is important for later because
# we'll be running this script on an automated schedule.
mkdir -p "${target_path}"

# A list of absolute paths to backup. In the case of WSL, ${HOME} is inside of
# the WSL file system. This is where most of your dotfiles would be located.
#
# The /d paths happens to be an internal HD I use to store all of my data.
include_paths=(
  "${HOME}/.aws"
  "${HOME}/.bash_history"
  "${HOME}/.docker"
  "${HOME}/.gitconfig"
  "${HOME}/.gnupg"
  "${HOME}/.npmrc"
  "${HOME}/.password-store"
  "${HOME}/.pypirc"
  "${HOME}/.ssh"
  "${HOME}/notes"
  "${HOME}/src"
  "/d/business"
  "/d/books"
  "/d/courses"
  "/d/media"
  "/d/music"
  "/d/podcasts"
  "/d/research"
  "/d/tweets"
  "/d/twitch"
  "/d/youtube"
)

# A list of folder names and files to exclude. There's no point backing up
# massive folders such as node_modules, plus you'll likely end up getting max
# file path copy errors because npm nests directories so deep it breaks Windows.
exclude_paths=(
  ".asset-cache"
  ".bundle"
  ".jekyll-cache"
  ".pyest_cache"
  ".tweet-cache"
  ".vagrant"
  "_site"
  "__pycache__"
  "node_modules"
  "tmp"
  "[._]*.s[a-v][a-z]"
  "[._]*.sw[a-p]"
  "[._]s[a-rt-v][a-z]"
  "[._]ss[a-gi-z]"
  "[._]sw[a-p]"
  "*~"
  "[._]*.un~"
)

# rsync allows you to exclude certain paths. We're just looping over all of the
# excluded items and build up separate --exclude flags for each one.
for item in "${exclude_paths[@]}"
do
  exclude_flags="${exclude_flags} --exclude=${item}"
done

# rsync allows you to pass in a list of paths to copy. It expects a space
# separated string, so that's what we're building up here.
for item in "${include_paths[@]}"
do
  include_args="${include_args} ${item}"
done

# Finally, we just run rsync with a few flags:
#  -a is archive mode so it keeps your original created and modified properties.
#  -v is verbose mode to get a bit of extra output (useful for debugging).
#  -R is relative mode. It ensures the included paths get created on the target.
#  --dry-run ensures nothing gets written to the target (for testing purposes).
rsync -avR --dry-run ${exclude_flags} ${include_args} ${target_path}

What Do You Need to Change?

In your real script, you’ll want to remove --dry-run because otherwise nothing will get copied. I included it here by default because I don’t want you to just copy / paste this script and run it blindly without testing it first.

You’ll likely want to change the target_path and include_paths variables too.

How Fast / Efficient Is It?

I transferred about 182GB of data across 89,617 files at 88MB / sec and it finished in about 39 minutes. So I would say it’s frikken fast! That was using WSL 1 too. With WSL 2 it’s a lot slower but that’s because mounted drives have poor performance, but in due time Microsoft should fix that.

The second time I ran the script in finished in 11 seconds and did very little work. It only copied over a few IRC log files that changed.

That’s how rsync works. It will only copy over what changed, which makes it incredibly useful for backing up files (much better than scp -r for this use case).

Scheduling the Script to Run Automatically

Once you have your script fully working and tested, you can automate running it on whatever interval works best for you.

Automated Backups on Linux

We have access to the cron utility, so this will be simple. Run crontab -e, pick an editor (nano) if you haven’t done so already and then add this line to the bottom and save it:

0 2 * * * /path/to/your/backup/script

This would run once a day at 2am. You’ll want to replace /path/to/your/backup/script with wherever you saved the backup script. I keep mine in ~/src/scripts/backup.

Automated Backups on Windows / WSL

The steps below expect you to be running WSL to run the script. If you don’t use WSL, you’ll want to adjust steps 8 / 9 to call the script however you’re doing it instead.

  1. Search for “Computer Management” and run it
  2. Click “System Tools -> Task Scheduler” in the sidebar
  3. Click “Create Task” in the action bar on the right
  4. Name it “System-Backup”
  5. Make sure the security option is set to “Run only when user is logged in”
  6. Change “Configure for” (dropdown box) to “Windows 10”
  7. Goto the “Actions” tab and click “New”
  8. Enter in C:\Windows\System32\bash.exe as the Program/script
  9. Add the arguments -c "/path/to/your/backup/script" and hit OK to anything it says
  10. Goto the “Triggers” tab and click “New”
  11. Change the settings to “On idle”
  12. Goto the “Conditions” tab and set it to happen after idling for 10 minutes
  13. Click OK and then click OK again for Create Task

For step 9, you’ll want to replace /path/to/your/backup/script with wherever you saved the backup script. I keep mine in ~/src/scripts/backup.

It’s also worth noting that for whatever reason, WSL commands don’t work correctly if your user is logged out (even if you set the task to work for logged out users).

If your user gets logged out (such as a lock screen after your screensaver comes on) then bash.exe will end up not being found when the task executes. This can be confirmed because if you look at the task’s history, it’ll exit with code 2 which is “the system cannot find the specified file”.

With that said, my lock screen kicks in after 30 minutes, so I set the idle time to 10 minutes. Apparently there’s some weirdness with how Windows checks for idle time. Their documentation says it only checks every 15 minutes.

Thus, if we have a 10 minute idle time and we get hit with the worst case scenario for the idle check (15 minutes) then it will wait at maximum 25 minutes, which is 5 minutes before the lock screen comes on which ensures we will be logged in still.

Using the above strategy should result in a few backups per day, depending on how often you go AFK. Of course, you’ll need to adjust your screensaver to 30 minutes or more to make this strategy work.

I know, it would be way easier to just be like “backup at 2am” but, I haven’t figured out how to make this work. If you’re a Windows Task Scheduler guru and know the answer, please let me know in the comments.

Also, if the above is too complicated, you could always just run the backup script by hand every day. That wouldn’t be the end of the world either, but personally I do have the above set up with the idle timer and it works well enough.

Automated Backups on MacOS

I don’t use a Mac but here’s how you could set up LaunchD, which is the default scheduler on MacOS that is supposed to replace cron.

Create the LaunchD based schedule file:

sudo touch /Library/LaunchDaemons/SystemBackup.plist

Add this content to the above file and save it:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
          http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>SystemBackup</string>
    <key>Program</key>
    <string>/path/to/your/backup/script</string>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>2</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
</dict>
</plist>

This would run once a day at 2am. You’ll want to replace /path/to/your/backup/script with wherever you saved the backup script. I keep mine in ~/src/scripts/backup.

Load the task into LaunchD:

launchctl load -w /Library/LaunchDaemons/SystemBackup.plist

So that’s it. You now have automated offline backups. Congrats!

When was the last time you backed up your files? Let me know below.

Never Miss a Tip, Trick or Tutorial

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per month (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.



Comments