Backups are a necessity, whether they are done via the database's native backup technology, a snapshot, or dumping to tape. Here's the difference between some common solutions, and tips on how to get started.
What is a Snapshot?
A snapshot basically captures the state of all the data on the whole hard disk, or in the case of ZFS and others, the specific dataset at that very moment in time. If a database was running on that disk and there was a snapshot taken of it, you can guarantee that restoring the snapshot will allow the database to be restarted in a consistent manner. Using a snapshot will look like that database never went down, as opposed to other methods, like using tar to back up the files. If a tarball is being taken of that database while it's running, or using some other backup solution that copies data at a file level, there will be an inconsistent view of that data and there is the potential that that database may not be able to be restored properly if needed.
Native Backup Technology vs Snapshots
Many databases incorporate some mechanism for doing stream replication, a hot backup, or some kind of incremental backup of the database itself. Using the database's native backup technology is much more efficient than doing snapshots. The native mechanism can run often and quickly, and has incremental views of the data in a very consistent state because the database server is responsible for making those backups.
The ZODB, the database for Zope and Plone, can also make incremental snapshots that basically only export out the changed or new objects that are inside the system versus exporting the whole Data.fs all at once. What's nice is that the standard backup software that does file level or even block level backups can then back up the exported or incremental database files versus trying to back up the actual live database files and risking inconsistent backup.
The benefit here is that the whole disk doesn't have to be backed up; only the data that has been changed will be backed up. Of course we recommend the backup be kept offsite, at a minimum, and optimally in a couple different locations as part of the overall backup strategy.
Restoring from a Hardware Failure
The quickest option to restore from depends on the backup technology that is being used. For example, if a cloud-based system like Amazon's AWS is being used, restoring a snapshot may be really quick, but may not be as granular. On the other hand, a database native backup solution can get hourly or every fifteen minute incrementals, which enables a much more granular rollback to get to a more recent state in the database.
The whole state of the server should be stored in a way that a developer or operations person can basically create a new virtual machine using a configuration management tool like Salt. Just run one command, and the state should be exactly where it was before, except for the data, which would then be restored from either a snapshot or from the native tools of the database server.
Restoring from a Software Failure
In the case of a database corruption, developers and operations staff want to be able to roll back to some point in time prior to the database corruption. When doing incremental backups, you typically have the ability to restore based on a time stamp back to a certain time. You can specify that you want to get the database when it was at this time, or even further back. This is also a great way to prevent ransomware attacks against your data. Snapshots at a very granular interval don't take much space, but may save your bacon if your data gets unintentionally encrypted.
Cloud-Based Backups
If you are running databases on EC2, you will want to export the database backups using the native backup software for the database solution. Once this is done, you can synchronize the database changes over to an S3 bucket especially since they are very inexpensive to store. Doing this also means the data gets replicated across multiple regions, so if the database server needed to be brought up in another region or another availability zone, that would be done pretty quickly by synchronizing the data back from the S3 bucket.
Utilizing the database backup technology in this way simplifies the restore process from S3 because you are not managing a bunch of snapshots: instead, you benefit from a database-managed version of the backups through the database software itself.
Which Backup Solution is Best?
It all depends on your needs, your risks, and how often your data changes. There's no one size fits all backup strategy.
If data doesn't change very often, daily backups may be sufficient.
However, snapshots may be an easier way to go. They are going to be lower overhead to set up and get going because whatever cloud platform is being used has a push button mechanism, or some kind of way of scheduling snapshots. Yet, snapshots will not have the granularity of a full backup, or a streaming replication backup, like when using the native tools of the database server.
Specific Backup Options
When planning your backup strategy, I recommend you take a look at ZFS snapshots and replications because that's a nice one stop shop for doing block level replication and snapshotting. ZFS snapshots are very efficient, take up very little space, and are very quick to transfer over the wire.
If you are using the AWS Cloud, you may want to look at S3 backups and snapshots.
If you are using Postgres, MySQL, or ZODB, I recommend you evaluate their own techniques for dumping out backups or streaming replication.
Tips for Beginners
First of all, stop procrastinating. Don't think you can get away without backups because things *will* fail.
Next, evaluate your needs. Only put in place enough infrastructure and complexity as needed. Don't go overboard. If you put in too much complexity, things will typically break over time or get into a state of disrepair, and then you're going to spend a lot of time maintaining it or not have backups when you really need them.
Go with the most simple solution that fits your risk model. If you're okay with losing a days' worth of data in your application because things don't change very often, then maybe daily snapshots using the provider's snapshot technology is fine. If you need to have down to the millisecond backups, it's going to require a lot more time and planning to actually be able to handle that.
Are you using these techniques to back up your system? Would you like to learn more about how to backup your system? We are more than happy to talk to you!