Thursday, September 19, 2024

Why not differential backups?

I get this question frequently. It’s usually triggered either because the tape device can’t hold an entire backup set or because the time required for backup interferes with productive work. Most of the time this can be easily remedied by a larger or faster storage device, but someone is bound to bring up the idea of differential backups.

The idea is that you create a full backup that has everything, and from then on, you only backup the files that have changed. Presumably that’s a smaller set of files and therefore this solves the space or time problems. Usually the full backup is refreshed on some schedule and the process starts again. There are variants on the theme; for example the differential may include all files that have changed since the last full backup rather than just those that have changed since the last differential. That sort of scheme eventually ends up with the differential containing any and all files that ever change, no matter how infrequently; the full backup is the source of everything else.

Often the term “Incremental” is used to describe what I call true differential. I’ll use that term for the rest of this article. Remember that a Differential will always have everything that has changed since the last complete backup; an Incremental will only have files that have changed since the previous Incremental backup. Right after the full backup, an Incremental and a Differential would be exactly the same; after that they will probably contain different files. An Incremental CAN be smaller than a Differential but could never be larger.

Differential or Incremental backups always seems like a great idea to people who haven’t experienced the negative aspects. Admittedly, there can be circumstances where you have no other choice, but consider these points:

  • Nowadays, this may be a futile effort. The unchanging Operating System files aren’t what is exceeding your space or time capacity- it’s surely your data in most situations. So any style of differential backup is still likely to be more data than you want- the OS files are often a puny and insignificant part of your data set.
  • Differential backups complicate off-site storage. The whole point of moving backups off site is to provide safety in the event of a fire or other complete physical loss. If you have complete backups, most small companies rotate the media in and out daily- Wednesday nights backup goes off site Thursday night, and Tuesday’s is brought back in Friday morning. This is simple.

    With differentials, it’s more difficult. You need to keep a master off site and if you are doing Incrementals (not Differentials), you need to keep ALL of those off site. That makes it inconvenient if you need to have occasional access to the tapes on site, and that may also mean that you need to make TWO full backups each time you reach that point in your cycle- that makes it very time consuming and can use a lot of media.

  • Incrementals (which are often the only method that will solve the time or space constraints) introduce another problem if it becomes necessary to restore. You start with the most recent full backup, and then restore each Incremental in order. More than once I’ve seen people run out of disk space doing this because of temporary files. Each Incremental will include temporary or transient files that may have been removed before the next Incremental, but those files will be restored faithfully just the same. You have to be very careful about excluding temporary files with this scheme.

    More sophisticated backup programs can avoid this by deleting files that are not present on the next tape – however that depends on the integrity of the set and simple backups like tar or cpio cannot do this at all.

    Worse news: damaged or lost media in the middle of a Incremental set like this can mean disaster. If a file happens to only exist on one piece of media because it is modified infrequently, the modifications may be lost forever.

  • Differentials give more redundancy than Incrementals to the changing data, but often have no or limited redundancy for the full backup. As system files very often are modified very infrequently, loss of a full backup (media damage or physical loss) can be quite serious.

Wherever possible, doing a complete, full backup every day is easiest and gives the most data redundancy. If you absolutely cannot do that, then the modified Incremental (everything modified since the last full backup) is better than true Incrementals. However, don’t neglect having multiple full backups in either case.

By the way, my aversion to differential or incremental backups is based on many years of painful field experience. Although it is rare nowadays, not too many years ago I would be involved with drive failures about once a month: I have seen these problems for myself. I STRONGLY RECOMMEND FULL BACKUPS IF AT ALL POSSIBLE. Backup media gets larger and faster and cheaper ever year, so most people CAN do complete backups, and should.

What about Network Backups to another hard drive?

While attractive in principle, the time element isn’t all that good and you also lose several important capabilities:

  • The ability to take media off site.
  • The ability to restore completely to a fresh drive from the media without reinstalling the OS (see Supertars).
  • “deep” backup stretching as far back in time as you need. You can simulate that with a large enough drive at the receiving end, but then all your backups are in one mechanical device: if that device fails, you lose all backup.

Removable media is still the intelligent choice for backup and will remain so until solid state, non-volatile disk drives are common, and I’m not even sure if it’s a bad idea then.

Please read this disclaimer
Copyright and reprint info

A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

City cape coral.