Aron Smetana: 10 common backup mistakes

Good guidelines to keep in mind when designing your backup schema.

Despite their best intentions, IT pros sometimes fall short when it comes to implementing a reliable backup solution. See whether these any of these mistakes sound familiar.

All of us in IT have been taught from Day One that performing regular backups is critical to an organization’s well being. Yet even seasoned pros sometimes make certain mistakes. Here are a few of the common mistakes I’ve encountered.

Note: This article is also available as a PDF download.

1: Not making system state backups often enough

In Windows environments, system state backups have a shelf life. For domain controllers, the shelf life is equal to the maximum tombstone age (60 days by default). After that, the backup becomes null and void. Even for non domain controllers, the age of the backup is an issue.

Each computer on a Windows network has a corresponding computer account in the Active Directory. Like a user account, the computer account has an associated password. The difference is that the password is assigned, and periodically changed, by Windows. If you try to restore a system state backup that is too old, the computer account password that is stored in the backup will no longer match the password that is bound to the computer account in the Active Directory, so the machine won’t be able to participate in the domain. There are workarounds, but it is usually easier to just make frequent system state backups of your servers.

2: Failing to adequately test backups

We all know that we should test our backups once in a while, but testing often seems to be one of those tasks that either falls by the wayside or that isn’t done thoroughly. Remember that making the backup is only the first step -if you can’t restore from them, you’re dead in the water. You need to ensure that those backups will work if and when you need them.

3: Not using an application-aware backup application

For some applications, a file-level backup is insufficient. A classic example of this is Microsoft Exchange, which requires an Exchange-aware backup application. Failure to use such a backup application causes the data that has been backed up to be in an inconsistent (and often unrestorable) state. It is therefore important to know which applications reside on your servers and to make note of any special application-specific backup requirements.

4: Shipping backup tapes offsite too quickly

One of the companies I used to work for used a courier service to ship backup tapes offsite. Each morning at 8:00, the previous night’s backup tapes were transported to an offsite storage facility. One morning, we had a server failure at about 9:30. Unfortunately, we couldn’t perform an immediate restoration because the tape had been shipped offsite. It was almost 4:00 before the tape could be located and returned to us. By that time, the server had been down all day. While you should keep backups off site, consider waiting until the end of the business day to remove the previous night’s tapes from the building.

5: Having a single point of failure

Always remember that your backups are your safety net. If a server fails, your backups are the primary (and sometimes the only) mechanism for returning the server to a functional state. Because backups are so critically important, you should construct your backup architecture in a way that avoids (at least as much as possible) having a single point of failure. If possible, have a backup for your backups. You never want to find yourself in a situation in which you did not get a backup the night before and you are just praying that the server doesn’t fail that day because you have nothing to fall back on.

6: Forgetting to plan for the future

Years ago, I managed the IT department for a large organization. While I was on vacation, some of my staff decided to surprise me by cleaning out a storage room that had become badly cluttered. In doing so, they threw out some obsolete computer equipment.

While this initially seemed harmless, some of the old equipment was in the storage room for a reason. Each quarter, the organization made a special backup that was kept as a permanent archive. Over time, though, backup technology changed. Although the company had decided at one point to switch to a newer tape format, I had kept the organization’s old tape drives and an old computer that had a copy of the backup software installed on it, just in case we should ever have to read any of the data from an archive tape. The lesson to be learned is that although change is inevitable, you should always make sure that you have the necessary hardware and software to read your oldest backup tapes.

7: Not considering the consequences of using backup security mechanisms

For most organizations, IT security is a high priority. But sometimes, security can be a bad thing. I have seen real-world situations in which a backup could not be restored because nobody knew the password that had been used on the backup tape. I also once saw a situation in which an organization used hardware-level encryption and then upgraded to a new tape drive that didn’t support the previously used encryption (which meant that old backups could not be restored).

There is no denying that it is important to secure your backups, but it is equally important to consider the consequences of your security measures. If you find yourself having to restore a backup after a major system failure, the last thing you need is an ill-conceived security mechanism standing in the way of the recovery.

8: Backing up only data

I once had someone tell me that I should be backing up only my data, as opposed to performing full backups that included the server’s operating system and applications. His rationale was that data-only backups complete more quickly and consume fewer tapes. While these are valid points, I completely disagree with the overall philosophy.

If an organization has a server failure and needs to perform a full recovery, it is usually possible to reinstall the operating system and the applications and then restore any data. However, time is of the essence when trying to recover from a crash. It is much faster to restore everything from backup than it is to manually install an operating system and a set of applications. More important, it is often difficult to manually configure a server so that it matches its pervious configuration. Backing up the entire server ensures that its configuration will be exactly as it was before the crash.

9: Relying solely on a disk-to-disk backup solution

Disk-to-disk backup solutions offer many advantages over traditional tape backups. Even so, a disk-to-disk backup solution should not be an organization’s only backup, because the backup server is prone to the same risks as the servers it protects. A hurricane, lightning strike, fire, or flood could wipe out your backup server along with your other servers. For this reason, it is important to dump the contents of your disk based backups to tape on a frequent basis.

10: Using a tape rotation scheme that’s too short

One organization I worked for used a two-week tape rotation. This seemed to work fairly well, but we found out the hard way that two weeks just weren’t enough. The organization had an Exchange server fail because of corruption within the information store. When we tried to restore a backup, we found that we had backed up corrupt data. The corruption had existed for some time and had grown progressively worse. Every one of the backup tapes contained corrupt data, so the server could not be restored. This is a perfect argument for periodically testing your backups, but it also underscores the importance of using a long rotation scheme or at least keeping some of your backup tapes as long-term archives.

(Via 10 Things.)

Aron Smetana

Tuesday, November 17, 2009

10 common backup mistakes