Oracle Backup Best Practices
Overview
A well-developed database protection plan and its implementation are an Oracle DBA's greatest responsibility. The topic of backups and recovery is covered here from multiple aspects to better ensure that the solutions you choose, and how they are implemented, will be met with success.
According to Donald K. Burleson, "It can't be stressed enough that backup and recovery operations should be fully proceduralized and these procedures tested frequently, especially after major database or equipment changes. An untested backup and recovery plan is no plan at all."
What is important for a recovery plan?
The essential questions for recovery are:
- What are the recovery time objectives (RTO)?
- What is the tolerance for data loss?
- What are the backup retention policies?
Recovery Time Objective (RTO)
When considering the business requirements of a data protection plan, it is important to assess the impact of system downtime on the mission, the company's reputation and bottom-line revenue. The recovery time objective is the maximum period of time it will take to perform a restore operation of the database. At a high level, recovery time consists of the time it takes to:
- Repair hardware or storage failures.
- Restore backups from disk and/or tape.
- Recover the database by applying incremental backups and archived logs.
- Open the database for user access.
Media recovery is needed if a database file is physically damaged and requires a restore and recovery operation. Instance recovery is needed when the database crashes, and usually just requires a recover command (no restore operation).
It is also imperative that your data protection plan include tested, documented procedures for restoring and recovering the data, in scenarios that can range from minor corruption to full scale site disaster.
Tolerance for Data Loss
The OS platform, disk/tape I/O thresholds, and other hardware options are significant factors in recovery time. The question boils down to: "What is the business/mission cost of downtime and data loss versus the hardware and software cost of reducing such downtime and data loss?"
Here are some considerations in designing a backup and recovery strategy that will help determine the trade-off between required service levels and cost: What is the tolerance for system downtime, e.g. x days, y hours? Consider all scenarios where media loss or corruption, caused by either hardware or software failures, can lead to system downtime. In light of the required service levels, evaluate I/O performance and cost of disk versus tape as the backup media. Consider backing up to disk, fast recovery, and periodically moving backups to tape for archival purposes. Are there different recovery service levels for different databases? Do service levels vary in relation to granularity of data, e.g. table, datafile, tablespace, whole database? In the event that the production database server suffers an outage, is there a requirement for failover within seconds or minutes to a standby database to continue handling the workload?
Backup Retention Policy
The backup retention policy governs how long backups should be kept, or how many versions of backups are to be kept at any time. This policy should be set in accordance with company mandated and/or government regulated record retention periods.
In RMAN, this policy is set using the CONFIGURE command, and applies to all backups taken during the RMAN client session for the specified target database. If a long term backup is needed, outside of the normal retention policy, this can be made using the KEEP option on backup.
There are two types of retention policies, and only one can be set at any time:
- Recovery window: This policy establishes the number of days within which point-in-time recovery must be possible. For RMAN disk based backups I recommend this method.
- Redundancy: Establishes a fixed number of backups that must be kept. Backups that are in excess of this can be deleted. The default retention policy is redundancy 1.
Note: When using a third party media manager, RMAN cannot implement an automatic retention policy if the media manager deletes backups. The media manager retention policy should always be longer than the RMAN retention policy to prevent expiration of tape backups that are still listed as available in the RMAN repository.
Tools of the Trade
It is important to use the backup tools in a manner that yields success. As already briefly discussed, the main Oracle tool for database backup and recovery is Recovery Manager (RMAN). RMAN is a superb tool for both backing up and restoring your database. You should be using this because no one knows how to backup and restore an Oracle database better than Oracle. Plus it is free.
In the real world however, the pretty Oracle GUI interface may not be available, or adequate, in all situations concerning backups and restores. Furthermore, RMAN does not backup everything required to survive a disaster. For these reasons being versed in the RMAN command line tool is essential.
As mentioned previously: SIMPLE = MATURE. Your backup tools and architecture need to be incredibly simple. The successful components detailed here can be implemented and maintained by any DBA. One does not require an extreme Oracle skill set. If you feel your environment warrants Oracle Management Server (OMS) or other complex forms of implementing Oracle backups, I would direct you to the manuals written on RMAN that detail every aspect of its usage.
It is imperative that you do not rely on just one backup method. In addition to using RMAN you need to also perform routine full database exports too. This is an industry best practice.
For Novell file servers, for instance, the native SBACKUP was used in addition to a vendor backup solution. In the Windows world circa NT4, for example, a good system administrator performed a native NT Backup in addition to vendor backup solutions. Vendor backup solutions and/or hardware have been known to fail at the most inopportune times. I purposely referenced these older systems because the underlying best practices don't change over time. Just the mask or surface software of the day changes.
Disasters do happen. They will happen in a way that effects you. Using more than one method is nothing more than insurance.
Components of a Successful Oracle Data Protection Plan
Modify these to meet your customer's RTO:
- Your plan needs to exist!
- It needs to be approved by your customer.
- Keep your database architectures simple and easy to duplicate.
- Ensure all your databases are in Archive Log Mode.
- Perform a full (level 0) backup once a week and on all other days perform an incremental backup (level 1).
- Perform a monthly test restore (validate) on your RMAN backups.
- Perform a full database export once a week (or more).
- If your RMAN backups go to disk, work with your system administrator to ensure that your backup set files are also backed up to tape weekly (if not daily).
- Ensure your tape backups are routinely sent offsite to a location that will not be effected by any disaster that can effect your physical location (hurricane etc.).
- Ensure your RMAN backups are retained for at least 30 days in your control file/catalog.
If these seem obvious, then you are right on! Simple is a sign of maturity in the implementation of enterprise systems and strategies.