2007/09/05

HOWTO: Disaster Recovery: Recovering the RMAN catalog database using RMAN

What's the first step in a DR scenario? Recovering the RMAN catalog database of course!

Can you use RMAN to recover RMAN? Yes, you can!

For my money, the best feature of RMAN is that even though the catalog database itself will not startup without recovery, it can still be used. Which is great, because otherwise we have one of those dreaded Star Trek space-time paradoxes (what do you do to recover your recovery system?!), and those make my head hurt. Even better, it's really pretty easy if you know what to do.

My scenario is:
I walk into work one morning and the production box that my RMAN catalog is running on got hosed-up somehow over night (i'm not naming any names... *cough* sysadmin *cough*) and now all of the databases on the box are down and in need of recovery as well as the RMAN database.

Note that in this particular case, the RMANDB database needed archive logs that were not on the system in order to be recovered. This is what necessitates the use of Step 5 below to retrieve the file, otherwise, this step could be skipped.

Here's how we fix the RMAN catalog database:
  1. Set your Oracle environment variables as normal (I use "oraenv")
    1. $ . oraenv
  2. Start rman
    1. $ rman
  3. Connect to the target DB and then the catalog (I use the "rman" user as my catalog owner and i've named my RMAN catalog database "rmandb")
    1. RMAN> connect target
    2. RMAN> connect catalog rman@rmandb
      1. RMAN-00571: ===========================================================
        RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
        RMAN-00571: ===========================================================
        RMAN-04004: error from recovery catalog database: ORA-01034: ORACLE not available
        ORA-27101: shared memory realm does not exist
        IBM AIX RISC System/6000 Error: 2: No such file or directory
  4. Recover the database
    1. RMAN> startup force nomount;
      1. Oracle instance started
        Total System Global Area 2164260864 bytes
        Fixed Size 2074256 bytes
        Variable Size 704645488 bytes
        Database Buffers 1442840576 bytes
        Redo Buffers 14700544 bytes
    2. RMAN> alter database mount;
      1. database mounted
  5. Get the missing archive logs back from tape/disk/TSM/whatever
    1. NOTE: When I first tried the "startup" command the database it told me it need archive log 241 and up, so that is where the value below came from (you can always use "restore archivelog all")
    2. RMAN> run {
      allocate channel oem_restore type sbt parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
      allocate channel oem_restore_disk type disk;
      restore archivelog from logseq=241 thread=1;
      }
  6. Open the database
    1. RMAN> alter database open resetlogs;
Having gone through this exercise just a couple of months ago, I can tell you that I was absolutely thrilled when this worked!!

1 comment:

Kintanon said...

Talk about a lifesaver. This entry was PRECISELY what I needed today. You = Awesome.

Extra awesome points because you picked the same blog layout I did...