2007/10/04

HowTo: Remove parameters from the spfile

Here's a question that comes up often:
  • How do you remove an SPFILE parameter (not change the value of, but actually purge it outright)?
Answer:
  • Use "ALTER SYSTEM RESET ..." (For database versions 9i and up)
Syntax:
  • ALTER SYSTEM RESET PARAMETER SID='SID|*'
  • ALTER SYSTEM RESET "_TRACE_FILES_PUBLIC" SCOPE=SPFILE SID='*';
NOTE: The "SID='SID|*'" argument is REQUIRED!

2007/09/24

HowTo: Clearing Alerts in OEM 10g Grid Control

I was browsing the Oracle Forums earlier today and this post with a bit of SQL to clear OEM alerts from mnazim, who always has good advice to offer:
To clear the alert:

1. Make a SELECT * from MGMT_CURRENT_SEVERITY and show for the TARGET_GUID, METRIC_GUID and KEY_VALUE.

2. Connect to db user SYSMAN and execute:
exec EM_SEVERITY.delete_current_severity(TARGET_GUID, METRIC_GUID, KEY_VALUE);

for example:
exec EM_SEVERITY.delete_current_severity('0DEB8E6980695B7548CF98871084AD10', 'F95BA0D95585002889E1ABF92B2DA7C3', 'SYS');

2007/09/10

What Do All 10g Flashback Features Rely on and what are their Limitations?

There's a great MetaLink article on Flashback - check it out!

What Do All 10g Flashback Features Rely on and what are their Limitations?
Doc ID: Note:435998.1
  • What Do All Flashback Features Rely on ?
  • Differences Between New Oracle 10g Flashback Technologies
  • Limitations of Flashback Database
  • Limitations and Restrictions on Flashback Drop
  • Limitations and Restrictions on Flashback Tables
Here are two gems that were news to me:
  • To be able to flashback a table, you must enable row movement for the table
  • Because DML operations are used to bring the table back to its former state, the ROWIDs in the table change. As a result, Flashback Table is not a viable option for applications that depend on the table’s ROWIDs to remain constant
Note that altering a table to enable row movement will cause underlying PL/SQL to become invalid!!!

2007/09/05

HOWTO: Disaster Recovery: Recovering the RMAN catalog database using RMAN

What's the first step in a DR scenario? Recovering the RMAN catalog database of course!

Can you use RMAN to recover RMAN? Yes, you can!

For my money, the best feature of RMAN is that even though the catalog database itself will not startup without recovery, it can still be used. Which is great, because otherwise we have one of those dreaded Star Trek space-time paradoxes (what do you do to recover your recovery system?!), and those make my head hurt. Even better, it's really pretty easy if you know what to do.

My scenario is:
I walk into work one morning and the production box that my RMAN catalog is running on got hosed-up somehow over night (i'm not naming any names... *cough* sysadmin *cough*) and now all of the databases on the box are down and in need of recovery as well as the RMAN database.

Note that in this particular case, the RMANDB database needed archive logs that were not on the system in order to be recovered. This is what necessitates the use of Step 5 below to retrieve the file, otherwise, this step could be skipped.

Here's how we fix the RMAN catalog database:
  1. Set your Oracle environment variables as normal (I use "oraenv")
    1. $ . oraenv
  2. Start rman
    1. $ rman
  3. Connect to the target DB and then the catalog (I use the "rman" user as my catalog owner and i've named my RMAN catalog database "rmandb")
    1. RMAN> connect target
    2. RMAN> connect catalog rman@rmandb
      1. RMAN-00571: ===========================================================
        RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
        RMAN-00571: ===========================================================
        RMAN-04004: error from recovery catalog database: ORA-01034: ORACLE not available
        ORA-27101: shared memory realm does not exist
        IBM AIX RISC System/6000 Error: 2: No such file or directory
  4. Recover the database
    1. RMAN> startup force nomount;
      1. Oracle instance started
        Total System Global Area 2164260864 bytes
        Fixed Size 2074256 bytes
        Variable Size 704645488 bytes
        Database Buffers 1442840576 bytes
        Redo Buffers 14700544 bytes
    2. RMAN> alter database mount;
      1. database mounted
  5. Get the missing archive logs back from tape/disk/TSM/whatever
    1. NOTE: When I first tried the "startup" command the database it told me it need archive log 241 and up, so that is where the value below came from (you can always use "restore archivelog all")
    2. RMAN> run {
      allocate channel oem_restore type sbt parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
      allocate channel oem_restore_disk type disk;
      restore archivelog from logseq=241 thread=1;
      }
  6. Open the database
    1. RMAN> alter database open resetlogs;
Having gone through this exercise just a couple of months ago, I can tell you that I was absolutely thrilled when this worked!!

2007/08/27

Problem: ORA-10631 errors encountered during "SHRINK SPACE" operation

OEM was nice the other day and offered some Segment Adviser Recommendations to shrink a few tables that had become fragmented. However, when I ran the shrink job, it failed.

PROBLEM:
  • "ALTER TABLE my_test_table SHRINK SPACE;" results in "ORA-10631: SHRINK clause should not be specified for this object"
SOLUTION:
  • Tables that have function-based indexes cannot be compacted using the SHRINK clause
  • Apparently, someone forgot to inform OEM of this fact (thanks the bogus recommendations! :-)
  • See MetaLink Bug No.:5968539
Here's a SQL statement that shows all the tables in a given schema and whether they are candidates for the SHRINK clause:

SELECT   dt.owner, dt.table_name,
         (CASE
            WHEN NVL(ind.cnt, 0) < 1 THEN 'Y'
            ELSE 'N'
         END) AS can_shrink
    FROM dba_tables dt,
         (SELECT   table_name, COUNT(*) cnt
              FROM dba_indexes di
             WHERE index_type LIKE 'FUNCTION-BASED%'
          GROUP BY table_name) ind
   WHERE dt.table_name = ind.table_name(+)
     AND dt.table_name NOT LIKE 'AQ$%'
     AND dt.table_name NOT LIKE 'BIN$%'
     AND dt.owner = 'BRENT'
ORDER BY 1, 2;

Problem: Unable to copy data files when 10g database is in hot backup mode

So, our customers are FINALLY making the transition to 10gR2 databases this summer and we're having all kinds of great "Oops!" moments. Here's a good example:

PROBLEM:
  • A tablespace's physical files cannot be copied even when the tablespace is in hot backup mode.
  • We created a pretty standard 10.2.0.3 database on an AIX 5L box, and when we put the tablespaces into hot backup mode, we were unable to copy the data files to a backup location as AIX kept reporting that they were locked.
Of course our first reaction was "HUH?!", and after much head scratching, sysadmin pestering, and SAR opening, we finally got the solution from Oracle.

SOLUTION:
  • There is a bug with the FILESYSTEMIO_OPTIONS init parameter in 10g that prevents the "setall" value from working correctly.
  • We were able to copy the data files once we made the following change: filesystemio_options=async
  • See MetaLink Note:434385.1

2007/08/24

HowTo: Setup User Defined Metrics in OEM Grid Control

User Defined Metrics are a fantastic feature of OEM Grid Control. Do you use them? Well, you should! They are very easy to setup and are just as reliable as any other OEM alerts.

As an example, let's say you want to be notified whenever a certain schema has any invalid objects in it. This is how to setup such an alert:
  1. Navigate to the database instance "Home" page
  2. Click the "User Defined Metrics" link at the bottom of the page
  3. Click the "Create" button to create a new UDM
  4. Complete all of the form fields


    1. SQL query you would like to run (don't use a semi-colon at the end!)
    2. Credentials
    3. Thresholds
    4. Schedule
    5. Note that you can customize the subject line of the notification emails
    6. Click the "Test" button to make sure the results are what you want
    7. Click "Ok" to save this UDM
  5. Now, if you merely stop here, no notifications will ever be sent! Why? Because there is no Notification Rule defined, so let' s do that next...
  6. Click "Preferences" in the top right corner of any Grid Control page
  7. Click "Rules" in the left nav bar
  8. Create a new Notification Rule

  9. Voila! Now email notifications will be sent whenever your threshold is exceeded and the this metric automatically integrates with all of the Alert display boxes throughout OEM.
Cool huh?

11g New Features for DBA's

Arup Nanda is already hard at work detailing the new 11g features for DBA's!

Check it out.

2007/08/09

Diagnosing Oracle I/O problems in AIX

Check out this great post by Matt Fleming.

I was searching for how to create a Physical Volume (PV) to Logical Volume (LV) LUN mapping spreadsheet for my AIX system. I didn't find anything as good as this, so H/T to my buddy Mark for sending me this link.

Now I know how get this information, plus I saved the AIX I/O Stack diagram for future reference.

Very nice!

2007/08/07

HowTo: Reorg physical disks under ASM

Let's say I have one or more disks/mounts in an ASM disk group and I want to get all of my data off of this disk and then remove it from the disk group. How is this done?

Answer:

When I first built my lone ASM instance (+ASM), I had SysAdmin create 2 300GB RAW mounts and made them into a single disk group to be used by a single instance.

This setup lead to very poor ASM throughput of about 2MB/sec (and this is off of an 200MB/sec EMC SAN!). This is on a 64-bit AIX 5L box with each mount only having only a single LUN (which they did not bother to disclose to me until AFTER i reported slowness!).

So what i decided to do was to start again.

I went back to the SA's and ordered 20, 25GB mounts with a unique LUN for each mountpoint. After the space became available I added it to my existing disk group (DGROUP1) and let ASM rebalance itself over night.

Now, here's the genius part: All you have to do to get your data off of a disk and remove is to just go into the OEM Grid Control ASM Disk Group page, select the disk and click the "Remove" button.

ASM AUTOMATICALLY rebalances the data off of this one disk onto the remaining disks and then drops the disk!. How great is that?

Once you've selected a disk and clicked "Remove", you will see that the disk now has a status of "Dropping" (see screen-cap below).



You can also monitor the progress of the drop operation by using the "Pending Operations" link under the General section heading on the Disk Group page (see screen cap below).

2007/07/12

An insanely great trick to recover a cold backup database that starts needing recovery

Why would a 10g database restored from cold backup need recovery? That's messed up.

So for the second time now, we took a small (~50GB) database the other day, shut it down using "shutdown immediate", started it back up, switched logs, and then shut it down again.

We then cloned it (took a cold backup of it, and brought it up with a different name), and low-and-behold it complains that it needs recovery. WTF?

Here's the upside though: There's a way to easily recover from this and bring the database up.

Check out MetaLink Note: 175015.1 - it's somewhat cryptic, but essentially all you need to do is take the most recent REDO log and copy it to the archivelog dest location and name it to whatever archivelog name Oracle is suggesting in the recover database command, then just hit (Return) to accept the filename and begin the recovery process. Next thing you know, boom, everything is up and running.

Now to the question of why this is happening...

Oracle MetaLink search plugin now available for Firefox

Sweet! Check it out.

Thanks to the good folks at Oracle AppsLab!

2007/06/20

Issues with RMAN on AIX using TDP for Oracle

So our group has thankfully decided to go with RMAN for database backups, and you know i'm first in line to take this task on.

Here's my setup at a high level:
  • 10gR2 database (rmandb - snappy name huh? :)
  • IBM's TDP 5.3 for Oracle
  • AIX 5.2
Steps:
  1. My internal IT department setup my TSM domain, etc.
  2. Sysadmin installed the TDP fileset and then generated the password file for me
  3. I used the TSM Quick Start Guide to get all of my config files and directories setup
  4. I used OEM GC R3 to set the RMAN preferences, and to setup the Tape parameters: ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)
Once all of this was done and "$ tdpoconf SHOWENVironment" looked good, i went for my first test drive.

I logged into RMAN, connected to target, and then ran a connection test:
run {
allocate channel oem_sbt_backup type 'sbt_tape' format '%U' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
release channel oem_sbt_backup;
}
And it worked fine. Woot!

Next, I used the "Test Backup" button on the Backup Setting OEM page. It's a simple test consisting of:

run {
allocate channel oem_sbt_backup type 'sbt_tape' format '%U' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
backup as BACKUPSET current controlfile tag '05252007020711';
restore controlfile validate from tag '05252007020711';
release channel oem_sbt_backup;
}
And... it fails with this lovely error output:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 05/22/2007 16:28:09
ORA-19506: failed to create sequential file, name="0aiiceoi_1_2", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
ANS0231E (RC2025) The management class name was not found.
Nice huh?? What the heck is "ANS0231E (RC2025) The management class name was not found."?

If you think that's depressing, try entering "ANS0231E" into Google. There is less than 30 results worldwide! I hate when that happens...

After working with IS and IBM for 3 weeks (3 weeks!!!), we finally worked our way up to a SEV-1 analyst on a conference call and he figures it out.

Wait for it...

My tdpo.opt file had no stanza for "TDPO_MGMT_CLASS_2" (or _3, or _4) because I was only allocating one channel (these values are only used when more than one channel is opened to TSM, otherwise, it used the channel appropriately named "DEFAULT").

Here's the gotcha: I setup RMAN to do 'duplex' copies, e.g.: create 2 copies of all datafiles and 2 copies of all control files.

The problem is RMAN opens 2 channels when duplex is set!!!

Bottom Line: (I had IBM add this to their KB)
If you receive the dreaded "ANS0231E (RC2025) The management class name was not found." error, the first question you need to ask yourself is, "Am i using duplex copies", if so, i'll bet you don't have a value for "TDPO_MGMT_CLASS_2" in your tdpo.opt file.

This was a hard lesson learned... :-)

2007/06/14

Upgrading EM Grid Control from 10.2.0.1 to 10.2.0.3

Okay, so I finally bit the bullet and upgraded EM Grid Control from the initial install version of 10.2.0.1 to 10.2.0.3 and upgraded the EM repository (cleverly named EMREP) from 10.1.0.5 to 10.2.0.1 .

Here's a few observations about the upgrade:
  1. The EMREP page in GC still showed the old 10.1 home instead of the new 10.2 home.
    1. This stems from the fact that I had 2 different ORACLE_HOME installs: /orabase/product/ora10g and /orabase/OracleHomes/db10g.
    2. Upgrading meant moving from one home to another and GC did not pick up on this change
    3. Solution: Remove EMREP from monitoring and re-add it (BTW, refreshing the host configuration did not work!)
  2. It took much longer than I had planned.
    1. As part of the 10.2.0.3 patch, a repository upgrade was necessary - this step alone took about 45 minutes - yikes!
  3. There are about 8 new security policies that come with GC 10.2.0.3 ("Password Complexity Verification Function Usage", etc.)
    1. Once you've upgraded and logged in to GC, you'll see a whole bunch of new security policy warnings.
    2. Sadly, these have to be fixed by going to your Monitoring Templates (you do use templates right??) and using the "Add Policies To Template" button to add the offending policies and then set them appropriately.
One last thing:
  • They finally fixed that god-forsaken bug that kept the Database->Administration->Tablespaces page from rendering on 9i databases in under an hour! Jeez, that was annoying as hell! :-)

2007/05/31

SQL of the Day: How to view progress of DML operations

So, you're running an INSERT/UPDATE/DELETE statement and it's just sitting there processing. Then your boss comes by and says, "When will this be done?". Well now you can tell them.

Run this SQL to show the progress of DML operations:
SELECT SUBSTR(sql_text, 1, 60) "SQL Text",
       rows_processed "Total Rows Processed",
       ROUND((SYSDATE - TO_DATE(first_load_time, 'yyyy-mm-dd hh24:mi:ss')) * 24 * 60, 1) "Total Time (Min)",
       TRUNC(rows_processed /((SYSDATE - TO_DATE(first_load_time, 'yyyy-mm-dd hh24:mi:ss')) * 24 * 60)) "Rows/Min",
       TRUNC(rows_processed /((SYSDATE - TO_DATE(first_load_time, 'yyyy-mm-dd hh24:mi:ss')) * 24 * 60 * 60)) "Rows/Sec"
  FROM SYS.v_$sqlarea
 WHERE sql_text LIKE 'INSERT INTO some_table%'
   AND open_versions > 0
   AND rows_processed > 0;

2007/05/24

D'oh! of the Day: Creating a tablespace with 16k blocksize in a database with 8k blocksize default

So, this should have been a no-brainer.

I'm trying to improve the performance of a DSS database (10gR2 with ASM), so I figure, "Hey, i'll create a new tablespace with a 16k blocksize and reorg 2 big tables into it to see if that helps."
SQL> CREATE SMALLFILE TABLESPACE "DATA_16K"
DATAFILE'+DGROUP1(DATAFILE)' SIZE 1M
LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
BLOCKSIZE 16K;
ORA-27035: logical block size is invalid
Huh??

Here's the solution (the D'oh! moment): I never created a buffer pool for 16k blocks!
SQL> select id,name,block_size, current_size,buffers from v$buffer_pool;

ID|NAME |BLOCK_SIZE|CURRENT_SIZE| BUFFERS
----------|--------------------|----------|------------|----------
3|DEFAULT | 8192| 2560| 317439

SQL> alter system set db_16K_cache_size=2m scope=both;
System altered.

SQL> select id,name,block_size, current_size,buffers from v$buffer_pool;

ID|NAME |BLOCK_SIZE|CURRENT_SIZE| BUFFERS
----------|--------------------|----------|------------|----------
3|DEFAULT | 8192| 2528| 313471
7|DEFAULT | 16384| 32| 2014

SQL> CREATE SMALLFILE TABLESPACE "DATA_16K"
DATAFILE'+DGROUP1(DATAFILE)' SIZE 1M
LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
BLOCKSIZE 16K;

Tablespace created.
Yes, i'm easily amused some days.

2007/05/14

Having issues using DBCA with ASM on 10gR2?

So, i'm feeling particularly pleased last week when I finally get ASM up and running last week on a new reporting server. It's an IBM AIX box, with a 600GB disk group (DGROUP1), using the usual "+ASM" instance name. Also note that I used my 10gR2 (10.2.0.1 EE) home as my ASM home, since i'm only using ASM for a single database on this box.

Once i verify that ASM is working and that it's hooked into EM Grid Control, I cd to my Oracle 10gR2 home, setup my environment by using oraenv, and then run dbca.

Everything is going great until I get to the storage page, where I select "ASM" and click Next. Within a couple of seconds, I see this gem of an error message:
"DBCA could not startup the ASM instance configured on this node. To proceed with the database creation using ASM, ASM instance needs to be up and running. Do you want to recreate ASM instance on this node?"
Hmm. Not what I wanted to see, that's for sure. :-(

So it's off to Metalink to see what Oracle has to say. In no short order, I find about 6 separate issues (some with their own accompanying bug numbers) all reporting the exact same issue, and all with existing ASM
instances.

The only note that had a real work-around involved changing some OAS (Oracle Advanced Security) settings as ODBC did not work well with them. That's all well-and-good, but i'm not using OAS. My sqlnet.ora file has one line!

Next, I did what all good DBA's do when faced with this type of situation: go to Google. Unfortunately, I was still unable to find a solution. The OTN forums were not much help either. Oddly enough, some of my searches even resulted in googlewhacks.

The one interesting result I found on Metalink said something like, "I managed to use dbca to setup a database with ASM before, but now that I try it a second time, it does not work!" Interesting. To me, that meant there HAD TO BE a solution here. There's just no way that dbca is unable to create a database with ASM. Common!!

Here's what i'm doing: I have 2 terminal windows open - one was running dbca, the other was my working terminal. The working one being the terminal I used to change listener settings, tnsping, etc.

After 2 hours of a hapless tango between clicking Next> and seeing the same error, then changing settings ... , I finally throw in the towel at 4:30 and close the term window running dbca.

Just for fun, I run dbca from the OTHER term window where i've been continually futzing with settings. I get to Storage, select ASM and click Next... NOW IT WORKS!!! WTF?!

That's why it works for some people sometimes and then stops the next time. We've got some env variable or something set wrong!

SOLUTION: If you get this error, check your settings, open another window, re-run oraenv, keep trying!

As soon as dbca completed, I catpured my 'env' output - here's the items of interest:
  • ORACLE_SID=+ASM
  • ORACLE_HOME=/orabase/product/ora10g
  • LIBPATH=/orabase/product/ora10g/lib:/orabase/product/ora10g/lib32:/usr/lib:/lib
  • JAVA_HOME=/usr/java5/bin

2007/05/11

HowTo: Force re-evaluation of Policy Violations in OEM Grid Control

Don't you hate when you fix a Security/Policy Violation in OEM and it won't re-evaluate it for another 24 hours?

Well there's 2 things you can try:
  1. Apply your Monitoring Template to the object (this works especially well for Hosts), this will force a re-evaluation and clear outstanding alerts
  2. Use an undocumented parameter for the EM Agent (10.2.0.1+) "emctl clearstate agent". This will cause the Agent to re-evaluate the instances on it's host - note: this can take a couple of hours to run and update in the background!