TroubleShooting

Health Checks

<- TroubleShooting

Daily

Are the OS and system hardware functioning OK (SAN/RAID, CPU, Network, RAM)?
Is the database up? Was it down for any reason in the last 24 hours?
Can client apps connect to the database?
When was the last RMAN incremental backup run and what were the results?
Are there any database errors that you need to know about?
- Alert Log
  SQL To determine Alert Log location:
  SELECT value FROM v$diag_info WHERE name='Diag Trace';
  alias alog='tail -f /u01/app/oracle/diag/rdbms/mydb/mydb/trace/alert_mydb.log'
  Common Path:
  %ORACLE_BASE%\diag\rdbms\{DBNAME}\{DBNAME}\trace\alert_{DBNAME}.log
- Listener Log
  To determine location:
  OS> lsnrctl status
  Common Path:
  %ORACLE_BASE%\diag\tnslsnr\{DBNAME}\listener\trace\listener.log
Is there ample disk space for the datafiles, archive logs, export files and RMAN files?
Did the daily scheduled jobs run OK?
Review any security provided monitoring reports for inappropriate changes (Tripwire etc.).

IMPORTANT: Never open a live listener log file from an editor!
Use tail -f or copy it first and view the copy.

Weekly

When was the last RMAN full backup run and what were the results?
When was the last RMAN database validation run and what were the results?
When was the last full database export and what were the results?
How often are redo log switches occurring?
Are the number of connections increasing? If so by how much?
Are all the blocks in your datafiles OK (not just the ones with data)?
Is any partition maintenance required (extending ranges etc.)?
Does any of your documentation need to be updated?
Did the weekly scheduled jobs run OK?

Monthly

Are all your database patches up-to-date?
Are your indexes OK or do they need to be rebuilt?
Which tablespaces are growing, and by how much?
Is the size of any schema increasing? If so by how much?
Is your SGA and PGA usage increasing? If so by how much?
What SQL is being run most often and can it be made more efficient?
Is there a significant increase in DML (INSERTS, UPDATES and DELETES)?
Are the segments in your tablespaces OK or do they need to be optimized?
Can the maintenance of any the databases be simplified?
Did the monthly scheduled jobs run OK?

Appendix: Using SQL to Check Alert Log (11g and later)

SELECT inst_id, originating_timestamp, message_text
FROM TABLE(gv$(cursor(SELECT inst_id, originating_timestamp, message_text 
                      FROM v$diag_alert_ext
                      WHERE originating_timestamp > (sysdate - 1)
                      AND   message_text LIKE '%ORA-%')))
ORDER BY inst_id, originating_timestamp;

<- TroubleShooting