ORA-00494 Error enqueue [CF] held for too long causing database hung

Continued, I got another problem of “ORA-00494: enqueue [CF] held for too long (more than 900 seconds)” and it looked like caused database hung.

In alert log, the typical information below occurred quite frequently.

Errors in file /oracle/diag/rdbms/orcl/ORCL/trace/ORCL_mmon_11013.trc  (incident=149730):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 11750'

Oracle Note 753290.1 indicates that it could be problem of setting redo log file.  The typical symptom is the high frequency of redo log files switch. And, it provides a possible solution for that.

Solution#1:
We usually suggest to configure the redo log switches to be done every 20~30 min to reduce the contention on the control files.
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine a recommended size for your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of  FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.

To apply this solution, issue

SQL> select OPTIMAL_LOGFILE_SIZE from v$instance_recovery;
OPTIMAL_LOGFILE_SIZE
--------------------

Note: Without setting the FAST_START_MTTR_TARGET, you will get nothing back.

SQL> show parameter FAST_START_MTTR_TARGET
NAME                                 TYPE        VALUE
----------------------------         --------    ------
fast_start_mttr_target               integer     0

SQL> alter system set FAST_START_MTTR_TARGET = 60;
System altered.

SQL> select OPTIMAL_LOGFILE_SIZE from v$instance_recovery;
OPTIMAL_LOGFILE_SIZE
--------------------
49

In my case, the size of redo log file is 10 M and, obviously, it’s necessary to be changed to optimal size. To do that, I just simply follow the excellent post “Resizing / Recreating Online Redo Log Files” (pdf).

After that’s done, I’m hoping the issue was solved.

Posted in My Reference, Oracle Case Study, Oracle Point | Tagged , | Leave a comment

“WARNING: oradism did not start up correctly” (oracle database)

Recently, we experienced an issue on our oracle database. The database is hung and the following error keeps occurring in alert file.

WARNING: oradism did not start up correctly.
  Dynamic ISM can not be locked.----------------------------------------
oradism creation failed for unknown reasons 0 8 500

My research hits the oracle document ID 1074905.1 “WARNING: Oradism did not start up correctly – Reported in the Alert.log”. This document recommends to correct the mount points. After checking the system error log, we didn’t find typical error, for example,

[ID 809163 kern.info] NOTICE: oradism, uid 303: setuid execution not allowed, dev=13b00001770

Another document “When Will DISM Start On Oracle Database? (Doc ID 778777.1)” talks about “Oracle will use DISM instead of ISM if SGA_MAX_SIZE is set larger than the total of the database buffers (in particular, db_cache_size dynamic SGA resizing is not supported with the older db_block_buffers parameter), the shared pool, the redo buffers, the large pool, the Java pool, the fixed size.  Just simply followed this suggestion, the error message “WARNING: oradism did not start up correctly” is gone.

PROBLEM SOLVED!

When Will DISM Start On Oracle Database? (Doc ID 778777.1)

Dynamic intimate shared memory (DISM) is a unique feature of Solaris. It achieves dynamic memory management. On other OSs, When you specify SGA_MAX_SIZE, then the complete size of SGA_MAX_SIZE is reserved from your memory (reserved from Swap not from physical RAM) at instance startup. Then you can dynamically increase SGA_TARGET size until you reach SGA_MAX_SIZE. On Solaris, however, and due to DISM feature, SGA_MAX_SIZE is not reserved from the memory at instance startup but it is allocated when needed. This presents a great value since you do not reserve memory you are not using.

The use of DISM requires that the ora_dism process starts. The process will not start unless SGA_MAX_SIZE is greater than sum total of all SGA components. This means that in case you use SGA_TARGET. you specify size of SGA_MAX_SIZE greater than SGA_TARGET.

**For 11g only**
Oracle Database also uses DISM if it is available on the system and MEMORY_TARGET or MEMORY_MAX_TARGET  is set by the user.

By default, SGA_MAX_SIZE is not set during database installation and hence it takes the same value as SGA_TARGET. The use of SGA_TARGET is not a necessity to use DISM. Oracle will use DISM instead of ISM if SGA_MAX_SIZE is set larger than the total of the database buffers (in particular, db_cache_size dynamic SGA resizing is not supported with the older db_block_buffers parameter), the shared pool, the redo buffers, the large pool, the Java pool, the fixed size. In this case you will need to check the total of following parameters:

* db_cache_size
* java_pool_size
* large_pool_size
* log_buffer
* shared_pool_size
* streams_pool_size
* db_keep_cache_size
* db_recycle_cache_size

The difference between the total and the SGA_MAX_SIZE is so small that it can be approximated in calculations related to Granule size, so DISM should not be started, otherwise the ora_dism process will be started. .

Please note that to use DISM, you must have ISM enabled (which is the default case on Oracle 10g/11g). You will need always to check parameter:
*._use_ism= TRUE

Also done following suggestion.

“11g: ORA-609 TNS-12537 and TNS-12547 or TNS-12170 in 11g Alert.log (Doc ID 1116960.1)”

For that reason, we often recommend increasing the values for INBOUND_CONNECT_TIMEOUT at both listener and server side sqlnet.ora file as a preventive measure.  If the problem  is due to connection timeouts, an increase in the following parameters should eliminate or reduce the occurrence of the ORA-609s.

e.g.
Sqlnet.ora: SQLNET.INBOUND_CONNECT_TIMEOUT=180
Listener.ora: INBOUND_CONNECT_TIMEOUT_listener_name=120

These settings are in seconds.  Again, the default is 60.  A setting of 120 seconds should resolve most if not all ORA-609 errors.

As explained in the Cause section, The ORA-609 error is thrown when a client connection of any kind failed to complete or aborted the connection process, so ORA-609 and TNS- errors are expected while shutting down the database – as client connection will fail to complete or aborted the connection process before the connection/authentication process was complete. So please ignore the ORA-609 and TNS- errors, if these are appear while database instance is shutting down.

Posted in My Reference, Oracle Point | Tagged , | Leave a comment

Error ORA-16724 Raised While Oracle Standby Database Tried to Catch up Redo Log Received from Primary Database

For what ever reason, if error ORA-16724 raised while you try to have oracle standby database to catch up the redo log received from primary database, you might need to use incremental backup to achieve your goal. It’s for sure you could simply drop standby database or and create a new one. But, that’s not always a feasible way.

Over the past weekend, I experienced such error while I was working on moving database to different Unix system. It’s very good to me to come across the good posting regarding to this topic. That’s “Resolving Gaps in Data Guard Apply Using Incremental RMAN Backup” by author Arup Nanda.

One thing I realized  is that it’s better to pick earlier SCN number from which to create incremental backup. For example, if current SCN on standby is 1301571, it’s no harm to do incremental backup starting from SCN 1300000. The reason I’m doing this is because the log file for SCN 1301571 on standby site is probably corrupted and starting from earlier SCN could re-generate redo log file for SCN 1301571.

This posting provided step-by-step approach to solve this issue. The only step I added is

SQL Stby> alter database recover managed standby database cancel;

The reason I’m doing for is because I experienced error message below when I tried to recover database via RMAN by using incremental backup files generated from primary database.

RMAN> recover database;

Starting recover at 26-JAN-15
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00003: /dbfiles/STBY/undotbs.dbf
destination for restore of datafile 00008: /dbfiles/STBY/development1.dbf
destination for restore of datafile 00009: /dbfiles/STBY/indexes.dbf
channel ORA_DISK_1: reading from backup piece /dbbackup/incre_back/5tptmjaj_1_1.rmb
channel ORA_DISK_2: starting incremental datafile backup set restore
channel ORA_DISK_2: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /dbfiles/STBY/system.dbf
channel ORA_DISK_2: reading from backup piece /dbbackup/incre_back/5uptmji8_1_1.rmb
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 01/26/2015 13:43:05
ORA-19870: error while restoring backup piece /dbbackup/incre_back/5tptmjaj_1_1.rmb
ORA-19573: cannot obtain exclusive enqueue for datafile 8

As standby database is going to apply redo log files received from primary database once it’s mounted, the datafiles are consequently locked. Issuing command above will stop the redo log apply and release lock on the the datafile.

Once I followed the steps in Arup’s posting, the ORA-16724 was solved and the standby database could be able to apply the redo log.

Furthermore, I got one warning message from data guard manager, saying “ORA-16826: apply service state is inconsistent with the DelayMins property”. To solve this, I just simply follow the posting authored by Navneet Upneja.  The workaround is quite straightforward.

SQL Stby> alter database recover managed standby database cancel;

Database altered.

SQL Stby> alter database recover managed standby database using current logfile disconnect;

Database altered.

The logic is to stop redo log apply first and then start redo log apply in real time mode.

Solution_for_Ora_16724_Standby_database (author: Arup Nanda)

Solution_for_Ora-16826_Standby_Database (author: Navneet Upneja)

Posted in My Reference | Tagged , , , | Leave a comment