Windows Virtual PC on Windows 7

Comparing to Oracle VirtualBox, windows virtual pc is excellent alternative if you work on windows 7 platform daily.

It’s very straightforward to get windows virtual pc installed.

1.  download and install windows virtual pc

2. install and use windows XP mode in windows 7

However, implementing multiple virtual pc on windows 7 is not very easy. The good thing is that we still can do that.

Posted in My Reference | Leave a comment

Speed up Indexing of Splunk Enterprise

Splunk is a very good log analysis product at enterprise level. The license of trail version is good for 60 days. When I use trail version to analyze different kinds of logs. What I found is that the speed of indexing is kindly slow. After research, the following tip is very helpful to speed up the indexing.

In default configuration file $SPLUNK_HOME/etc/system/default/indexes.conf, change following two parameters:

  • frozenTimePeriodInSecs = 31449600 (note: it only index data in the past year)
  • maxMemMB = 500 (enlarge memory allocation for splunk)
Posted in Oracle Utilities | Tagged | Leave a comment

Oracle Database Exceptional Termination by ARC1

It’s not good to experience the exceptional downtime of database. But, it happens some time. What we  encountered this time is “Instance terminated by ARC1, pid = 10353”, which brought down one of production databases.

Alert log shows:

ORA-1092 : opitsk aborting process 
Instance terminated by ARC1, pid = 10353
......
System state dump requested by (instance=1, osid=29144 (ARC1)), summary=[abnormal instance termination]. 
System State dumped to trace file /oracle11/diag/rdbms/db1/DB1/trace/DB1_diag_28895.trc 
Instance terminated by ARC1, pid = 29144
......
Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_m000_16420.trc:
ORA-12751: cpu time or run time policy violation
WARNING: aiowait timed out 1 times
ERROR: Unable to normalize symbol name for the following short stack (at offset 139):
dbgexProcessError()+176<-dbgePostErrorKGE()+1348<-dbkePostKGE_kgsf()+48<-kgeade()+640<-kgerelv()+240<-kgerec4()+80<-kjdgpstackdmp()+892<-_$c1A.kjdglblkrdmpint()+216<-ksikblkrdmpi()+240<-ksqgtlctx()+9760<-ksqgelctx()+800<-kcc_get_enqueue()+544<-kccocx()+716<-kcc_begin_txn_internal()+76<-krsa_cftxn_begin()+2752<-krse_arc_complete()+540<-krse_arc_driver_core()+4940<-krse_arc_driver()+1292<-kcrrwkx()+21916<-kcrrwk()+1560<-ksbabs()+1348<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_arc1_10353.trc  (incident=77612):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 9876'
Incident details in: /oracle11/diag/rdbms/db1/DB1/incident/incdir_77612/DB1_arc1_10353_i77612.trc
Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_ora_10256.trc  (incident=77940):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 9876'
Incident details in: /oracle11/diag/rdbms/db1/DB1/incident/incdir_77940/DB1_ora_10256_i77940.trc
Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353)
 by killing session 438.1
Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353)
 by terminating the process
ARC1 (ospid: 10353): terminating the instance due to error 2103

DB1_m000_16420.trc shows:

DDE: Problem Key 'ORA 12751' was flood controlled (0x2) (no incident)
ORA-12751: cpu time or run time policy violation
ORA-12751: cpu time or run time policy violation
KEBM: MMON slave action policy violation. krammonsl_; viol=1; err=12751

Above findings gave us the couple of messages and clues:

  1. ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 9876′
  2. ORA-12751: cpu time or run time policy violation
  3. KEBM: MMON slave action policy violation. krammonsl_; viol=1; err=12751
  4. Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353) by terminating the process

Finding no.1 simply guide us to adjust size of redo log files to avoid competition and waiting, which is described in my previous posting “ORA-00494 Error enqueue [CF] held for too long causing database hung”.

Finding no.2 and no.3 brought us to oracle support document “DOC ID 1671412.1: Alert Log Shows ‘ORA-12751: cpu time or run time policy violation’ and Associated MMON Trace Shows ‘KEBM: MMON action policy violation. ‘Block Cleanout Optim, Undo Segment Scan’ viol=1; err=12751’”.  The symptoms showing in our system are exactly what this document mentioned.

From this document, “This is due to  Bug 9040676   MMON ACTION POLICY VIOLATION. ‘BLOCK CLEANOUT OPTIM, UNDO SEGMENT SCAN’.

Sometimes when there are long and large transactions in the database, MMON starts scanning undo tablespace aggressively, causing the errors and AWR not being generated. The MMON Process is directly related to the AWR; this process is responsible for the collection of statistics for the Automatic Workload Repository (AWR).  MMON may suspend actions where there was a large queue of background tasks waiting for service or in a case of server resource exhaustion. This explains why AWR report could not be generated in that period of time.”

The workaround for solving this is to issue command below to disable the cleanout optimization routine,

SQL> alter system set "_smu_debug_mode"=134217728 scope=both;

Finding 4 is a little bit tricky. It looked like killing enqueue blocker caused the termination of instance. Oracle support document “Database Crashes With ORA-00494 (Doc ID 753290.1)” talks about one possible cause that matches the symptoms found in our system.

Cause#1: The lgwr has killed the ckpt process, causing the instance to crash.
From the alert.log we can see:

    The database has waited too long for a CF enqueue, so the next error is reported:
    ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 38356'

    Then the LGWR killed the blocker, which was in this case the CKPT process which then causes the instance to crash.

Checking the alert.log further we can see that the frequency of redo log files switch is very high (almost every 1 min).

Also, this document offers three solutions:

  1. re-size redo log files to reduce the contention on the control files
  2. check the storage used for storing the database as this issue is I/O issue as per collected data
  3. adjust hidden init parameter _kill_enqueue_blocker

Solution no.1 was applied and also we tried to working solution no. 3. I just simply included the solution no.3 here. That explain how we can do very well.

Solution#3:

This kill blocker interface / ORA-494 was introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking process, non-background or background.

  • The difference will be that if the enqueue holder is a non-background process, even if it is killed, the instance can function without it.
  • In case the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash.

If you want to avoid the kill of the blocker (background or non-background process), you can set

_kill_controlfile_enqueue_blocker=false.

This means that no type of blocker will be killed anymore although the resolution to this problem should focus on why the process is holding the enqueue for so long. Also, you may prefer to only avoid killing background processes, since they are vital to the instance, and you may want to allow the killing of non-background blokers.

This has been addressed in a secondary bug – unpublished Bug 7914003 ‘KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED’ which was closed as Not a bug.

In order to prevent a background blocker from being killed, you can set the following init.ora parameter to 1 (default is 3).

_kill_enqueue_blocker=1

With this parameter, if the enqueue holder is a background process, then it will not be killed, therefore the instance will not crash. If the enqueue holder is not a background process, the new 10.2.0.4 mechanism will still try to kill it.

The reason why the blocker interface with ORA-494 is kept is because, in most cases, customers would prefer crashing the instance than having a cluster-wide hang.

_kill_enqueue_blocker = { 0 | 1 | 2 | 3 }

    0. Disables this mechanism and no foreground or background blocker process in enqueue will be killed.
    1. Enables this mechanism and only kills foreground blocker process in enqueue while background process is not affected.
    2.  Enables this mechanism and only kills background blocker process in enqueue.
    3.  Default value. Enables this mechanism and kills blocker processes in enqueue.

Posted in Oracle Point | Leave a comment

Installing Java JDK and Tomcat server on Linux

1. Java Installation and setting

  • download Java JDK 1.7.0_80 and extract to /usr/java
  • add followings to /etc/profile and save
export JAVA_HOME=/usr/java/jdk1.7.0_80
export CLASSPATH=/usr/java/jdk1.7.0_80/lib
export PATH=$JAVA_HOME/bin:$PATH
  • make the environment setting effective
-bash-4.2$ source /etc/profile
  • check if java is setting properly
-bash-4.2$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

2. installing Tomcat Server

  • download tomcat server 7.0.62
  • unzip tomcat server 7.0.62 to /apache-tomcat
  • Startup Tomcat Server (run command below)
-bash-4.2$ sh /apache-tomcat/bin/startup.sh

Using CATALINA_BASE:   /apache-tomcat
Using CATALINA_HOME:   /apache-tomcat
Using CATALINA_TMPDIR: /apache-tomcat/temp
Using JRE_HOME:        /usr/java/jdk1.7.0_80
Using CLASSPATH:       /apache-tomcat/bin/bootstrap.jar:/apache-tomcat/bin/tomcat-juli.jar
Tomcat started.

NOTE: For this version, it’s no need to set environment parameter CATALINA_HOME, CATALINA_BASE, CATALINA_TMPDIR AND CLASSPATH. When startup.sh is called to start tomcat server, the parent folder is set to CATALINA_HOME and CATALINA_BASE.

  • Check output file /apache-tomcat/logs/catalina.out
-bash-4.2$ tail -10 /apache-tomcat/logs/catalina.out
Jul 02, 2015 9:23:08 AM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
Jul 02, 2015 9:23:08 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /apache-tomcat/webapps/manager has finished in 171 ms
Jul 02, 2015 9:23:08 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]
Jul 02, 2015 9:23:08 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
Jul 02, 2015 9:23:08 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 197497 ms

image

Posted in My Reference, Technology Misc | Tagged | Leave a comment

Oracle OEM Agent 12C High Usage of CPU on Solaris

That Oracle OEM Agent 12C caused spiking and high usage of CPU on Solaris is a very common issue. I did experienced it couple of times. To solve it, oracle support document “12c EM Agent CPU spiking and High Usage on Solaris with many Database and Listener Targets (Doc ID 1536871.1)” almost offers the complete solution. I just include it here for quick reference only.


12c EM Agent CPU spiking and High Usage on Solaris with many Database and Listener Targets (Doc ID 1536871.1)


In this Document

Symptoms

Cause

Solution

References


APPLIES TO:

Enterprise Manager for Oracle Database – Version 12.1.0.2.0 and later
Oracle Solaris on SPARC (32-bit)
Oracle Solaris on SPARC (64-bit)

SYMPTOMS

12c EM agents are observed to consume high CPU and create high spikes on Solaris boxes when many Database and Listener targets are managed with the same EM agent. The CPU consumption and spike are directly proportional with the number of CPU that the physical server has (the more CPUs are there, the higher is the agent CPU usage).

If you would like to discuss the issue raised in this note with others in the EM Community, please reply to the “My Oracle Support Communities” thread captured below:

CAUSE

This was investigated under the following BUG:
Bug 15953286 – agent high cpu usage when scheduling metrics for many targets at once

The CPU consumption is generated by many factors as:

– a known issue described in Note 1427773.1 – The Cloud Control Agent 12c Status Timeout / Connection refused / “security.pkcs11.P11SecureRandom.implNextBytes”
– The EM agent scheduler does not spread the Response metric or other metrics within the schedule interval and it’s trying to start all of them in the same time (creating the spikes)
– EM agent uses UCP (Universal Connection Pool) for managing the database connections which is suffering from BUG 10203435 where about 24 threads are started for each created pool (with the BUG patch it will create only 4)
– a JDBC leak BUG 13583799
– too many garbage collector threads for the agent’s JVM defined by default.
– a default behavior of Solaris Hotspot JVM’s implementation where a more expensive thread controlling mechanism (LWP) is being used instead of direct Kernel Calls.

SOLUTION

1. Ensure the default values are set in agent_inst/sysman/config/emd.properties as:
     agentJavaDefines=-Xmx128M -XX:MaxPermSize=96M
2. Implement the Java security fix as per note 1427773.1
3. Agent DEBUG level to be set to WARN (higher tracing levels will add an overhead on CPU usage).
4. Include the following lines in agent_inst/sysman/config/s_jvm_options.opt file.

-XX:ParallelGCThreads=4
-XX:-UseLWPSynchronization

(or apply patch 16398691 on the EM agent, which is implementing these changes.)
5. Apply JDBC leak patch 17591700 on the EM agent ORACLE_HOME.

6.  sym-link-ing the libmtmalloc.so.1 library (as root): ln -s /usr/lib/sparcv9/libmtmalloc.so.1 /usr/lib/secure/sparcv9/libmtmalloc.so.1
7. Run: export LD_PRELOAD_64=libmtmalloc.so.1
8. Install patch 10203435 for the UCP thread leak BUG on the Agent’s ORACLE_HOME
9 Install the MLR patch 16303155 that contains the fixes for bug 16072609 and bug 16298267 which are spreading the metric schedules on wider intervals
10. Modify the following property in agent_inst/sysman/config/emd.properties file:
from
_ResponseDelay=30
to
_ResponseDelay=60
11. If does not exist, add the following parameter: _scheduleSpecSeed=300 as well in emd.properties

12. Restart the EM agent after all the above operations.

REFERENCES

BUG:15953286 – AGENT HIGH CPU USAGE WHEN SCHEDULING METRICS FOR MANY TARGETS AT ONCE

Posted in My Reference | Tagged | Leave a comment