OEM agent on Solaris 10 At 100% CPU utilization after discovering databases
Right after the database upgrade to 10gr2, one of our clients experienced the problem that OEM agent on solaris 10 always at 100% CPU utilization after discovering targeted databases. Stopping OEM agent will dramatically reduce the CPU utilization to a low percentage.
To solve the problem, I was writing a email to this client to identify this problem and offering a solution for that. My email is included bellow.
“To solve the OEM problem on M4K, I’d like to suggest you to read the following two metalink articles:
- Note: 556998.1 “Problem: Agent on Solaris At 100% CPU After Discovering Database”
- Note: 578631.1 “OMS GOES DOWN WITH ‘TOO MANY OPEN FILES’ ERRORS”
Those two articles presented the similar problem on OEM as we have. I’ve checked the agent trace file emoms.trc on M4K and found the huge number of metric collection timeout as described in these two articles.
The solution offered in these two articles is to adjust OS parameter NOFILES, which may require a reboot of server to take effect.
I’d like to share my thoughts here to explain the difference in migration from XPROD to PROD. The difference is the increase of number of users. If my understanding is right, only small part of banner users were involved in testing on XPROD, not all of them (This is “workloading test” that we missed). Once you migrated to PROD, significant users increase were expected, and that might cause resource competition if the OS resource setting is set improperly.
Similarly, a COBOL problem, which happened in last week, seems also relate to the limitation of OS setting because the error message mentioned word “Semaphore”. As our known, we can adjust the OS setting “Semaphore” by editing /etc/system in Solaris. For more details, please consult with Homa and James.
Please see attached “Pre-Install checks report for 10.2.x oracle database on Solaris Platform”. I’ve run it against server M4K with oracle utility “RDA4 - Health Check/Validation Engine” (Oracle Metalink Note: 250262.1). The report clearly shows that “ulimit test” is failed (Rule 190) and the current setting we have is far below oracle recommendation, such as NOFILES and STACK.
It’s better to consult with Ricky (Unix Administrator) for those OS setting if we are going to do adjustment. “
Using “RDA4 - Health Check/Validation Engine”, it’s easy to create a pre-install check on multiple platform for specific oracle products. Attached is a sample of pre-check report for oracle 10gr2 on Solaris 10.
RDA_HCVE_A201DB10R2_sol_res.htm
Reference:
- Note: 375509.1 Understanding OEM 10g Agent Resource Comsumption
- Note: 317257.1 Running Oracle Database in Solaris 10 COntainers Best Practices
- Note: 188149.1 How to Display and Change Unix Process Resource Limits
- Note: 429191.1 Kernel setup nfor Solaris 10 using project files
Popularity: 17% [?]















