Oracle Point, Oracle Life.

Most Popular Posts

July 22, 2008

OEM agent on Solaris 10 At 100% CPU utilization after discovering databases

Filed under: OraclePoint — R.Wang @ 5:02 pm

Right after the database upgrade to 10gr2, one of our clients experienced the problem that OEM agent on solaris 10 always at 100% CPU utilization after discovering targeted databases. Stopping OEM agent will dramatically reduce the CPU utilization to a low percentage.

To solve the problem, I was writing a email to this client to identify this problem and offering a solution for that. My email is included bellow.

“To solve the OEM problem on M4K, I’d like to suggest you to read the following two metalink articles:

  • Note: 556998.1 “Problem: Agent on Solaris At 100% CPU After Discovering Database”
  • Note: 578631.1 “OMS GOES DOWN WITH ‘TOO MANY OPEN FILES’ ERRORS”

Those two articles presented the similar problem on OEM as we have. I’ve checked the agent trace file emoms.trc on M4K and found the huge number of metric collection timeout as described in these two articles.

The solution offered in these two articles is to adjust OS parameter NOFILES, which may require a reboot of server to take effect.

I’d like to share my thoughts here to explain the difference in migration from XPROD to PROD. The difference is the increase of number of users. If my understanding is right, only small part of banner users were involved in testing on XPROD, not all of them (This is “workloading test” that we missed). Once you migrated to PROD, significant users increase were expected, and that might cause resource competition if the OS resource setting is set improperly.

Similarly, a COBOL problem, which happened in last week, seems also relate to the limitation of OS setting because the error message mentioned word “Semaphore”. As our known, we can adjust the OS setting “Semaphore” by editing /etc/system in Solaris. For more details, please consult with Homa and James.

Please see attached “Pre-Install checks report for 10.2.x oracle database on Solaris Platform”.  I’ve run it against server M4K with oracle utility “RDA4 - Health Check/Validation Engine” (Oracle Metalink Note: 250262.1). The report clearly shows that “ulimit test” is failed (Rule 190) and the current setting we have is far below oracle recommendation, such as NOFILES and STACK.

It’s better to consult with Ricky (Unix Administrator) for those OS setting if we are going to do adjustment. “

Using “RDA4 - Health Check/Validation Engine”, it’s easy to create a pre-install check on multiple platform for specific oracle products. Attached is a sample of pre-check report for oracle 10gr2 on Solaris 10.

RDA_HCVE_A201DB10R2_sol_res.htm

Reference: 

  • Note: 375509.1 Understanding OEM 10g Agent Resource Comsumption
  • Note: 317257.1 Running Oracle Database in Solaris 10 COntainers Best Practices
  • Note: 188149.1 How to Display and Change Unix Process Resource Limits
  • Note: 429191.1 Kernel setup nfor Solaris 10 using project files

bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Popularity: 17% [?]

Tags:,

July 3, 2008

Database infrastructure at eBay

Filed under: OraclePoint — R.Wang @ 6:09 pm

The article “Scalability Best Parctices: Lessons from eBay” provides 7 best practices on application infrastructure at eBay. Those points cover not only enterprise structure, but database and application server. As DBA, through reading this article, I realized the part relating to database design are edificatory to me for database infrastructure design.

Point 1: Partition by function

“There is no single monolithic database at eBay. Instead there is a set of database hosts for user data, a set for item data, a set for purchase data, etc. - 1000 logical databases in all, on 400 physical hosts. Again, this approach allows us to scale the database infrastructure for each type of data independently of the others.”   — Citation 1

Note: It’s approved that partition is viable technology in Oracle for heavy traffic enterprise level system, especially couple with RAC. It’s a good practice to do partition by function, not only in database object level, but especially in database server level.

Point 2: Split Horizontally

“The more challenging problem arises at the database tier, since data is stateful by definition. Here we split (or "shard") the data horizontally along its primary access path. User data, for example, is currently divided over 20 hosts, with each host containing 1/20 of the users. As our numbers of users grow, and as the data we store for each user grows, we add more hosts, and subdivide the users further. Regardless of the details of the partitioning scheme, though, the general idea is that an infrastructure which supports partitioning and repartitioning of data will be far more scalable than one which does not.” – Citation 2

Note: Besides of partition, heavy traffic with same function will also introduce performance bottleneck. Therefore, splitting data traffic horizontally is a viable solution to meet the application demand.

(more…)

bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Popularity: 16% [?]

Tags:,

Page: 1 | 2
 

Windows Live Translator:

Google