Tuesday, September 8, 2015

3 node Oracle RAC - Cluster wait due to remastering of objects

In  our production  environment,  we are using a three Node oracle RAC  with around 1+ TB of database and handles a medium OLTP transactions in database.  This application is being used by a financial company in India that has around 4500 branches across the country,  The version of oracle is 11.2.0.2.0 (11GR2) . 

Recently we started observing high cluster wait during the peak  load hours and also  experience node eviction few times .  We could not find any  misbehavior on the network  side specially  on interconnect . None of the monitoring tools helped us to  find any  issues in  the inter connect. Then we started analyzing the OEM  results in depth  and found, the cluster event occurs while GC  Remastering happens. By  default DRM (Dynamic Remastering) is enabled and the spikes are occurring at 10 min intervals after a node rejoin/start.  On further analysis we found the the _gc_policy_time  parameter (It controls how often the queue is checked to see if the remastering must be triggered )   has the default value 10 .  We debated whether to  disable the DRM  by  setting the parameter value to 0. But since this require a complete restart of oracle,  we were bit hesitant to do  the restart of the server. This was essentially  due to  the cluster wait it shows after the restart.

On checking further we found another parameter that controls the remastering behavior _gc_policy_minimum. This parameter is defined as “minimum amount of dynamic affinity activity per minute” to be a candidate for remastering. Defaults to 1500 and we thought it is lower in a busy environment like ours.  The best part of this is we do  not require a restart after setting this value and we decided to  increase the value to 20000 .  After setting the value we found the cluster wait events has almost disappeared and system is back to  the sate it was earlier.  
Since it was not a fully  documented parameter, we couldn't get much  expert opinion on this. Since we got a definite advantage by  setting this parameter,  i  thought i  will  share this in my  blog hoping some one may get advantage of this ...

Below is OEM display  after and before setting the parameter. 



Note:  This is an undocumented parameter by  oracle,  you  may  contact oracle support before setting the parameter value.