Prior to 11g R2, when the voting disk became unresponsive or inaccessible, to avoid split brain, the host was rebooted without waiting for the IO processes to complete. This reboot was usually initiated when there was either a communication (network related) failure with the other nodes in the cluster or the overloaded IO on the host, causing it to hang. In Oracle Cluster 11gr2 enhancements have been made to avoid having to reboot the node when a situation requiring to IO fence the node arises.
Oracle Clusterware uses the industry standard STONITH (shoot the other node in the head) clustering algorithm to fence nodes when required. With Oracle Clusterware 11g Release 2 however enhancements have been made to attempt to perform a clean shutdown of all processes in a hung scenario and “force kill” the IO processes if required. After all resources are shutdown they are automatically restarted by the newly introduced Oracle High Availability Services Daemon (OHAS), to the state they were before the cleanup.
With Oracle Clusterware 11 Release 2, this algorithm was further enhanced to allow restart of failed RAC sub-components avoiding having to reboot and restart all Clusterware processes
These enhancements are particularly beneficial in environments where there are a large number of nodes and other applications are running on the Database server node. This helps reducing manageability and up-time for the host and the applications running on it.
Read more on Oracle Cluster Health Monitor.