In 11gr2 Oracle Grid Infrastructure, Oracle has introduced a new mechanism to remove an unresponsive node from the RAC cluster. This mechanism is a Failure isolation process whereby the unresponsive node is sent instructions via an external mechanism, Intelligent Management Platform Interface device (IPMI), by the CRS daemon to reboot the problematic node. This method removes the dependency to send the instruction either to the CRS process or the Kernel to reboot the node. To support this, the Clusterware is configured to send the reboot instruction via an authenticated session using the LAN, directly to IPMI device for that node.
To implement this here are some prerequisites.
I. Each nodes requires a Baseboard Management Controller device, running firmware compatible with IPMI version 1.5 or greater, supporting IPMI over LANs, configured for remote control using LAN.
II. IPMI driver installed on each node with the firmware compatible with IPMI version 1.5.
III. The IPMI device is placed on a management network preferably its own network. The devices can be configured to use either a static IP or a dynamically assigned IP.
IV. Each node in the cluster should be connected to the management network.
This IPMI configuration can either be setup during the Oracle Grid Infrastructure setup or alternatively after the install has been completed. The process involves storing the authentication username/password information in the Oracle wallet and the configuration information locally in the Oracle Local Directory (OLR) of each node.
When setting up the Oracle Grid Infrastructure install, choose “Use Intelligent Platform Management interface (IPMI)”.
If setup is being performed after the Grid Infrastructure has been installed then using CRSCTL the configuration steps are as below.
a. When IPMI is configured to obtain its IP address via DHCP, then it may be necessary to reset IPMI or restart the node to obtain its IP address. After obtaining the dynamic or the static address, crsctl is used to store the information in the OLR for each node on start up.
crsctl set css ipmiaddr 192.168.17.105
b. Authentication information is also stored using the crsctl set css ipmiadmin command, and providing authenticating information.
crsctl set css ipmiadmin administrator_name
IPMI BMC password: provide password
This command validates and succeeds if the IPMI configuration setup is correct across all nodes in the cluster.
With this the IPMI-based Failure Isolation should be functional!