Tuesday, March 12, 2013

Issue: Two node windows 2003 cluster and the cluster service on the node is failing to start. One of the node is already evicted and the cluster service is failing on the second node.



Issue: Two node windows 2003 cluster and the cluster service on the node is failing to start. One of the node is already evicted and the cluster service is failing on the second node.

 Sample Troubleshooting

  • ·         We started off by starting the cluster service with the /FQ switch and it started without any errors and we were able to bring all the resources to online state.
  • ·         As navigated through the resources, we identified that the disk resources are coming from the VERITAS volume manager.
  • ·         Asked to engage Symantec on to the call as the disk resources was having issues while trying to come online.
  • ·         The snippet of the cluster log which indicated the failure of the disk resource which is coming from “Volume Manager Disk Group”

#########################################
000004b4.0000174c::2012/06/19-15:38:13.729 ERR  Volume Manager Disk Group : LDM_RESOnlineThread: CheckQuorumPath() failed! dwStatus = 1008.
000004b4.0000174c::2012/06/19-15:38:13.729 INFO Volume Manager Disk Group : LDM_RESOnlineThread: RESOURCE IS PUT ONLINE SUCCESSFULLY
000006c0.000007b4::2012/06/19-15:38:13.901 INFO [FM] FmpCleanupGroupsPhase1: Group is not quiet, wait
000004b4.0000174c::2012/06/19-15:38:14.119 ERR  [RM] Exception. Code = 0xc0000005, Address = 0x0000000077EE4AD2
000004b4.0000174c::2012/06/19-15:38:14.119 ERR  [RM] Exception parameters: 0, 46, 0, 11c
000004b4.0000174c::2012/06/19-15:38:14.119 INFO [RM] GenerateMemoryDump: Start memory dump to file C:\WINDOWS\Cluster\resrcmon.dmp
#########################################
  • ·     Symantec  article http://www.symantec.com/business/support/index?page=content&id=TECH126532 which helped in resolving the issue.
  • ·         After installing the hotfix and rebooting the cluster node, the cluster service was able to come online without any issue.
  • ·         Reconfigured the cluster to make use of the new Disk group that Ian created and pointed cluster to make use of it.
  • ·         Moved the cluster core resources to the new group that was created.
  • ·         Added the second node to the cluster and tested a couple of failover which were successful.
  • ·         Application team tested a couple of failovers and they were successful.


No comments:

Post a Comment