ASM磁盘组冗余的三种类型:external、normal、high,这里恢复的是normal状态,模拟OCR磁盘或votedisk不可用时,RAC会出现什么现象?给出故障定位的整个过程。在11.2.0.3中表决盘是放到了ocr中,所以OCR磁盘或votedisk不可用的两个实验一起做。在11.2.0.3中可 ASM磁盘组冗余的三种类型:external、normal、high,这里恢复的是normal状态,模拟OCR磁盘或votedisk不可用时,RAC会出现什么现象?给出故障定位的整个过程。在11.2.0.3中表决盘是放到了ocr中,所以OCR磁盘或votedisk不可用的两个实验一起做。在11.2.0.3中可以手动备份OCR,但手动备份是无效的。 ocrconfig-export/u01/ocr.exp检查OCR有哪些备份: [root@rac1~]#ocrconfig-showbackup rac12013/07/2205:39:51/u01/grid/crs/cdata/rac/backup00.ocr rac12013/07/2201:39:51/u01/grid/crs/cdata/rac/backup01.ocr rac12013/07/2121:39:50/u01/grid/crs/cdata/rac/backup02.ocr rac22013/07/2101:52:54/u01/grid/crs/cdata/rac/day.ocr rac22013/07/0901:52:25/u01/grid/crs/cdata/rac/week.ocr PROT-25:ManualbackupsfortheOracleClusterRegistryarenotavailable注意:orcle明确给出了手动备份是无效的! 查看表决盘信息: [root@rac1~]#crsctlquerycssvotedisk ##STATEFileUniversalIdFileNameDiskgroup ------------------------------------------ 1.ONLINE745716af7e5b4faebfc8d948d096aa55(/dev/oracleasm/disks/OCR_VOT1)[OCR_VOT] 2.ONLINE7092079f66c04f9dbf65974d0dcc611a(/dev/oracleasm/disks/OCR_VOT2)[OCR_VOT] 3.ONLINE6510631353284f5fbf3d4c8839822dbd(/dev/oracleasm/disks/OCR_VOT3)[OCR_VOT] Located3votingdisk(s). #停库: [root@rac1~]#srvctlstopdatabase-dorcl-oimmediate #停集群: [root@rac1~]#crsctlstopcluster-all-f #破坏OCR和VOT: [root@rac1~]#ddif=/dev/zerof=/dev/mapper/mpathap1bs=1024Kcount=1 记录了1+0的读入 记录了1+0的写出 1048576字节(1.0MB)已复制,0.0160613秒,65.3MB/秒 [root@rac1~]#ddif=/dev/zerof=/dev/mapper/mpathap2bs=1024Kcount=1 记录了1+0的读入 记录了1+0的写出 1048576字节(1.0MB)已复制,0.00800275秒,131MB/秒 [root@rac1~]#ddif=/dev/zerof=/dev/mapper/mpathap3bs=1024Kcount=1 记录了1+0的读入 记录了1+0的写出 1048576字节(1.0MB)已复制,0.00927389秒,113MB/秒注意:破坏后,各节点服务一切正常: [root@rac1~]#crs_stat-t NameTypeTargetStateHost ------------------------------------------------------------ ora.DATA.dgora.up.typeONLINEONLINErac1 ora.FRA.dgora.up.typeONLINEONLINErac1 ora.ER.lsnrora.er.typeONLINEONLINErac1 ora.N1.lsnrora.er.typeONLINEONLINErac1 ora.OCR_VOT.dgora.up.typeONLINEONLINErac1 ora.asmora.asm.typeONLINEONLINErac1 ora.orcl.dbora.se.typeONLINEONLINErac1 ora.cvuora.cvu.typeONLINEONLINErac1 ora.SM1.asmapplicationONLINEONLINErac1 ora.C1.lsnrapplicationONLINEONLINErac1 ora.ac1.gsdapplicationOFFLINEOFFLINE ora.ac1.onsapplicationONLINEONLINErac1 ora.ac1.vipora.t1.typeONLINEONLINErac1 ora.SM2.asmapplicationONLINEONLINErac2 ora.C2.lsnrapplicationONLINEONLINErac2 ora.ac2.gsdapplicationOFFLINEOFFLINE ora.ac2.onsapplicationONLINEONLINErac2 ora.ac2.vipora.t1.typeONLINEONLINErac2 ora.gsdora.gsd.typeOFFLINEOFFLINE ora.networkora.rk.typeONLINEONLINErac1 ora.oc4jora.oc4j.typeONLINEONLINErac1 ora.onsora.ons.typeONLINEONLINErac1 ora.ry.acfsora.fs.typeONLINEONLINErac1 ora.scan1.vipora.ip.typeONLINEONLINErac1所有节点重启操作系统后集群服务启不来了: [root@rac1~]#reboot如果只是停止集群服务,后面的重新创建ASM磁盘组会失败,但重启操作系统后,就可以创建成功。 检查CRS: [grid@rac1~]$crsctlcheckcrs CRS-4638:OracleHighAvailabilityServicesisonline CRS-4535:CannotcommunicatewithClusterReadyServices CRS-4530:CommunicationsfailurecontactingClusterSynchronizationServicesdaemon CRS-4534:CannotcommunicatewithEventManager启动集群服务: [root@rac1~]#crsctlstartcluster-all CRS-2672:尝试启动'ora.cssdmonitor'(在'rac1'上) CRS-2672:尝试启动'ora.cssdmonitor'(在'rac2'上) CRS-2676:成功启动'ora.cssdmonitor'(在'rac1'上) CRS-2676:成功启动'ora.cssdmonitor'(在'rac2'上) CRS-2672:尝试启动'ora.cssd'(在'rac1'上) CRS-2672:尝试启动'ora.diskmon'(在'rac1'上) CRS-2672:尝试启动'ora.cssd'(在'rac2'上) CRS-2672:尝试启动'ora.diskmon'(在'rac2'上) CRS-2676:成功启动'ora.diskmon'(在'rac1'上) CRS-2676:成功启动'ora.diskmon'(在'rac2'上) #直停在这里其他终端使用其他命令启动集群服务: [root@rac1~]#crsctlstartcrs CRS-4640:OracleHighAvailabilityServicesisalreadyactive CRS-4000:CommandStartfailed,orcompletedwitherrors.操作系统及crs日志中没看到特别有用的信息: [root@rac1~]#vi/var/log/messages [grid@rac1~]#vi$ORACLE_HOME/log/rac1/crsd/crsd.logocss日志中提示: vi$ORACLE_HOME/log/rac1/cssd/ocssd.log 2013-07-2121:15:08.550:[CSSD][1095031104]clssnmvFindInitialConfigs:Novotingfilesfound发现部分ASM磁盘没有了: [root@rac1~]#/etc/init.d/oracleasmscandisks ScanningthesystemforOracleASMLibdisks:[OK] [root@rac1~]#/etc/init.d/oracleasmlistdisks DATA FRA依照RAC安装文档重建ASM磁盘: [root@rac1~]#/etc/init.d/oracleasmcreatediskOCR_VOT1/dev/mapper/mpathap1 Markingdisk"OCR_VOT1"asanASMdisk:[OK] [root@rac1~]#/etc/init.d/oracleasmcreatediskOCR_VOT2/dev/mapper/mpathap2 Markingdisk"OCR_VOT2"asanASMdisk:[OK] [root@rac1~]#/etc/init.d/oracleasmcreatediskOCR_VOT3/dev/mapper/mpathap3 Markingdisk"OCR_VOT3"asanASMdisk:[OK]停掉集群服务: 要加-f,否则可能停止非常慢 [root@rac1~]#crsctlstopcrs-f CRS-2791:StartingshutdownofOracleHighAvailabilityServices-managedresourceson'rac1' CRS-2673:Attemptingtostop'ora.mdnsd'on'rac1' CRS-2673:Attemptingtostop'ora.crf'on'rac1' CRS-2677:Stopof'ora.mdnsd'on'rac1'succeeded CRS-2677:Stopof'ora.crf'on'rac1'succeeded CRS-2673:Attemptingtostop'ora.gipcd'on'rac1' CRS-2677:Stopof'ora.gipcd'on'rac1'succeeded CRS-2673:Attemptingtostop'ora.gpnpd'on'rac1' CRS-2677:Stopof'ora.gpnpd'on'rac1'succeeded CRS-2793:ShutdownofOracleHighAvailabilityServices-managedresourceson'rac1'hascompleted CRS-4133:OracleHighAvailabilityServiceshasbeenstopped.以-excl-nocrs方式启动集群,这将启动ASM实例但不启动CRS [root@rac1~]#crsctlstartcrs-excl-nocrs CRS-4123:OracleHighAvailabilityServiceshasbeenstarted. CRS-2672:Attemptingtostart'ora.mdnsd'on'rac1' CRS-2676:Startof'ora.mdnsd'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.gpnpd'on'rac1' CRS-2676:Startof'ora.gpnpd'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.cssdmonitor'on'rac1' CRS-2672:Attemptingtostart'ora.gipcd'on'rac1' CRS-2676:Startof'ora.cssdmonitor'on'rac1'succeeded CRS-2676:Startof'ora.gipcd'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.cssd'on'rac1' CRS-2672:Attemptingtostart'ora.diskmon'on'rac1' CRS-2676:Startof'ora.diskmon'on'rac1'succeeded CRS-2676:Startof'ora.cssd'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.drivers.acfs'on'rac1' CRS-2679:Attemptingtoclean'ora.cluster_interconnect.haip'on'rac1' CRS-2672:Attemptingtostart'ora.ctssd'on'rac1' CRS-2681:Cleanof'ora.cluster_interconnect.haip'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.cluster_interconnect.haip'on'rac1' CRS-2676:Startof'ora.drivers.acfs'on'rac1'succeeded CRS-2676:Startof'ora.ctssd'on'rac1'succeeded CRS-2676:Startof'ora.cluster_interconnect.haip'on'rac1'succeeded CRS-2672:Attemptingtostart'ora.asm'on'rac1' CRS-2676:Startof'ora.asm'on'rac1'succeeded此时crs仍然报错: [root@rac1~]#crs_stat-t CRS-0184:CannotcommunicatewiththeCRSdaemon. [root@rac1~]#crsctlcheckcrs CRS-4638:OracleHighAvailabilityServicesisonline CRS-4535:CannotcommunicatewithClusterReadyServices CRS-4530:CommunicationsfailurecontactingClusterSynchronizationServicesdaemon CRS-4534:CannotcommunicatewithEventManager重建原ocr和votedisk所在磁盘组: 注意:这里是在grid用户下 SQL>colpathfora50 SQL>setlines300 SQL>selectpath,header_statusfromv$asm_disk; SQL>creatediskgroupOCR_VOTnormalredundancydisk'/dev/oracleasm/disks/OCR_VOT1','/dev/ oracleasm/disks/OCR_VOT2','/dev/oracleasm/disks/OCR_VOT3' attribute'compatible.rdbms'='11.2','compatible.asm'='11.2';ASM磁盘组冗余的三种类型:external、normal、high,我这里之前用的是normal。 从ocrbackup中恢复OCR: 在每个节点grid用户下: cd$ORACLE_HOME/cdata/rac ocrconfig-restore/u01/grid/crs/cdata/rac/backup00.ocr恢复表决盘的准备工作: showparameterasm_diskstring如果asm_diskstring没有值,表示ASM磁盘用的是默认ASM磁盘搜索路径。 修改成实际的ASM磁盘搜索路径: altersystemsetasm_diskstring='/dev/oracleasm/disks/*';恢复表决盘: [root@rac1~]#crsctlreplacevotedisk+OCR_VOT Successfuladditionofvotingdisk4ad2b9cc0a754fffbf1515281199a78f. Successfuladditionofvotingdisk9f8dc1c013df4f39bfd85c64051a0bc1. Successfuladditionofvotingdiska4aea7a1aa434fb3bff161f6ea8ce102. Successfullyreplacedvotingdiskgroupwith+OCR_VOT. CRS-4266:Votingfile(s)successfullyreplacedocr和vot恢复后,crs等服务就会自动起来了。 [root@rac1~]#crsctlcheckcrs CRS-4638:OracleHighAvailabilityServicesisonline CRS-4535:CannotcommunicatewithClusterReadyServices CRS-4529:ClusterSynchronizationServicesisonline CRS-4534:CannotcommunicatewithEventManager [root@rac1~]#crsctlquerycssvotedisk ##STATEFileUniversalIdFileNameDiskgroup ------------------------------------------ 1.ONLINE4ad2b9cc0a754fffbf1515281199a78f(/dev/oracleasm/disks/OCR_VOT1)[OCR_VOT] 2.ONLINE9f8dc1c013df4f39bfd85c64051a0bc1(/dev/oracleasm/disks/OCR_VOT2)[OCR_VOT] 3.ONLINEa4aea7a1aa434fb3bff161f6ea8ce102(/dev/oracleasm/disks/OCR_VOT3)[OCR_VOT] Located3votingdisk(s).重启集群服务,检查是否已经恢复正常: [root@rac1~]#crsctlstopcrs [root@rac1~]#crsctlstartcrs