IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    asm disk被加入到另外一个磁盘组故障恢复

    惜分飞发表于 2023-09-05 06:18:46
    love 0

    联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

    标题:asm disk被加入到另外一个磁盘组故障恢复

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    有朋友在aix环境对其中一个rac的asm磁盘组进行扩容
    add_disk


    之后另外一套rac的磁盘组直接dismount

    Wed Aug 23 12:44:02 2023
    NOTE: SMON starting instance recovery for group DATA domain 2 (mounted)
    NOTE: F1X0 found on disk 0 au 2 fcn 0.128808679
    NOTE: SMON skipping disk 7 - no header
    NOTE: cache initiating offline of disk 7 group DATA
    NOTE: process _smon_+asm1 (1770932) initiating offline of disk 7.3422955792 (DATA_0007) with mask 0x7e in group 2
    NOTE: initiating PST update: grp = 2, dsk = 7/0xcc062910, mask = 0x6a, op = clear
    Wed Aug 23 12:44:02 2023
    GMON updating disk modes for group 2 at 7 for pid 17, osid 1770932
    ERROR: Disk 7 cannot be offlined, since diskgroup has external redundancy.
    ERROR: too many offline disks in PST (grp 2)
    Wed Aug 23 12:44:02 2023
    NOTE: cache dismounting (not clean) group 2/0x7FE6D808 (DATA) 
    WARNING: Offline for disk DATA_0007 in mode 0x7f failed.
    Wed Aug 23 12:44:02 2023
    NOTE: halting all I/Os to diskgroup 2 (DATA)
    ERROR: No disks with F1X0 found on disk group DATA
    NOTE: aborting instance recovery of domain 2 due to diskgroup dismount
    NOTE: SMON skipping lock domain (2) validation because diskgroup being dismounted
    Abort recovery for domain 2
    Wed Aug 23 12:44:02 2023
    ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7fe6d808 (DATA)
    ERROR: ORA-15130 thrown in RBAL for group number 2
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_2360526.trc:
    ORA-15130: diskgroup "DATA" is being dismounted
    [

    再次尝试mount该磁盘组,报ORA-15042和ORA-15038错误

    SQL> alter diskgroup data mount 
    NOTE: cache registered group DATA number=2 incarn=0x79e6d861
    NOTE: cache began mount (first) of group DATA number=2 incarn=0x79e6d861
    NOTE: Assigning number (2,0) to disk (/dev/rhdisk31)
    NOTE: Assigning number (2,3) to disk (/dev/rhdisk33)
    NOTE: Assigning number (2,4) to disk (/dev/rhdisk34)
    NOTE: Assigning number (2,5) to disk (/dev/rhdisk35)
    NOTE: Assigning number (2,6) to disk (/dev/rhdisk36)
    NOTE: Assigning number (2,9) to disk (/dev/rhdisk39)
    NOTE: Assigning number (2,1) to disk (/dev/rhdisk8)
    NOTE: Assigning number (2,2) to disk (/dev/rhdisk9)
    Wed Aug 23 12:58:46 2023
    NOTE: GMON heartbeating for grp 2
    GMON querying group 2 at 11 for pid 27, osid 3736034
    NOTE: Assigning number (2,7) to disk ()
    NOTE: Assigning number (2,8) to disk ()
    GMON querying group 2 at 12 for pid 27, osid 3736034
    NOTE: cache dismounting (clean) group 2/0x79E6D861 (DATA) 
    NOTE: messaging CKPT to quiesce pins Unix process pid: 3736034, image: oracle@hbbz01 (TNS V1-V3)
    NOTE: dbwr not being msg'd to dismount
    NOTE: lgwr not being msg'd to dismount
    NOTE: cache dismounted group 2/0x79E6D861 (DATA) 
    NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x79e6d861
    NOTE: cache deleting context for group DATA 2/0x79e6d861
    GMON dismounting group 2 at 13 for pid 27, osid 3736034
    NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
    NOTE: Disk  in mode 0x7f marked for de-assignment
    NOTE: Disk  in mode 0x7f marked for de-assignment
    NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
    ERROR: diskgroup DATA was not mounted
    ORA-15032: not all alterations performed
    ORA-15040: diskgroup is incomplete
    ORA-15042: ASM disk "8" is missing from group number "2" 
    ORA-15042: ASM disk "7" is missing from group number "2" 
    ORA-15038: disk '/dev/rhdisk37' mismatch on 'Time Stamp' with target disk group [2129689239] [2062898314]
    ERROR: alter diskgroup data mount
    

    怀疑把报错这个磁盘组的rhdisk37加入到另外一套rac的asm中了(也就是说两套asm使用了同一块磁盘),aix操作系统层面分析确认

    ---对asm扩容的机器上
    # lscfg -vpl hdisk15
      hdisk15          U78C5.001.DQD076A-P2-C4-T1-W200C00A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk
    
            Manufacturer................NETAPP  
            Machine Type and Model......LUN C-Mode      
            ROS Level and ID............9000
            Serial Number...............80DYz]L/OpCA
            Device Specific.(Z0)........FAS8020         
    
    
      PLATFORM SPECIFIC
    
      Name:  disk
        Node:  disk
        Device Type:  block
    
    ---磁盘组dismount的机器上
    # lscfg -vpl hdisk37      
      hdisk37          U5802.001.9K87776-P1-C1-T1-W200500A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk
    
            Manufacturer................NETAPP  
            Machine Type and Model......LUN C-Mode      
            ROS Level and ID............9000
            Serial Number...............80DYz]L/OpCA
            Device Specific.(Z0)........FAS8020         
    
    
      PLATFORM SPECIFIC
    
      Name:  disk
        Node:  disk
        Device Type:  block
    

    通过lscfg 命令确认两套rac使用了同一块盘导致一个磁盘组异常,在新加的机器上查询确认新盘被破坏情况(新加入的磁盘由于reblance操作,已经被写入了380G左右数据[也就意味着这个磁盘在老磁盘组中最少会丢失380G数据]
    20230905140603


    对于这种情况,dismount磁盘组是外部冗余不可能直接mount起来,只能通过以前处理的类似方法:
    asm disk header 彻底损坏恢复
    asm磁盘加入vg恢复
    asm磁盘dd破坏恢复
    asm disk 磁盘部分被清空恢复
    再一例asm disk被误加入vg并且扩容lv恢复
    fdisk分区导致asm disk破坏数据库恢复
    再一起asm disk被格式化成ext3文件系统故障恢复
    oracle asm disk格式化恢复—格式化为ext4文件系统
    oracle asm disk格式化恢复—格式化为ntfs文件系统
    ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复
    通过底层处理恢复出来没有覆盖的数据块中数据
    20230827200941

    再使用dul恢复出来其中数据,完成这次故障的核心数据恢复

    • 误删除asm disk导致磁盘组无法mount数据库恢复
    • Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)
    • 再一起asm disk被格式化成ext3文件系统故障恢复
    • ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
    • ORA-600 kfrValAcd30 恢复
    • ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理
    • WARNING: Read Failed.导致asm磁盘组异常
    • Oracle Exadata坏盘导致磁盘组无法mount恢复
    • asm磁盘加入vg恢复
    • 分区无法识别导致asm diskgroup无法mount
    • asm磁盘类似_DROPPED_0001_DATA名称故障处理
    • ORA-15096: lost disk write detected


沪ICP备19023445号-2号
友情链接