IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]

    惜分飞发表于 2025-01-21 11:13:42
    love 0

    联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

    标题:Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    Applies to:
    Oracle Database – Enterprise Edition – Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
    Oracle Database Cloud Schema Service – Version N/A and later
    Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) – Version N/A and later
    Oracle Cloud Infrastructure – Database Service – Version N/A and later
    Oracle Database Cloud Exadata Service – Version N/A and later
    IBM AIX on POWER Systems (64-bit)

    Description
    Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX and potentially on HPUX Itanium 64bit.
    The database may become unusable and fail to be OPEN.

    Occurrence
    This issue is specific to RDBMS version 12c (12.1.0.1 or 12.1.0.2) where the new default feature of using multiple LGWRs is introduced.
    It affects databases on IBM AIX and potentially on HPUX Itanium 64bit

    Symptoms
    ORACLE on IBM AIX or HPUX Itanium 64bit with RDBMS Version 12c.

    DEADLOCK or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery caused by bug 21915719.

    PMON may terminate the instance while extensive block recovery is being performed.

    A DEADLOCK example is with LG0[n] waiting on ‘LGWR worker group ordering’. Example from a System State Dump trace file:

    PROCESS 18: LG01
    SO: 0x7000101f95ad720, type: 4, owner: 0x7000101f84195f8, flag: INIT/-/-/0x00
    if: 0x3 c: 0x3
       proc=0x7000101f84195f8, name=session, file=ksu.h LINE:13590 ID:, pg=0
    conuid=0
      (session) sid: 865 ser: 1 trans: 0x0, creator: 0x7000101f84195f
     Current Wait Stack:
       0: waiting for 'LGWR worker group ordering'
          lwn_id=0x58, phase=0x1, =0x0
          wait_id=4947 seq_num=4948 snap_id=1
          wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
          wait times: max=infinite, heur=13 min 21 sec
          wait counts: calls=1 os=267
          in_wait=1 iflags=0x5a0
      There is at least one session blocking this session.
        Dumping 1 direct blocker(s):
          inst: 1, sid: 817, ser: 1
        Dumping final blocker:
          inst: 1, sid: 817, ser: 1
      There are 730 sessions blocked by this session.
    .
    .
    PROCESS 17: LG00
    SO: 0x7000101f85bcc60, type: 4, owner: 0x7000101f93eeb20, flag: INIT/-/-/0x00
    if: 0x3 c: 0x3
       proc=0x7000101f93eeb20, name=session, file=ksu.h LINE:13590 ID:, pg=0
    conuid=0
      (session) sid: 817 ser: 1 trans: 0x0, creator: 0x7000101f93eeb20
      ksuxds FALSE at location: 0
      service name: SYS$BACKGROUND
      Current Wait Stack:
       0: waiting for 'LGWR worker group ordering'
          lwn_id=0x56, phase=0x1, =0x0
          wait_id=1630680 seq_num=57841 snap_id=1
          wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
          wait times: max=infinite, heur=13 min 21 sec
          wait counts: calls=2 os=268
          in_wait=1 iflags=0x15a0
      There is at least one session blocking this session.
        Dumping 1 direct blocker(s):
          inst: 1, sid: 865, ser: 1
        Dumping final blocker:
          inst: 1, sid: 865, ser: 1
     

    The instance may fail to OPEN with errors ORA-600 [kcrfrgv_nextlwn_scn] and/or ORA-600 [krr_process_read_error_2]:

    Recovery Session Failed with:
    
    ORA-00283: recovery session canceled due to errors
    ORA-00600: internal error code, arguments: [krr_process_read_error_2],
    
    
    Alter database open fails with:
    
    ORA-00600: internal error code, arguments: [kcrfrgv_nextlwn_scn] .....
    ORA-600 signalled during: ALTER DATABASE OPEN...
    

    Workaround
    Disable the new feature of multiple LGWR worker processes by proactively setting _use_single_log_writer=true.

    Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR worker groups were not available.

    ALTER SYSTEM SET "_use_single_log_writer"=TRUE SID='*' SCOPE=SPFILE;
    -- Restart the database or all instances of the RAC database
    

    Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN. Once the problem is introduced, _use_single_log_writer=true may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error.
    If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.
    参考:ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] (Doc ID 1957710.1)

    • awr创建snapshot等待library cache: mutex X
    • Systemstates分析会话阻塞—锁表
    • _use_single_log_writer和_max_outstanding_log_writes
    • 自治事件引起死锁
    • ORA-04020导致adg异常
    • cursor: pin S wait on X 等待事件
    • DBMS_STATS收集子分区表导致library cache lock等待
    • Hanganalyze分析会话阻塞—锁表
    • ORA-600 kcrf_resilver_log_1故障处理
    • HP Itaniums上一次ORA-240异常处理
    • _ALLOW_RESETLOGS_CORRUPTION
    • RAC中关于”Immediate Kill Session#” bug记录


沪ICP备19023445号-2号
友情链接