IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    记录一次存储异常数据库恢复后遗症ORA-600[kafspa:columnBuffer1]错误处理

    惜分飞发表于 2015-08-13 16:22:56
    love 0

    联系:手机(13429648788) QQ(107644445)

    标题:记录一次存储异常数据库恢复后遗症ORA-600[kafspa:columnBuffer1]错误处理

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    先说下前提,这个库以前是由于存储异常,找硬件厂商做了raid重组,然后我进行数据恢复的,恢复出来数据之后,应用厂商通过验证和补数据,然后迁移到另外一台机器做生产用的,这个库一直没有怎么看,最近检查数据库发现ORA-600[kafspa:columnBuffer1]错误,通过删除异常记录的方式解决.
    数据库alert日志

    Mon Aug 10 00:00:21 2015
    LNS: Standby redo logfile selected for thread 1 sequence 617 for destination LOG_ARCHIVE_DEST_2
    Mon Aug 10 00:00:33 2015
    Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc  (incident=146517):
    ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
    Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_146517\xff_j002_6900_i146517.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_280"
    ORA-20011: Approximate NDV failed: 
    ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
    ORA-06512: at "SYS.DBMS_STATS", line 31228
    

    分析日志发现

    *** 2015-07-19 06:00:30.231
    *** SESSION ID:(578.751) 2015-07-19 06:00:30.231
    *** CLIENT ID:() 2015-07-19 06:00:30.231
    *** SERVICE NAME:(SYS$USERS) 2015-07-19 06:00:30.231
    *** MODULE NAME:(DBMS_SCHEDULER) 2015-07-19 06:00:30.231
    *** ACTION NAME:(ORA$AT_OS_OPT_SY_220) 2015-07-19 06:00:30.231
     
    Dump continued from file: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j001_4444.trc
    ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
    
    ========= Dump for incident 146142 (ORA 600 [kafspa:columnBuffer1]) ========
    
    *** 2015-07-19 06:00:30.231
    dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
    ----- Current SQL Statement for this session (sql_id=g0q33k8qtbcpd) -----
    /* SQL Analyze(1) */ select /*+  full(t)    no_parallel(t) no_parallel_index(t)
     dbms_stats cursor_sharing_exact use_weak_name_resl 
    dynamic_sampling(0) no_monitoring no_substrb_pad 
    …………
    to_char(substrb(dump(max("LIST_NO"),16,0,32),1,120)) from "CHF"."T_XIFENFEI" t 
    …………
    

    对表进行收集统计信息

    
    SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
    ;
    BEGIN DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE); EN
    D;
    
    *
    第 1 行出现错误:
    ORA-20011: Approximate NDV failed: ORA-00600: 内部错误代码, 参数:
    [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
    ORA-06512: 在 "SYS.DBMS_STATS", line 24232
    ORA-06512: 在 "SYS.DBMS_STATS", line 24332
    ORA-06512: 在 line 1
    
    
    SQL> desc "CHF"."T_XIFENFEI"
     名称                                      是否为空? 类型
     ----------------------------------------- -------- -----------------
    
     VISIT_DATE                                         DATE
    …………
     GETDRUG_FLAG                                       VARCHAR2(2)
    …………
    

    通过上面的alert日志和trace文件以及人工收集统计信息,基本上可以定位是由于数据库自动收集统计信息进程在进行统计信息收集之时,对于”CHF”.”T_XIFENFEI”表进行收集统计信息由于某种错误,从而出现该错误.查询mos,发现此类问题主要是由于varchar2类型存储的数据长度超过了表定义长度.

    通过验证官方所说

    C:\Users\Administrator>exp "'/ as sysdba'" tables="CHF"."T_XIFENFEI" file
    =y:/1.dmp log=y:/1.log
    
    Export: Release 11.2.0.4.0 - Production on 星期四 8月 13 11:03:22 2015
    
    Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
    
    
    连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Produc
    tion
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集
    
    即将导出指定的表通过常规路径...
    当前的用户已更改为 CHF
    . . 正在导出表                    T_XIFENFEI
    EXP-00015: 错误出现在行 1339552 (表 T_XIFENFEI, 列 GETDRUG_FLAG), 数据类型 1
    EXP-00001: 数据字段截断 - 列长度 =2, 缓冲区大小 =2 实际大小 =17Errors in file :
    OCI-21500: 内部错误代码, 参数: [kghfrempty:ds], [0x00652FCC8], [], [], [], [], [
    ], []
    
    
    ----- Call Stack Trace -----
    calling              call     entry                argument values in hex
    location             type     point                (? means dubious value)
    -------------------- -------- -------------------- ----------------------------
    kgerinv_internal()+  CALL???  skgudmp()            000000000 006447680 000000000
    
    139                                                006447680
    kgerinv()+49         CALL???  kgerinv_internal()   000000001 000676B4D 0064985B0
    
                                                       000000000
    kgerin()+49          CALL???  kgerinv()            000000018 000799612 000072000
    
                                                       000000000
    kghnerror()+294      CALL???  kgerin()             006447680 00645092C 006447680
    
                                                       000000001
    kghfrempty()+639     CALL???  kghnerror()          0000001F0 000000000
                                                       BE019800000000 7E01960000
    kghgex()+1433        CALL???  kghfrempty()+368     000000000 00652CAD8 000000000
    
                                                       000000000
    kghfnd()+808         CALL???  kghgex()             001004000 000000000 001BEDD10
    
                                                       001A7131C
    kghalo()+610         CALL???  kghfnd()             00012C450 00012C4A0 000000000
    
                                                       006446FD0
    kghgex()+445         CALL???  kghalo()             006494848 000000000 001BEDD10
    
                                                       00190A575
    kghfnd()+808         CALL???  kghgex()             000000001 0000001A0 000000000
    
                                                       006493D68
    kghalo()+610         CALL???  kghfnd()             000000000 006447680 0FFFFFFFF
    
                                                       006447680
    kpuhhalo()+358       CALL???  kghalo()             000000000 000000178 07FFFFFFF
    
                                                       000000001
    kpuertb_reallocTemp  CALL???  kpuhhalo()           00652C498 000003E84 001C0EA44
    
    Buf()+192                                          000000000
    kpuex_reallocTempBu  CALL???  kpuertb_reallocTemp  000004007 0018BA3BF 00012CAB0
    
    f()+67                        Buf()                001AB296F
    kpudefn()+347        CALL???  kpuex_reallocTempBu  00012CC38 001004000 001BEDD44
    
                                  f()                  000000004
    kpudfn()+1506        CALL???  kpudefn()            00012F3D0 000000004 006520044
    
                                                       000000000
    OCIDefineByPos()+10  CALL???  kpudfn()             004327570 000000000 00012F3D0
    
    2                                                  000000004
    00000001400116E5     CALL???  OCIDefineByPos()     1043B9300 0043B92C0 0044002B8
    
                                                       004401394
    000000014004AFC7     CALL???  00000001400113BA     00012F380 00012F0E0 000000068
    
                                                       14004B2B6
    000000014001E784     CALL???  000000014004A37E     000013F30 140095A71 140097520
    
                                                       14009F540
    00000001400027A7     CALL???  000000014001E39F     14009F838 00012FB5C 140097520
    
                                                       14009F540
    000000014000102C     CALL???  0000000140001E2C     000000005 004327570
                                                       1D0D5749D21764D 000000000
    000000014006BEF0     CALL???  000000014000100E     000130000 1AFBFE2D0D8
                                                       000000000 000000000
    000000007748652D     CALL???  000000014006BDD0     000000000 000000000 000000000
    
                                                       000000000
    00000000775BC521     CALL???  0000000077486520     000000000 000000000 000000000
    
                                                       000000000
    
    
    call stack performance statistics:
    total                  : 0.778000 sec
    setup                  : 0.350000 sec
    stack unwind           : 0.099000 sec
    symbol translation     : 0.021000 sec
    printing the call stack: 0.304000 sec
    printing frame data    : 0.000000 sec
    printing argument data : 0.000000 sec
    
    
    ----- End of Call Stack Trace -----
    

    这里通过exp验证到数据在GETDRUG_FLAG列上有异常,本来定义列长度为2,可是实际数据长度为17,明显不符

    通过plsql定位具体错误rowid

    SQL> set serveroutput on
    SQL> DECLARE
      2   TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
      3   CURSOR c1 IS  select /*+index(t PK_T_XIFENFEI_BAK_NEW)*/ rowid from CHF.T_XIFENFEI t;
      4   r RowIDTab;
      5   rows  NATURAL := 20000;
      6   bad_rows number := 0 ;
      7   errors number;
      8   error_code number;
      9   myrowid rowid;
     10  BEGIN
     11   OPEN c1;
     12   LOOP
     13     FETCH  c1 BULK COLLECT INTO r LIMIT rows;
     14     EXIT WHEN r.count=0;
     15     BEGIN
     16      FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
     17       insert into CHF.T_XIFENFEI_new
     18       select /*+ ROWID(A) */ *
     19       from CHF.T_XIFENFEI A where rowid = r(i);
     20     EXCEPTION
     21     when OTHERS then
     22      BEGIN
     23       errors := SQL%BULK_EXCEPTIONS.COUNT;
     24       FOR err1 IN 1..errors LOOP
     25           error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
     26           myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
     27           bad_rows := bad_rows + 1;
     28           insert into system.had_rows values('CHF.T_XIFENFEI',myrowid, error_code);
     29       END LOOP;
     30       END;
     31     END;
     32    commit;
     33   END LOOP;
     34   commit;
     35   CLOSE c1;
     36   dbms_output.put_line('Total Bad Rows: '||bad_rows);
     37  END;
     38  /
    Total Bad Rows: 1
    
    PL/SQL 过程已成功完成。
    
    
    SQL> SELECT row_id FROM  system.had_rows ;
    
     ROW_ID
     ------------------
    
     AAAT8wAAEAAAM29AAX
    
    SQL> select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';
    select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX'
                            *
    第 1 行出现错误:
    ORA-00600: 内部错误代码, 参数: [kafspa:columnBuffer1], [2883], [1], [], [], [],
    [], [], [], [], [], []
    
    

    通过这里基本上可以定位到是该条rowid记录有问题,和业务进行沟通,确定该条记录可以删除(也不能访问,其实不删除也没用)

    删除异常记录

    SQL> delete from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';
    
    已删除 1 行。
    
    SQL> commit;
    
    提交完成。
    

    收集统计信息

    SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
    ;
    
    PL/SQL 过程已成功完成。
    

    通过清理异常记录,数据库可以正常收集统计信息,未再报ORA-00600[kafspa:columnBuffer1]错误,故障较完美解决

    补充几个现象
    1. analyze table “CHF”.”T_XIFENFEI” estimate statistics; 分析表统计信息正常,但是dbms_stats收集报错(因为dbms_stats相当对于每个列进行了扫描,而analyze应该不是)
    2. 在报ORA-00600[kafspa:columnBuffer1]的情况下,ctas依旧可以成功,但是普通插入不行(因为ctas相当加油append操作),因此在有些情况下,需要慎重append(特别是有逻辑坏块的时候)

    • ORA-600 [LibraryCacheNotEmptyOnClose] on shutdown
    • dbca创建数据库报ORA-00443
    • ORA-00600[kgscLogOff-notempty]
    • truncate table强制终止导致ORA-00600[ktspfundo-2]
    • ORA-600[2037]与ORA-07445[kcbs_dump_adv_state]错误
    • ORA-00600[ktspNextL1:4]
    • ORA-00600[kcbshlc_1]导致数据库 down 案例
    • asmlib异常报ORA-00600[kfklLibFetchNext00]
    • ORA-00600[3689]引起MRP进程异常
    • 客户端版本导致ORA-00600[kssadd_stage: null parent]
    • ASMM表空间强制终止DML操作导致ORA-600 [ktspfupdst-1]
    • ORA-00600[4454]


沪ICP备19023445号-2号
友情链接