今天,客户端某数据库遭遇大量library cache lock 和cursor pin S wait on X的问题,经诊断发现是由于他们底层的ASM存储是多个库共享的,故障时间段别的库占用大量的IO,导致paitsm库的IO响应变慢,db file sequential read从平时的20多毫秒,到了故障时的400多毫秒。且这个库大量的sql使用了SPM,库中有近5000条baseline,连select 1 from dual这样的简单SQL也使用了SPM。在这2个因素的共同作用下,导致了大量library cache lock 和cursor pin S wait on X。分析过程如下:(一)9:23 数据库中出现大量library cache lock,大约有1400个session处于该等待。去War Room之前,做了 hangeanalyze,从hanganalyze的文件分析,存在多种堵塞链条:117 'db file sequential read'<='readbyothersession'<='librarycachelock' 13 'dbfilesequentialread'<='readbyothersession'<='librarycachelock'<='cursor:pinS' 13 'controlfilesequentialread'<='logfileswitchcompletion' 12 'librarycachelock'<='librarycachelock'<='librarycachelock'...<='librarycachelock'<=... 5 'dbfilesequentialread'<='readbyothersession' 1 <='cursor:pinSwaitonX'1 'librarycachelock'<='cursor:pinSwaitonX'可以看到,大部分的阻塞链还是’db file sequential read’<='read by other session'<='library cache lock'。也就是说,db file sequential导致了library cache lock。看上去还是非常像IO慢的问题导致library cache lock的。如:sample_id 140056627时, sid 3654(select 1 from dual,library cache lock) is blocked by=> sid 1171(select 1 from dual,read by other session, obj为SQLOBJ$_PKEY) is blocked by=> sid 1018(SELECT NVL(MAX(MYDATE), TO_DATE(’2007-09-01′, ‘YYYY-MM-DD’)) FROM TAB_ABC_99 WHERE AKSBBCCD_ID = :B1,db file sequential read, obj为SQLOBJ$_PKEY)等待library cache lock的sql,和阻塞library cache lock的sql是一样的,都是select 1 from dual,但是最终的holder的sql,是另外的SQL。注:在MOS中,虽然没有关于SQLOBJ$PKEY的详细说明,但是能查询的大部分资料都和spm有关(二)dba kill library cache lock(三)10:08 ,发现再次出现大量cursor: pin S wait on X,final blocking session是sid为2475的session,处于db file scattered read。在做process dump之后,kill os process。我们通过process dump,我们可以看到的阻塞链是:db file scattered read<= library cache lock,即dump的时候,该session处于db file scattered read。在process dump中,我们看到,2745的session大约堵塞了35个session,其中一个session是3032,处于library cache lock,请求44ab09e98的handle,而往下查时,我们可以看到,2745的该session是以X模式hold住了44ab09e98的handle,所以3032的session就无法获得,只能等2745的session释放。process dump:Current Wait Stack: 0: waiting for 'db file scattered read'<<<<< file#=0x2,block#=0x8388,blocks=0x8 wait_id=13067seq_num=13352snap_id=1 waittimes:snap=0.222181sec,exc=0.222181sec,total=0.222181sec waittimes:max=infinite,heur=0.222181sec waitcounts:calls=0os=0 in_wait=1iflags=0x5a0 Thereare35sessionsblockedbythissession. <<<<< Dumpingonewaiter: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< inst:1,sid:3032,ser:46627 waitevent:'librarycachelock' <<<<<<<<<<<<<<<<<< p1:'handleaddress'=0x44ab09e98<<<<<<<<< p2:'lockaddress'=0x45a527f50 p3:'100*mode+namespace'=0x310002 row_wait_obj#:4294967295,block#:0,row#:0,file#0 min_blocked_time:1581secs,waiter_cache_ver:20549 WaitState: fixed_waits=0flags=0x2aboundary=0x4b9549e48/0 SessionWaitHistory: ... SO:0x4006de370,type:78,owner:0x4b26741e8,flag:INIT/-/-/0x00if:0x3c:0x3 proc=0x4bcd810b8,name=LIBRARYOBJECTPIN,file=kgl.hLINE:8480,pg=0 LibraryObjectPin:Address=0x4006de370Handle=0x44ab09e98<<<<<<<<<<=library cache lock<=cursor: pin S wait on X如:sample_id 140059277时, sid 2515 (sql_id 535a6y4mx7gn1,cursor: pin S wait on X) is blocked by=> sid 3032 (sql_id 535a6y4mx7gn1,library cache lock) is blocked by=> sid 2745 (sql_id 535a6y4mx7gn1,db file sequential read, obj也还是SQLOBJ$_PKEY)注:最后所读的对象,还是SQLOBJ$_PKEY。我们在process dump中,也可以看到,在db file scattered read之前,确实处于db file sequential read:Sampled Session History of session 2745 serial 11423…… The history is displayed in reverse chronological order. sample interval: 1 sec, max history 120 sec --------------------------------------------------- [1 sample, 10:08:46] waited for 'db file sequential read', seq_num: 13344 p1: 'file#'=0x2 p2: 'block#'=0x35fbd p3: 'blocks'=0x1 time_waited: >= 0 sec (still in wait) [1 sample, 10:08:45] waited for 'db file sequential read', seq_num: 13342 p1: 'file#'=0x2 p2: 'block#'=0x4be90 p3: 'blocks'=0x1 time_waited: 0.958431 sec (sample interval: 0 sec) [1 sample, 10:08:44] waited for 'db file sequential read', seq_num: 13335 p1: 'file#'=0x2 p2: 'block#'=0x108c39 p3: 'blocks'=0x1 time_waited: 0.860525 sec (sample interval: 0 sec) [1 sample, 10:08:43] waited for 'db file sequential read', seq_num: 13327 p1: 'file#'=0x2 p2: 'block#'=0xda9ae p3: 'blocks'=0x1 time_waited: 0.828370 sec (sample interval: 0 sec) [1 sample, 10:08:42] waited for 'db file sequential read', seq_num: 13321 p1: 'file#'=0x2 p2: 'block#'=0x93975 p3: 'blocks'=0x1 time_waited: 0.858617 sec (sample interval: 0 sec)……(四)11:29 再次去War Room,此时再次做 hanganalyze,看到的阻塞链为:382 'resmgr:cpu quantum'<='librarycachelock' 47 'resmgr:cpuquantum'<='librarycachelock'<='cursor:pinSwaitonX'2 'logfileparallelwrite'<='logfilesync'如:sample_id 140063747时, sid 32 (sql_id 535a6y4mx7gn1,cursor: pin S wait on X) is blocked by=> sid 1968 (sql_id 535a6y4mx7gn1,library cache lock) is blocked by=> sid 1172 (sql_id 535a6y4mx7gn1,resmgr:cpu quantum)综上4个时间点,我们可以看出,阻塞链为:‘db file sequential read ‘<='library cache lock'<='cursor: pin S wait on X'或者'resmgr:cpu quantum'<='library cache lock'<='cursor: pin S wait on X'Db file sequential read 的对象都是SQLOBJ$_PKEY。由于resource manager的作用,导致处理排队,从而引发library cache lock比较好理解,但是IO慢为何导致了library cache lock?我们看到library cache handle的holder持有了比较长的时间,一直在处于读SQLOBJ$_PKEY,这是因为SQL在做解析时,需要持有handle,而且需要等解析完,才会释放handle。而由于我们的sql是有SPM固定的,在解析的时候,需要去访问SQLOBJ$_PKEY,而SQLOBJ$_PKEY的对象,由于IO差,访问的慢,或者read by other session正在被别的session访问,而别的session也遭受着IO差的问题。因此,耗费了较多的时间在访问SQLOBJ$_PKEY上。再对于SQLOBJ$_PKEY的访问没有结束前,解析不算完成,因此也就不能释放handle。因此别的session就处于library cache lock了。另外,cursor pin S wait on X被library cache lock堵塞,我们用3个session来说明原因:Session A,由于IO慢,hold住了handle,释放的慢。Session B,需要获得handle,但是被session A hold着,所以处于library cache lock等待。而这个session,在执行select 1 from dual的时候,在解析时会生成一个child cursor,并以exclusive模式pin住这个child cursor。Session C,由于和session B执行的是一个sql,需要以shared模式共享上面的child cursor,但是由于session B还没有完成解析,还没生成完整的child cursor,所以也无法共享这个child cursor。所以处于cursor pin Swait on X的等待。Session C: ---------------------------------------- SO: 0x4b8cf13f0, type: 2, owner: (nil), flag: IN ... Current Wait Stack: 0: waiting for 'cursor: pin S wait on X' idn=0x27d3be81, value=0x7b000000000, where=0x500000000 wait_id=106 seq_num=107 snap_id=1 wait times: snap=42 min 55 sec, exc=42 min 55 sec, total=42 min 55 sec wait times: max=infinite, heur=42 min 55 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a2 There is at least one session blocking this session. Dumping 1 direct blocker(s): inst: 1, sid: 1968, ser: 18357 Dumping final blocker: inst: 1, sid: 3702, ser: 34891 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: ... ---------------------------------------- SO: 0x2c903d538, type: 77, owner: 0x4b521c698, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x4b8cf13f0, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8476, pg=0 LibraryObjectLock: Address=0x2c903d538 Handle=0x476a55410 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 ClusterLock=0x448fa74e0 Context=0x2b6f6bc43078 User=0x4b521c698 Session=0x4b521c698 ReferenceCount=1 Flags=CBK/[0020] SavepointNum=0 LibraryHandle: Address=0x476a55410 Hash=0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD Name: Namespace=SQL AREA(00) Type=CURSOR(00) Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=1 ActiveLocks=51 TotalLockCount=51 TotalPinCount=52 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 Concurrency: DependencyMutex=0x476a554c0(0, 0, 0, 0) Mutex=0x48bdf4e68(0, 602180, 1294, 0) Flags=RON/PIN/PN0/EXP/CHD/[10012111] WaitersLists: Lock=0x476a554a0[0x476a554a0,0x476a554a0] Pin=0x476a55480[0x476a55480,0x476a55480] LibraryObject: Address=0x292c0e140 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] DataBlocks: Block: #='0' name=KGLH0^27d3be81 pins=0 Change=NONE Heap=0x476a55358 Pointer=0x292c0e1e0 Extent=0x292c0e0c0 Flags=I/-/P/A/-/- FreedLocation=0 Alloc=3.148438 Size=3.976562 LoadTime=9484515620 Block: #='6' name=SQLA^27d3be81 pins=0 Change=NONE Heap=0x413820d28 Pointer=0x292c0cd28 Extent=0x292c0c0c0 Flags=I/-/P/A/-/E FreedLocation=0 Alloc=0.000000 Size=0.000000 LoadTime=0 NamespaceDump: Child Cursor: Heap0=0x292c0e1e0 Heap6=0x292c0cd28 Heap0 Load Time=06-25-2015 10:56:38 Heap6 Load Time=06-25-2015 10:56:38 ---------------------------------------- KGX Atomic Operation Log 0x448fa74e0 Mutex 0x413820c80(1968, 0) idn 27d3be81 oper GET_SHRD<<<<<<<<<<<<<<<<< CursorPinuid1185efd0whr5slp5774 opr=2pso=0x2c903d538flg=0 pcs=0x413820bf0nxt=(nil)flg=35cld=6hd=0x476a55410par=0x48bdf3d68 ct=1fhsh=0unp=(nil)unn=0hvl=5bc57b98nhv=1ses=0x4b93927b8 hep=0x413820c80flg=80ld=1ob=0x292c0e140ptr=0x292c0cd28fex=0x292c0c0c0 SessionB: ---------------------------------------- SO:0x4b8db8470,type:2,owner:(nil),flag:INIT/-/-/0x00if:0x3c:0x3 proc=0x4b8db8470,name=process,file=ksu.hLINE:12451,pg=0 (process)Oraclepid:1010,ser:17,callscur/top:0x2d4aee7a0/0x3b5259db0 flags:(0x0)- flags2:(0x0), flags3:(0x0) intrerror:0,callerror:0,sesserror:0,txnerror0 intrqueue:empty ksudlpFALSEatlocation:0 (postinfo)lastpostreceived:0026 ... CurrentWaitStack: 0:waitingfor'librarycachelock' handleaddress=0x44ab09e98, <<<<<<<<<<<<,100*mode+namespace=0x310002 wait_id=281seq_num=282snap_id=1 waittimes:snap=68min27sec,exc=68min27sec,total=68min27sec waittimes:max=infinite,heur=68min27sec waitcounts:calls=1379os=1379 in_wait=1iflags=0x15a2 Thereisatleastonesessionblockingthissession. Dumping1directblocker(s): inst:1,sid:1172,ser:47967 Dumpingfinalblocker: inst:1,sid:1210,ser:61171 Thereare50sessionsblockedbythissession. Dumpingonewaiter:... SO:0x2f9db9f48,type:77,owner:0x2d4aee7a0,flag:INIT/-/-/0x00if:0x3c:0x3 proc=0x4b8db8470,name=LIBRARYOBJECTLOCK,file=kgl.hLINE:8476,pg=0 LibraryObjectLock: Address=0x2f9db9f48Handle=0x44ab09e98<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< CursorPinuid1968efd0whr1slp0 opr=3pso=0x3bd2c7330flg=0 pcs=0x413820bf0nxt=(nil)flg=35cld=6hd=0x476a55410par=0x48bdf3d68 ct=1fhsh=0unp=(nil)unn=0hvl=5bc57b98nhv=1ses=0x4b93927b8 ---------------------------------------- SO:0x2f9db9f48,type:77,owner:0x2d4aee7a0,flag:INIT/-/-/0x00if:0x3c:0x3 proc=0x4b8db8470,name=LIBRARYOBJECTLOCK,file=kgl.hLINE:8476,pg=0 LibraryObjectLock: Address=0x2f9db9f48Handle=0x44ab09e98RequestMode=S<<<<<<<<<<<<<<<<<<<<<<