一个ORA-4031的处理案例
客户的一个9207 RAC因为ora-4031导致了宕机。分析4031问题,如果是事后,那么TRACE文件是最好的分析工具,我们来看4031 TRACE里各种对象的情况:
===============================
Memory Utilization of Subpool 1
===============================
Allocation Name Size
_________________________ __________
"free memory " 32773512
"miscellaneous " 15862600
"partitioning i " 0
"sim trace entries " 245760
"gcs resources " 21925568
"PL/SQL DIANA " 401496
"trigger defini " 0
"KQR M PO " 49688
"krvxrr " 253056
"trigger inform " 0
"FileOpenBlock " 6028960
"transaction " 470104
"fixed allocation callback" 376
"replication session stats" 269040
"ges enqueues " 4222088
"KGK heap " 552
"trigger source " 0
"PL/SQL MPCODE " 26448
"errors " 0
"pl/sql source " 0
"sim memory hea " 515944
"ges resources " 85367744
"dictionary cache " 1071104
"PL/SQL PPCODE " 0
"Temporary Tables State Ob" 209968
"parameters " 0
"db_block_hash_buckets " 3924384
"library cache " 2541616
"db_handles " 928000
"sql area " 157400
"sessions " 597584
"KGLS heap " 80272
"gcs shadows " 2519272
"branch " 316616
"event statistics per sess" 2507024
"Checkpoint queue " 1283200
"table definiti " 0
"partitioning d " 0
===============================
Memory Utilization of Subpool 2
===============================
Allocation Name Size
_________________________ __________
"free memory " 33796496
"miscellaneous " 17424112
"KGLS heap " 134008
"transaction " 405408
"trigger defini " 0
"errors " 0
"ges enqueues " 4138672
"PL/SQL MPCODE " 0
"Checkpoint queue " 1283200
"ksm_file2sga region " 370496
"KCL name table " 1471944
"PL/SQL DIANA " 0
"gcs resources " 5824744
"KCL extra lock elements " 691200
"sim memory hea " 515944
"trigger inform " 648
"parameters " 0
"session param values " 1577760
"ges resources " 85415480
"partitioning d " 0
"gcs shadows " 2517168
"ges regular msg buffers " 893048
"dictionary cache " 33792
"KQR S SO " 8192
"partitioning i " 0
"KQR M PO " 210328
"trigger source " 0
"library cache " 3063304
"processes " 1056000
"sql area " 795296
"sessions " 597584
"MTTR advisory " 179472
"event statistics per sess" 2518368
"fixed allocation callback" 424
"KGK heap " 6448
"table definiti " 0
"db_block_hash_buckets " 352256
"ges resource hash table " 2490368
===============================
Memory Utilization of Subpool 3
===============================
Allocation Name Size
_________________________ __________
"free memory " 28924264
"miscellaneous " 15446808
"KQR M PO " 135704
"gcs resources " 5824744
"parameters " 0
"KGLS heap " 16456
"ges big msg buffers " 3835688
"Checkpoint queue " 1283200
"trigger defini " 0
"PLS non-lib hp " 2088
"ges resources " 72820856
"KQR L PO " 131176
"PL/SQL MPCODE " 37392
"joxs heap init " 4240
"enqueue " 1646960
"gcs shadows " 18621616
"MTTR advisory " 149216
"KSXR receive buffers " 1034000
"table definiti " 0
"ges process array " 1057320
"trigger source " 0
"sim memory hea " 520184
"dictionary cache " 1065728
"db_block_hash_buckets " 3924400
"partitioning i " 0
"1M buffer " 1049600
"PL/SQL DIANA " 0
"library cache " 2219312
"partitioning d " 0
"sql area " 128528
"trigger inform " 0
"sessions " 600288
"event statistics per sess" 2507024
"ges enqueues " 4264344
"PL/SQL PPCODE " 0
"errors " 0
"fixed allocation callback" 336
"ktlbk state objects " 521528
===============================
Memory Utilization of Subpool 4
===============================
Allocation Name Size
_________________________ __________
"free memory " 35164632
"miscellaneous " 16038016
"fixed allocation callback" 376
"partitioning i " 0
"table definiti " 224
"KGLS heap " 197648
"KQR L PO " 343440
"gcs resources " 5822416
"PL/SQL MPCODE " 2488
"PX subheap " 41008
"ges reserved msg buffers " 2096008
"gcs resource hash table " 524288
"PL/SQL PPCODE " 0
"transaction " 404520
"gcs shadows " 2519272
"partitioning d " 0
"trigger inform " 0
"1M buffer " 1049600
"message pool freequeue " 324464
"KSXR pending messages que" 853952
"PL/SQL SOURCE " 0
"ges enqueues " 4100120
"dictionary cache " 2103808
"ges resources " 85545768
"enqueue resources " 535104
"trigger defini " 0
"errors " 0
"library cache " 3616560
"PL/SQL DIANA " 44672
"sql area " 691424
"sessions " 597584
"KQR M PO " 28232
"event statistics per sess" 2507024
"parameters " 0
"DML lock " 820368
"sim memory hea " 515944
"Checkpoint queue " 1283200
"trigger source " 0
服务器有16个CPU,因此共享池被分为4个子池。每个子池128M。从上面看,当时的空闲空间超过100M。在分配一个2268字节的对象的时候报错。说明共享池的碎片化十分严重。什么导致共享池的碎片呢?我们注意到一个对象:ges resources,在每个子池中总计超过300M。大多数共享池空间都被GES RESOURCES占用了。而GES RESOURCE是可以动态扩展的PERMENT OBJECT,每个资源占用484个字节。在RAC环境中,由于GES RESOURCE的大量动态扩展,会导致共享池中碎片大规模出现,最终导致系统不可用。在这个案例中,处理这个问题的方法很多,最简单的情况,是内存充足的情况下,直接加大共享池的大小就可以解决问题了。如果要找到GES RESOURCES为何如此巨大的原因,要复杂的多,要配合应用一起来分析。这里就不做讨论了。
发表于 koko 在 2008年10月10日, 04:53 下午 CST #