Here I will sum some information about what I have learned from the Tanel Poder's video about the ORA-4031 error.
To see the informations about the sharedd pool, you can query the x$ksmsp table. It's not safe to use this table in the production database, because qurying will take the latch on the shared pool and than goes through the whole shared pool. that can mean that the instance database will be frozen for some time. In very active oltp or in the system with large shared pool this freeze can take several seconds or minutes (In case of a bug it can take hours or forever). Whe shared poll is locked, you cant connect even as sysdba. Only possibility is to kill -9 the process which is holding the lock. PMON will then cleanup the session and release the latch.
The x$ksmsp means the kernel shared memory management shared pool.
The number of chunks in the shared pool:
SYS@SODS1 SQL> select count(*) from x$ksmsp;
COUNT(*)
----------
53116
To know the number of chunks is not eneough. So we will query the chunk class (KSMCHCLS)
SYS@SODS1 SQL> select KSMCHCLS,count(*) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*)
-------- ----------
recr 25762
freeabl 23154
R-freea 83
perm 54
R-free 37
R-perm 2
free 4010
Here we can see that:
There are 4010 free chunks, 37 free chunks in the reserved pool. 23154 chunks are freable by the same process (kernel module) which has allocated it, 25762 are recreatable by any process. 54 chunks are used for permanent subpool (this is used for several v$ tables).
The reserved area in the shared pool is used for the bigger allocations. Normaly ihe maximum size of the small allocation is 4096 bytes. When needed, oracle can splilt the cursor or execution plan in more chunks. (works only for sql and not pl/sql). For the bigger allocations, the reserved area will be used. To see tme minimum allocation size which can use the reserved area is:
SYS@SODS1 SQL> @pd min_alloc
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
182 B6 _shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool
The space allocated for big allocations:
SYS@SODS1 SQL> @pd shared%reserved
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
180 B4 shared_pool_reserved_size 29360128 size in bytes of reserved area of shared pool
181 B5 _shared_pool_reserved_pct 5 percentage memory of the shared pool allocated for the reserved area
182 B6 _shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool
To see the sum of space used in each class:
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ)
-------- ---------- -------------
recr 25787 50374760
freeabl 23170 74217040
R-freea 83 126200
perm 54 364235864
R-free 37 28426184
R-perm 2 33547040
free 3984 53049520
Here we can see that there are cca 50MB of free space in the shared pool.
The shared pool is allways fragmented. Oracle can allocate chunks of 1KB, 2KB, 4KB near to each other. When we try to allocate new chunk of 4KB and there is no free chunk in the freelist, oracle tries to free not used chunks. After some time, there can be a gap between the chunks. If you free 3KB chunk, you cant fit the 4KB in int. On the other side, when oracle frees 2 chunks near to each other he will merge them. Oracel flushes 8 chunks at a time. If he is not able to to find large enough continuos space, it will flush another 8.
When you issue the "alter system flush shared_pool;" oracle will try to free the chunks and merge them. Oracle cant flush used cursor. So not all the space will be freed.
See the min and max size of the chunk:
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ),min(KSMCHSIZ),max(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ) MIN(KSMCHSIZ) MAX(KSMCHSIZ)
-------- ---------- ------------- ------------- -------------
recr 25798 50399648 56 4190376
freeabl 23258 74414328 32 2097176
R-freea 83 126200 48 30816
perm 54 364235864 80 15937536
R-free 37 28426184 2184 839496
R-perm 2 33547040 16772192 16774848
free 3907 52827344 40 7802296
We have cca 52MB of free space and maximal size of the chunk is 7MB.
Let's try the flush shared_pool:
SYS@SODS1 SQL> alter system flush shared_pool;
System altered.
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ),min(KSMCHSIZ),max(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ) MIN(KSMCHSIZ) MAX(KSMCHSIZ)
-------- ---------- ------------- ------------- -------------
recr 6647 13250080 56 4190376
freeabl 6702 30201016 32 2097176
R-freea 73 23960 48 20504
perm 54 364235864 80 15937536
R-free 36 28528424 2184 839496
R-perm 2 33547040 16772192 16774848
free 2833 134190224 40 9897960
We can see, that the number of chunks is smaller after the flush. That is because oracle has merged the free chunks near to each other. The free space went up from 7MB to 9 MB.
Shared pool uses the LRU mechanism (least recently used). That means, that it tries to flush the least used chunks first.
The LRU has 2 parts: TRANSIENT end and RECURRENT end. Whe you execute the query, then the sql is pinned twice. First time when you parse and thane when you execute it. If you use the sql next time, it's pinned third time and oracle will move it to the transient end. Whe oracle tries to allocate a new chunk and there is no free chunk in the freelist( shared pool is full) it will try to find out if there is any hidden memory. If not it will flush the chunks in the transient end first. When no beig enough chunks are in the transient end, it will look in the reccurent end.
Metrics in v$systat which are showing that we can have problem with shared pool allocations:
sql area evicted - somebody has thrown the execution plan of the shared pool
CCursor + sql area evicted - somebody thrown away cursor or sql from pool
hard parses - soft parse occres every time (softer soft parse can occur when statement is found in session cursor cache).
To monitor hard parses:
@snapper stats,gather=s,sinclude=parse.*count|evicted 5 1 all <- monitor hard parses
------------------------------------------------------------------------------------------------------------------------------------
SID @INST, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
------------------------------------------------------------------------------------------------------------------------------------
52 @1, EVM_BYPASS, STAT, sql area evicted , 1, .16, ,
52 @1, EVM_BYPASS, STAT, CCursor + sql area evicted , 1, .16, ,
52 @1, EVM_BYPASS, STAT, parse count (total) , 4, .63, ,
52 @1, EVM_BYPASS, STAT, parse count (hard) , 4, .63, ,
102 @1, MNISCAK , STAT, sql area evicted , 884, 139.45, ,
102 @1, MNISCAK , STAT, CCursor + sql area evicted , 861, 135.82, ,
102 @1, MNISCAK , STAT, parse count (total) , 2724, 429.7, ,
102 @1, MNISCAK , STAT, parse count (hard) , 907, 143.08, ,
You can see, that my user did 2724 parses during the 5 seconds interval. 907 of the were hard parses.
Info about the activity in the shared pool can be see in the x$kghlu table
The x$ksmsp means the kernel shared memory management shared pool.
The number of chunks in the shared pool:
SYS@SODS1 SQL> select count(*) from x$ksmsp;
COUNT(*)
----------
53116
To know the number of chunks is not eneough. So we will query the chunk class (KSMCHCLS)
SYS@SODS1 SQL> select KSMCHCLS,count(*) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*)
-------- ----------
recr 25762
freeabl 23154
R-freea 83
perm 54
R-free 37
R-perm 2
free 4010
Here we can see that:
There are 4010 free chunks, 37 free chunks in the reserved pool. 23154 chunks are freable by the same process (kernel module) which has allocated it, 25762 are recreatable by any process. 54 chunks are used for permanent subpool (this is used for several v$ tables).
The reserved area in the shared pool is used for the bigger allocations. Normaly ihe maximum size of the small allocation is 4096 bytes. When needed, oracle can splilt the cursor or execution plan in more chunks. (works only for sql and not pl/sql). For the bigger allocations, the reserved area will be used. To see tme minimum allocation size which can use the reserved area is:
SYS@SODS1 SQL> @pd min_alloc
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
182 B6 _shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool
The space allocated for big allocations:
SYS@SODS1 SQL> @pd shared%reserved
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
180 B4 shared_pool_reserved_size 29360128 size in bytes of reserved area of shared pool
181 B5 _shared_pool_reserved_pct 5 percentage memory of the shared pool allocated for the reserved area
182 B6 _shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool
To see the sum of space used in each class:
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ)
-------- ---------- -------------
recr 25787 50374760
freeabl 23170 74217040
R-freea 83 126200
perm 54 364235864
R-free 37 28426184
R-perm 2 33547040
free 3984 53049520
Here we can see that there are cca 50MB of free space in the shared pool.
The shared pool is allways fragmented. Oracle can allocate chunks of 1KB, 2KB, 4KB near to each other. When we try to allocate new chunk of 4KB and there is no free chunk in the freelist, oracle tries to free not used chunks. After some time, there can be a gap between the chunks. If you free 3KB chunk, you cant fit the 4KB in int. On the other side, when oracle frees 2 chunks near to each other he will merge them. Oracel flushes 8 chunks at a time. If he is not able to to find large enough continuos space, it will flush another 8.
When you issue the "alter system flush shared_pool;" oracle will try to free the chunks and merge them. Oracle cant flush used cursor. So not all the space will be freed.
See the min and max size of the chunk:
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ),min(KSMCHSIZ),max(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ) MIN(KSMCHSIZ) MAX(KSMCHSIZ)
-------- ---------- ------------- ------------- -------------
recr 25798 50399648 56 4190376
freeabl 23258 74414328 32 2097176
R-freea 83 126200 48 30816
perm 54 364235864 80 15937536
R-free 37 28426184 2184 839496
R-perm 2 33547040 16772192 16774848
free 3907 52827344 40 7802296
We have cca 52MB of free space and maximal size of the chunk is 7MB.
Let's try the flush shared_pool:
SYS@SODS1 SQL> alter system flush shared_pool;
System altered.
SYS@SODS1 SQL> select KSMCHCLS,count(*),sum(KSMCHSIZ),min(KSMCHSIZ),max(KSMCHSIZ) from x$ksmsp group by KSMCHCLS;
KSMCHCLS COUNT(*) SUM(KSMCHSIZ) MIN(KSMCHSIZ) MAX(KSMCHSIZ)
-------- ---------- ------------- ------------- -------------
recr 6647 13250080 56 4190376
freeabl 6702 30201016 32 2097176
R-freea 73 23960 48 20504
perm 54 364235864 80 15937536
R-free 36 28528424 2184 839496
R-perm 2 33547040 16772192 16774848
free 2833 134190224 40 9897960
We can see, that the number of chunks is smaller after the flush. That is because oracle has merged the free chunks near to each other. The free space went up from 7MB to 9 MB.
Shared pool uses the LRU mechanism (least recently used). That means, that it tries to flush the least used chunks first.
The LRU has 2 parts: TRANSIENT end and RECURRENT end. Whe you execute the query, then the sql is pinned twice. First time when you parse and thane when you execute it. If you use the sql next time, it's pinned third time and oracle will move it to the transient end. Whe oracle tries to allocate a new chunk and there is no free chunk in the freelist( shared pool is full) it will try to find out if there is any hidden memory. If not it will flush the chunks in the transient end first. When no beig enough chunks are in the transient end, it will look in the reccurent end.
Metrics in v$systat which are showing that we can have problem with shared pool allocations:
sql area evicted - somebody has thrown the execution plan of the shared pool
CCursor + sql area evicted - somebody thrown away cursor or sql from pool
hard parses - soft parse occres every time (softer soft parse can occur when statement is found in session cursor cache).
To monitor hard parses:
@snapper stats,gather=s,sinclude=parse.*count|evicted 5 1 all <- monitor hard parses
------------------------------------------------------------------------------------------------------------------------------------
SID @INST, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
------------------------------------------------------------------------------------------------------------------------------------
52 @1, EVM_BYPASS, STAT, sql area evicted , 1, .16, ,
52 @1, EVM_BYPASS, STAT, CCursor + sql area evicted , 1, .16, ,
52 @1, EVM_BYPASS, STAT, parse count (total) , 4, .63, ,
52 @1, EVM_BYPASS, STAT, parse count (hard) , 4, .63, ,
102 @1, MNISCAK , STAT, sql area evicted , 884, 139.45, ,
102 @1, MNISCAK , STAT, CCursor + sql area evicted , 861, 135.82, ,
102 @1, MNISCAK , STAT, parse count (total) , 2724, 429.7, ,
102 @1, MNISCAK , STAT, parse count (hard) , 907, 143.08, ,
You can see, that my user did 2724 parses during the 5 seconds interval. 907 of the were hard parses.
Info about the activity in the shared pool can be see in the x$kghlu table
FREE UNPIN
UNSUCCESS <--- ora-04031 We were not able to find free chunk after the flush.
LAST FRUNP
UNSUCC SIZE <-- the size of the unsuccessful allocation
If you cant allocate the chunk of 4KB, it's really a bad fragmentation.
Shared poll (like every other pool) works with granules. The granule size depends on the size of sga. In my exaple wit h1GB of SGA the size of the granule is 4MB:
SYS@MNTMP1 SQL> @pd ksmg
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
436 1B4 _ksmg_granule_size 4194304 granule size in bytes
The size of the granule and size of the each pool can be get from v$sga_dynamic_components.
The size of reserved pool:
SYS@MNTMP1 SQL> show parameter reserved
PARAMETER_NAME TYPE VALUE
------------------------------------------------------------ --------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------
shared_pool_reserved_size big integer 15309209
Parameter _shared_pool_reserved_pct sets the number of % of the shared pool which will be reserved for the reserved area.
Parameter _kghdsidx_count sets thu number of subpools in the shared pool. Depends on the size of the shared pool. According to Tanel, there is no need to have more than 3 subpools (This parameter should be mentioned on the MOS). you can see the number of the subpools in the x$kghlu table.
To see the most flushers in the database, you can query the x$ksmlru table:
SYS@MNTMP1 SQL> @ksmlru
IDX DUR FLUSHED ALLOC_COMMENT ALLOC_SIZE OBJECT_NAME HASH_VALUE SES_ADDR
--- --- ---------- -------------------- ---------- -------------------------------------------------- ---------- ----------------
1 0 2720 parameter handle 3152 0 000000009F1E0C98
1 0 1808 parameter table blo 2072 0 000000009F1E0C98
1 0 744 KQR X PO 1176 select location_name, user#,... 1018695163 000000009ED53260
1 0 408 KQR L PO 1072 select location_name, user#,... 1018695163 000000009ED53260
1 0 328 KGLHD 1040 0 000000009EEC1B60
1 0 280 KGLHD 1024 V$DB_CACHE_ADVICE 2832437830 000000009EDAECA0
1 0 272 KGLHD 1032 GV$DB_CACHE_ADVICE 1364340235 000000009EDAECA0
1 0 176 kgllk 2136 SELECT inst_id, KTTEFINFOTSN... 2413039534 000000009F1E0C98
1 0 112 KGLHD 912 GV$JAVA_POOL_ADVICE 4100145568 000000009EDAECA0
1 0 8 kglHeapInitialize:t 4096 0 000000009F352678
The x$ksmlru will reset itself after queriing. So when you need to troubeshoot it, store the output in some table with timestamp.
Example of the error:
create or replace procedure bigp as
*
ERROR at line 1:
ORA-04031: unable to allocate 4316944 bytes of shared memory ("shared pool","unknown object","PLDIA^5247287d","PTG: Node")
Here you can see that we were trying to allocate 4316944 bytes in the shared pool. The "unknown objects" means, that it was anonymous PL/SQL block. Otherwise the sql text would be there. Other thing you can try is to flush shared pool. It's better to try the flush two times in a row. It will flush more chunks and hopefully it can merge some to fit your query.
When you hit this error in the PROD, what you can do is to enlarge the shared pool. there is no need to add 1GB, but try to add 100MB and see if it will help.
UNSUCCESS <--- ora-04031 We were not able to find free chunk after the flush.
LAST FRUNP
UNSUCC SIZE <-- the size of the unsuccessful allocation
If you cant allocate the chunk of 4KB, it's really a bad fragmentation.
Shared poll (like every other pool) works with granules. The granule size depends on the size of sga. In my exaple wit h1GB of SGA the size of the granule is 4MB:
SYS@MNTMP1 SQL> @pd ksmg
Show all parameters and session values from x$ksppi/x$ksppcv...
INDX I_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------- ------------------------------ ----------------------------------------------------------------------
436 1B4 _ksmg_granule_size 4194304 granule size in bytes
The size of the granule and size of the each pool can be get from v$sga_dynamic_components.
The size of reserved pool:
SYS@MNTMP1 SQL> show parameter reserved
PARAMETER_NAME TYPE VALUE
------------------------------------------------------------ --------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------
shared_pool_reserved_size big integer 15309209
Parameter _shared_pool_reserved_pct sets the number of % of the shared pool which will be reserved for the reserved area.
Parameter _kghdsidx_count sets thu number of subpools in the shared pool. Depends on the size of the shared pool. According to Tanel, there is no need to have more than 3 subpools (This parameter should be mentioned on the MOS). you can see the number of the subpools in the x$kghlu table.
To see the most flushers in the database, you can query the x$ksmlru table:
SYS@MNTMP1 SQL> @ksmlru
IDX DUR FLUSHED ALLOC_COMMENT ALLOC_SIZE OBJECT_NAME HASH_VALUE SES_ADDR
--- --- ---------- -------------------- ---------- -------------------------------------------------- ---------- ----------------
1 0 2720 parameter handle 3152 0 000000009F1E0C98
1 0 1808 parameter table blo 2072 0 000000009F1E0C98
1 0 744 KQR X PO 1176 select location_name, user#,... 1018695163 000000009ED53260
1 0 408 KQR L PO 1072 select location_name, user#,... 1018695163 000000009ED53260
1 0 328 KGLHD 1040 0 000000009EEC1B60
1 0 280 KGLHD 1024 V$DB_CACHE_ADVICE 2832437830 000000009EDAECA0
1 0 272 KGLHD 1032 GV$DB_CACHE_ADVICE 1364340235 000000009EDAECA0
1 0 176 kgllk 2136 SELECT inst_id, KTTEFINFOTSN... 2413039534 000000009F1E0C98
1 0 112 KGLHD 912 GV$JAVA_POOL_ADVICE 4100145568 000000009EDAECA0
1 0 8 kglHeapInitialize:t 4096 0 000000009F352678
The x$ksmlru will reset itself after queriing. So when you need to troubeshoot it, store the output in some table with timestamp.
Example of the error:
create or replace procedure bigp as
*
ERROR at line 1:
ORA-04031: unable to allocate 4316944 bytes of shared memory ("shared pool","unknown object","PLDIA^5247287d","PTG: Node")
Here you can see that we were trying to allocate 4316944 bytes in the shared pool. The "unknown objects" means, that it was anonymous PL/SQL block. Otherwise the sql text would be there. Other thing you can try is to flush shared pool. It's better to try the flush two times in a row. It will flush more chunks and hopefully it can merge some to fit your query.
When you hit this error in the PROD, what you can do is to enlarge the shared pool. there is no need to add 1GB, but try to add 100MB and see if it will help.