Skip to main content

Command Palette

Search for a command to run...

How direct-path inserts and one extra row per loop bloated 18 MB into 8 GB

Updated
10 min readView as Markdown
U
DBA Team Lead and Oracle ACE Pro. Building database tools and helping teams manage Oracle infrastructure.

I was investigating the code that looked something like this:

truncate table t1;
for i in 1..300 loop
   insert /*+ append */ into t1
      select 1 from dual connect by level <= p_connect_by_level;      
   commit;
end loop;

On first glance, it might look like it makes sense, the use of /*+ append */ suggests we're expecting a lot of data to be loaded using given select. But, what if the opposite is true: the loop runs many times and each time it inserts rather little data? What if p_connect_by_level is small; say, for example, somewhere around 3.5k rows?

Notice the segment size change if we insert only one row per iteration more:

p_connect_by_level segment size
3285 ~ 18 MB
3286 ~ 8GB

Why?

direct-path load

Experienced readers already noticed the many INSERT /*+ APPEND */ calls (300x), which is the culprit here.

Every table has a High Water Mark (HWM), which is the highest point in its allocated space that has ever been used. Blocks below the HWM are not necessarily full, in fact, they may be empty, e.g. after deletes, or blocks skipped by direct loads (which is the case here).

When you use /*+ APPEND */ hint, Oracle does its best to use a direct-path load, which won't search for free space in existing blocks, instead, it writes above the HWM. What "above the HWM" means exactly depends on whether or not this is the first direct-path insert to given table in a transaction.

If this is the first direct-path insert in given transaction, then "above the HWM" means it won't use existing extents. It will allocate new extents and write there.

On the other hand, if this is the subsequent insert in same transaction, it will reuse previously allocated extents (within same transaction). But still - it will write above the HWM.

Regardless, HWM always points to a block. So, even if you insert one tiny single row and leave plenty of space in a block, such insert will start with the next block (and waste whatever is free in blocks below HWM even if this is subsequent insert in same transaction).

(My assumption: the load only writes into space whose bitmaps it created itself, and that's why it skips extents left behind by earlier transactions.)

Unrestricted Direct Loads
Some might remember the old days, when you couldn't do a second insert on the same table after a direct-path load in the same transaction, without committing first. Luckily, the feature called Unrestricted Direct Loads was introduced in 23c, which allows to do more than one such insert without getting ORA-12838: cannot read/modify an object after modifying it in parallel.

What can we measure?

Before trying to deduce how the extent allocation algorithm works, let's discuss what can we measure after each insert/commit. Or, at least, what I measured while trying to figure this out.

dba_extents

It's a view, which has a list of all extents allocated to a single segment. Using simple aggregate functions, we can use it, for example, to figure out how many extents of which sizes are allocated.

select blocks, bytes/1024/1024 mb, count(*)
   from dba_extents 
   where owner = c_owner
      and segment_name = c_segment
   group by blocks, bytes;

Here are the results for our example of p_connect_by_level=3286 :

    BLOCKS         MB   COUNT(*)
---------- ---------- ----------
         8      .0625        308
       128          1         60
      1024          8        119
      8192         64        110
      7936         62          4

So, there is 110 extents with size of 64MB allocated.

A bit less obvious thing we can gather from this output is that, given our example of 300 inserts and commits; this amounts to 601 extents. Almost exactly half (308) of them are 64KB in size:

  • 300 for each opening 64k extent per insert

  • 1 for initial extent created at table-creation time

  • 7 for the early "overflow" extents, from when the segment was still under 1MB in size.

dbms_space

Surely, not all ~ 8 GB of space is fully utilized, right? We can use dbms_space.space_usage and dbms_space.unused_space to see how many blocks are:

  • unformatted

  • 0-25% full

  • 25-50% full

  • 50%-75% full

  • 75%-100% full

  • totally full

  • unused

Here's a quick example of dbms_space.space_usage for our example:

   procedure print_usage is
      l_spc_rec space_debug_tab%rowtype;
   begin
       dbms_space.space_usage(
         segment_owner => c_owner,
         segment_name => c_segment,
         segment_type => 'TABLE',
         unformatted_blocks => l_spc_rec.unformatted_blocks,
         unformatted_bytes => l_spc_rec.unformatted_bytes,
         fs1_blocks => l_spc_rec.fs1_blocks,
         fs1_bytes => l_spc_rec.fs1_bytes,
         fs2_blocks => l_spc_rec.fs2_blocks,
         fs2_bytes => l_spc_rec.fs2_bytes,
         fs3_blocks => l_spc_rec.fs3_blocks,
         fs3_bytes => l_spc_rec.fs3_bytes,
         fs4_blocks => l_spc_rec.fs4_blocks,
         fs4_bytes => l_spc_rec.fs4_bytes,
         full_blocks => l_spc_rec.full_blocks,
         full_bytes => l_spc_rec.full_bytes);
       dbms_space.unused_space(
         segment_owner => c_owner,
         segment_name => c_segment,
         segment_type => 'TABLE',
         total_blocks => l_spc_rec.total_blocks,
         total_bytes => l_spc_rec.total_bytes,
         unused_blocks => l_spc_rec.unused_blocks,
         unused_bytes => l_spc_rec.unused_bytes,
         last_used_extent_file_id => l_spc_rec.last_used_extent_file_id,
         last_used_extent_block_id => l_spc_rec.last_used_extent_block_id,
         last_used_block => l_spc_rec.last_used_block);
         
      dbms_output.put_line(rpad('_', 30)                  || '|' || lpad('Blocks', 15)                     || '|' || lpad('Bytes', 15));
      dbms_output.put_line(rpad('Unformatted', 30)        || '|' || lpad(l_spc_rec.unformatted_blocks ,15) || '|' || lpad(l_spc_rec.unformatted_bytes, 15));
      dbms_output.put_line(rpad('0 to 25% used', 30)      || '|' || lpad(l_spc_rec.fs4_blocks ,15)         || '|' || lpad(l_spc_rec.fs4_bytes, 15));
      dbms_output.put_line(rpad('25 to 50% used', 30)     || '|' || lpad(l_spc_rec.fs3_blocks ,15)         || '|' || lpad(l_spc_rec.fs3_bytes, 15));
      dbms_output.put_line(rpad('50 to 75% used', 30)     || '|' || lpad(l_spc_rec.fs2_blocks ,15)         || '|' || lpad(l_spc_rec.fs2_bytes, 15));
      dbms_output.put_line(rpad('75 to <100% used', 30)   || '|' || lpad(l_spc_rec.fs1_blocks ,15)         || '|' || lpad(l_spc_rec.fs1_bytes, 15));
      dbms_output.put_line(rpad('Full', 30)               || '|' || lpad(l_spc_rec.full_blocks ,15)        || '|' || lpad(l_spc_rec.full_bytes, 15));
      dbms_output.put_line(rpad('Unused', 30)             || '|' || lpad(l_spc_rec.unused_blocks ,15)      || '|' || lpad(l_spc_rec.unused_bytes, 15));
      dbms_output.put_line(rpad('Total', 30)              || '|' || lpad(l_spc_rec.total_blocks ,15)       || '|' || lpad(l_spc_rec.total_bytes, 15));
   end;

And for our testcase it prints the following numbers:

_                             |         Blocks|          Bytes
Unformatted                   |              0|              0
0 to 25% used                 |        1036614|     8491941888
25 to 50% used                |              0|              0
50 to 75% used                |              0|              0
75 to <100% used              |              0|              0
Full                          |           1800|       14745600
Unused                        |           8183|       67035136
Total                         |        1048992|     8593342464

Note that there is no "Empty" block row, only 0 to 25% used. Most of the blocks referred to there are completely empty.

block dump

When all else fails, we can inspect every single block using:

alter system dump datafile <file_id> block <block_number>;

We'll use this later to check if a specific block contains metadata or data.

Proof / Test

Using previously discussed methods, we can do only one iteration (instead of 300) and measure it. Here are the numbers:

num of rows extent count blocks marked full
3285 2 5
3286 3 6

First extent is skipped/wasted as explained earlier. Thus, extent count is > 1 in both cases (3285 rows and 3286 rows). 3285 rows fit into 5 blocks and 3286 rows did not, so an additional extent was needed to accommodate one extra row/block.

Each of those extents in this case is 8 blocks in size.

The first 3 blocks of each opening extent are metadata. I've confirmed this using block dump as explained earlier. So, 5 blocks remain available for rows to be inserted. To verify this, we can look at the Dump of First Level Bitmap (which refers to both extents, 16 blocks, 8 per extent):

 EXTENT_ID    FILE_ID   BLOCK_ID
---------- ---------- ----------
         0         16     524304
         1         16   14417920
         2         16   14417928
         
SQL> alter system dump datafile 16 block min 14417920 block max 14417927;        
...
  --------------------------------------------------------
  DBA Ranges :
  --------------------------------------------------------
   0x00dc0000  Length: 8      Offset: 0
   0x00dc0008  Length: 8      Offset: 8

   0:Metadata   1:Metadata   2:Metadata   3:FULL
   4:FULL   5:FULL   6:FULL   7:FULL
   8:FULL   9:100% free   10:100% free   11:100% free
   12:100% free   13:100% free   14:100% free   15:100% free
...

Reading the bitmap against my testcase: blocks 3–7 are the 3285 rows that fit in the opening extent. Block 8 - the first block of the "overflow" extent - holds the single 3286th row, yet is marked FULL: a direct load marks every block it writes as FULL, regardless of how empty it is. Blocks 9–15 are formatted, empty, and will be skipped by every subsequent direct load in our testcase. This is the space that accumulates into gigabytes once the "overflow" extents grow to 8 MB and 64 MB.

Visual Representation

.. of what we've just discussed.

  • M means block containing metadata,

  • F means a block marked as full,

  • . means an empty block

3285 rows case:

  Extent 0: initial, created at table creation, skipped by the load)
  Extent 1: M | M | M | F | F | F | F | F | (3 meta, 5 full)

3286 rows case:

  Extent 0: (initial, created at table creation, skipped by the load)
  Extent 1: M | M | M | F | F | F | F | F | (3 meta, 5 full)
  Extent 2: F | . | . | . | . | . | . | . | (1 containing 1 row, 7 empty)

Autoallocate

This whole post assumes that tablespace used for our experiment has AUTOALLOCATE set.

In case where 3285 rows fit one 64K extent, this means that after 300x such insert+commit we have 301 extents (one for each insert and one "wasted" extent, created at table creation time), each 64K in size. To confirm:

select bytes, count(*)
   from dba_extents
   where segment_name='T1' and owner='CHEMISTRY'
   group by bytes;
     BYTES   COUNT(*)
---------- ----------
     65536        301

With 3286 rows, it's a bit more interesting. We get twice as many extents:

  COUNT(*)
----------
       601

but, they're of different sizes (due to AUTOALLOCATE). I was unable to find exact documentation on how autoallocate works, but, according to my testcase, it goes like this:

Each insert (after a commit) first allocates 64KB. Then it allocates new "overflow" extent (for one single row that doesn't fit). But, this "overflow" extent size is determined according to the current segment size. Bigger the segment, bigger that "overflow" extent. According to the data I gathered:

select cur.max_blocks * 8 / 1024   as new_extent_mb,
       prev.sum_blocks * 8 / 1024  as segment_mb_before
   from space_debug_tab cur
   join space_debug_tab prev on prev.testcase = cur.testcase
      and prev.iteration = cur.iteration - 1
   where cur.testcase = 'B'
      and cur.max_blocks > prev.max_blocks
   order by cur.iteration;
NEW_EXTENT_MB SEGMENT_MB_BEFORE
------------- -----------------
            1             .9375
            8           64.6875
           64          1024.125

So, when the segment reaches 1GB in size, each "overflow" extent size is 64 MB - but in our case, it only holds 1 single row. Those almost empty "overflow" extents count towards segment size, so, at the end, we have ~8 GB segment size with ~14 MB of actual data.

What to do about it

A few very obvious quick tips on how to avoid wasting space:

  • batch small appends into one transaction (you can remove 'commit' from given testcase and you'll end up with 18x 64KB + 14x 1MB extents instead of ~8GB of space wasted)

  • use one large insert instead of many small ones,

  • or simply use conventional insert instead of direct-path insert

Final Notes

I've run the described testcase on Oracle Database 23.26.2. Standard Edition in OCI. You can download the full PL/SQL package i used for testing.

Here's a few links with additional info on the topics discussed in this post: