Dontcheff

Archive for August, 2022|Monthly archive page

How to calculate the size of a database?

In Data, Databases, DBA on August 29, 2022 at 10:03

Often even DBAs have misalignment on what is meant by “database size”. As data and databases grow with a rather high rate, it is important to understand the size of the database estate. Worldwide data is expected to hit 175 zettabytes by 2025, representing a 61% CAGR. Also, 51% of the data will be in data centers and 49% will be in the public cloud. So, when moving to cloud, it is important to know how much storage is needed for the databases.

Ever wondered which is the biggest database in the world? Here is an interesting link: 10 Largest Databases in the World.Really? Sure about that? How do you actually know if it’s true? Who is the source of information, and can we trust it?” I was wondering exactly the same when checking the list.

And also: how should all the data be stored? According to Data: A Lenovo Solutions Perspective, the answer is that within the core (i.e. data center or cloud) the classic relational database will still be the dominant approach for many years. Thus, one more reason to properly estimate and calculate the size of your databases.

So, how how to calculate the size of say an Oracle database (check the links at the end for other database brands)? The most common way is to simply calculate the space which the database files physically consume on disk:

select sum(bytes)/1024/1024/1024 size_in_GB from dba_data_files;

But how about the other files? Like temp, control and redolog files? Are the undo files really part of the database? Do we always want to include the files from SYSTEM and SYSAUX too?

Also, not all this space in the files listed in dba_data_files is necessarily allocated. There could be sections of these files that are not used. Then, we can go with this method:

select sum(bytes)/1024/1024/1024 size_in_GB from dba_segments;

I often prefer using the query which ignores the system, temp and undo data:

select nvl(ceil(sum(bytes)/(1024*1024*1024)),0) size_in_GB
from dba_extents
where tablespace_name not in ('SYSTEM','TEMP','SYSAUX') and tablespace_name not like '%UNDO%';

Another way used is to only calculate the size of the real data from each schema in the database – the methods above include the indexes too. This requires recent analyze (not estimate but rather compute 100%) for all tables.

select owner, nvl(ceil(sum(num_rows*avg_row_len)/(1024*1024*1024)),0) size_in_GB
from dba_tables
where owner not in ('SYSTEM','OUTLN','SYS') group by owner;

However, checking the overall size including TEMP and REDO is perhaps the best approach. There is a MOS note, How to Calculate the Size of the Database (Doc ID 1360446.1) which is also worth reading. Here is the suggested method to calculate the total database size:

select a.data_size+b.temp_size+c.redo_size+d.cont_size "total_size"
from ( select sum(bytes) data_size
       from dba_data_files ) a,
     ( select nvl(sum(bytes),0) temp_size
       from dba_temp_files ) b,
     ( select sum(bytes) redo_size
       from sys.v_$logfile lf, sys.v_$log l
       where lf.group# = l.group#) c,
     ( select sum(block_size*file_size_blks) cont_size
       from v$controlfile ) d;

If you have enabled block change tracking and want to be really pedantic with the database size, you should also add the BCT file.

Here are some additional links and a query which will help you find indexes which are bigger than tables.

Database Size in Oracle by Bijay Kumar Sahoo

How to calculate current DB size from asktom

Estimate the Size of a Database in SQL Server

How to calculate total size of the database

How to calculate the size of a MySQL database

select owner "TABLE_OWNER", tablename "TABLE_NAME", tablesize "TABLE SIZE (GB)", indexname "INDEX_NAME", indexsize "INDEX SIZE (GB)", indexsize/tablesize "INDEX/TABLE" from
(
with
tabs as (select owner, segment_name tablename,sum(bytes/1024/1024/1024) tablesize from dba_segments where segment_type='TABLE' group by owner, segment_name),
inds as (select i.owner, i.index_name indexname, i.table_name tablename, sum(s.bytes/1024/1024/1024) indexsize from dba_indexes i join dba_segments s on (i.owner=s.owner and i.index_name=s.segment_name) group by i.owner, i.index_name, i.table_name)
select * from tabs natural join inds where indexsize > tablesize and indexsize>1
)
order by indexsize/tablesize desc;

Finally, there is a view in Oracle, called dba_hist_tbspc_space_usage, which displays historical tablespace usage statistics. So, you can retrieve historical growth of the tablespaces and thus the database as a whole. Check this blog post: How to retrieve growth history for Oracle tablespaces. Note that the history is available for as far back as AWR data is retained. It is considered good practice to keep track of the database size on say weekly basis. You will be always able to answer questions on capacity needs, trends, growth, size, etc.

P.S. DBA_HIGH_WATER_MARK_STATISTICS contains a column called DB_SIZE which is dependent on the internal package DBMS_SWRF_REPORT_INTERNAL. The definition of DBA_HIGH_WATER_MARK_STATISTICS is:

select dbid, hwm.name, version, highwater, last_value, description
from wri$_dbu_high_water_mark hwm, wri$_dbu_hwm_metadata mt
where hwm.name = mt.name and
hwm.name not like '_HWM_TEST%' and /* filter out test hwm */
bitand(mt.method, 4) != 4          /* filter out disabled hwm */

The table WRI$_DBU_HIGH_WATER_MARK is mostly likely populated using DBMS_SWRF_REPORT_INTERNAL. The view DBA_HIGH_WATER_MARK_STATISTICS (existing since 10g!) is useful as it tells you the maximum size of the database for each version of the lifespan of the database.

Time to rebuild indexes in the Oracle database?

In Databases, DBA, Oracle database on August 19, 2022 at 14:13

One of the most controversial topics among DBAs is whether to rebuild indexes or not.

But it is so generic that the answer to the question depends on the database version, the index type and the reasons behind doing it.

In most cases, we consider b-tree indexes for rebuilt. Note that b stands not for “binary” but for “balanced”. We create indexes mostly for performance reasons. If performance is the real concern, we need to first understand at least at high level how those indexes are organized.

The MOS note Lists All Indexes that Benefit from a Rebuild (Doc ID 122008.1) suggests that indexes considered for rebuilt are indexes for which:

– deleted entries represent 20% or more of the current entries
– the index depth is more then 4 levels

And possible candidate for bitmap index were considered when the distinctiveness was more than 99%:

if ( height > 5 ) OR ( (del_lf_rows/lf_rows) > 0.2 ) then
    dbms_output.put_line (rpad(r_indx.owner,16,' ') || rpad(r_indx.index_name,40,' ') || 
                          lpad(round((del_lf_rows/lf_rows)*100,3),17,' ') || 
                          lpad(height-1,7,' ') || lpad(round((lf_rows-distinct_keys)*100/lf_rows,3),16,' '));
end if;

However, it is a good question if this applies to version 11g and above. I have personally seen good benefits of rebuilding indexes satisfying those conditions in lower versions of the database.

Another MOS article, Script to investigate a b-tree index structure (Doc ID 989186.1), provides a script which is quite handy as it verifies the structure of a b-tree index based on the existing table and index statistics.  The script calculates the following items:

– Estimate the size the index should be as optimal packing can be specified

– The index layout

This script keeps a history of the the data gathered in the INDEX_HIST table. This can be useful to prevent a pattern in index rebuilding. The history log can be user defined.

Yes another MOS article, Index Rebuild, the Need vs the Implications (Doc ID 989093.1), specifies that the most common justifications given for rebuilding an index are:
– index becomes fragmented
– index grows and grows – deleted space is not re-used
– index clustering factor becomes out of sync

However, as noted, the impact of rebuilding the index can be quite significant:

  • Most scripts around depend on the index_stats dynamic table
  • Redo activity may increase and general performance might be impacted as a direct result of rebuilding an index

An index coalesce is often preferred instead of an index rebuild. It has the following advantages:

  • does not require approximately 2 times the disk storage
  • always online
  • does not restructure the index, but combines index leaf blocks as much as possible, avoiding system overhead

Due to the reasons listed above, it is strongly advised not to rebuild indexes on a regular basis but instead use proper diagnostics.

Next, here is my personal experience from rebuilding indexes: either drop and create or simply rebuild. I have seen 3 main use cases:

  1. For performance reasons: because of too many deleted rows or because of split index nodes. With every new release of Oracle, it looks like that we need less and less time on performing such a rebuild operation.
  2. Indexes get fragmented over time and occupy too much space. There was a mission critical production database where data was about 500MB and the indexes all together were about 4TB. You can release sometimes quite a lot of space.
  3. Indexes should be in my opinion in a separate tablespace. Not where the data is. If something gets wrong with the index tablespace: logical or physical corruption, or deleting files unintentionally, then it is just a matter of time to recreate the indexes again.

In Oracle 21c, there is a new feature called Automatic Index Optimization. The optimization process includes 3 actions:

• Compress: Compresses portions of the key values in an index segment (~3 times)

• Shrink: Merges the contents of index blocks where possible to free blocks for reuse

• Rebuild: Rebuilds an index to improve space usage and access speed

For a very long time, both DBAs and Developers, have been struggling (really struggling) with what indexes should be created, what type of indexes they should be created as and what indexes should be dropped from the database. By far, the most interesting new feature of Oracle Database 19c is Automatic Index creation (AI Creation). In the long run, this is to be one of the most important features in the Oracle database. Note that you cannot rebuild an auto index! Nice and useful capability for AI is that Oracle automatically rebuilds indexes that are marked as “Unusable”.

For more, check:

Automatic Indexes: Automatically Rebuild Unusable Indexes Part I by Richard Foote

Oracle B-Tree Index Internals: Rebuilding The Truth:

“Generally rebuild index when the clustering factor exceeds eight times the number of dirty blocks in the base table, when the levels exceed two or when there are excessive brown nodes in the index”.

When an index should be rebuilt? by Gouranga

SQL Server: Reorganize and Rebuild Indexes in the Database

“Microsoft recommends fixing index fragmentation issues by rebuilding the index if the fragmentation percentage of the index exceeds 30%, where it recommends fixing the index fragmentation issue by reorganizing the index if the index fragmentation percentage exceeds 5% and less than 30%”

How Often Do You Rebuild Indexes?

Rebuilding Indexes by Jonathan Lewis