An old Chinese proverb says: “When planning for a year, plant corn. When planning for a decade, plant trees. When planning for life, train and educate people.”
When new IT applications are created, how often does someone plan about the future of the data? Its growth, reorganization, what is being logged, how often unnecessary data is purged and deleted; what is being audited and how? I guess we all know the answer. Here is a picture from The Data Affordability GAP Widens – Year by Year
Building the Knowledge! That is the starting point. Andy Flower, the president of IOUG, wrote an excellent article in the January/February 2012 issue of Oracle Magazine entitled “Building the Knowledgebase”.
Andy Flower says: “30 percent of organizations have seen data grow at an annual rate of more than 25 percent, with 10 percent seeing greater than 50 percent growth. 27 percent of respondents currently have more than 100 TB of data. Nearly 1 out of 10 sites now has data stores in the petabyte range.”
1. Unnecessary data is seldom deleted, purged or archived. I have often (rather quite often) seen databases where less that 10% of the data stored is used on daily basis. Oracle AWR reports can provide you excellent information where the physical and logical reads come from and with what percentage. Here is a simple example:
2. Databases often get fragmented, and they need regular reorganization. An interesting example is a global database which was 7TB in size (now bigger I guess) with real data less than 1TB. If that’s not a classical example of database fragmentation, then please give me a better one. Datafiles had so much white space! You may use this query to check for datafile fragmentation:
set lines 256 set pages 999 col "File Name" for A47 column file_name format a40; column highwater format 9999999999; SELECT /*+ RULE */ df.File_id, Substr(df.file_name,1,47) "File Name", Round(df.bytes/1024/1024,2) "Size (M)", Round(e.used_bytes/1024/1024) "Used (M)", Round(f.free_bytes/1024/1024) "Free (M)", round((b.maximum+c.blocks-1)*d.db_block_size/(1024*1024)) "HWM (M)" FROM dba_data_files df, (SELECT file_id, Sum(Decode(bytes,NULL,0,bytes)) used_bytes FROM dba_extents GROUP by file_id) e, (SELECT Max(bytes) free_bytes, file_id FROM dba_free_space GROUP BY file_id) f, (SELECT file_id, max(block_id) maximum from dba_extents group by file_id) b, dba_extents c, (SELECT value db_block_size from v$parameter where name='db_block_size') d WHERE e.file_id (+) = df.file_id AND df.file_id = f.file_id (+) AND df.file_id = b.file_id and c.file_id = b.file_id and c.block_id = b.maximum ORDER BY df.tablespace_name, df.file_name /
I am sorry about the RULE hint but it really runs faster with RBO.
Two key benefits of regular database reorganization are:
- better performance as data and indexes are spread over less blocks and thus less I/O and CPU are needed to access the data (and less memory of course)
- backups take less time and occupy less storage and tape (accordingly restore & recovery will be faster)
3. Databases are copied one-to-one from environment to another when data refresh is needed. Just like that. Copy and Paste. Not really, but often with the RMAN’s command “duplicate target database to ..” Have a look at what Oracle has to offer these days: the Oracle Test Data Management Pack for Oracle and non-Oracle Databases. The key benefits of the pack are:
- Reduce application storage costs dramatically through data subsetting
- Increase developer productivity through right-sized production copies
- Eliminate labor-intensive error-prone manual process to create test systems by automating application discovery, data relationship modeling, dynamic subset rules and parameter-driven execution
One really doesn’t need all the data from Production in the Development or the Test environment!
4. Data growth is often not monitored and seldom one knows which data can be deleted. I have just 3 words for this: “Oracle Cloud Control”.
A very detailed explanation of the problem is described by IBM in Control application data growth before it controls your business. Let me quote part of the executive summary:
“What are these effects? The most obvious are complexity and risk, along with storage and management costs. Data that has accumulated at a significant rate is highly distributed over multiple applications, databases and platforms creating complicated data relationships that are difficult to define, understand, control and house. Also, managing years of historical data in production databases can impact service levels and disaster recovery initiatives. Expanding online and batch processing windows, along with routine maintenance tasks, takes much more time and can reduce application availability significantly.”
Wonder why people started all of a sudden talk about Big Data?