Dontcheff

Archive for the ‘DBA’ Category

What else besides Automatic Indexing is new in Oracle Database 19c?

In DBA, Init.ora, New features, Oracle database, Security and auditing, SQL on April 23, 2019 at 10:53

Automatic Indexing (AI) is probably the most important new feature of Oracle Database 19c and AI is arguably one of the best example of AI in the IT industry. But there is much more that came along with 19c.

Here is my choice of the top 10 least known (for now at least) new features of Oracle Database 19.2 (in no order whatsoever):

1. Security: if the database password file name or location has been changed and the metadata cache needs to be refreshed with the details of the updated database password file, you can do it simply with the following command:

SQL> alter system flush passwordfile_metadata_cache;

System altered.

In 19c, most of the Oracle Database supplied schema-only accounts now have their passwords removed to prevent users from authenticating to these accounts. But DBAs can still assign passwords to the default schema-only accounts.

2. SQL: the LISTAGG aggregate function now supports duplicate elimination by using the new DISTINCT keyword.

There are also a new bitvector SQL operators can be used to speed up COUNT DISTINCT operations within a SQL query.

3. Active Data Guard DML redirection: You can now run DML on the Active Data Guard standby database. When run on the standby side, the update is passed to the Primary database where it is executed and the resulting redo of the transaction will update the standby after which control will be returned to the application.

4. There are 6 new init.ora parameters in 19c. DBAs can now tune in DataGuard the amount of wait time by using DATA_GUARD_MAX_IO_TIME and DATA_GUARD_MAX_LONGIO_TIME. You can check details for all six:

ADG_REDIRECT_DML
DATA_GUARD_MAX_IO_TIME
DATA_GUARD_MAX_LONGIO_TIME
ENABLE_IMC_WITH_MIRA
LOB_SIGNATURE_ENABLE
MAX_DATAPUMP_PARALLEL_PER_JOB

5. You can now create Materialized Views containing bitmap-based COUNT(DISTINCT) functions. This means that the MVs are based on SQL aggregate functions that use bitmap representation to express the computation of COUNT(DISTINCT) operations.

SQL> create materialized view MV_SALES as
select CLIENT, 
BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(PRICE),'RAW') bm_price
from SALES
group by CLIENT,BITMAP_BUCKET_NUMBER(PRICE);  

Materialized view created.

6. Looks like there is now automatic resolution of SQL plan regressions: “SQL plan management searches for SQL statements in the Automatic Workload Repository (AWR). Prioritizing by highest load, it looks for alternative plans in all available sources, adding better-performing plans to the SQL plan baseline. Oracle Database also provides a plan comparison facility and improved hint reporting.”

7. Real-Time Statistics is also a new cool feature. “Oracle automatically gathers online statistics during conventional DML operations. Statistics can go stale between execution of DBMS_STATS statistics gathering jobs. By gathering some statistics automatically during DML operations, the database augments the statistics gathered by DBMS_STATS. Fresh statistics enable the optimizer to produce more optimal plans.”

8. Hybrid Partitioned Tables: now large portions of a table can reside in external partitions, for example in the Oracle Cloud. With this new feature, you can also easily move non-active partitions to external files, such as Oracle Data Pump files, for a cheaper storage solution. Hybrid partitioned tables support all existing external table types for external partitions: ORACLE_DATAPUMP, ORACLE_LOADER, ORACLE_HDFS, ORACLE_HIVE.

9. Data Pump:

– Oracle Data Pump allows tablespaces to stay read-only during TTS import.
– Oracle Data Pump can work in test mode for transportable tablespaces.
– Oracle Data Pump supports resource usage limitations with the introduction of two new parameters: MAX_DATAPUMP_JOBS_PER_PDB and MAX_DATAPUMP_PARALLEL_PER_JOB.
– Data Pump no longer enables secure, password-protected roles by default. Beginning with 19c, you must explicitly enable password-protected roles for an individual export or import job. A new command-line parameter has been added, ENABLE_SECURE_ROLES=YES | NO that can be used to explicitly enable or disable these types of roles for an individual export or import job.
– The new Data Pump command-line parameter CREDENTIAL enables secure import into a managed service from dump files in the Oracle Object Store Service.

10. SQL Quarantine: “SQL statements that are terminated by Oracle Database Resource Manager due to their excessive consumption of CPU and I/O resources can be automatically quarantined. The execution plans associated with the terminated SQL statements are quarantined to prevent them from being executed again.”

Check out the new Oracle package DBMS_SQLQ – cool stuff!

Advertisements

An Interview with Kerry Osborne

In DBA on March 19, 2019 at 15:28

No introductions needed, the only and only Kerry Osborne:

Tell us what you remember about your first experiences with Oracle 2

I was just out of college – my first job. The first day they handed me a set of Oracle manuals (I think there was only one book actually). Maybe a half an inch thick. It was running on Vax/VMS and it stack dumped on a pretty regular basis. I figured out a couple sure fire ways to make it it crash. 🙂 The company was already using it and had written a bunch of Fortran OCI stuff that was pretty cool. (I had actually worked on some of the Fortran code while I was still in college – that’s how I got the job). That system had a fairly large subroutine library in object file format only. They had some how lost the source code. Fortunately we had a little bit of documentation that at least showed the input and output parameters of each subroutine. It was my first experience with having to reverse engineer software, which came in handy later in my career when I started trying to figure out how the Oracle optimizer worked. 🙂

Your ponytail and cap Kerry are in a way trademarked by you. How did it all start?

I don’t really know. I just started wearing a cap most of the time after I got out of college and got used to it and now it’s just like putting on my pants in the morning. No I don’t sleep with my cap on (unless I’m laying on my back and need something to block out the light). 🙂 Tim Hall wrote a very funny blog post about my cap at one point. https://oracle-base.com/blog/2014/09/26/an-interview-with-kerry-osborne/

The pony tail thing is a bit of a nod to my native American heritage. My great grandmother was a Cherokee (just a handed down story, but I believe it to be true). But the real reason I let my hair grow is that my brain wants to optimize pretty much every process that I encounter. So at one point I went to get a haircut and as I was sitting in the barber chair, I started thinking about how to optimize that process. I actually wrote about it on my blog here: http://kerryosborne.oracle-guy.com/2009/05/shopping-trip/ But the bottom line is that I calculated that over the course of my life I would loose almost a year by repeatedly going to get my hair cut. And since the quickest way to do anything is not to do it at all. I stopped.

Which is your favorite Oracle version?

I don’t really have a favorite. (I guess 11.2.0.4 is my favorite of the ones commonly in use at the moment)

Arguably you are the most famous DBA worldwide, how many presentations have you given in total at conferences, meetings, webcast, etc.

Nah – I’m definitely not! I don’t really know how many talks I’ve delivered. I don’t keep up with that stuff very well (that’s why I’m not an active Oracle Ace Director anymore actually – too time consuming to keep up with that and fill out the required paperwork). Oddly enough, when I was in my 20’s, public speaking was my greatest fear. In fact, the first paper I submitted to IOUW (the predecessor to OOW) got accepted, but I backed out on doing the presentation because I was too scared. I had to convince one of my co-workers to go and deliver the paper for me. I worked hard for a few years to try to rid myself of that irrational fear, but the thing that really allowed me to get over it was when I started teaching week long classes. There’s nothing like repetitive practice, and standing in front of a class 8 hours a day for a week at a time provided plenty of that. So I still have a little pit in my stomach before every talk, but I actually enjoy it now. 🙂

How do you see the future of the Oracle database both on premise and in the cloud?

I think Oracle will continue to be a vital platform in enterprises. I believe they have the best SQL engine on the planet. They are great at managing transactions, consistency and concurrency. So for those reasons, and the fact that there are tons of existing applications built on their platform, I believe that they will be around and highly valuable for the foreseeable future. I think that many systems will eventually run in public clouds but I don’t believe that the whole landscape will migrate to the cloud. I believe most companies will end up with a hybrid cloud strategy, combining on-prem with multiple public clouds. However, there are a number of other tools available that are making a major push into the Data Warehouse / Reporting / Analytics space. So this is the area where I see the most competition for Oracle.

Do you think the Autonomous Database puts at risk the jobs of many DBAs around the world?

Not really. I do think that the job of a DBA is changing. There are various flavors of DBA’s and some of them (maybe all of them) will need to change their focus. By that I mean, that they’ll need to be more conversant with development. They need to be automating repetitive tasks. They need to be solving higher level problems. But I don’t think Oracle’s Autonomous Database offerings will eliminate a lot of jobs. It will just change them. That said, I think the biggest area of change will be for the people that are doing tasks that are repetitive. Adding storage, installing databases, patching databases, etc… These types of activities will be automated. So start learning how to automate them yourself instead of waiting for someone else to do it. That sets you up with the skill to automate other tasks and moves you into a space that will have more demand. And by the way, that’s basically a development task, so at a very high level, I guess I’m saying that I think people should be moving away from infrastructure type jobs and towards development oriented jobs.

How did you manage to create Enkitec as an exquisite top skill group? Such a talent is seldom concentrated in one single company!

The word exquisite is a bit over the top. 🙂 But I am very proud of the team that came together at Enkitec. I did a short 10 min Ted Talk styled presentation at OakTable World on recruiting world class talent a few years ago. A couple of the key points were that there are some basic requirements – team members need to be smart, have a good technical background, have worked for a reasonably long period of time (10+ years), and have excellent communication skills. But they need 2 other things that are even more important. They need to have the ability to apply their skills to creatively solve tough problems. This is not a given based on having the basic requirements covered. It’s a unique characteristic, which we can improve on, but which I believe is at least partly an innate ability. The 2nd important requirement is that they need to NOT have a big ego. The ability to work together as a team and share knowledge requires people that don’t view themselves as smarter/better than the people around them and with a view that everyone around them has value and something to offer, something they can learn from. This was one of the keys to building the team that we built. Nothing is perfect, but I am very proud of the results and the team we built. There is way too much to say in a brief answer, so maybe it’s better to just point to that video: https://www.youtube.com/watch?v=vI9do1DEm08

You have probably given more keynotes than anyone else. Which are you recent and upcoming topics of interest?

I don’t think that’s true, but I do a few here and there. I just did one called “How to Stay Relevant” at the Rocky Mountain User Group Conference in February. I’m also speaking at the UKOUG’s conference in Dublin in April and I am hoping to do another keynote at the Bulgarian User Group meeting this summer. The “How to Stay Relevant” talk is about the disruption that’s currently going on in our industry and it focuses on the effects on those of us who work with data (and Oracle specifically). In terms of what I’m interested in these days, distributed systems are high on the list. Any platform that moves the work to the data and that allows access via a SQL interface is of interest to me.

What is your take on ML, AI and Big Data? Are these just new trends in IT that come and go?

No, I don’t think they are just buzz words! Unfortunately the hype cycle can quickly cause over-used terms to become a turn off. But the general idea of artificial intelligence is not going to just fade away as some trendy ideas do. So with regards to AI (and the subset of Machine Learning) I think we are still early in the adoption and will see more and more of it. We’ll also see a lot of things that are labeled as AI, which are really not. So we’ll have to learn to be educated consumers. Oracle’s optimizer does some things that could be viewed as AI for example. The ability to recognize that it’s made a mistake in it’s determination of a plan is very smart, but also very narrowly focused and does not really use “AI techniques” to make those determinations. But I have no doubt that the kernel developers are thinking about ways to forge ahead into that space. And there are a lot of successes already, specifically with narrowly focused applications (so called ANI). The champions of most intellectual games are AI engines now, just as an example. Artificial General Intelligence is still a long way off in my view, but the Law of Accelerating Returns could kick in and blow us all away much more rapidly than we expect.

Big Data is also a trendy term that has been around a long time and that has been over used. But when I think of big data these days, I think more about how we deal with the ever increasing volumes (often produced by machines) that we are called on to manage. We also have increasingly complex kinds of analytics and more advanced processing (AI/ML) that we need to do on the large data sets. This requires highly scalable, distributed computing platforms that are capable of dealing with the volume in a predictable (scalable) way as well as in an economical way. Hadoop is somewhat synonymous with the term Big Data because it was the first (and still is the most popular) such framework in the open source community. As with most technology swings though, Hadoop has gone through a lot of hype (when people thought it would solve all of our technological problems), then a period of disillusionment (when everyone realized this technology, like all others, has it’s limitations/tradeoffs), and is now entering the phase of reasonable expectations, which has led to widespread adoption. This is the classic hype cycle that Gartner uses so heavily. But my view is that Big Data (Hadoop or other proprietary or public cloud based offerings) is definitely not going away, and in fact I believe it will continue to be a huge force of change in our industry.

What single piece of advice would you give to the new DBAs entering the database profession?

Focus on the data, not the database. …

Hard to just leave it at just that: Focus on supporting or becoming a cross over developer. Focus on architecture and optimization topics. … But the one liner is very quotable, right?

Some DBAs come from sysadmin background and some from apps/applications. How did you become a DBA?

I started as an application developer. Then I started getting frustrated with the lack of control (power) to do what I wanted at the database level and I moved into a DBA role. Then I got frustrated with having to ask sysadmins for help and I became a sysadmin. Then I started doing jobs that combined all 3 aspects. I eventually got focused on optimization of existing systems which combined all of the jobs in my background and I started to learn about the internals of Oracle. And the more complicated the optimizer became, the more important it was to know how it made it’s decisions. I never was the absolute best at any of these jobs btw, but I think being competent in all these areas gave me a lot of advantages that I wouldn’t have had otherwise. Probably the main advantage was that it made it possible for me to look at problems from multiple angles. I’ve worked with a ton of DBAs over the years and some of the very best are the guys and gals with multiple skills sets (Network Admin/DBA, Storage Admin/DBA, Unix Admin/DBA, Developer/DBA, …) And of course having experience building software is a huge plus for anyone in the “computer” business. It gives you the ability to think to yourself, “Now if I had been the developer on this, how would I have coded it?”. This is an extremely powerful view into problems when trying to resolve software issues.

If you can add one new feature to the Oracle database, what would that be?

There are too many features already. Focus on fixing the bugs!

Tell us about Gluent and its flagship product?

I thought you’d never ask. …

We founded Gluent to address one of the biggest issues we’ve seen over the last decade or so in our work with large databases in the enterprise space. Fundamentally, Gluent allows existing applications to take advantage of big data platforms, and here’s the most important part, without re-writing the applications. I think the reason for founding the company is best illustrated by a story about a large medical company that reportedly had one of the largest Teradata clusters in the world. The whispered cost was $40M a year maintaining that system and there was a desire to migrate off that platform for cost reasons. A 6 month project was undertaken to see if it was feasible without degrading the performance of the system. The results were pretty amazing. The performance was increased on average over 1000X across a huge number of queries (all the queries from a days worth of work). The annual cost of the proposed solution was about $1M. Nevertheless, the company decided not to re-platform the system. The reason? They were calculating a 3 year ROI, and despite the fact that the projection showed a saving of nearly $120M, their assessment of the risk and cost to re-write the very complex application out weighed the savings. This is an extreme example of a story that plays out every day in the enterprise space and is the reason why Gluent was initially formed.

We have a few core beliefs:

We believe data volumes are exploding
We believe that legacy apps are not going away
We believe that data will continue to be born in silos but will need to joined with other data
We believe that most new apps will be built on distributed back-ends and on data that is stored in open data formats
We believe that legacy skills need to be transitioned, not thrown away
We believe that most data will migrate to a centralized storage/compute platform
We believe that companies need to be able to change the wings on the plane while it is in the air

After an implementation of Gluent, the resulting system has a hybrid architecture which blends a distributed back-end with a relational database. The goal being for both components to do what they are best at. The relational database being really good at managing transactions, concurrency and consistency and the distributed back-end excelling at throughput, scalability, and doing so economically. It also allows hybrid cloud deployments where the database component can remain on-prem while the distributed back-end can be deployed in the cloud with some or all of the data in relatively low cost object storage. This sort of architecture allows a great deal of flexibility around data placement between some in object storage, some local to a distributed cluster, and some local to the database tier. It also provides flexibility with regards to security, as some (sensitive) data can be contained strictly on-prem, while less sensitive data is moved out to the cloud, as an example.

The ultimate goal is to connect any applications to any data and become a data sharing platform. We believe that data will continue to be born in silos, but that it should be synchronized (once) to a distributed backend and then presented to any engine/system that wants to access it (and has permissions to do so). We also have as a gleam in our eye, the concept of Gluent as a Service. Which would remove the need to manage the distributed backend. We are already certified on S3/EMR for example (Azure and Google coming in the next couple of months) which allows up to spin up a cluster on demand as needed and shut it down when not in use. We are not quite there yet with the complete Gluent as a Service offering, but we are getting close. 🙂

I could go on (and on and on …). But hopefully you get the idea. We believe that our approach allows existing applications to take advantage of these magical new platforms, but without breaking anything. That’s our secret sauce.

Which is the biggest mistake you have ever made as a DBA?

Ha – that made me laugh! Early in my career, when I first gained sysadmin privileges, I was still doing DBA and development work. So one night I was working remotely via a 1200 baud modem on a program of some sort (probably Fortran OCI accessing Oracle V3). Anyway, each test run was taking several minutes and I noticed that there were several other processes running night time batch jobs on the system and they were burning a lot of CPU. So I decided that make use of my newly gained sysadmin privileges and I upped my priority so that my test runs wouldn’t have to compete so much for CPU time. Well in those days, the operating systems weren’t as smart about prioritization as they are now. So my process was effectively getting all the CPUI time until it completed (and was much faster – yay!). Unfortunately, at some point I made a change that threw my code into an infinite loop. I tried to break out after a few minutes and unfortunately the modem connection dropped. Since my original process was getting all the available CPU time, there wasn’t even enough left to get logged back into another session to kill the rogue process. The program finally died when it had used up all the disk space. The real sysadmins were not happy with me the next morning!

If you can ask yourself a question, and not answer it, what would that be?

What do you want to be when you grow up?
I still haven’t figured that out!

AI in AI: Artificial Intelligence in Automatic Indexing

In DBA on March 7, 2019 at 17:30

“By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.” — Eliezer Yudkowsky

CNBC: 40% of A.I. start-ups in Europe have almost nothing to do with A.I.

Oracle 19c brings one key feature which does not exist in database systems: Automatic Indexing. Something very similar does exit in Azure SQL Database but with some limitations.

For a very long time, both DBAs and Developers, have been struggling (really struggling) with what indexes should be created, what type of indexes they should be created as and what indexes should be dropped from the database. Automatic Index creation (AI Creation) means the explicit creation of new indexes and also dropping existing, unused indexes without human intervention.

In the long run, this is to be arguably one of the most important features in the Oracle database. I have already covered the basics in a previous blog post entitled Automatic Indexing in 19c. The expert system works in the following way passing through the stages of identification, verification and decision making:

Based on the captured workload, Oracle’s expert system identifies the index candidates which are created first as UNUSABLE & INVISIBLE (metadata only):

Then, there is the verification process. Some indexes will become VALID (physical segments are created) but will still stay INVISIBLE to the optimizer.

Later, Oracle decides if some of these indexes can become VISIBLE and this happens based on how the performance increases and how these new indexes affect other activities in the database.

Look for possible error using this query:

select EX.execution_type, EX.execution_name,F.message
from DBA_ADVISOR_FINDINGS F, DBA_ADVISOR_EXECUTIONS EX 
WHERE F.EXECUTION_NAME = EX.EXECUTION_NAME AND F.TYPE = 'ERROR';

If you need a detailed report from (say) the last 30 days, here is how to obtain it:

spool report
select dbms_auto_index.report_activity(sysdate-30,null,'text','all','all') report from dual;
spool off

A sample report shows beside the index candidates, space used, fatal errors also the overall improvement factor and also the SQL statement improvement factor:

When using/implementing the feature, have in mind the following:

– AUTO_INDEX_MODE must be set in every PDB: even set on container level it is not cascading to the pluggable databases
– Manually created indexes are nor dropped by default, you need to set separately AUTO_INDEX_RETENTION_FOR_MANUAL
– Follow the expert system runs from CDB_AUTO_INDEX_EXECUTIONS
– Hint for an INVISIBLE VALID index (for example /*+ index(clients SYS_AI_64uvm6wb5168u) */): I have seen how the index becomes VISIBLE in a second (if really useful)

For more details, check the recent bog post 19c Auto Index: the dictionary views by Franck Pachot

Automatic Indexing is by far one of the best examples of Artificial Intelligence and Machine Learning in the IT Industry. Really! I still remember a 5TB Oracle database I used to administer (mission critical one, a 24×7 system) where the indexes were almost 4.5TB in size while the real data was like half a TB only.

Automatic Indexing in 19c

In Autonomous, Database tuning, Databases, DBA, Oracle database on February 18, 2019 at 17:38

One of the most impressive new features of Oracle Database 19c is Automatic Indexing. Arguably, this is the most interesting innovation in the database world for a rather long time.

I remember some years ago when a DBA asked me at an Oracle conference: “Julian, why are half of the presentations at Oracle database conferences only about performance tuning? Is the Oracle database performing that badly that people should tune it all the time?” Sigh…

With 19c and ADB (Oracle Autonomous Database), things look very different now, don’t they? Automatic Indexing provides what database systems need: continuous optimization of the database workload, stable & solid performance and almost no human interaction. Let me share some of my early experience with Automatic Indexing and where human interaction is needed.

For now (February 18th, 2019), Oracle 19c is only available on Exadata (Linux 7.4) and in order to enable Automatic Indexing you need to do the following:

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

As you can guess from the picture above, the so called expert system of Automatic Indexing runs every 15th minute for as long as one hour. Note that I disabled the job from 4:43 till 5:56. The Resource Manager plan limits the task to 1 CPU only and the next run is skipped if the job has not been completed within the 15 minutes.

Here are details on how Automatic Indexing works but what is most important to remember is as follows:

– The auto index candidates are created as invisible auto indexes
– If the performance of SQL statements is not improved from the auto indexes, then the indexes are marked as unusable and the corresponding SQL statements are blacklisted
– Auto indexes cannot be used for any first time SQL run against the database
– Auto indexes are created as either single, concatenated indexes or function-based indexes and they all use advanced low compression
– The unused auto indexes are deleted after 373 days (can be changed)
– The unused non-auto indexes (manual indexes) are never deleted by the automatic indexing process but can be deleted automatically if needed

The Auto Indexing can be disabled at any time or can be set to set to reporting mode (new auto indexes as created asinvisible indexes, so that they cannot be used in SQL) with the following commands:

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','OFF');

 

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','REPORT ONLY');

Here is a way to ask Oracle to create new auto indexes in a separate tablespace called AUTO_INDEX_TS:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEX_TS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01

You can easily check the configuration for Automatic Indexing for the root container and the PDBs from CDB_AUTO_INDEX_CONFIG;

If you need a report of what happened during the expert system activity (either last 3 days or during the last activity), here is a way to generate it:

set long 300000
select DBMS_AUTO_INDEX.REPORT_ACTIVITY(SYSTIMESTAMP-3,SYSTIMESTAMP,'TEXT','ALL','ALL') from dual;
select DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY('TEXT','ALL','ALL') from dual;

These are the most important views about Auto Indexing:

DBA_AUTO_INDEX_EXECUTIONS: history of execution of automatic indexing tasks
DBA_AUTO_INDEX_STATISTICS: statistics related to auto indexes
DBA_AUTO_INDEX_IND_ACTIONS: actions performed on auto indexes
DBA_AUTO_INDEX_SQL_ACTIONS: actions performed on SQL statements for verifying auto indexes
DBA_AUTO_INDEX_CONFIG: configuration settings related to auto indexes
DBA_AUTO_INDEX_VERIFICATIONS: stats about PLAN_HASH_VALUE, AUTO_INDEX_BUFFER_GETS, etc.

The new package DBMS_AUTO_INDEX can be used for 3 main things:

1. Configuration of the parameters related to Auto Indexing
2. Drop *all* the indexes except the ones used for constraints
3. Report the activity of the “expert system”:

Finally, here are some additional resources:

Automatic Indexing in Oracle Database 19c
Oracle Database 19c is now available!
Managing Auto Indexes

How to check if I have any auto indexes in my database: select auto, count(*) from dba_indexes group by auto;

Few interesting facts about Oracle ADB, Redshift and Snowflake

In Autonomous, Data Warehouse, Databases, DBA on January 14, 2019 at 16:17

Building a new data warehouse in the cloud or migrating an existing one to cloud requires careful consideration and the answer to the question “Which cloud should I use?” is often “It depends”.

An interesting comparison of system properties comparing Amazon Redshift vs. Oracle vs. Snowflake can be found on db-engines.com

There are several other options too: Azure SQL Data Warehouse, Presto, Google BigQuery, etc.

An interesting benchmark paper called “Data Warehouse Benchmark: Redshift, Snowflake, Azure, Presto and BigQuery” by Fivetran is worth reading!

Another comparison called Interactive Analytics: Redshift vs Snowflake vs BigQuery is already more than 2 years old but still interesting.

Recently, things have changed. Oracle’s Autonomous Data Warehouse Cloud has been in GA for almost 1 year (since March 2018). ADW is for enterprise loads and mission critical systems arguably the best solution right now.

Viscosity compared both Oracle Autonomous and Amazon Redshift. The result? Check it here: Amazon vs Oracle: Data Warehouse Services, How do They Compare?

In short, the conclusion of the research above is:

– Oracle’s ADW was able to achieve data retrieval at the lowest latencies, and achieved the highest volume of queries per hour. In terms of serial query execution and multi-user query throughput.
– Oracle’s ADW consistently outperformed Redshift by a factor of 4x in both sets of tests.

And do not ignore the db-engines ranking! Only one of the three is in the Top 10.

What is interesting to know on top of all papers above are these 10 differences or let us call them less known technical facts (in no order of importance) between Oracle Autonomous, Amazon Redshift and Snowflake:

1. Snowflake compute usage is billed on a per-second basis, with a minimum of 60 seconds. Amazon Redshift is based on PostgreSQL 8.0.2 and is built on top of technology from the MPP data warehousing company ParAccel. Oracle Autonomous Database is based on Exadata and 18c.

2. In Oracle Autonomous Cloud, you can provision up to 128 CPUs and 128TB directly from the cloud console but you can provision more if needed.

3. Snowflake manages all aspects of how data is stored in S3 including data organization, file sizes, structure, compression, and statistics.

4. The only things needed for BYOL in Oracle Autonomous Database are Multitenant and RAC (only when using more than sixteen OCPUs). The standby option (not yet available) will require Active Data Guard as well.

5. Snowflake does not disclose the information about processing power and memory. Oracle do disclose the information via internal views but you cannot directly define the SGA or PGA size.

6. Redshift is not built as a high-concurrency database with several concurrent running queries and AWS recommends that you execute no more than 15 queries at a time. The number of concurrent user connections that can be made to a cluster is 500.

7. Oracle ADW and ATP allow you to partition both indexes and tables. In Snowflake partitioning is handled internally. Amazon Redshift does not support tablespaces, table partitioning, inheritance, and even certain constraints. Amazon Redshift Spectrum supports table partitioning using the CREATE EXTERNAL TABLE command.

8. The maximum number of tables in Amazon Redshift is 9,900 for large and xlarge cluster node types and 20,000 for 8xlarge cluster node types. The limit includes temporary tables. An Oracle database does not have a limit for the number of tables.

9. Oracle automatically applies all security updates (and online!) to ensure data is not vulnerable to known attack vectors. Additional in-database features like Virtual Private Database and Data Redaction are also available.

10. There is no operation in Snowflake for collecting database statistics. It is handled by the engine. In Oracle, database statistics collection is allowed. Both Oracle Autonomous and Amazon Redshift monitor changes to your workload and automatically update statistics in the background.

Finally, here are official URLs of all three products:

Oracle Autonomous Database
Amazon Redshift
Snowflake Database

Autonomous Data Warehouse, Autonomous Transaction Processing or Something Else?

In DBA on November 30, 2018 at 14:37

First things first: there is nothing else. Let me explain why.

Both Forbes and the Wall Street Journal wrote about the top 5 industry early adopters of Autonomous Systems.

According to the article, “in the IT industry, the pioneering product is Oracle’s Autonomous Data Warehouse Cloud, a cloud-based database that configures, optimizes and patches itself with minimal human intervention. Oracle Executive Chairman and CTO Larry Ellison says the machine learning technology that underpins the company’s autonomous data warehouse, as well as autonomous integration, developer, mobile and other platform services that will follow, is as revolutionary as the internet.”

To make it clear, the new Autonomous Data Warehouse and the Autonomous Transaction Processing databases are not based on newly written software. It is the same Oracle database with a lot of automation and mathematical algorithms embedded into the original database software. Think of machine learning and computer intelligence.

If you are looking for something similar among other database brands – good luck! Finding all areas of Self-Securing, Self-Automation and Self-Repairing outside Oracle Autonomous Database Cloud is mission impossible. And here are the areas:

Four Areas of Self-Securing of Autonomous Databases:

1. Self-securing starts with the security of the Oracle Cloud infrastructure and database service. Security patches are automatically applied every quarter or as needed, narrowing the window of vulnerability. Patching includes the full stack: firmware, operating system [OS], clusterware, and database. There are no steps required from the customer side.

2. Oracle encrypt customer data everywhere: in motion, at rest, and in backups. The encryption keys are managed automatically, without requiring any customer intervention. And encryption cannot be turned off.

3. Administrator activity on Oracle Autonomous Data Warehouse Cloud is logged centrally and monitored for any abnormal activities. Oracle have enabled database auditing using predefined policies so that customers can view logs for any abnormal access: UNIFIED_AUDIT_TRAIL

4. Built upon Oracle Database Vault, unique to Oracle Cloud, operations personnel have privilege to do all administrative tasks without any ability to ever see any customer data.

Four Areas of Self-Automation of Autonomous Databases:

1. Self-Automation: automatic provisioning of pluggable databases and automatic scaling – PDB resource manager.

2. Automatic tuning: SQL Plan Management, Adaptive Plans, SQL Tuning Advisor – Automatic SQL Tuning, Storage Indexes, Automatic Storage Management, Automatic detection and correction of regressions due to plan changes, Automatically tune memory, process, sessions.

3. Automatic Fault Tolerant Failover: RAC and Data Guard. Automatically kill run-away transactions and SQL. Automatically kill inactive session.

4. Automatic Backup and Recovery: RMAN, Flashback.

Seven Areas of Self-Repairing of Autonomous Databases:

Both Maria Colgan and Franck Pachot wrote on the differences between ADW and ADP:

How does Autonomous Transaction Processing differ from the Autonomous Data Warehouse? by Maria Colgan

ATP vs ADW – the Autonomous Database lockdown profiles by Franck Pachot

But here are in short the four main areas of differences between ADW and ADP:

1. Settings and parameters:
– In ADW: the majority of the memory is allocated to the PGA – joins, aggregations in memory
– In ATP: the majority of the memory is allocated to the SGA – minimize I/O

For DBAs: ADW runs on 94 non-default parameters out of which 35 are underscore. In ATP, the corresponding numbers are 94 and 36. Not same 94 though! And these numbers may slightly vary.

2. Data formats:
– In ADW: data is stored in a columnar format as that’s the best format for analytics processing – ADW uses DBIM option features like in-memory columnar flash cache under the covers
– In ATP: data is stored in a row format

3. Statistics/CBO:
– In ADW: statistics are automatically maintained as part of bulk load and DBMS_CLOUD activities
– In ATP: statistics are automatically gathered when the volume of data changes significantly enough to make a difference to the statistics

4. Client services/connections:
– In ADW: only one service (LOW) automatically runs SQL statements serially, all is parallel
– In ATP: the PARALLEL service does no longer exist (as of 12.11.2018)

FAQ for Oracle Autonomous Database

In order to show the other side of the coin, here are two perspective from IBM and SAP point of view:

Oracle Autonomous Database – is it truly self-driving? by Danny Arnold

How Real is The Oracle Automated Database? by Shaun Snapp

But if you prefer more neutral reading check Oracle’s next chapter: The Autonomous Database and the DBA and Will Autonomous Database Entice Big Business To The Cloud?

Bottom line: if you need extreme high reliability, top-level security, 100% automation of DBA routine tasks and no funny surprises – start testing and using the Oracle Autonomous Database. Really!

Amazon’s Aurora and Oracle’s Autonomous ATP

In Autonomous, Cloud, DBA, PostgreSQL on August 29, 2018 at 09:26

Databases are very much like wine, cheese and trees: they get better as they age.

Amazon Aurora exists since 2015. The word aurora comes Latin, means dawn. The name was borne by the Roman mythological goddess of dawn and by the princess in the fairy tale Sleeping Beauty.

Both Amazon’s “dawn” Aurora and Oracle’s ATP are typical cloud OLTP systems.

The question is: what are their differences, which one is better and meant exactly for my needs?

Oracle ATP is based on Oracle’s database and Exadata, here are all the innovations adopted from both systems:

Amazon’s Aurora has 2 flavors: Amazon Aurora MySQL and Amazon Aurora PostgreSQL.

Amazon Aurora MySQL is compatible with MySQL 5.6 using the InnoDB storage engine. Certain MySQL features like the MyISAM storage engine are not available with Amazon Aurora. Amazon Aurora PostgreSQL is compatible with PostgreSQL 9.6. The storage layer is virtualized and sits on a proprietary virtualized storage system backed up by SSD. And you pay $0.20 per 1 million IO requests.

Oracle’s Autonomous database comes also in 2 flavors: Oracle ADW and Oracle ATP. Check Franck Pachot’s article ATP vs ADW – the Autonomous Database lockdown profiles to see the differences of both cloud databases.

In general, one can compare Oracle ADW with Amazon Redshift and Oracle ATP with Amazon Aurora.

One way to compare is to look at the ranking provided by DB-Engines: Amazon Aurora vs. Oracle. No-brainer who the leader is: score of 1300 vs score of 5 in favor of Oracle.

Another interesting comparison comes from Amalgam Insights. Check how Oracle Autonomous Transaction Processing lowers barriers to entry for data-driven business. Check out the DBA labor cost involved: 5 times less in favor of Oracle ATP compared to Amazon! All the routine DBA tasks have been totally eliminated.

The message from them is very clear: “Oracle ATP could reduce the cost of cloud-based transactional database hosting by 65%. Companies seeking to build net-new transactional databases to support Internet of Things, messaging, and other new data-driven businesses should consider Oracle ATP and should do due diligence on Oracle Autonomous Database Cloud for reducing long-term Total Cost of Ownership.”

This month (August 2018), there was an interesting article by Den Howlett entitled Oracle introduces autonomous transaction processing database – pounds on AWS. Here are 2 interesting and probably correct statements/quotes from there:

1. It really is hard to get off an established database, even one that can be as expensive as Oracle can turn out to be.
2. Some of the very largest workloads will not go to the public cloud anytime soon. Maybe never which in internet years is after 2030.

As a kind of proof of how reliable and fast Oracle’s Autonomous Transaction Processing database is consider the following OLTP workload running non-stop in a balanced way without any major spikes and without a single queued statement!

No human labor, no human error, and no manual performance tuning!

Migrating Amazon Redshift to Autonomous Data Warehouse Cloud

In Autonomous, Data Warehouse, DBA, Exadata, PostgreSQL on July 4, 2018 at 18:34

“Big Data wins games but Data Warehousing wins championships” says Michael Jordan. Data Scientists create the algorithm, but as Todd Goldman says, if there is no data engineer to put it into production for use by the business, does it have any value?

If you google for Amazon Redshift vs Oracle, you will find lots of articles on how to migrate Oracle to Redshift. Is it worth it? Perhaps in some cases before Oracle Autonomous Data Warehouse Cloud existed.

Now, things look quite different. “Oracle Autonomous Data Warehouse processes data 8-14 times faster than AWS Redshift. In addition, Autonomous Data Warehouse Cloud costs 5 to 8x less than AWS Redshift. Oracle performs in an hour what Redshift does in 10 hours.” At least according to Oracle Autonomous Data Warehouse Cloud white paper. And I have nothing but great experiences with ADWC. For the past half an year or so.

But, what are the major issues and problems reported by Redshift users?

One of the most common complaints involves how Amazon Redshift handles large updates. In particular, the process of moving massive data sets across the internet requires substantial bandwidth. While Redshift is set up for high performance with large data sets, “there have been some reports of less than optimal performance,” for the largest data sets. An article by Alan R. Earls entitled Amazon Redshift review reveals quirks, frustrations claims that reviewers want more from the big data service. So:

Why to migrate from Amazon Redshift to Autonomous Data Warehouse Cloud?

1. Amazon Redshift is ranked 2nd in Cloud Data Warehouse with 14 reviews vs Oracle Exadata which is ranked 1st in Data Warehouse with 55 reviews.

The top reviewer of Amazon Redshift writes “It processes petabytes of data and supports many file formats. Restoring huge snapshots takes too long”. The top reviewer of Oracle Exadata writes “Thanks to smart scans, the amount of data transferred from storage to database nodes significantly decreases”.

2. Oracle Autonomous dominates in features and capabilities:

DB-engines shows an excellent system properties comparison of Amazon Redshift vs. Oracle.

In addition, reading through these thoughts on using Amazon Redshift as a replacement for an Oracle Data Warehouse can be worthwhile. It shows how Amazon Redshift compares with a more traditional DW approach. But Enterprises have some Redshift concerns, including:

– The difference between versions of PostgreSQL and the version Amazon uses with Redshift
– The scalability of very large data volume is limited and performance suffers
– The query interface is not modern, interface is a bit behind
– Redshift needs more flexibility to create user-defined functions
– Access to the underlying operating system and certain database functions and capabilities aren’t available
– Starting sizes may be too large for some use cases
– Redshift also resides in a single AWS availability zone

3. Amazon Redshift has several limitation: Limits in Amazon Redshift. On the other hand, you can hardly find a database feature not yet implemented by Oracle.

4. But the most important reason why to migrate to ADWC is that the Oracle Autonomous Database Cloud offers total automation based on machine learning and eliminates human labor, human error, and manual tuning.

How to migrate from Amazon Redshift to Autonomous Data Warehouse Cloud?

Use the SQL Developer Amazon Redshift Migration Assistant which is available with SQL Developer 17.4. It provides easy migration of Amazon Redshift environments on a per-schema basis.

Here are the 5 steps on how to migarte from Amazon Redshift to Autonomous Data Warehouse Cloud:

1. Connect to Amazon Redshift
2. Start the Cloud Migration Wizard
3. Review and Finish the Amazon Redshift Migration
4. Use the Generated Amazon Redshift Migration Scripts
5. Perform the Post Migration Tasks

Check out what Paul Way says about why Oracle thinks Autonomous IT can ultimately win the Cloud War.

Finally, here is what Amazon CTO Werner Vogels is saying: Our cloud offers any database you need. And I agree with him that a one size fits all database doesn’t fit anyone. But mission and business critical enterprise systems with huge requirements and resource needs deserve only the best.

The DBA profession beyond autonomous: a database without a DBA is like a tree without roots

In Autonomous, Cloud, Databases, DBA on May 30, 2018 at 19:41

“To make a vehicle autonomous, you need to gather massive streams of data from loads of sensors and cameras and process that data on the fly so that the car can ‘see’ what’s around it” Daniel Lyons

Let me add that the data must be stored somewhere, analyzed by some software, monitored and backed up by someone, and so on and so on…

Top 5 Industry Early Adopters Of Autonomous Systems are: (1) Information Technology: Oracle’s Autonomous Data Warehouse Cloud, (2) Automotive, (3) Manufacturing, (4) Retail and (5) Healthcare.

Being an early adopter of ADWC, I must say that it is probably the best product created by Oracle Corporation. For sure part of Top Five.

This month (May 2018), ComputerWeekly published an article quoting Oracle CEO Mark Hurd that the long-term future of database administrators could be at risk if every enterprise adopts the Oracle 18c autonomous database.

“Hurd said it could take almost a year to get on-premise databases patched, whereas patching was instant with the autonomous version. If everyone had the autonomous database, that would change to instantaneous.”

So where does that leave Oracle DBAs around the world? Possibly in the unemployment queue, at least according to Hurd.

“There are hundreds of thousands of DBAs managing Oracle databases. If all of that moved to the autonomous database, the number would change to zero,” Hurd said at an Oracle media event in Redwood Shores, California.

If you are interested in more detail on this subject, I suggest you read the following articles in the order below:

The Robots are coming by James Anthony: “But surely we’ve been here before? Indeed, a quick Google search brings up the following examples of white papers by Oracle with a reference to the database being self-managing all the way back to 2003.”

Oracle Autonomous Database and the Death of the DBA by Tim Hall: “Myself and many others have been talking about this for over a decade. ”

Death of the DBA, Long Live the DBA by Kellyn Pot’Vin-Gorman: “With DBAs that have been around a while, we know the idea that you don’t need a DBA has been around since Oracle 7, the self-healing database.”

No DBA Required? by Tim Hall: “It will be interesting to see what Oracle actually come up with at the end of all this…”

Self-Driving Databases are Coming: What Next for DBAs? by Maria Colgan: “It’s also important for DBAs to remember that the transition to an autonomous environment is not something that will occur overnight.”

Death of the Oracle DBA (again) by Johanthan Stuart: “Twenty years later I run Claremont’s Managed Services practice and the DBA group is our largest delivery team.”

Don’t Fall For The “Autonomous Database” Distraction by Greg McStravick: a totally different point of view on autonomous databases.

Now, “a picture is worth a thousand words”. Here are 5 screenshots from the Autonomous Data Warehouse Cloud documentation:

1. Who will be creating external tables using the DBMS_CLOUD package?

2. Who will run “alter database property set.. ” in order to create credentials for the Oracle Cloud Infrastructure?

3. Who will restore and recover the database in case of any type of failure? Or failures never happen, right?

4. Who will manage run away SQL with cs_resource_manager and run “alter system kill session”?

5. Who will manage the CBO statistics and add hints?

As of today, we have 4 Exadata choices with Autonomous being by far the best. For data warehouse loads for now. As explained by Alan Zeichick, Autonomous Capabilities Will Make Data Warehouses — And DBAs — More Valuable. “No need for a resume writer: DBAs will still have plenty of work to do.”

So still: a database without a DBA is like a tree without roots.

P.S. Check out the book Human + Machine: Reimagining Work in the Age of AI by Paul R. Daugherty and H. James (Jim) Wilson.

DBA Internals of the Oracle Autonomous Database

In Cloud, DBA, Oracle database, Oracle internals on March 28, 2018 at 07:11

First things first: the word autonomous come from the Greek word autónomos which means “with laws of one’s own, independent”.

After starting using the Autonomous Data Warehouse Cloud, I must say I am pleasantly surprised to see something totally new, simple, uncomplicated and effortless, with no additional tuning or re-architecturing of the Oracle databases needed – the underlying Oracle Cloud Infrastructure is super fast and highly reliable.

1. You may connect to ADWC by either using the web interface as you can see above or as a client (I use SQL Developer 17.4) but for the client connection type choose Cloud PDB and not TNS. Your configuration file is a zip file and not a plain text file to what DBAs are used to.

2. You cannot create indexes on columns, you cannot partition tables, you cannot create materialized views, etc. Not even database links. You will get an error message: “ORA-00439: feature not enabled: Partitioning” or “ORA-01031: insufficient privileges”.

ADWC lets you create primary keys, unique keys and a foreign key constraints in RELY DISABLE NOVALIDATE mode which means that they are not enforced. These constraints can be created also in enforced mode, so technically you can create constraints as in a non-autonomous Oracle database.

Note that in execution plans primary keys and unique keys will only be used for single table lookups by the optimizer, they will not be used for joins.

But … you can run alter system kill session!

3. The Oracle Autonomous Data Warehouse interface contains all necessary capabilities for a non-professional database user to create its own data marts and run analytical reports on the data. You can even run AWR reports.

4. You do not have full DBA control as Oracle (in my opinion) uses lockdown profiles in order to make the database autonomous. As ADMIN user, you have 25 roles including the new DWROLE which you would normally grant to all ADWC users created by you. Among those 25 roles, you have GATHER_SYSTEM_STATISTICS, SELECT_CATALOG_ROLE, CONSOLE_ADMIN, etc. You have access to most DBA_ and GV_$ views. Not to mention the 211 system privileges.

5. ADWC configures the database initialization parameters based on the compute and storage capacity you provision. ADWC runs on dozens of non-default init.ora parameters. For example:

parallel_degree_policy = AUTO
optimizer_ignore_parallel_hints = TRUE
result_cache_mode = FORCE
inmemory_size = 1G

You are allowed to change almost no init.ora parameters except few NLS_ and PLSQL_ parameters.

And the DB block size is 8K!

6. I can see 31 underscore parameters which are not having default values, here are few:

_max_io_size = 33554432 (default is 1048576)
_sqlmon_max_plan = 4000 (default is 0)
_enable_parallel_dml = TRUE (default is FALSE)
_optimizer_answering_query_using_stats = TRUE (default is FALSE)

One of the few alter session commands you can run is “alter session disable parallel dml;”

7. Monitoring SQL is easy:

But there is no Oracle Tuning Pack: you did not expect to have that in an autonomous database, did you? There is no RAT, Data Masking and Subsetting Pack, Cloud Management Pack, Text, Java in DB, Oracle XML DB, APEX, Multimedia, etc.

8. Note that this is (for now) a data warehousing platform. However, DML is surprisingly fast too. I managed to insert more than half a billion records in just about 3 minutes:

Do not try to create nested tables, media or spatial types, or use LONG datatype: not supported. Compression is enabled by default. ADWC uses HCC for all tables by default, changing the compression method is not allowed.

9. The new Machine Learning interface is easy and simple:


You can create Notebooks where you have place for data discovery and analytics. Commands are run in a SQL Query Scratchpad.

10. Users of Oracle Autonomous database are allowed to analyze the tables and thus influence on the Cost Based Optimizer and hence on performance – I think end users should not be able to influence on the laws (“νόμος, nomos”) of the database.

Conclusion: The Autonomous Database is one of the best things Oracle have ever made. And they have quite a portfolio of products….

Finally, here is a live demo of the Oracle Autonomous Data Warehouse Cloud: