Dontcheff

Archive for 2019|Yearly archive page

An Interview with Kerry Osborne

In DBA on March 19, 2019 at 15:28

No introductions needed, the only and only Kerry Osborne:

Tell us what you remember about your first experiences with Oracle 2

I was just out of college – my first job. The first day they handed me a set of Oracle manuals (I think there was only one book actually). Maybe a half an inch thick. It was running on Vax/VMS and it stack dumped on a pretty regular basis. I figured out a couple sure fire ways to make it it crash. 🙂 The company was already using it and had written a bunch of Fortran OCI stuff that was pretty cool. (I had actually worked on some of the Fortran code while I was still in college – that’s how I got the job). That system had a fairly large subroutine library in object file format only. They had some how lost the source code. Fortunately we had a little bit of documentation that at least showed the input and output parameters of each subroutine. It was my first experience with having to reverse engineer software, which came in handy later in my career when I started trying to figure out how the Oracle optimizer worked. 🙂

Your ponytail and cap Kerry are in a way trademarked by you. How did it all start?

I don’t really know. I just started wearing a cap most of the time after I got out of college and got used to it and now it’s just like putting on my pants in the morning. No I don’t sleep with my cap on (unless I’m laying on my back and need something to block out the light). 🙂 Tim Hall wrote a very funny blog post about my cap at one point. https://oracle-base.com/blog/2014/09/26/an-interview-with-kerry-osborne/

The pony tail thing is a bit of a nod to my native American heritage. My great grandmother was a Cherokee (just a handed down story, but I believe it to be true). But the real reason I let my hair grow is that my brain wants to optimize pretty much every process that I encounter. So at one point I went to get a haircut and as I was sitting in the barber chair, I started thinking about how to optimize that process. I actually wrote about it on my blog here: http://kerryosborne.oracle-guy.com/2009/05/shopping-trip/ But the bottom line is that I calculated that over the course of my life I would loose almost a year by repeatedly going to get my hair cut. And since the quickest way to do anything is not to do it at all. I stopped.

Which is your favorite Oracle version?

I don’t really have a favorite. (I guess 11.2.0.4 is my favorite of the ones commonly in use at the moment)

Arguably you are the most famous DBA worldwide, how many presentations have you given in total at conferences, meetings, webcast, etc.

Nah – I’m definitely not! I don’t really know how many talks I’ve delivered. I don’t keep up with that stuff very well (that’s why I’m not an active Oracle Ace Director anymore actually – too time consuming to keep up with that and fill out the required paperwork). Oddly enough, when I was in my 20’s, public speaking was my greatest fear. In fact, the first paper I submitted to IOUW (the predecessor to OOW) got accepted, but I backed out on doing the presentation because I was too scared. I had to convince one of my co-workers to go and deliver the paper for me. I worked hard for a few years to try to rid myself of that irrational fear, but the thing that really allowed me to get over it was when I started teaching week long classes. There’s nothing like repetitive practice, and standing in front of a class 8 hours a day for a week at a time provided plenty of that. So I still have a little pit in my stomach before every talk, but I actually enjoy it now. 🙂

How do you see the future of the Oracle database both on premise and in the cloud?

I think Oracle will continue to be a vital platform in enterprises. I believe they have the best SQL engine on the planet. They are great at managing transactions, consistency and concurrency. So for those reasons, and the fact that there are tons of existing applications built on their platform, I believe that they will be around and highly valuable for the foreseeable future. I think that many systems will eventually run in public clouds but I don’t believe that the whole landscape will migrate to the cloud. I believe most companies will end up with a hybrid cloud strategy, combining on-prem with multiple public clouds. However, there are a number of other tools available that are making a major push into the Data Warehouse / Reporting / Analytics space. So this is the area where I see the most competition for Oracle.

Do you think the Autonomous Database puts at risk the jobs of many DBAs around the world?

Not really. I do think that the job of a DBA is changing. There are various flavors of DBA’s and some of them (maybe all of them) will need to change their focus. By that I mean, that they’ll need to be more conversant with development. They need to be automating repetitive tasks. They need to be solving higher level problems. But I don’t think Oracle’s Autonomous Database offerings will eliminate a lot of jobs. It will just change them. That said, I think the biggest area of change will be for the people that are doing tasks that are repetitive. Adding storage, installing databases, patching databases, etc… These types of activities will be automated. So start learning how to automate them yourself instead of waiting for someone else to do it. That sets you up with the skill to automate other tasks and moves you into a space that will have more demand. And by the way, that’s basically a development task, so at a very high level, I guess I’m saying that I think people should be moving away from infrastructure type jobs and towards development oriented jobs.

How did you manage to create Enkitec as an exquisite top skill group? Such a talent is seldom concentrated in one single company!

The word exquisite is a bit over the top. 🙂 But I am very proud of the team that came together at Enkitec. I did a short 10 min Ted Talk styled presentation at OakTable World on recruiting world class talent a few years ago. A couple of the key points were that there are some basic requirements – team members need to be smart, have a good technical background, have worked for a reasonably long period of time (10+ years), and have excellent communication skills. But they need 2 other things that are even more important. They need to have the ability to apply their skills to creatively solve tough problems. This is not a given based on having the basic requirements covered. It’s a unique characteristic, which we can improve on, but which I believe is at least partly an innate ability. The 2nd important requirement is that they need to NOT have a big ego. The ability to work together as a team and share knowledge requires people that don’t view themselves as smarter/better than the people around them and with a view that everyone around them has value and something to offer, something they can learn from. This was one of the keys to building the team that we built. Nothing is perfect, but I am very proud of the results and the team we built. There is way too much to say in a brief answer, so maybe it’s better to just point to that video: https://www.youtube.com/watch?v=vI9do1DEm08

You have probably given more keynotes than anyone else. Which are you recent and upcoming topics of interest?

I don’t think that’s true, but I do a few here and there. I just did one called “How to Stay Relevant” at the Rocky Mountain User Group Conference in February. I’m also speaking at the UKOUG’s conference in Dublin in April and I am hoping to do another keynote at the Bulgarian User Group meeting this summer. The “How to Stay Relevant” talk is about the disruption that’s currently going on in our industry and it focuses on the effects on those of us who work with data (and Oracle specifically). In terms of what I’m interested in these days, distributed systems are high on the list. Any platform that moves the work to the data and that allows access via a SQL interface is of interest to me.

What is your take on ML, AI and Big Data? Are these just new trends in IT that come and go?

No, I don’t think they are just buzz words! Unfortunately the hype cycle can quickly cause over-used terms to become a turn off. But the general idea of artificial intelligence is not going to just fade away as some trendy ideas do. So with regards to AI (and the subset of Machine Learning) I think we are still early in the adoption and will see more and more of it. We’ll also see a lot of things that are labeled as AI, which are really not. So we’ll have to learn to be educated consumers. Oracle’s optimizer does some things that could be viewed as AI for example. The ability to recognize that it’s made a mistake in it’s determination of a plan is very smart, but also very narrowly focused and does not really use “AI techniques” to make those determinations. But I have no doubt that the kernel developers are thinking about ways to forge ahead into that space. And there are a lot of successes already, specifically with narrowly focused applications (so called ANI). The champions of most intellectual games are AI engines now, just as an example. Artificial General Intelligence is still a long way off in my view, but the Law of Accelerating Returns could kick in and blow us all away much more rapidly than we expect.

Big Data is also a trendy term that has been around a long time and that has been over used. But when I think of big data these days, I think more about how we deal with the ever increasing volumes (often produced by machines) that we are called on to manage. We also have increasingly complex kinds of analytics and more advanced processing (AI/ML) that we need to do on the large data sets. This requires highly scalable, distributed computing platforms that are capable of dealing with the volume in a predictable (scalable) way as well as in an economical way. Hadoop is somewhat synonymous with the term Big Data because it was the first (and still is the most popular) such framework in the open source community. As with most technology swings though, Hadoop has gone through a lot of hype (when people thought it would solve all of our technological problems), then a period of disillusionment (when everyone realized this technology, like all others, has it’s limitations/tradeoffs), and is now entering the phase of reasonable expectations, which has led to widespread adoption. This is the classic hype cycle that Gartner uses so heavily. But my view is that Big Data (Hadoop or other proprietary or public cloud based offerings) is definitely not going away, and in fact I believe it will continue to be a huge force of change in our industry.

What single piece of advice would you give to the new DBAs entering the database profession?

Focus on the data, not the database. …

Hard to just leave it at just that: Focus on supporting or becoming a cross over developer. Focus on architecture and optimization topics. … But the one liner is very quotable, right?

Some DBAs come from sysadmin background and some from apps/applications. How did you become a DBA?

I started as an application developer. Then I started getting frustrated with the lack of control (power) to do what I wanted at the database level and I moved into a DBA role. Then I got frustrated with having to ask sysadmins for help and I became a sysadmin. Then I started doing jobs that combined all 3 aspects. I eventually got focused on optimization of existing systems which combined all of the jobs in my background and I started to learn about the internals of Oracle. And the more complicated the optimizer became, the more important it was to know how it made it’s decisions. I never was the absolute best at any of these jobs btw, but I think being competent in all these areas gave me a lot of advantages that I wouldn’t have had otherwise. Probably the main advantage was that it made it possible for me to look at problems from multiple angles. I’ve worked with a ton of DBAs over the years and some of the very best are the guys and gals with multiple skills sets (Network Admin/DBA, Storage Admin/DBA, Unix Admin/DBA, Developer/DBA, …) And of course having experience building software is a huge plus for anyone in the “computer” business. It gives you the ability to think to yourself, “Now if I had been the developer on this, how would I have coded it?”. This is an extremely powerful view into problems when trying to resolve software issues.

If you can add one new feature to the Oracle database, what would that be?

There are too many features already. Focus on fixing the bugs!

Tell us about Gluent and its flagship product?

I thought you’d never ask. …

We founded Gluent to address one of the biggest issues we’ve seen over the last decade or so in our work with large databases in the enterprise space. Fundamentally, Gluent allows existing applications to take advantage of big data platforms, and here’s the most important part, without re-writing the applications. I think the reason for founding the company is best illustrated by a story about a large medical company that reportedly had one of the largest Teradata clusters in the world. The whispered cost was $40M a year maintaining that system and there was a desire to migrate off that platform for cost reasons. A 6 month project was undertaken to see if it was feasible without degrading the performance of the system. The results were pretty amazing. The performance was increased on average over 1000X across a huge number of queries (all the queries from a days worth of work). The annual cost of the proposed solution was about $1M. Nevertheless, the company decided not to re-platform the system. The reason? They were calculating a 3 year ROI, and despite the fact that the projection showed a saving of nearly $120M, their assessment of the risk and cost to re-write the very complex application out weighed the savings. This is an extreme example of a story that plays out every day in the enterprise space and is the reason why Gluent was initially formed.

We have a few core beliefs:

We believe data volumes are exploding
We believe that legacy apps are not going away
We believe that data will continue to be born in silos but will need to joined with other data
We believe that most new apps will be built on distributed back-ends and on data that is stored in open data formats
We believe that legacy skills need to be transitioned, not thrown away
We believe that most data will migrate to a centralized storage/compute platform
We believe that companies need to be able to change the wings on the plane while it is in the air

After an implementation of Gluent, the resulting system has a hybrid architecture which blends a distributed back-end with a relational database. The goal being for both components to do what they are best at. The relational database being really good at managing transactions, concurrency and consistency and the distributed back-end excelling at throughput, scalability, and doing so economically. It also allows hybrid cloud deployments where the database component can remain on-prem while the distributed back-end can be deployed in the cloud with some or all of the data in relatively low cost object storage. This sort of architecture allows a great deal of flexibility around data placement between some in object storage, some local to a distributed cluster, and some local to the database tier. It also provides flexibility with regards to security, as some (sensitive) data can be contained strictly on-prem, while less sensitive data is moved out to the cloud, as an example.

The ultimate goal is to connect any applications to any data and become a data sharing platform. We believe that data will continue to be born in silos, but that it should be synchronized (once) to a distributed backend and then presented to any engine/system that wants to access it (and has permissions to do so). We also have as a gleam in our eye, the concept of Gluent as a Service. Which would remove the need to manage the distributed backend. We are already certified on S3/EMR for example (Azure and Google coming in the next couple of months) which allows up to spin up a cluster on demand as needed and shut it down when not in use. We are not quite there yet with the complete Gluent as a Service offering, but we are getting close. 🙂

I could go on (and on and on …). But hopefully you get the idea. We believe that our approach allows existing applications to take advantage of these magical new platforms, but without breaking anything. That’s our secret sauce.

Which is the biggest mistake you have ever made as a DBA?

Ha – that made me laugh! Early in my career, when I first gained sysadmin privileges, I was still doing DBA and development work. So one night I was working remotely via a 1200 baud modem on a program of some sort (probably Fortran OCI accessing Oracle V3). Anyway, each test run was taking several minutes and I noticed that there were several other processes running night time batch jobs on the system and they were burning a lot of CPU. So I decided that make use of my newly gained sysadmin privileges and I upped my priority so that my test runs wouldn’t have to compete so much for CPU time. Well in those days, the operating systems weren’t as smart about prioritization as they are now. So my process was effectively getting all the CPUI time until it completed (and was much faster – yay!). Unfortunately, at some point I made a change that threw my code into an infinite loop. I tried to break out after a few minutes and unfortunately the modem connection dropped. Since my original process was getting all the available CPU time, there wasn’t even enough left to get logged back into another session to kill the rogue process. The program finally died when it had used up all the disk space. The real sysadmins were not happy with me the next morning!

If you can ask yourself a question, and not answer it, what would that be?

What do you want to be when you grow up?
I still haven’t figured that out!

Advertisements

AI in AI: Artificial Intelligence in Automatic Indexing

In DBA on March 7, 2019 at 17:30

“By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.” — Eliezer Yudkowsky

CNBC: 40% of A.I. start-ups in Europe have almost nothing to do with A.I.

Oracle 19c brings one key feature which does not exist in database systems: Automatic Indexing. Something very similar does exit in Azure SQL Database but with some limitations.

For a very long time, both DBAs and Developers, have been struggling (really struggling) with what indexes should be created, what type of indexes they should be created as and what indexes should be dropped from the database. Automatic Index creation (AI Creation) means the explicit creation of new indexes and also dropping existing, unused indexes without human intervention.

In the long run, this is to be arguably one of the most important features in the Oracle database. I have already covered the basics in a previous blog post entitled Automatic Indexing in 19c. The expert system works in the following way passing through the stages of identification, verification and decision making:

Based on the captured workload, Oracle’s expert system identifies the index candidates which are created first as UNUSABLE & INVISIBLE (metadata only):

Then, there is the verification process. Some indexes will become VALID (physical segments are created) but will still stay INVISIBLE to the optimizer.

Later, Oracle decides if some of these indexes can become VISIBLE and this happens based on how the performance increases and how these new indexes affect other activities in the database.

Look for possible error using this query:

select EX.execution_type, EX.execution_name,F.message
from DBA_ADVISOR_FINDINGS F, DBA_ADVISOR_EXECUTIONS EX 
WHERE F.EXECUTION_NAME = EX.EXECUTION_NAME AND F.TYPE = 'ERROR';

If you need a detailed report from (say) the last 30 days, here is how to obtain it:

spool report
select dbms_auto_index.report_activity(sysdate-30,null,'text','all','all') report from dual;
spool off

A sample report shows beside the index candidates, space used, fatal errors also the overall improvement factor and also the SQL statement improvement factor:

When using/implementing the feature, have in mind the following:

– AUTO_INDEX_MODE must be set in every PDB: even set on container level it is not cascading to the pluggable databases
– Manually created indexes are nor dropped by default, you need to set separately AUTO_INDEX_RETENTION_FOR_MANUAL
– Follow the expert system runs from CDB_AUTO_INDEX_EXECUTIONS
– Hint for an INVISIBLE VALID index (for example /*+ index(clients SYS_AI_64uvm6wb5168u) */): I have seen how the index becomes VISIBLE in a second (if really useful)

For more details, check the recent bog post 19c Auto Index: the dictionary views by Franck Pachot

Automatic Indexing is by far one of the best examples of Artificial Intelligence and Machine Learning in the IT Industry. Really! I still remember a 5TB Oracle database I used to administer (mission critical one, a 24×7 system) where the indexes were almost 4.5TB in size while the real data was like half a TB only.

Automatic Indexing in 19c

In Autonomous, Database tuning, Databases, DBA, Oracle database on February 18, 2019 at 17:38

One of the most impressive new features of Oracle Database 19c is Automatic Indexing. Arguably, this is the most interesting innovation in the database world for a rather long time.

I remember some years ago when a DBA asked me at an Oracle conference: “Julian, why are half of the presentations at Oracle database conferences only about performance tuning? Is the Oracle database performing that badly that people should tune it all the time?” Sigh…

With 19c and ADB (Oracle Autonomous Database), things look very different now, don’t they? Automatic Indexing provides what database systems need: continuous optimization of the database workload, stable & solid performance and almost no human interaction. Let me share some of my early experience with Automatic Indexing and where human interaction is needed.

For now (February 18th, 2019), Oracle 19c is only available on Exadata (Linux 7.4) and in order to enable Automatic Indexing you need to do the following:

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

As you can guess from the picture above, the so called expert system of Automatic Indexing runs every 15th minute for as long as one hour. Note that I disabled the job from 4:43 till 5:56. The Resource Manager plan limits the task to 1 CPU only and the next run is skipped if the job has not been completed within the 15 minutes.

Here are details on how Automatic Indexing works but what is most important to remember is as follows:

– The auto index candidates are created as invisible auto indexes
– If the performance of SQL statements is not improved from the auto indexes, then the indexes are marked as unusable and the corresponding SQL statements are blacklisted
– Auto indexes cannot be used for any first time SQL run against the database
– Auto indexes are created as either single, concatenated indexes or function-based indexes and they all use advanced low compression
– The unused auto indexes are deleted after 373 days (can be changed)
– The unused non-auto indexes (manual indexes) are never deleted by the automatic indexing process but can be deleted automatically if needed

The Auto Indexing can be disabled at any time or can be set to set to reporting mode (new auto indexes as created asinvisible indexes, so that they cannot be used in SQL) with the following commands:

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','OFF');

 

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','REPORT ONLY');

Here is a way to ask Oracle to create new auto indexes in a separate tablespace called AUTO_INDEX_TS:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEX_TS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01

You can easily check the configuration for Automatic Indexing for the root container and the PDBs from CDB_AUTO_INDEX_CONFIG;

If you need a report of what happened during the expert system activity (either last 3 days or during the last activity), here is a way to generate it:

set long 300000
select DBMS_AUTO_INDEX.REPORT_ACTIVITY(SYSTIMESTAMP-3,SYSTIMESTAMP,'TEXT','ALL','ALL') from dual;
select DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY('TEXT','ALL','ALL') from dual;

These are the most important views about Auto Indexing:

DBA_AUTO_INDEX_EXECUTIONS: history of execution of automatic indexing tasks
DBA_AUTO_INDEX_STATISTICS: statistics related to auto indexes
DBA_AUTO_INDEX_IND_ACTIONS: actions performed on auto indexes
DBA_AUTO_INDEX_SQL_ACTIONS: actions performed on SQL statements for verifying auto indexes
DBA_AUTO_INDEX_CONFIG: configuration settings related to auto indexes
DBA_AUTO_INDEX_VERIFICATIONS: stats about PLAN_HASH_VALUE, AUTO_INDEX_BUFFER_GETS, etc.

The new package DBMS_AUTO_INDEX can be used for 3 main things:

1. Configuration of the parameters related to Auto Indexing
2. Drop *all* the indexes except the ones used for constraints
3. Report the activity of the “expert system”:

Finally, here are some additional resources:

Automatic Indexing in Oracle Database 19c
Oracle Database 19c is now available!
Managing Auto Indexes

How to check if I have any auto indexes in my database: select auto, count(*) from dba_indexes group by auto;

Few interesting facts about Oracle ADB, Redshift and Snowflake

In Autonomous, Data Warehouse, Databases, DBA on January 14, 2019 at 16:17

Building a new data warehouse in the cloud or migrating an existing one to cloud requires careful consideration and the answer to the question “Which cloud should I use?” is often “It depends”.

An interesting comparison of system properties comparing Amazon Redshift vs. Oracle vs. Snowflake can be found on db-engines.com

There are several other options too: Azure SQL Data Warehouse, Presto, Google BigQuery, etc.

An interesting benchmark paper called “Data Warehouse Benchmark: Redshift, Snowflake, Azure, Presto and BigQuery” by Fivetran is worth reading!

Another comparison called Interactive Analytics: Redshift vs Snowflake vs BigQuery is already more than 2 years old but still interesting.

Recently, things have changed. Oracle’s Autonomous Data Warehouse Cloud has been in GA for almost 1 year (since March 2018). ADW is for enterprise loads and mission critical systems arguably the best solution right now.

Viscosity compared both Oracle Autonomous and Amazon Redshift. The result? Check it here: Amazon vs Oracle: Data Warehouse Services, How do They Compare?

In short, the conclusion of the research above is:

– Oracle’s ADW was able to achieve data retrieval at the lowest latencies, and achieved the highest volume of queries per hour. In terms of serial query execution and multi-user query throughput.
– Oracle’s ADW consistently outperformed Redshift by a factor of 4x in both sets of tests.

And do not ignore the db-engines ranking! Only one of the three is in the Top 10.

What is interesting to know on top of all papers above are these 10 differences or let us call them less known technical facts (in no order of importance) between Oracle Autonomous, Amazon Redshift and Snowflake:

1. Snowflake compute usage is billed on a per-second basis, with a minimum of 60 seconds. Amazon Redshift is based on PostgreSQL 8.0.2 and is built on top of technology from the MPP data warehousing company ParAccel. Oracle Autonomous Database is based on Exadata and 18c.

2. In Oracle Autonomous Cloud, you can provision up to 128 CPUs and 128TB directly from the cloud console but you can provision more if needed.

3. Snowflake manages all aspects of how data is stored in S3 including data organization, file sizes, structure, compression, and statistics.

4. The only things needed for BYOL in Oracle Autonomous Database are Multitenant and RAC (only when using more than sixteen OCPUs). The standby option (not yet available) will require Active Data Guard as well.

5. Snowflake does not disclose the information about processing power and memory. Oracle do disclose the information via internal views but you cannot directly define the SGA or PGA size.

6. Redshift is not built as a high-concurrency database with several concurrent running queries and AWS recommends that you execute no more than 15 queries at a time. The number of concurrent user connections that can be made to a cluster is 500.

7. Oracle ADW and ATP allow you to partition both indexes and tables. In Snowflake partitioning is handled internally. Amazon Redshift does not support tablespaces, table partitioning, inheritance, and even certain constraints. Amazon Redshift Spectrum supports table partitioning using the CREATE EXTERNAL TABLE command.

8. The maximum number of tables in Amazon Redshift is 9,900 for large and xlarge cluster node types and 20,000 for 8xlarge cluster node types. The limit includes temporary tables. An Oracle database does not have a limit for the number of tables.

9. Oracle automatically applies all security updates (and online!) to ensure data is not vulnerable to known attack vectors. Additional in-database features like Virtual Private Database and Data Redaction are also available.

10. There is no operation in Snowflake for collecting database statistics. It is handled by the engine. In Oracle, database statistics collection is allowed. Both Oracle Autonomous and Amazon Redshift monitor changes to your workload and automatically update statistics in the background.

Finally, here are official URLs of all three products:

Oracle Autonomous Database
Amazon Redshift
Snowflake Database