Dontcheff

Archive for March, 2019|Monthly archive page

An Interview with Kerry Osborne

In DBA on March 19, 2019 at 15:28

No introductions needed, the one and only Kerry Osborne:

Tell us what you remember about your first experiences with Oracle 2

I was just out of college – my first job. The first day they handed me a set of Oracle manuals (I think there was only one book actually). Maybe a half an inch thick. It was running on Vax/VMS and it stack dumped on a pretty regular basis. I figured out a couple sure fire ways to make it it crash. 🙂 The company was already using it and had written a bunch of Fortran OCI stuff that was pretty cool. (I had actually worked on some of the Fortran code while I was still in college – that’s how I got the job). That system had a fairly large subroutine library in object file format only. They had some how lost the source code. Fortunately we had a little bit of documentation that at least showed the input and output parameters of each subroutine. It was my first experience with having to reverse engineer software, which came in handy later in my career when I started trying to figure out how the Oracle optimizer worked. 🙂

Your ponytail and cap Kerry are in a way trademarked by you. How did it all start?

I don’t really know. I just started wearing a cap most of the time after I got out of college and got used to it and now it’s just like putting on my pants in the morning. No I don’t sleep with my cap on (unless I’m laying on my back and need something to block out the light). 🙂 Tim Hall wrote a very funny blog post about my cap at one point. https://oracle-base.com/blog/2014/09/26/an-interview-with-kerry-osborne/

The pony tail thing is a bit of a nod to my native American heritage. My great grandmother was a Cherokee (just a handed down story, but I believe it to be true). But the real reason I let my hair grow is that my brain wants to optimize pretty much every process that I encounter. So at one point I went to get a haircut and as I was sitting in the barber chair, I started thinking about how to optimize that process. I actually wrote about it on my blog here: http://kerryosborne.oracle-guy.com/2009/05/shopping-trip/ But the bottom line is that I calculated that over the course of my life I would loose almost a year by repeatedly going to get my hair cut. And since the quickest way to do anything is not to do it at all. I stopped.

Which is your favorite Oracle version?

I don’t really have a favorite. (I guess 11.2.0.4 is my favorite of the ones commonly in use at the moment)

Arguably you are the most famous DBA worldwide, how many presentations have you given in total at conferences, meetings, webcast, etc.

Nah – I’m definitely not! I don’t really know how many talks I’ve delivered. I don’t keep up with that stuff very well (that’s why I’m not an active Oracle Ace Director anymore actually – too time consuming to keep up with that and fill out the required paperwork). Oddly enough, when I was in my 20’s, public speaking was my greatest fear. In fact, the first paper I submitted to IOUW (the predecessor to OOW) got accepted, but I backed out on doing the presentation because I was too scared. I had to convince one of my co-workers to go and deliver the paper for me. I worked hard for a few years to try to rid myself of that irrational fear, but the thing that really allowed me to get over it was when I started teaching week long classes. There’s nothing like repetitive practice, and standing in front of a class 8 hours a day for a week at a time provided plenty of that. So I still have a little pit in my stomach before every talk, but I actually enjoy it now. 🙂

How do you see the future of the Oracle database both on premise and in the cloud?

I think Oracle will continue to be a vital platform in enterprises. I believe they have the best SQL engine on the planet. They are great at managing transactions, consistency and concurrency. So for those reasons, and the fact that there are tons of existing applications built on their platform, I believe that they will be around and highly valuable for the foreseeable future. I think that many systems will eventually run in public clouds but I don’t believe that the whole landscape will migrate to the cloud. I believe most companies will end up with a hybrid cloud strategy, combining on-prem with multiple public clouds. However, there are a number of other tools available that are making a major push into the Data Warehouse / Reporting / Analytics space. So this is the area where I see the most competition for Oracle.

Do you think the Autonomous Database puts at risk the jobs of many DBAs around the world?

Not really. I do think that the job of a DBA is changing. There are various flavors of DBA’s and some of them (maybe all of them) will need to change their focus. By that I mean, that they’ll need to be more conversant with development. They need to be automating repetitive tasks. They need to be solving higher level problems. But I don’t think Oracle’s Autonomous Database offerings will eliminate a lot of jobs. It will just change them. That said, I think the biggest area of change will be for the people that are doing tasks that are repetitive. Adding storage, installing databases, patching databases, etc… These types of activities will be automated. So start learning how to automate them yourself instead of waiting for someone else to do it. That sets you up with the skill to automate other tasks and moves you into a space that will have more demand. And by the way, that’s basically a development task, so at a very high level, I guess I’m saying that I think people should be moving away from infrastructure type jobs and towards development oriented jobs.

How did you manage to create Enkitec as an exquisite top skill group? Such a talent is seldom concentrated in one single company!

The word exquisite is a bit over the top. 🙂 But I am very proud of the team that came together at Enkitec. I did a short 10 min Ted Talk styled presentation at OakTable World on recruiting world class talent a few years ago. A couple of the key points were that there are some basic requirements – team members need to be smart, have a good technical background, have worked for a reasonably long period of time (10+ years), and have excellent communication skills. But they need 2 other things that are even more important. They need to have the ability to apply their skills to creatively solve tough problems. This is not a given based on having the basic requirements covered. It’s a unique characteristic, which we can improve on, but which I believe is at least partly an innate ability. The 2nd important requirement is that they need to NOT have a big ego. The ability to work together as a team and share knowledge requires people that don’t view themselves as smarter/better than the people around them and with a view that everyone around them has value and something to offer, something they can learn from. This was one of the keys to building the team that we built. Nothing is perfect, but I am very proud of the results and the team we built. There is way too much to say in a brief answer, so maybe it’s better to just point to that video: https://www.youtube.com/watch?v=vI9do1DEm08

You have probably given more keynotes than anyone else. Which are you recent and upcoming topics of interest?

I don’t think that’s true, but I do a few here and there. I just did one called “How to Stay Relevant” at the Rocky Mountain User Group Conference in February. I’m also speaking at the UKOUG’s conference in Dublin in April and I am hoping to do another keynote at the Bulgarian User Group meeting this summer. The “How to Stay Relevant” talk is about the disruption that’s currently going on in our industry and it focuses on the effects on those of us who work with data (and Oracle specifically). In terms of what I’m interested in these days, distributed systems are high on the list. Any platform that moves the work to the data and that allows access via a SQL interface is of interest to me.

What is your take on ML, AI and Big Data? Are these just new trends in IT that come and go?

No, I don’t think they are just buzz words! Unfortunately the hype cycle can quickly cause over-used terms to become a turn off. But the general idea of artificial intelligence is not going to just fade away as some trendy ideas do. So with regards to AI (and the subset of Machine Learning) I think we are still early in the adoption and will see more and more of it. We’ll also see a lot of things that are labeled as AI, which are really not. So we’ll have to learn to be educated consumers. Oracle’s optimizer does some things that could be viewed as AI for example. The ability to recognize that it’s made a mistake in it’s determination of a plan is very smart, but also very narrowly focused and does not really use “AI techniques” to make those determinations. But I have no doubt that the kernel developers are thinking about ways to forge ahead into that space. And there are a lot of successes already, specifically with narrowly focused applications (so called ANI). The champions of most intellectual games are AI engines now, just as an example. Artificial General Intelligence is still a long way off in my view, but the Law of Accelerating Returns could kick in and blow us all away much more rapidly than we expect.

Big Data is also a trendy term that has been around a long time and that has been over used. But when I think of big data these days, I think more about how we deal with the ever increasing volumes (often produced by machines) that we are called on to manage. We also have increasingly complex kinds of analytics and more advanced processing (AI/ML) that we need to do on the large data sets. This requires highly scalable, distributed computing platforms that are capable of dealing with the volume in a predictable (scalable) way as well as in an economical way. Hadoop is somewhat synonymous with the term Big Data because it was the first (and still is the most popular) such framework in the open source community. As with most technology swings though, Hadoop has gone through a lot of hype (when people thought it would solve all of our technological problems), then a period of disillusionment (when everyone realized this technology, like all others, has it’s limitations/tradeoffs), and is now entering the phase of reasonable expectations, which has led to widespread adoption. This is the classic hype cycle that Gartner uses so heavily. But my view is that Big Data (Hadoop or other proprietary or public cloud based offerings) is definitely not going away, and in fact I believe it will continue to be a huge force of change in our industry.

What single piece of advice would you give to the new DBAs entering the database profession?

Focus on the data, not the database. …

Hard to just leave it at just that: Focus on supporting or becoming a cross over developer. Focus on architecture and optimization topics. … But the one liner is very quotable, right?

Some DBAs come from sysadmin background and some from apps/applications. How did you become a DBA?

I started as an application developer. Then I started getting frustrated with the lack of control (power) to do what I wanted at the database level and I moved into a DBA role. Then I got frustrated with having to ask sysadmins for help and I became a sysadmin. Then I started doing jobs that combined all 3 aspects. I eventually got focused on optimization of existing systems which combined all of the jobs in my background and I started to learn about the internals of Oracle. And the more complicated the optimizer became, the more important it was to know how it made it’s decisions. I never was the absolute best at any of these jobs btw, but I think being competent in all these areas gave me a lot of advantages that I wouldn’t have had otherwise. Probably the main advantage was that it made it possible for me to look at problems from multiple angles. I’ve worked with a ton of DBAs over the years and some of the very best are the guys and gals with multiple skills sets (Network Admin/DBA, Storage Admin/DBA, Unix Admin/DBA, Developer/DBA, …) And of course having experience building software is a huge plus for anyone in the “computer” business. It gives you the ability to think to yourself, “Now if I had been the developer on this, how would I have coded it?”. This is an extremely powerful view into problems when trying to resolve software issues.

If you can add one new feature to the Oracle database, what would that be?

There are too many features already. Focus on fixing the bugs!

Tell us about Gluent and its flagship product?

I thought you’d never ask. …

We founded Gluent to address one of the biggest issues we’ve seen over the last decade or so in our work with large databases in the enterprise space. Fundamentally, Gluent allows existing applications to take advantage of big data platforms, and here’s the most important part, without re-writing the applications. I think the reason for founding the company is best illustrated by a story about a large medical company that reportedly had one of the largest Teradata clusters in the world. The whispered cost was $40M a year maintaining that system and there was a desire to migrate off that platform for cost reasons. A 6 month project was undertaken to see if it was feasible without degrading the performance of the system. The results were pretty amazing. The performance was increased on average over 1000X across a huge number of queries (all the queries from a days worth of work). The annual cost of the proposed solution was about $1M. Nevertheless, the company decided not to re-platform the system. The reason? They were calculating a 3 year ROI, and despite the fact that the projection showed a saving of nearly $120M, their assessment of the risk and cost to re-write the very complex application out weighed the savings. This is an extreme example of a story that plays out every day in the enterprise space and is the reason why Gluent was initially formed.

We have a few core beliefs:

We believe data volumes are exploding
We believe that legacy apps are not going away
We believe that data will continue to be born in silos but will need to joined with other data
We believe that most new apps will be built on distributed back-ends and on data that is stored in open data formats
We believe that legacy skills need to be transitioned, not thrown away
We believe that most data will migrate to a centralized storage/compute platform
We believe that companies need to be able to change the wings on the plane while it is in the air

After an implementation of Gluent, the resulting system has a hybrid architecture which blends a distributed back-end with a relational database. The goal being for both components to do what they are best at. The relational database being really good at managing transactions, concurrency and consistency and the distributed back-end excelling at throughput, scalability, and doing so economically. It also allows hybrid cloud deployments where the database component can remain on-prem while the distributed back-end can be deployed in the cloud with some or all of the data in relatively low cost object storage. This sort of architecture allows a great deal of flexibility around data placement between some in object storage, some local to a distributed cluster, and some local to the database tier. It also provides flexibility with regards to security, as some (sensitive) data can be contained strictly on-prem, while less sensitive data is moved out to the cloud, as an example.

The ultimate goal is to connect any applications to any data and become a data sharing platform. We believe that data will continue to be born in silos, but that it should be synchronized (once) to a distributed backend and then presented to any engine/system that wants to access it (and has permissions to do so). We also have as a gleam in our eye, the concept of Gluent as a Service. Which would remove the need to manage the distributed backend. We are already certified on S3/EMR for example (Azure and Google coming in the next couple of months) which allows up to spin up a cluster on demand as needed and shut it down when not in use. We are not quite there yet with the complete Gluent as a Service offering, but we are getting close. 🙂

I could go on (and on and on …). But hopefully you get the idea. We believe that our approach allows existing applications to take advantage of these magical new platforms, but without breaking anything. That’s our secret sauce.

Which is the biggest mistake you have ever made as a DBA?

Ha – that made me laugh! Early in my career, when I first gained sysadmin privileges, I was still doing DBA and development work. So one night I was working remotely via a 1200 baud modem on a program of some sort (probably Fortran OCI accessing Oracle V3). Anyway, each test run was taking several minutes and I noticed that there were several other processes running night time batch jobs on the system and they were burning a lot of CPU. So I decided that make use of my newly gained sysadmin privileges and I upped my priority so that my test runs wouldn’t have to compete so much for CPU time. Well in those days, the operating systems weren’t as smart about prioritization as they are now. So my process was effectively getting all the CPUI time until it completed (and was much faster – yay!). Unfortunately, at some point I made a change that threw my code into an infinite loop. I tried to break out after a few minutes and unfortunately the modem connection dropped. Since my original process was getting all the available CPU time, there wasn’t even enough left to get logged back into another session to kill the rogue process. The program finally died when it had used up all the disk space. The real sysadmins were not happy with me the next morning!

If you can ask yourself a question, and not answer it, what would that be?

What do you want to be when you grow up?
I still haven’t figured that out!

AI in AI: Artificial Intelligence in Automatic Indexing

In DBA on March 7, 2019 at 17:30

“By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.” — Eliezer Yudkowsky

CNBC: 40% of A.I. start-ups in Europe have almost nothing to do with A.I.

Oracle 19c brings one key feature which does not exist in database systems: Automatic Indexing. Something very similar does exit in Azure SQL Database but with some limitations.

For a very long time, both DBAs and Developers, have been struggling (really struggling) with what indexes should be created, what type of indexes they should be created as and what indexes should be dropped from the database. Automatic Index creation (AI Creation) means the explicit creation of new indexes and also dropping existing, unused indexes without human intervention.

In the long run, this is to be arguably one of the most important features in the Oracle database. I have already covered the basics in a previous blog post entitled Automatic Indexing in 19c. The expert system works in the following way passing through the stages of identification, verification and decision making:

Based on the captured workload, Oracle’s expert system identifies the index candidates which are created first as UNUSABLE & INVISIBLE (metadata only):

Then, there is the verification process. Some indexes will become VALID (physical segments are created) but will still stay INVISIBLE to the optimizer.

Later, Oracle decides if some of these indexes can become VISIBLE and this happens based on how the performance increases and how these new indexes affect other activities in the database.

Look for possible error using this query:

select EX.execution_type, EX.execution_name,F.message
from DBA_ADVISOR_FINDINGS F, DBA_ADVISOR_EXECUTIONS EX 
WHERE F.EXECUTION_NAME = EX.EXECUTION_NAME AND F.TYPE = 'ERROR';

If you need a detailed report from (say) the last 30 days, here is how to obtain it:

spool report
select dbms_auto_index.report_activity(sysdate-30,null,'text','all','all') report from dual;
spool off

A sample report shows beside the index candidates, space used, fatal errors also the overall improvement factor and also the SQL statement improvement factor:

When using/implementing the feature, have in mind the following:

– AUTO_INDEX_MODE must be set in every PDB: even set on container level it is not cascading to the pluggable databases
– Manually created indexes are nor dropped by default, you need to set separately AUTO_INDEX_RETENTION_FOR_MANUAL
– Follow the expert system runs from CDB_AUTO_INDEX_EXECUTIONS
– Hint for an INVISIBLE VALID index (for example /*+ index(clients SYS_AI_64uvm6wb5168u) */): I have seen how the index becomes VISIBLE in a second (if really useful)

For more details, check the recent bog post 19c Auto Index: the dictionary views by Franck Pachot

Automatic Indexing is by far one of the best examples of Artificial Intelligence and Machine Learning in the IT Industry. Really! I still remember a 5TB Oracle database I used to administer (mission critical one, a 24×7 system) where the indexes were almost 4.5TB in size while the real data was like half a TB only.