Dontcheff

Archive for March, 2011|Monthly archive page

Do not go beyond this point: on the “obvious advantages” of Database Shards

In DBA, Oracle database on March 28, 2011 at 11:22

Caution! Database Sharding is like the anti-consolidation of databases. It is splitting the database into many small databases. You spend years and years on trying to unify and gather together databases and all of a sudden you are told that there is an application managed scaling technique using hundreds of independent databases. Tricky, right?

Sometimes, when planning database solutions in terms of scalability and massiveness, going beyond a certain point might be risky. This is the case when database shards may be of huge help (big website used globally). The word shard may sometimes refer to a piece of glass, a sea glass that can be found almost everywhere, for example at the beaches near San Francisco.

This is my friend Pasi Jaakkola. I took the picture in 2006 during Oracle Open World 2006. We brought however no databases, no glass shards from Frisco, not literally at least. But we both often dare “cross the point” and take the extra mile.

Here are 3 sources on database sharding:

1. Wei Hu’s excellent white paper Global Scale Web 2.0 Data Platforms Sharding with Oracle Database

2. Tom Kyte’s discussion on Shared Nothing

3. The Rise of Database Sharding on CodeFutures.

Let us have a look at number 3 and the 3 claims there about the several advantages of smaller databases:

Claim 1: Smaller databases are easier to manage
Claim 2: Smaller databases are faster
Claim 3: Database Sharding can reduce costs

Having a quick glance at the three statements above most people would say yes, true. But people who had been involved in the DBA work on data center level (meaning working with 100s of databases) might tend to disagree. Why? Here we go:

“Smaller databases are easier to manage”: if you have to break a databases into smaller databases, then you will have more monitoring configurations for Grid Control, thus the number of possible problems will increase. You will have to configure more backups and more problems on the way. Naturally. If you have to recover the data one day probably consistency of the data will be a rather important thing to consider, so you might have to recover not just one database to the timestamp needed. So what is easier to manage for a smaller database? The claim “Routine table and index optimizations can stretch to hours or days, in some cases making regular maintenance infeasible” is unclear to me. What is table optimization? Gathering statistics for the table? Online reorganization? Beats me 🙂

Next, “Smaller databases are faster”: yeah right. I have seen databases less than 1G in size having SQL statements running longer than the complete trilogy of the Godfather. I don’t even want to get into that. If a database is fast or not has nothing to do with the size. Check this claim out: “By hosting each shard database on its own server, the ratio between memory and data on disk is greatly improved, thereby reducing disk I/O.” Great! Nobody cares about the extra computers, the software licenses.. Let’s have a look at the other statement: “This results in less contention for resources, greater join performance, faster index searches, and fewer database locks.” So the smaller the database, the less locks you have? At least this is what I understand from the quotes sentence. And we all know that locks are due to clumsy SQLs and logical structure of the database, not due to the growing size of the database.

And now the most bizarre claim: “Database Sharding can reduce costs”. Right, cost! That is what all we want to hear. Byte the hook! Can but cannot. It can rather increase the cost. More hardware, more license fees, more databases to manage.

The real benefits of Database Sharding can be found in the Oracle’s white paper above. It is written by Wei Hu and it is exceptionally good! And good results with Sharding can be achieved with Oracle by the way. Forget about MySQL here. But decide for yourself after you read the white paper.

These are frosted shards of glass:

What is easier: to break the glass into these small pieces or do build the glass out of all these shards?

And the questions asked in the CodeFutures arcticle: “If Database Sharding is highly scalable, less costly, and improves performance, why hasn’t adoption of the technology been more widespread?” Probably, because not every Sharding is highly scalable, less costly, and improves performance. Really.

Conclusion: the real benefit of Database Sharding is often neglected. Main reason being the concept is wrongly comprehended. As Wei Hu says: “Sharding across many databases multiplies Oracle’s advantages a-thousand-fold”!

On the Total Cost of Database Administration

In DB2 database, DBA, MySQL, Oracle database, SQL Server on March 25, 2011 at 10:42

Have you ever read those “white papers” distributed over the internet on total costs of database administration? One claiming that “Microsoft SQL Server required significantly less effort to install and maintain than Oracle”, another one saying that “DBAs can perform typical administrative functions in 41 percent less time when using Oracle Database compared to Microsoft SQL Server”.

You can just stare in disbelief in their “facts” and “arguments”. The former quote above is from ComputerWorld “The Total Cost of Administration Champ: Microsoft SQL Server 2005 or Oracle Database 10g?”, the latter is from TechRepublic “Oracle Database 11g vs.
Microsoft SQL Server 2008”
.

If Oracle is the Lexus of relational databases, is then your Oracle DBA cost as high as the service cost of an expensive car? Should you consider switching to SQL Server? Nobody is saying that IBM’s DB2 is cheap but very often the comparison is between Oracle and SQL Server.

You might have seen Don Burleson’s article Oracle 10g v. SQL Server. I agree with Don that reducing the number of expensive DBAs is hardly an area of opportunity for the IT manager to save money. But when you often show managers figures like the ones below:

Managers will sooner or later start believing some of those “research figures”. Some of them at least. You see, somebody claims that one DBA can administer 9.9 Oracle databases/31.2 Microsoft databases , another one mentions 15.2 Oracle databases/65.4 Microsoft databases. Probably the figures are not far from reality although I have seen that with some automation the numbers can grow much higher (100-150 databases per DBA) although I can’t take seriously studies giving me numbers about users supported by DBA. Sorry, what users? Internet users, database users/schemas? Come on..

It is not up to the number of databases a DBA handles or the gigabytes per DBAs or even the brand. It is up to the skills and experience of person doing the database administrations, it is up to environment and the DBA team handling the work. It is much about the set-up of the databases and their complexity. Of course the automation involved in the DBA work is a significant factor as well. But generalizations like the ones above and below are just mildly insane. If someone tries to convince you that you will save a lot of money by putting your databases in “the Cloud”, just turn and run 🙂

About 5 years ago, Forrester did the following forecasts, decide for yourself if predictions were TRUE or FALSE. At least I disagree with the performance tuning curve.

1. DBA to databases ratio:

2. Database administration cost is declining:

3. Database administration challenges:

Database administration has been and will remain the most complex part of IT. The new features added by vendors like Oracle, IBM and Microsoft to the relation software only make administration more difficult, complex and hard to learn and master from scratch.

P.S. There is a book written in 1954 called “How to Lie with Statistics”. You can get it now for 5.99 from amazon.com

Larry Ellison at the Churchill Club

In DBA, Grid Control, Oracle database, PL/SQL on March 21, 2011 at 11:02

Today is exactly one and half years after Larry Ellison was interviewed by Ed Zander at the Churchill Club.

Ben Rockwood wrote a must-read article about the event and I would not like to repeat what he said but just highlight few things looking retrospectively into this probably most interesting and flamboyant LJE interview ever.

Here are 5 interesting IBM related quotes:

1. The company that does more Oracle installations than any other company in the world: that will be IBM.
2. IBM is who we’re targeting, its who we feel we need to compete with to be successful ourselves. We are happy to compete with IBM on level plain field.
3. I’d like IBM to explain what they meant when they said they took 250 customers away from Sun. I don’t think there is a single example of any Sun customer who’s replaced all of their machines with IBM computers.
4. I will be very clear: Solaris is way better than AIX.
5. Sun machines run faster than IBM machines and cost less.

How do you like the part on Cloud Computing? I just can’t disagree with him 🙂

Do you remember also this one?

And finally the part from last year’s OOW (2010):

Poll: which concept do you least trust, which one do you find most fake?

11.2.0.2: Top 10 new features for DBAs

In DBA, Oracle database on March 19, 2011 at 23:18

New features, new features.. What new can you expect from a relational database after more than 10 major versions? In fact, if we consider as different versions the three releases of 8i, the two releases of 9i, 10g and 11g and take into account that there never was version 1, we are now into Oracle version 16. Or is it version 17 as 11.2.0.2 comes as separate totally new installation with several new features? Something we have not witnessed before.

Have in mind that you will have to download about 5G of zipped files:

So, here are the Top 10 DBA features as I prefer them and as documented by Oracle. Feel free to disagree 🙂

1. The new PL/SQL package called DBMS_AUTO_SQLTUNE provides more restrictive access to the Automatic SQL Tuning feature. Which is as many know my favorite one in 11g along with EBR. With this package, access to Automatic SQL Tuning can be restricted to DBAs so that only they can change its configuration settings that effect run-time behavior of the query optimizer, such as enabling or disabling automatic SQL profile creation.

2. Edition-Based Redefinition: The EDITION attribute of a database service specifies the initial session edition for a session that is started using that service. If the program that creates a new session does not specify the initial session, then the edition name specified by the service is used. If the service does not specify the edition name, then the initial session edition is the database default edition. DBAs should work together with Developers to achieve proper implementation

3. Redundant Interconnect Usage: Oracle RAC requires a dedicated network connection between the servers of the Oracle RAC cluster. The dedicated network connection, called interconnect, is crucial for the communication in the cluster. Using redundant network connections for load balancing and for failure protection is recommended. While in previous releases, technologies like bonding or trunking had to be used to make use of redundant networks for the interconnect, Oracle Grid Infrastructure for a Cluster and Oracle RAC now provide a native way of using redundant network interfaces in order to ensure optimal communication in the cluster. Using redundant interconnects optimizes the stability, reliability, and scalability of an Oracle RAC cluster. At least that is what Oracle claim in the documentation.

4. Oracle Enterprise Manager DB Control provides support for Oracle RAC One Node databases. Additionally, support has been added in this release to Oracle Database Configuration Assistant (DBCA) to create an Oracle Real Application Clusters One Node (Oracle RAC One Node) database as part of the database creation process.

5. Control File updates can be disabled during NOLOGGING operations using DB_UNRECOVERABLE_SCN_TRACKING (set it to true or false)

6. Enhanced TRUNCATE functionality: While truncating a table or partition, you can now specify whether or not to keep any segments. Truncating a table or partition with the new extended syntax removes all segments and does not use any space until new data is inserted. All allocated space in a database can now be reclaimed by truncating tables or partitions with the new extended syntax, optimizing the space foot print of any application. Here is an example of the new syntax:

TRUNCATE TABLE emp DROP ALL STORAGE;

7. Columnar compression is now supported with Oracle Streams and XStream. This feature enables logical replication of tables compressed using Exadata Hybrid Columnar Compression. Is anyone using Streams with Exadata 🙂 ?

8. Important: the default size of the first extent of any new segment for a partitioned table is now 8M instead of 64K. The goal according to Oracle is to improve I/O performance. However, under certain circumstances, loading a table will take significantly more disk space.

9. Using binary XML with SecureFiles as the XMLType default storage: in this release, the default storage model has changed for XMLType from STORE AS CLOB to STORE AS SECURE FILE BINARY XML. This affects the storage used when an explicit STORE AS clause is not supplied when creating an XMLType table or column. Not specifying a STORE AS CLAUSE indicates that it is left to the database to determine what the optimal storage model should be.

10. Support for in-place upgrade of clients: Both in-place and out-of-place upgrades are supported for client installations. You now have the option of doing in-place client upgrades reducing the need for extra storage and simplifying the installation process.

Note that there are some interesting (underscore) parameters such as:

_DATAFILE_WRITE_ERRORS_CRASH_INSTANCE
_MEMORY_IMM_MODE_WITHOUT_AUTOSGA
_FIX_CONTROL
CURSOR_BIND_CAPTURE_DESTINATION

Beware the new tcp_ and udp_ kernel parameters for Solaris 10 in order to install 11.2.0.2.

Finally, here are the platforms for which Oracle 11.2.0.2 is available as of 19.03.2011:

Scorecards for ranking relational databases

In DB2 database, DBA, Oracle database, SQL on March 18, 2011 at 18:10

In June 1970, Edgar “Ted” F. Codd publicly publishes the following paper: A Relational Model of Data for Large Shared Data Banks. IBM refused to implement his model to preserve revenues of IMS/DB. Customers pressured IBM to build it (System-R project) and a relational language SEQUEL: Structured English Query Language, later SQL, evoked.

Codd’s produced 12 rules, which in fact is a set of thirteen rules (numbered zero to twelve), that define a relational database.

Depending on your brand preferences, one can play with the words in the definition of a relational database and for example get Oracle on top as most “relational”. Or get IBM or even 3-4 other brands above Oracle. Some give one point for each rule being fulfilled, thus 12 points is the maximum. Some give 0.5 points for partially fulfilled rules in which case Oracle would be rated as 11.5/12 as (R6) is not satisfied (and will never be).

Oracle have come closer than all other brands to solving the update view issue using MVs (materialized views). DB2 can also support the update view fully, if the integrity between a view and its base table can be maintained. Microsoft SQL Server has to solve the view problem as they are behind.

Some people claim that there is no RDBMS that supports totally the update views rule (R6), the management of null values rule (R3) and and logical data independence (R9). My claim is that Oracle satisfies (R3) & (R9). For (R6) , I give Oracle 0.5 points.

You can see rather obscure statements like on http://www.firstsql.com that “only Ingres and FirstSQL among the RDBMS vendors have a solid foundation in their engine for future improvements.” Additionally they say that “Oracle would rate about 60% compliance. Others are lower except Ingres and FirstSQL which rate about 65% compliance. Interestingly enough, Xbase products rate 0% in compliance with the relational model.” Yeah, right 🙂

Overall, regardless of the question is Oracle, Microsoft or IBM most relational in their database products, we should remember than nowadays it doesn’t really matter, does it? What Codd (and Date) produced was just a model and they missed (or didn’t bother to wait for) the business train. Larry Ellison spotted the business idea and built a database empire. With all the limitations faced throughout the years.

Today, the technology does not drive the business. It is vice versa. We develop technology to make business move forward. That is why I don’t care at all if DB2, Oracle or SQL Server is 100% relational. It doesn’t matter any longer! What would you rather be: right or rich?

And finally, here is for your reference Chris Date’s version of the Codd’s Rules and note that in Version 8, Oracle was 8.5/12 RDBMS compliant 🙂

0. (Yes, there is a Rule 0!)
For a system to qualify as a RELATIONAL, DATABASE, MANAGEMENT system, that system must use its RELATIONAL facilities (exclusively) to MANAGE the DATABASE.

1. The information rule:
The information rule simply requires all information in the database to be represented in one and only one way, Namely by values in column positions within rows of tables.

2. The guaranteed access rule:
This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the mane of the containing table, the name of the containing column and the primary key value of the containing row.

3. Systematic treatment of null values:
The DBMS is required to support a representation of “missing information and inapplicable information” that is systematic, distinct from all regular values (for example, “distinct from zero or any other number,” in the case of numeric values),
and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

4. Active online catalog based on the relational model:
The system is required to support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language.

5. The comprehensive data sublanguage rule:
The system must support a least one relational language that
(a) has a linear syntax,
(b) can be used both interactively and within application programs,
(c) supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval),
security and integrity constraints, and transaction management operations (begin, commit, and rollback).

6. The view updating rule:
All views that are theoretically updatable must be updatable by the system.

7. High-level insert, update, and delete:
The system must support set-at-a-time INSERT, UPDATE, and DELETE operators.

8. Physical data independence:
Self-explanatory.

9. Logical data independence:
Self-explanatory

10. Integrity independence:
Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.

11. Distribution independence:
Existing applications should continue to operate successfully
(a) when a distributed version of the DBMS is first introduced;
(b) when existing distributed data is redistributed around the system.

12. The nonsubversion rule:
If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system (e.g.) bypassing a relational security or integrity constraint.

P.S. I have not mentioned anything about MySQL for in my opinion MySQL is rather an SDK (Software Development Kit) than a database.