Dontcheff

Do not go beyond this point: on the “obvious advantages” of Database Shards

In DBA, Oracle database on March 28, 2011 at 11:22

Caution! Database Sharding is like the anti-consolidation of databases. It is splitting the database into many small databases. You spend years and years on trying to unify and gather together databases and all of a sudden you are told that there is an application managed scaling technique using hundreds of independent databases. Tricky, right?

Sometimes, when planning database solutions in terms of scalability and massiveness, going beyond a certain point might be risky. This is the case when database shards may be of huge help (big website used globally). The word shard may sometimes refer to a piece of glass, a sea glass that can be found almost everywhere, for example at the beaches near San Francisco.

This is my friend Pasi Jaakkola. I took the picture in 2006 during Oracle Open World 2006. We brought however no databases, no glass shards from Frisco, not literally at least. But we both often dare “cross the point” and take the extra mile.

Here are 3 sources on database sharding:

1. Wei Hu’s excellent white paper Global Scale Web 2.0 Data Platforms Sharding with Oracle Database

2. Tom Kyte’s discussion on Shared Nothing

3. The Rise of Database Sharding on CodeFutures.

Let us have a look at number 3 and the 3 claims there about the several advantages of smaller databases:

Claim 1: Smaller databases are easier to manage
Claim 2: Smaller databases are faster
Claim 3: Database Sharding can reduce costs

Having a quick glance at the three statements above most people would say yes, true. But people who had been involved in the DBA work on data center level (meaning working with 100s of databases) might tend to disagree. Why? Here we go:

“Smaller databases are easier to manage”: if you have to break a databases into smaller databases, then you will have more monitoring configurations for Grid Control, thus the number of possible problems will increase. You will have to configure more backups and more problems on the way. Naturally. If you have to recover the data one day probably consistency of the data will be a rather important thing to consider, so you might have to recover not just one database to the timestamp needed. So what is easier to manage for a smaller database? The claim “Routine table and index optimizations can stretch to hours or days, in some cases making regular maintenance infeasible” is unclear to me. What is table optimization? Gathering statistics for the table? Online reorganization? Beats me 🙂

Next, “Smaller databases are faster”: yeah right. I have seen databases less than 1G in size having SQL statements running longer than the complete trilogy of the Godfather. I don’t even want to get into that. If a database is fast or not has nothing to do with the size. Check this claim out: “By hosting each shard database on its own server, the ratio between memory and data on disk is greatly improved, thereby reducing disk I/O.” Great! Nobody cares about the extra computers, the software licenses.. Let’s have a look at the other statement: “This results in less contention for resources, greater join performance, faster index searches, and fewer database locks.” So the smaller the database, the less locks you have? At least this is what I understand from the quotes sentence. And we all know that locks are due to clumsy SQLs and logical structure of the database, not due to the growing size of the database.

And now the most bizarre claim: “Database Sharding can reduce costs”. Right, cost! That is what all we want to hear. Byte the hook! Can but cannot. It can rather increase the cost. More hardware, more license fees, more databases to manage.

The real benefits of Database Sharding can be found in the Oracle’s white paper above. It is written by Wei Hu and it is exceptionally good! And good results with Sharding can be achieved with Oracle by the way. Forget about MySQL here. But decide for yourself after you read the white paper.

These are frosted shards of glass:

What is easier: to break the glass into these small pieces or do build the glass out of all these shards?

And the questions asked in the CodeFutures arcticle: “If Database Sharding is highly scalable, less costly, and improves performance, why hasn’t adoption of the technology been more widespread?” Probably, because not every Sharding is highly scalable, less costly, and improves performance. Really.

Conclusion: the real benefit of Database Sharding is often neglected. Main reason being the concept is wrongly comprehended. As Wei Hu says: “Sharding across many databases multiplies Oracle’s advantages a-thousand-fold”!

Advertisements
  1. Hi Julian, great post about database sharding 🙂

    Regarding the point 3 (“you could save costs with sharding”):
    – If you assume that all shards are build up of Oracle Enterprise Edition instances, you are absolutely right, not much chance to save costs
    – but if we assume that the standard edition is enough (or even standard one) for a shard but not enough for a central one , then the world looks different (especially if we take into account the licensing policy for the enterprise edition that is driving your licensing costs higher with every new core packed into each new processor in half a year base)
    – Similar if you assume that you need Oracle EE for a central DB but you could build your shards out of e.g. the typical mysql farms (1 master, 1 shadow master, N readonly replicas).

    you will probably ask: “OK, but how can you compensate the lack of HA features …” : Just one example would be distributed caching. On a well design and scaled system you would be even able to shut your database down for some minutes and the in memory cluster would run further without any interruption. Yes, of course that costs time for development, testing and management but thanks to Oracle, the potential save of license costs is tremendous (just take an example of a commodity server with 2×8 core CPUs , the list price of Oracle EE + RAC + partitioning + the standard EM packages would be around half a million $ and now assume you would need 10 of them 😉 )

    Best regards,
    Alex

  2. […] On more global terms, you can you 11g XE for Database Sharding. 11 gigabytes of user data is quite sufficient for one shard! When it grows above say 10G, you will […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: