Dontcheff

Archive for the ‘Data Warehouse’ Category

Few interesting facts about Oracle ADB, Redshift and Snowflake

In Autonomous, Data Warehouse, Databases, DBA on January 14, 2019 at 16:17

Building a new data warehouse in the cloud or migrating an existing one to cloud requires careful consideration and the answer to the question “Which cloud should I use?” is often “It depends”.

An interesting comparison of system properties comparing Amazon Redshift vs. Oracle vs. Snowflake can be found on db-engines.com

There are several other options too: Azure SQL Data Warehouse, Presto, Google BigQuery, etc.

An interesting benchmark paper called “Data Warehouse Benchmark: Redshift, Snowflake, Azure, Presto and BigQuery” by Fivetran is worth reading!

Another comparison called Interactive Analytics: Redshift vs Snowflake vs BigQuery is already more than 2 years old but still interesting.

Recently, things have changed. Oracle’s Autonomous Data Warehouse Cloud has been in GA for almost 1 year (since March 2018). ADW is for enterprise loads and mission critical systems arguably the best solution right now.

Viscosity compared both Oracle Autonomous and Amazon Redshift. The result? Check it here: Amazon vs Oracle: Data Warehouse Services, How do They Compare?

In short, the conclusion of the research above is:

– Oracle’s ADW was able to achieve data retrieval at the lowest latencies, and achieved the highest volume of queries per hour. In terms of serial query execution and multi-user query throughput.
– Oracle’s ADW consistently outperformed Redshift by a factor of 4x in both sets of tests.

And do not ignore the db-engines ranking! Only one of the three is in the Top 10.

What is interesting to know on top of all papers above are these 10 differences or let us call them less known technical facts (in no order of importance) between Oracle Autonomous, Amazon Redshift and Snowflake:

1. Snowflake compute usage is billed on a per-second basis, with a minimum of 60 seconds. Amazon Redshift is based on PostgreSQL 8.0.2 and is built on top of technology from the MPP data warehousing company ParAccel. Oracle Autonomous Database is based on Exadata and 18c.

2. In Oracle Autonomous Cloud, you can provision up to 128 CPUs and 128TB directly from the cloud console but you can provision more if needed.

3. Snowflake manages all aspects of how data is stored in S3 including data organization, file sizes, structure, compression, and statistics.

4. The only things needed for BYOL in Oracle Autonomous Database are Multitenant and RAC (only when using more than sixteen OCPUs). The standby option (not yet available) will require Active Data Guard as well.

5. Snowflake does not disclose the information about processing power and memory. Oracle do disclose the information via internal views but you cannot directly define the SGA or PGA size.

6. Redshift is not built as a high-concurrency database with several concurrent running queries and AWS recommends that you execute no more than 15 queries at a time. The number of concurrent user connections that can be made to a cluster is 500.

7. Oracle ADW and ATP allow you to partition both indexes and tables. In Snowflake partitioning is handled internally. Amazon Redshift does not support tablespaces, table partitioning, inheritance, and even certain constraints. Amazon Redshift Spectrum supports table partitioning using the CREATE EXTERNAL TABLE command.

8. The maximum number of tables in Amazon Redshift is 9,900 for large and xlarge cluster node types and 20,000 for 8xlarge cluster node types. The limit includes temporary tables. An Oracle database does not have a limit for the number of tables.

9. Oracle automatically applies all security updates (and online!) to ensure data is not vulnerable to known attack vectors. Additional in-database features like Virtual Private Database and Data Redaction are also available.

10. There is no operation in Snowflake for collecting database statistics. It is handled by the engine. In Oracle, database statistics collection is allowed. Both Oracle Autonomous and Amazon Redshift monitor changes to your workload and automatically update statistics in the background.

Finally, here are official URLs of all three products:

Oracle Autonomous Database
Amazon Redshift
Snowflake Database

Advertisements

Migrating Amazon Redshift to Autonomous Data Warehouse Cloud

In Autonomous, Data Warehouse, DBA, Exadata, PostgreSQL on July 4, 2018 at 18:34

“Big Data wins games but Data Warehousing wins championships” says Michael Jordan. Data Scientists create the algorithm, but as Todd Goldman says, if there is no data engineer to put it into production for use by the business, does it have any value?

If you google for Amazon Redshift vs Oracle, you will find lots of articles on how to migrate Oracle to Redshift. Is it worth it? Perhaps in some cases before Oracle Autonomous Data Warehouse Cloud existed.

Now, things look quite different. “Oracle Autonomous Data Warehouse processes data 8-14 times faster than AWS Redshift. In addition, Autonomous Data Warehouse Cloud costs 5 to 8x less than AWS Redshift. Oracle performs in an hour what Redshift does in 10 hours.” At least according to Oracle Autonomous Data Warehouse Cloud white paper. And I have nothing but great experiences with ADWC. For the past half an year or so.

But, what are the major issues and problems reported by Redshift users?

One of the most common complaints involves how Amazon Redshift handles large updates. In particular, the process of moving massive data sets across the internet requires substantial bandwidth. While Redshift is set up for high performance with large data sets, “there have been some reports of less than optimal performance,” for the largest data sets. An article by Alan R. Earls entitled Amazon Redshift review reveals quirks, frustrations claims that reviewers want more from the big data service. So:

Why to migrate from Amazon Redshift to Autonomous Data Warehouse Cloud?

1. Amazon Redshift is ranked 2nd in Cloud Data Warehouse with 14 reviews vs Oracle Exadata which is ranked 1st in Data Warehouse with 55 reviews.

The top reviewer of Amazon Redshift writes “It processes petabytes of data and supports many file formats. Restoring huge snapshots takes too long”. The top reviewer of Oracle Exadata writes “Thanks to smart scans, the amount of data transferred from storage to database nodes significantly decreases”.

2. Oracle Autonomous dominates in features and capabilities:

DB-engines shows an excellent system properties comparison of Amazon Redshift vs. Oracle.

In addition, reading through these thoughts on using Amazon Redshift as a replacement for an Oracle Data Warehouse can be worthwhile. It shows how Amazon Redshift compares with a more traditional DW approach. But Enterprises have some Redshift concerns, including:

– The difference between versions of PostgreSQL and the version Amazon uses with Redshift
– The scalability of very large data volume is limited and performance suffers
– The query interface is not modern, interface is a bit behind
– Redshift needs more flexibility to create user-defined functions
– Access to the underlying operating system and certain database functions and capabilities aren’t available
– Starting sizes may be too large for some use cases
– Redshift also resides in a single AWS availability zone

3. Amazon Redshift has several limitation: Limits in Amazon Redshift. On the other hand, you can hardly find a database feature not yet implemented by Oracle.

4. But the most important reason why to migrate to ADWC is that the Oracle Autonomous Database Cloud offers total automation based on machine learning and eliminates human labor, human error, and manual tuning.

How to migrate from Amazon Redshift to Autonomous Data Warehouse Cloud?

Use the SQL Developer Amazon Redshift Migration Assistant which is available with SQL Developer 17.4. It provides easy migration of Amazon Redshift environments on a per-schema basis.

Here are the 5 steps on how to migarte from Amazon Redshift to Autonomous Data Warehouse Cloud:

1. Connect to Amazon Redshift
2. Start the Cloud Migration Wizard
3. Review and Finish the Amazon Redshift Migration
4. Use the Generated Amazon Redshift Migration Scripts
5. Perform the Post Migration Tasks

Check out what Paul Way says about why Oracle thinks Autonomous IT can ultimately win the Cloud War.

Finally, here is what Amazon CTO Werner Vogels is saying: Our cloud offers any database you need. And I agree with him that a one size fits all database doesn’t fit anyone. But mission and business critical enterprise systems with huge requirements and resource needs deserve only the best.