Dontcheff

Archive for January, 2019|Monthly archive page

Few interesting facts about Oracle ADB, Redshift and Snowflake

In Autonomous, Data Warehouse, Databases, DBA on January 14, 2019 at 16:17

Building a new data warehouse in the cloud or migrating an existing one to cloud requires careful consideration and the answer to the question “Which cloud should I use?” is often “It depends”.

An interesting comparison of system properties comparing Amazon Redshift vs. Oracle vs. Snowflake can be found on db-engines.com

There are several other options too: Azure SQL Data Warehouse, Presto, Google BigQuery, etc.

An interesting benchmark paper called “Data Warehouse Benchmark: Redshift, Snowflake, Azure, Presto and BigQuery” by Fivetran is worth reading!

Another comparison called Interactive Analytics: Redshift vs Snowflake vs BigQuery is already more than 2 years old but still interesting.

Recently, things have changed. Oracle’s Autonomous Data Warehouse Cloud has been in GA for almost 1 year (since March 2018). ADW is for enterprise loads and mission critical systems arguably the best solution right now.

Viscosity compared both Oracle Autonomous and Amazon Redshift. The result? Check it here: Amazon vs Oracle: Data Warehouse Services, How do They Compare?

In short, the conclusion of the research above is:

– Oracle’s ADW was able to achieve data retrieval at the lowest latencies, and achieved the highest volume of queries per hour. In terms of serial query execution and multi-user query throughput.
– Oracle’s ADW consistently outperformed Redshift by a factor of 4x in both sets of tests.

And do not ignore the db-engines ranking! Only one of the three is in the Top 10.

What is interesting to know on top of all papers above are these 10 differences or let us call them less known technical facts (in no order of importance) between Oracle Autonomous, Amazon Redshift and Snowflake:

1. Snowflake compute usage is billed on a per-second basis, with a minimum of 60 seconds. Amazon Redshift is based on PostgreSQL 8.0.2 and is built on top of technology from the MPP data warehousing company ParAccel. Oracle Autonomous Database is based on Exadata and 18c.

2. In Oracle Autonomous Cloud, you can provision up to 128 CPUs and 128TB directly from the cloud console but you can provision more if needed.

3. Snowflake manages all aspects of how data is stored in S3 including data organization, file sizes, structure, compression, and statistics.

4. The only things needed for BYOL in Oracle Autonomous Database are Multitenant and RAC (only when using more than sixteen OCPUs). The standby option (not yet available) will require Active Data Guard as well.

5. Snowflake does not disclose the information about processing power and memory. Oracle do disclose the information via internal views but you cannot directly define the SGA or PGA size.

6. Redshift is not built as a high-concurrency database with several concurrent running queries and AWS recommends that you execute no more than 15 queries at a time. The number of concurrent user connections that can be made to a cluster is 500.

7. Oracle ADW and ATP allow you to partition both indexes and tables. In Snowflake partitioning is handled internally. Amazon Redshift does not support tablespaces, table partitioning, inheritance, and even certain constraints. Amazon Redshift Spectrum supports table partitioning using the CREATE EXTERNAL TABLE command.

8. The maximum number of tables in Amazon Redshift is 9,900 for large and xlarge cluster node types and 20,000 for 8xlarge cluster node types. The limit includes temporary tables. An Oracle database does not have a limit for the number of tables.

9. Oracle automatically applies all security updates (and online!) to ensure data is not vulnerable to known attack vectors. Additional in-database features like Virtual Private Database and Data Redaction are also available.

10. There is no operation in Snowflake for collecting database statistics. It is handled by the engine. In Oracle, database statistics collection is allowed. Both Oracle Autonomous and Amazon Redshift monitor changes to your workload and automatically update statistics in the background.

Finally, here are official URLs of all three products:

Oracle Autonomous Database
Amazon Redshift
Snowflake Database