Dontcheff

Bloom filters for DBAs

In Database tuning, DBA, Exadata, Oracle database, RAC on August 28, 2012 at 19:59

Lady Bird Johnson said: Where flowers bloom so does hope. I would re-phrase: Where Bloom filters bloom, so does database performance.

A Bloom filter is a probabilistic algorithm for doing existence tests in less memory than a full list of keys would require. In other words, a Bloom filter is a method for representing a set of n elements (also called keys) to support membership queries. John Rose wrote a very detailed blog article about Bloom filters. Another good one is by Christian Antognini entitled simply Bloom Filters.

These underscore parameters are all related to Bloom filters:

The Oracle database makes use of Bloom filters in the following 4 situations:

- To reduce data communication between slave processes in parallel joins: mostly in RAC
– To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list
– To support result caches: when you run a query, Oracle will first see if the results of that query have already been computed and cached by some session or user, and if so, it will retrieve the answer from the server result cache instead of gathering all of the database blocks
– To filter members in different cells in Exadata: Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom filters as to determine whether a row is a member of the desired result set.

You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop columns of the execution plan and PART JOIN FILTER CREATE in the operations column:

Two hints can be used: px_join_filter and no_px_join_filter; and there are a couple of views for monitoring Bloom filters:

- v$sql_join_filter: information about Bloom filters; number of rows filtered out and probed by Bloom filters
– v$pq_tqstat: check reduced communication due to usage of Bloom filters

There are two undocumented Oracle functions: SYS_OP_BLOOM_FILTER and SYS_OP_BLOOM_FILTER_LIST:

In 11gR2 execution plans you may see :BF0000 replaced by KEY(AP). This is due to new functionality that performs partition pruning based on AND (= And Pruning) multiple pruning descriptors.

In 11.2.0.2 and later a SQL query on a table might not deliver any results. Unfortunately, it might not deliver any error message either. The solution is to disable Bloom filters:

alter system set “_bloom_filter_enabled” = false scope=both;

Restrictions:

- For Exadata customers, I would suggest to set a low degree of parallelism on small tables and indexes in order to take advantage of Bloom filters.
– Bloom filters are not supported in partition-wise joins. This is documented as Bug 14325392 but will be fixed in 12.1.0.
– About small tables with indexes: If Oracle goes for an index range scan of that small index, then it will be done in serial and not parallel and thus no Bloom filter will be used (just drop those small indexes).
– CURRENT_DATE does not use Bloom filters while SYSDATE does.
– Bloom filters are not used in DML statements: Bug 13801198 says that a simple two table join shows use of Bloom filter/pruning when running as a query and not when running as part of a INSERT as SELECT. The performance delta is 4:24 vs 0:24.

About these ads
  1. Good topic, Julian.

    If you readers are interested there is an archived video presentation I offered to IOUG that has some Bloom filter information. The video webcast is called Oracle Exadata Technical Deep Dive. Part I and can be downloaded here:

    http://kevinclosson.wordpress.com/written-works-and-presentations/

    To the best of my knowledge bloom filters are still 512KB. Unless joining a really small table to a really huge table the false positives really reduce the effectiveness of storage (Exadata) payload reduction.

    As you point out, Bloom filters have been useful for years in reducing message payload between intra-node parallel query slaves.

    Finally, can you please elaborate on what you mean in this phrase: “To filter members in different cells in Exadata”

  2. [...] Julian Dontcheff throws some light on not-much discussed Bloom Filters. [...]

  3. “Bloom filters for DBAs « Julian Dontcheff’s Database Blog” certainly got myself simply addicted on your webpage! I personallywill certainly be back even more frequently. Many thanks -Estela

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 257 other followers

%d bloggers like this: