Dontcheff

Archive for February, 2024|Monthly archive page

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

In Autonomous, Cloud, Database tuning, DBA, JSON, Oracle database, Performance on February 14, 2024 at 20:22

The Oracle Database and in particular the Oracle Autonomous Database offers multiple technologies to accelerate queries and improve database performance over JSON data, including indexes, materialized views, in-memory column storage, and Exadata storage-cell pushdown.

Simple Oracle Document Access (SODA) is a NoSQL query language for storing, editing and deleting JSON documents. A SODA collection is a set of documents (in particular JSON) that is backed by an Oracle Database table or view. By default, creating a SODA document collection creates the following in the Oracle Database:

  • Persistent default collection metadata.
  • table for storing the collection, in the database schema to which your SODA client is connected.

If the init.ora parameter compatible is at least 20, then SODA uses JSON data type by default for JSON content, and the version method is UUID. If the init.ora parameter compatible is less than 20, then SODA uses BLOB textual data by default for JSON content and the data is character data. 

Collection metadata is composed of multiple components. The kind of database you use determines the collection metadata that is used by default, and which of its field values you can modify for custom metadata.

Let us consider a collection of movies within the ADMIN schema in a 19c Oracle Autonomous JSON Database (think of one JSON file per movie). After opening JSON from Database Actions, we will create an index on the MOVIES colection:

Creating an index for a JSON collection requires the name and the type. The different options are Functional, Spatial and Search.

Before creating a functional index on the runtime field (we will be running order-by queries against it), let us see what are the field needed (fields to enter):

  • For a functional type index, the fields to enter are:
    • Unique: make all indexed values unique
    • Index Nulls: use the index in order-by queries
    • Path Required: the path must select a scalar value, even a JSON null value
    • Properties: select the property that you want to index on, or easier is to just type * to display all available document properties in the collection
    • Composite Index: use more than one property
    • Advanced: change the storage properties of the indexed property
  • For search index, the options are:
    • Dataguide off-on: create JSON data guide for collection
    • Text Search off-on: index all properties in documents to support full-text search based on string equality
    • Range Search off-on: choose on to support range search when string-range search or temporal search (equality or range) is required
  • Spatial index is used to index GeoJSON geographic data. The selected property should be of GeoJSON type. See Using GeoJSON Geographic Data. For spatial index, the options are:
    • Path Required: the path must select a value, even if it is a JSON null value
    • Lax: the targeted field does not need to be present or does not have a GeoJSON geometry object as its value

The properties of the selected index appear in JSON format below the listed indexes:

We can also index a singleton scalar field using SQL, here is how:

CREATE INDEX YEAR_NDX ON movies (json_value(data, '$.year.number()' ERROR ON ERROR));

Item method numberOnly() is used in the path expression that identifies the field to index, to ensure that the field value is numeric. As I have the year field in one of the JSON files as a string, I am getting the following error if I use the method numberOnly(): ORA-01722: invalid number. Method numberOnly() is used instead of method number(), because number() allows also for conversion of non-numeric fields to numbers. Clearly as some moview are produced during the same year I cannot make the index unique: ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found.

Creating a collection relational views of JSON documents is simple, select the columns we need, say title, year and runtime:

In SQL, we can use the view and order the movies by runtime:

Regardless of your database release you can create whatever Oracle Database indexes you need directly, using:

(1) the JSON Page of Using Oracle Database Actions, check Creating Indexes for JSON Collections

(2) Simple Oracle Document Access (SODA)

(3) SQL — see Indexes for JSON Data in Oracle Database JSON Developer’s Guide.

Using the JSON page in ADB is perhaps the easiest approach to indexing JSON data.

The static data dictionary views USER_SODA_COLLECTIONS lists the basic features of all of your SODA collections. Correspondently, you have also DBA_SODA_COLLECTIONS and ALL_SODA_COLLECTIONS. Collection metadata, expressed in JavaScript Object Notation (JSON) can be obtained from the last column, called JSON_DESCRIPTOR.

Users will typically work with JSON collections using native language drivers, for example, SODA for Java or SODA for Python. SODA native language drivers generally provide more throughput (operations per second) than the REST driver (SODA for REST). It is recommended to configure the SODA drivers as follows:

  • Enable SODA Metadata Cache: The SODA driver needs to know the metadata of each JSON collection (the column names, types, etc.). By enabling the metadata cache, roundtrips to the database can be saved, improving latency and throughput.
  • Enable Statement Cache: Statement caching improves performance by caching executable statements that are used repeatedly, such as in a loop or in a method that is called repeatedly. For Java, the statement cache is enabled using JDBC.
  • For load-balanced systems: turn off DNS caching: Load balancing allows to distribute SODA operations across different nodes. If DNS caching is turned on, then all connections are likely to use the same node and nullifying the load balancing. For Java, the following system property should be set: inet.addr.ttl=0

The database performance tuning techniques also apply to SODA: for example, SODA collections can be partitioned or sharded, and queries can be accelerated using indexes and/or materialized views.

How to monitor all that? You can turn on performance monitoring of the SQL operations that underlie a SODA read or write operation, by adding a SQL hint to the SODA operation.

Use only hint MONITOR (turn on monitoring) or NO_MONITOR (turn off monitoring). You can use this to pass any SQL hints, but MONITOR and NO_MONITOR are the useful ones for SODA, and an inappropriate hint can cause the optimizer to produce a suboptimal query plan.

Note in addition that when using SODA with Autonomous Database the following restrictions apply:

  • Automatic indexing is not supported for SQL and PL/SQL code that uses the SQL/JSON function json_exists. See SQL/JSON Condition JSON_EXISTS for more information but note that all the collection APIs use JSON_EXISTS for filtering so … the benefits might be negligible.
  • Automatic indexing is not supported for SODA query-by-example (QBE)

Final note: since Auto Indexing is disabled by default in the Autonomous Database, you can enable it by running: EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');