Dontcheff

Archive for the ‘Data’ Category

On Kafka, JSON, PL/SQL and Advanced Queuing in Oracle Database 20c

In Autonomous, Cloud, Data, DBA, New features, PL/SQL on June 1, 2020 at 05:50

Oracle Corp. starts today (June 1st, 2020) a new fiscal year as the Oracle Database is slowly moving towards version 20c (still in preview mode only) which comes with almost 500 init.ora parameters and 5326 hidden/underscore parameters. There is a new one for 20c called kafka_config_file. But let us first take a step back and see its connection with Advanced Queuing.


Advanced Queuing is available in all editions of Oracle database, including XE. Since Oracle Advanced Queuing is implemented in database tables, all the operational benefits of high availability, scalability, and reliability are applicable to queue data.

Advanced Queuing can be accessed through the several interfaces: PL/SQL, Visual Basic, Java, Java Message Service, JDBC, ODP.NET, OCI (do not read Oracle Cloud Infrastructure – think of C language), etc.

Using PL/SQL to access Oracle Advanced Queuing is probably the most common method: using the PL/SQL packages DBMS_AQADM and DBMS_AQ.

Reading all that, one might think of Kafka. Kafka is a distributed, partitioned, replicated commit log service providing the functionality of a messaging system, but with a unique design. The aim here is not position Kafka and Oracle AQ against each other but show what is new in Oracle Database 20c that brings Kafka and Oracle together and what is important from DBA point of view.

Todd Sharp explained recently how to use Kafka Connect wth Oracle Streaming Service and Autonomous Database. Let us now see what is new in Oracle Database 20c:

Oracle Database 20c introduces Transactional Event Queues (TEQ), which are partitioned message queues that combine the best of messaging, streaming, direct messages, and publish/subscribe. TEQ operates at scale on the Oracle database. TEQ provides transactional event streaming, and runs in the database in a scale of 10s to 100s of billions of messages per day on 2-node to 8-node Oracle RAC databases, both on-premise and on the cloud. TEQ has Kafka client compatibility, which means, Kafka producer and consumer can use TEQ in the Oracle database instead of a Kafka broker. Check Advanced Queuing in 20c for more details.

1. Advanced Queuing: Kafka Java Client for Transactional Event Queues

Kafka Java Client for Transactional Event Queues (TEQ) enables Kafka application compatibility with Oracle database. This provides easy migration of Kafka applications to TEQ.

You do not have to manage a separate Kafka infrastructure and this new feature simplifies the event-driven application architectures with an Oracle converged database that now includes events data. Starting from Oracle Database 20c, Kafka Java APIs can connect to Oracle database server and use Transactional Event Queues (TEQ) as a messaging platform. Developers can migrate an existing Java application that uses Kafka to the Oracle database. A client side library allows Kafka applications to connect to Oracle database instead of Kafka cluster and use TEQ messaging platform transparently.

Two levels of configuration are required to migrate Kafka application to TEQ messaging platform:

– Database level configuration
– Application level configuration

Kafka application needs to set certain properties which will allow OKafka library to locate the Oracle Database. This is analogous to how Kafka application provides zoo keeper information. These connection properties can be set in the following two ways:

– using database user and password provided in plain text
– using JDBC wallet

The following are the prerequisites for configuring and running Kafka Java client for TEQ in an Oracle Database. Create a database user. Grant the following user privileges:

grant connect, resource to user;
grant execute on dbms_aq to user;
grant execute on dbms_aqadm to user;
grant execute on dbms_aqin to user;
grant execute on dbms_aqjms to user;
grant select_catalog_role to user;

Next, set the correct database init.ora parameter to use TEQ:

streams_pool_size=512M

Set the local listener too:

LOCAL_LISTENER= (ADDRESS=(PROTOCOL=TCP)(HOST= )(PORT=))

2. Advanced Queuing Support for JSON Data Type and PL/SQL

Oracle Database Advanced Queuing now supports the JSON data type.

Many client application and micro-services which use Advanced Queuing for messaging have better performance if they use JSON data type to handle JavaScript Object Notation (JSON) messages.

In this aspect, PUBLIC is granted EXECUTE privilege on all these types:

AQ$_AGENT, AQ$_AGENT_LIST_T, AQ$_DESCRIPTOR, AQ$_NTFN_DESCRIPTOR, AQ$_NTFN_MSGID_ARRAY, AQ$_POST_INFO, AQ$_POST_INFO_LIST, AQ$_PURGE_OPTIONS_T, AQ$_RECIPIENT_LIST_T,
AQ$_REG_INFO, AQ$_REG_INFO_LIST, AQ$_SUBSCRIBER_LIST_T, DEQUEUE_OPTIONS_T, ENQUEUE_OPTIONS_T, QUEUE_PROPS_T, SEEK_INPUT_T, , EK_OUTPUT_T, SYS.MSG_PROP_T, MESSAGE_PROPERTIES_T, MESSAGE_PROPERTIES_ARRAY_T, MSGID_ARRAY_T

Regarding the new features of PL/SQL in 20c, check first the interesting example of Steven Feuerstein on extension of loop iterators.

Check the New Features in Release 20c for Oracle Database PL/SQL for more details on PL/SQL extended iterators, PL/SQL qualified expressions enhancements, SQL macros, the new JSON data type and the new pragma SUPPRESSES_WARNING_6009.

3. Advanced Queuing: PL/SQL Enqueue and Dequeue Support

The following features are new in this release:

– Kafka Java Client for Transactional Event Queues (TEQ) which enables Kafka application compatibility with Oracle Database and thus providing easy migration of Kafka applications to TEQ
– PL/SQL Enqueue and Dequeue Support for JMS Payload and non-JMS Payload in Transactional Event Queues
– Transactional Event Queues for Performance and Scalability
– Simplified Metadata and Schema in Transactional Event Queues
– Support for Message Retention and Seekable Subscribers
– Advanced Queuing Support for JSON Data Type

For all the details, check the Changes in Oracle Database Advanced Queuing Release 20c.

In terms of performance and scalability, Oracle Transactional Event Queues have their Queue tables partitioned in 20c into multiple Event Streams which are distributed across multiple RAC nodes for high throughput messaging and streaming of events.

Remember that in 10.1, AQ was integrated into Oracle Streams and thus Oracle AQ was called “Oracle Streams AQ”. But in 12.1, Oracle Streams got deprecated and AQ was again named just “Oracle AQ”.

And finally: here is the 546 page long Transactional Event Queues and Advanced Queuing User’s Guide along with few good additional articles:

Oracle + Kafka = Better Architecture by Jonathan Wallace
Streaming data from Oracle into Kafka by Robin Moffatt
Extending Oracle Streaming with Kafka Compatibility by Somnath Lahiri

Automatic Zone Maps in the Oracle Database

In Data, Database tuning, Databases, DBA, New features, Oracle database on May 18, 2020 at 06:18

A zone is a set of a contiguous data blocks on disk.

A zone map is an index-like structure built on a table and stores information about the zones of that table.

There are 2 major differences between indexes and zone maps:

– A zone map stores information per zone instead of per row which makes it much more compact than an index
– A zone map is not actively managed the way an index is kept in sync with the DML on the table

Zone maps are closer as a concept to Exadata’s storage indexes than to B-tree indexes.

Before going into how Automatic Zone Maps work in Oracle 20c, let me explain the concept with an example. Consider a small table containing basic information about some relational databases from db-engines.com (rank, score, initial and last release, cloud based):

The RDBMS_BRANDS segment has 6 data blocks with 2 rows per block:

Let us now create the zonemap on the RDBMS_BRANDS table (on 3 columns only):

 
SQL> CREATE MATERIALIZED ZONEMAP rdbms_zmap ON 
rdbms_brands (db_engines_rank, db_engines_score, initial_release); 

Materialized zonemap RDBMS_ZMAP created.

We have now 3 zones and each zone contains two blocks and stores the minimum and maximum of db_engines_rank, db_engines_score and initial_release:

Next, let us run a query returning all RDBMS brands with ranking score more than 1000:

Looking at the execution plan below we see that Oracle is scanning only Zone 1 as the maximum score in all other zone is smaller than 1000:

That is how zone maps work … but what is new in Oracle 20c?

We can now enable automatic creation and maintenance of basic zone maps for both partitioned and non-partitioned tables. But for now, the creation is not available for join zone maps, IOTs, external tables or temporary tables!

In 20c, you can use the new package DBMS_AUTO_ZONEMAP to enable Automatic Zone Maps in the database. Automatic zone map creation is turned off by default.

These four values are allowed for the parameter AUTO_ZONEMAP_MODE:

ON: Turns on auto zone map feature completely. Both for foreground and background zone map creation and maintenance
OFF: Turns off auto zone map feature completely. Both for foreground and background zone map creation and maintenance
FOREGROUND: Turns on only for foreground zone map creation and maintenance
BACKGROUND: Turns on only for background zone map creation and maintenance

You may use the ACTIVITY_REPORT function to view auto zone map activity for a given time window. Note that the background job that performs automatic zone map processing starts once per hour and each run may last up to three hours.

 
SET LONG 100000
SELECT dbms_auto_zonemap.activity_report() report FROM dual;

These 2 zonemaps related views show the most important information DBAs need:

DBA_ZONEMAPS displays all zone maps in the database
DBA_ZONEMAP_MEASURES displays the measures for all zone maps in the database

On a final note: Automatic Zone Maps are available for now only on Exadata and requires the Oracle Partitioning option.

Memoptimized Rowstore Fast Ingest in Oracle Database 19c

In Data, Database tuning, Databases, DBA, Oracle database, Oracle internals on November 25, 2019 at 15:53

“A filmmaker should never assume he’s superior to his subject. I often find that even the simplest topic remains an enigma. The best film portraits not only evoke that enigma but ingest it in a process that renders what’s invisible visible.” – Damian Pettigrew, a documentary SME

The same statement above holds for database SMEs using the new fast ingest process. Because their data might be at one point neither visible nor invisible. From DBA point of view I mean…

Memoptimized Rowstore fast ingest is a new feature of Oracle 19c and is all about high performance data streaming. Here is an example from Oracle which is ultra fast, arguably 25 million inserts per second on two socket server:

Memoptimized Rowstore was introduced in Oracle 18c and here are few good blog posts on the subject:

Oracle Database 18c: Introduction to Memoptimized RowStore (Memory of a Free Festival) by Richard Foote
MemOptimized RowStore in Oracle Database 18c by Philipp Salvisberg

The fast ingest functionality of Memoptimized Rowstore enables fast data inserts into an Oracle Database from applications that ingest small but high volume transactions with almost no transactional overhead. The inserted data is temporarily buffered in the large pool before writing it to disk in bulk in a deferred, asynchronous manner.

Memoptimized Rowstore provides the following two functionalities:

Fast ingest:
– Fast ingest optimizes the processing of high-frequency, single-row data inserts into a database
– Fast ingest uses the large pool for buffering the inserts before writing them to disk, so as to improve data insert performance

Fast lookup:
– Fast lookup enables fast retrieval of data from for high-frequency queries
– Fast lookup uses a separate memory area in the SGA called the memoptimize pool for buffering the data queried from tables
– For using fast lookup, you must allocate appropriate memory size to the memoptimize pool using MEMOPTIMIZE_POOL_SIZE

The memoptimize pool contains two parts: (1) the memoptimize buffer area which occupies 75% of the memoptimize pool and (2) a hash index that is a non-persistent segment data structure having several units containing a number of hash buckets. The hash index occupies 25% of the memoptimize pool.

Memoptimize buffer area

There are two steps for using fast ingest for inserting data:

1. Enable a table for fast ingest: specify the MEMOPTIMIZE FOR WRITE clause in the CREATE TABLE or ALTER TABLE statement: Example on how to enable a table for Fast Ingest

2. Specify a hint for using fast ingest for an insert operation by specifying the MEMOPTIMIZE_WRITE hint in the INSERT statement: Example on specifying a hint for using Fast Ingest for data inserts

Note that a table can be configured for using both fast ingest and fast lookup but these 4 points should be always considered:

– Because fast ingest buffers data in the large pool, there is a possibility of data loss in the event of a system failure. To avoid data loss, a client must keep a local copy of the data after performing inserts, so that it can replay the inserts in the event of a system failure before the data is written to disk. A client can use the DBMS_MEMOPTIMIZE package subprograms to track the durability of the inserts. After inserts are written to disk, a client can destroy its local copy of the inserted data.

Use the GET_APPLY_HWM_SEQID function to find the low high-water mark of sequence numbers of the records that have been successfully written to disk across all the sessions:

select DBMS_MEMOPTIMIZE.GET_APPLY_HWM_SEQID from dual;

Equivalently, use the GET_WRITE_HWM_SEQID function to return the high-water mark sequence number of the record that has been written to the large pool for the current session.

select DBMS_MEMOPTIMIZE.GET_WRITE_HWM_SEQID from dual;

– Queries do not read data from the large pool, hence data inserted using fast ingest cannot be queried until it is written to disk.

If you need to flush all the fast ingest data from the large pool to disk for the current session, just run:

exec DBMS_MEMOPTIMIZE.WRITE_END;

– Parent-child transactions must be synchronized to avoid errors. For example, foreign key inserts and updates of rows inserted into the large pool can return errors, if the parent data is not yet written to disk.

– Index operations are supported by fast ingest similar to the regular inserts. However, for fast ingest, database performs index operations while writing data to disk, and not while writing data into the large pool.

You can disable a table for fast ingest by specifying the NO MEMOPTIMIZE FOR WRITE clause in the ALTER TABLE statement. You can view the fast ingest data in the large pool using the V$MEMOPTIMIZE_WRITE_AREA view. You can also view and control the fast ingest data in the large pool using the subprograms of the packages DBMS_MEMOPTIMIZE and DBMS_MEMOPTIMIZE_ADMIN.

The DBMS_MEMOPTIMIZE_ADMIN package has only one procedure WRITES_FLUSH which does not accept any input or output parameters.

exec DBMS_MEMOPTIMIZE.WRITES_FLUSH;

Here is one more example from Oracle:

As you see, you can lose data using this new technique. Use it with caution unless you can afford that or re-insert the data.

If you need to remove or populate fast lookup data for a certain table from the memoptimize pool, use DBMS_MEMOPTIMIZE.DROP_OBJECT and DBMS_MEMOPTIMIZE.POPULATE respectively.

Check MemOptimized RowStore in Oracle Database 19c by Philipp Salvisberg for a good performance comparison between PL/SQL and Java Thin.

The Power of Autonomous Database Security

In Autonomous, Cloud, Data, DBA, Security and auditing on September 9, 2019 at 13:24

“The most secure database is the one having no users connected.” You can quote me on that, I have learned it the hard way – from experience.

Database security in short means protecting the data. Markus Strauss’ article on Traditional Database Security reveals exactly the same.

Let us look into 3 recent examples of where data was not very well protected (to put it mildly), covered by CNN and CNBC:

An entire nation just got hacked: records of more than 5 million Bulgarians got stolen by hackers from the country’s tax revenue office.

A hacker gained access to 100 million Capital One credit card applications and accounts: in one of the biggest data breaches ever, a hacker gained access to more than 100 million Capital One customers’ accounts and credit card applications earlier this year.

Marriott says its Starwood database was hacked for approximately 500 million guests: “The Marriott just revealing a massive data breach involving guest reservation database at its Starwood brand. Marriott says it was unauthorized access since 2014. This includes up to 500 million guests… For approximately 327 million of these guests, the information that was exposed includes a combination of name, mailing address, phone number, email address, you ready for this – passport number… Starwood preferred guest account information, date of birth, gender, arrival and departure information… including reservation dates and communication preference. As you know, when you go to a hotel, especially internationally, they take your passport. Often times, they take a copy of your passport.”

So, granted traditional database security does not protect data well, how about looking into something new, innovative and at the same time something which has been developed and improved for more than 40 years? The Oracle Autonomous Database might be the answer (arguably is the answer). Tom Haunert’s interview with Vipin Samar (SVP of Database Security) gives an excellent overview of what Autonomous Database Security is all about.

Here is a list of 10 security benefits of Oracle’s Autonomous Database (benefits over any other database for all it matters):

1. There is no DBA access, no root access, no Operating System access… Still you can create users, roles, etc. just as before. But certain the commands are blacklisted.
2. There are no customer-managed keys: Oracle manages the keys.
3. Oracle automatically applies all security updates/patches to ensure data is not vulnerable to known attack vectors.
4. All data is encrypted using transparent data encryption.
5. Still database security features such as Virtual Private Database and Data Redaction are available.
6. Network connections from clients to the Autonomous Database are also encrypted using the client credentials wallet.
7. Data is encrypted everywhere: SQL*Net traffic, data in tablespaces and data in backups.
8. It is now possible to specify an access control list that blocks all IP addresses that are not in the list from accessing the database.
9. Oracle has been engaging with external assessment entities and independent auditors to meet a broad set of international and industry-specific compliance standards for service deployments in Oracle Cloud such as ISO 27001, SOC1, SOC2, PCI DSS, HIPAA/HITECH, and FedRAMP.
10. All operations are being audited.

The first one above is rather controversial point of debate among the DBA community. In order to ensure the security and the performance of the Autonomous Database, some SQL commands are restricted: ADMINISTER KEY MANAGEMENT, ALTER PROFILE, ALTER TABLESPACE, CREATE DATABASE LINK, CREATE PROFILE, CREATE TABLESPACE, DROP TABLESPACE. For DB links, you should use DBMS_CLOUD_ADMIN.CREATE_DATABASE_LINK to create database links in ADB.

Several DBA statements are restricted: ALTER PLUGGABLE DATABASE, ALTER DATABASE, ALTER SYSTEM, ALTER SESSION, ALTER USER, ALTER TABLE, CREATE TABLE and CREATE USER. To ensure the security and the performance of Autonomous Database, some Oracle XML DB features are also restricted. Same holds for Oracle Text, Oracle Spatial and Graph and APEX.

Oracle ADB is a database which is both Autonomous and Secure and as Mike Faden says: from self-securing database cloud services to the new cloud perimeter, Oracle technology protects your most valuable investment—your data.

And here are the 4 Areas of Self-Securing of Autonomous Database:

– Self-securing starts with the security of the Oracle Cloud infrastructure and database service. Security patches are automatically applied every quarter or as needed, narrowing the window of vulnerability. Patching includes the full stack: firmware, operating system [OS], clusterware, and database. There are no steps required from the customer side.
– Oracle encrypt customer data everywhere: in motion, at rest, and in backups. The encryption keys are managed automatically, without requiring any customer intervention. And encryption cannot be turned off.
– Administrator activity on Oracle Autonomous Data Warehouse Cloud is logged centrally and monitored for any abnormal activities. Oracle have enabled database auditing using predefined policies so that customers can view logs for any abnormal access: UNIFIED_AUDIT_TRAIL
– Built upon Oracle Database Vault, unique to Oracle Cloud, operations personnel have privilege to do all administrative tasks without any ability to ever see any customer data.

And finally something about the passwords in the Oracle Autonomous Database. They still have to be carefully chosen. Because as people say, “passwords are like underwear: make them personal, make them exotic, and change them on a regular basis.”

Artificial stupidity as a DBA limitation of artificial intelligence

In Data, Database tuning, Databases, DBA on December 6, 2017 at 07:47

“Artificial intelligence is no match for natural stupidity” ― Albert Einstein

What about introducing Artificial Intelligence into the database to an extent it tunes itself into all possible dimensions?

You have probably either seen the question above or have already asked yourself if that was at all possible. On Ask Tom, John from Guildford wrote the following:

As for Artificial Intelligence, well Artificial Stupidity is more likely to be true. Humanity is not privy to the algorithm for intelligence. Anyone who’s had the pleasure of dealing with machine generated code knows that software is no more capable of writing a cohesive system than it is of becoming self-aware.

Provided you’re not trying to be a cheap alternative to an automaton you just need to think. That one function alone differentiates us from computers, so do more of it. The most sublime software on the planet has an IQ of zero, so outdoing it shouldn’t be all that hard.

Stephen Hawking thinks computers may surpass human intelligence and take over the world. Fear artificial stupidity, not artificial intelligence!

Einstein is credited with saying (but it was probably Alexandre Dumas or Elbert Hubbard who deserve the recognition): “The difference between genius and stupidity is that genius has its limits.”

Explore artificial stupidity (AS) and/or read Charles Wheelan’s book Naked Statistics to understand this kind of AI danger. By the way, according to Woody Allen, 94.5% of all statistics are made up!

So what are the limitations of AI? Jay Liebowitz argues that “if intelligence and stupidity naturally exist, and if AI is said to exist, then is there something that might be called “artificial stupidity?” According to him three of these limitations are:

  • Ability to possess and use common sense
  • Development of deep reasoning systems
  • Ability to easily acquire and update knowledge
  • But does artificial intelligence use a database in order to be an artificial intelligence? Few very interesting answers to that question are give by Douglas Green, Jordan Miller and Ramon Morales, here is a summary:

    Although AI could be built without a database, it would probably be more powerful if a database were added. AI and databases are currently not very well integrated. The database is just a standard tool that the AI uses. However, as AI becomes more advanced, it may become more a part of the database itself.

    I don’t believe you can have an effective Artificial Intelligence without a database or memory structure of some kind.

    While it is theoretically possible to have an artificial intelligence without using a database, it makes things a LOT easier if you can store what the AI knows somewhere convenient.

    As Demystifying Artificial Intelligence explains, AI hass been embedded into some of the most fundamental aspects of data management, making those critical data-driven processes more celeritous and manageable.

    Amazon Mechanical Turk is worth looking into and Oracle are also ready for business with AI.

    Matt Johnson, a Navy pilot turned AI researcher, at a conference this simmer by saying that one of the places we are not making a lot of advances is in that teaming, in that interaction (of humans and AI) – Artificial Stupidity: When Artificial Intelligence + Human = Disaster

    Bottom line: if AI uses a database, then the intelligent database should be at least autonomous and have most tasks automated but not relying on artificial stupidity as a DBA limitation of artificial intelligence. Whatever it means… I do not want to curb your enthusiasm but we need to first fill in the skills gap: we need data engineers who understand databases and data warehouses, infrastructure and tools that span data cleaning, ingestion, security, predictions. And in this aspect Cloud is critical and a big differentiator.

    P.S. Is Artificial Intelligence Progress a Bubble? was published 4 days after this blog post.

    Blockchain for DBAs

    In Data, Databases, DBA on October 30, 2017 at 09:25

    “Instead of putting the taxi driver out of a job, blockhchain puts Uber out of a job and lets the taxi driver work with the customer directly.” – Vitalik Buterin

    A blockchain database consists of two kinds of records: transactions and blocks. Blocks contain the lists of the transactions that are hashed and encoded into a hash (Merkle) tree. The linked blocks form a chain as every block holds the hash pointer to the previous block.

    The blockchain can be stored in a flat file or in a database. For example, the Bitcoin core client stores the blockchain metadata using LevelDB (based on Google’s Bigtable database system).

    The diagram above can be used to create the schema in PostgreSQL. “As far as what DBMS you should put it in”, says Ali Razeghi, “that’s up to your use case. If you want to analyze the transactions/wallet IDs to see some patterns or do BI work I would recommend a relational DB. If you want to setup a live ingest with multiple cryptocoins I would recommend something that doesn’t need the transaction log so a MongoDB solution would be good.”

    If you want to setup a MySQL database: here are 8 easy steps.

    But what is the structure of the block, what does it look like?

    The block has 4 fields:

    1. Block Size: The size of the block in bytes
    2. Block Header: Six fields in the block header
    3. Transaction Counter: How many transactions follow
    4. Transactions: The transactions recorded in this block

    The block header has 6 fields:

    1. Version: A version number to track software/protocol upgrades
    2. Previous Block Hash: A reference to the hash of the previous (parent) block in the chain
    3. Merkle Root: A hash of the root of the merkle tree of this block’s transactions
    4. Timestamp: The approximate creation time of this block (seconds from Unix Epoch)
    5. Difficulty Target: The proof-of-work algorithm difficulty target for this block
    6. Nonce: A counter used for the proof-of-work algorithm

    More details, like for example details on block header hash and block height, can be found here.

    But how about blockchain vs. relational database: Which is right for your application? As you can see, because the term “blockchain” is not clearly defined, you could argue that almost any IT project could be described as using a blockchain.

    It is worth reading Guy Harrison’s article Sealing MongoDB documents on the blockchain. Here is a nice quote: “As a database administrator in the early 1990s, I remember the shock I felt when I realized that the contents of the database files were plain text; I’d just assumed they were encrypted and could only be modified by the database engine acting on behalf of a validated user.”

    The Blockchain technology is a very special kind of a distributed database. Sebastien Meunier’s post concludes that ironically, there is no consensus on the definition of what blockchain technology is.

    I particularly, like his last question: Is a private blockchain without token really more efficient than a centralized system? And I would add: private blockchain, really?

    But once more, what is blockchain? Rockford Lhotka gives a very good DBA-friendly definition/characteristics of blockchain:

    1. A linked list where each node contains data
    2. Immutable:
    – Each new node is cryptographically linked to the previous node
    – The list and the data in each node is therefore immutable, tampering breaks the cryptography
    3. Append-only
    – New nodes can be added to the list, though existing nodes can’t be altered
    4. Persistent
    – Hence it is a data store – the list and nodes of data are persisted
    5. Distributed
    – Copies of the list exist on many physical devices/servers
    – Failure of 1+ physical devices has no impact on the integrity of the data
    – The physical devices form a type of networked cluster and work together
    – New nodes are only appended to the list if some quorum of physical devices agree with the cryptography and validity of the node via consistent algorithms running on all devices.

    Kevin Ford’s reply is a good one to conclude with: “Based on this description (above) it really sounds like your (Rockford Lhotka’s) earlier comparison to the hype around XML is spot on. It sounds like in and of itself it isn’t particularly anything except a low level technology until you structure it to meet a particular problem.”

    The nature of blockchain technology makes it difficult to work with high transnational volumes.

    But DBAs can have a look at (1) BigchainDB, a database with several blockchain characteristics added: high-transaction, decentralized database, immutability & native support for assets and (2) at Chainfrog if interested in connecting legacy databases together. As far as I know, they support as of now at least MySQL and SQL Server.

    Twelve new features for Cyber Security DBAs

    In Cloud, Data, DBA, Security and auditing on June 2, 2017 at 08:32

    In the early years of Oracle, Larry Ellison was asked if clients ever ask for their money back. “Nobody’s asked for their money back yet – he replied – a few have asked for their data back though!

    A relatively new Wells Fargo Insurance Cyber Security study shows that companies are more concerned with private data loss than with hackers:

    Thus, one of the main roles of the cyber security DBA is to protect and secure the data.

    Here is what the latest Oracle release 12cR2 is offering us:

    1. A Fully Encrypted Database

    To encrypt an entire database, you must encrypt all the tablespaces within this database, including the Oracle-supplied SYSTEM, SYSAUX, UNDO, and TEMP tablespaces (which is now possible in 12.2). For a temporary tablespace, drop it and then recreate it as encrypted – do not specify an algorithm. Oracle recommends that you encrypt the Oracle-supplied tablespaces by using the default tablespace encryption algorithm, AES128. Here is how you do it:

    ALTER TABLESPACE system ENCRYPTION ONLINE ENCRYPT 
    FILE_NAME_CONVERT=('system01.dbf','system01_enc.dbf'); 
    

    2. TDE Tablespace Live Conversion

    You can now encrypt, decrypt, and rekey existing tablespaces with Transparent Data Encryption (TDE) tablespace live conversion. The feature performs initial cryptographic migration for TDE tablespace encryption on the tablespace data in the background so that the tablespace can continue servicing SQL and DML statements like insert, delete, select, merge, and so on. Ensure that you have enough auxiliary space to complete the encryption and run (for example):

    ALTER TABLESPACE users ENCRYPTION ONLINE USING 'AES192' ENCRYPT 
    FILE_NAME_CONVERT = ('users.dbf', 'users_enc.dbf'); 
    

    3. Support for ARIA, SEED, and GOST algorithms

    By default, Transparent Data Encryption (TDE) Column encryption uses the Advanced Encryption Standard with a 192-bit length cipher key (AES192), and tablespace and database encryption use the 128–bit length cipher key (AES128). 12.2 provides advanced security Transparent Data Encryption (TDE) support for these encryption algorithms:

    – SEED (Korea Information Security Agency (KISA) for South Korea
    – ARIA (Academia, Research Institute, and Agency) for South Korea
    – GOST (GOsudarstvennyy STandart) for Russia

    ALTER TABLE clients REKEY USING 'GOST256'; 
    

    4. TDE Tablespace Offline Conversion

    12.2 introduces new SQL commands to encrypt tablespace files in place with no storage overhead. You can do this on multiple instances across multiple cores. Using this feature requires downtime, because you must take the tablespace temporarily offline. With Data Guard configurations, you can either encrypt the physical standby first and switchover, or encrypt the primary database, one tablespace at a time. This feature provides fast offline conversion of existing clear data to TDE encrypted tablespaces. Use the following syntax:

    ALTER TABLESPACE users ENCRYPTION OFFLINE ENCRYPT; 
    

    5. Setting Future Tablespaces to be Encrypted

    ALTER SYSTEM SET ENCRYPT_NEW_TABLESPACES = CLOUD_ONLY; 
    

    CLOUD_ONLY transparently encrypts the tablespace in the Cloud using the AES128 algorithm if you do not specify the ENCRYPTION clause of the CREATE TABLESPACE SQL statement: it applies only to an Oracle Cloud environment. ALWAYS automatically encrypts the tablespace using the AES128 algorithm if you omit the ENCRYPTION clause of CREATE TABLESPACE, for both the Cloud and premises scenarios.

    6. Role-Based Conditional Auditing

    Role-based conditional auditing provides the ability to define unified audit policies that conditionally audit users based on a role in addition to the current capability to audit by users. This feature enables more powerful policy-based conditional auditing by using database roles as the condition for auditing. For example, auditing for new users with the DBA role would begin automatically when they are granted the role:

    CREATE AUDIT POLICY role_dba_audit_pol ROLES DBA CONTAINER = ALL; 
    AUDIT POLICY role_dba_audit_pol;
    

    7. Strong Password Verifiers by Default and Minimum Authentication Protocols

    The newer verifiers use salted hashes, modern SHA-1 and SHA-2 hashing algorithms, and mixed-case passwords.

    The allowed_logon_version_server in the sqlnet.ora file is used to specify the minimum authentication protocol allowed when connecting to Oracle Database instances. 
    Oracle notes that the term “version” in the allowed_logon_version_server parameter name refers to the version of the authentication protocol.  It does NOT refer to the Oracle release version.

    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=8 generates all three password versions 10g, 11g, and 12c
    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=12 generates both 11g and 12c password versions, and removes the 10g password version
    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=12a generates only the 12c password version

    8. New init.ora parametercalled OUTBOUND_DBLINK_PROTOCOLS

    Due to direct SQL*Net Access Over Oracle Cloud, existing applications can now use Oracle Cloud without any code changes. We can easily control the outbound database link options:

    – OUTBOUND_DBLINK_PROTOCOLS specifies the allowed network protocols for outbound database link connections: this can be used to restrict database links to use secure protocols
    – ALL_GLOBAL_DBLINKS allows or disallow global database links, which look up LDAP by default

    9. SYSRAC – Separation of Duty in a RAC

    SYSRAC is a new role for Oracle Real Application Clusters (Oracle RAC) management. This administrative privilege is the default mode for connecting to the database by the clusterware agent on behalf of the Oracle RAC utilities such as srvctl. For example, we can now create a named administrative account and grant only the administrative privileges needed such as SYSRAC and SYSDG to manage both Oracle RAC and Oracle Data Guard configurations.

    10. Automatic Locking of Inactive User Accounts

    CREATE PROFILE time_limit LIMIT INACTIVE_ACCOUNT_TIME 60;
    

    Within a user profile, the INACTIVE_ACCOUNT_TIME parameter controls the maximum time that an account can remain unused. The account is automatically locked if a log in does not occur in the specified number of days. Locking inactive user accounts prevents attackers from using them to gain access to the database. The minimum setting is 15 and the maximum is 24855. The default for INACTIVE_ACCOUNT_TIME is UNLIMITED.

    11. Kerberos-Based Authentication for Direct NFS

    Oracle Database now supports Kerberos implementation with Direct NFS communication. This feature solves the problem of authentication, message integrity, and optional encryption over unsecured networks for data exchange between Oracle Database and NFS servers using Direct NFS protocols.

    12. Lockdown Profiles

    Lockdown profile is a mechanism used to restrict operations that can be performed by connections to a given PDB for both cloud and non-cloud.

    There are three functionalities that you can disable:

    Feature: it lets us enable or disable database features for say junior DBAs (or cowboy DBAs)
    Option: for now, the two options we can enable/disable are “DATABASE QUEUING” and “PARTITIONING”
    Statement: we can either enable or disable the statements “ALTER DATABASE”, “ALTER PLUGGABLE DATABASE”, “ALTER SESSION”, and “ALTER SYSTEM”. In addition, we can specify granular options along with these statements. Example:

    ALTER LOCKDOWN PROFILE junior_dba_prof STATEMENT = ('ALTER SYSTEM') 
    CLAUSE = ('SET')  OPTION= ('OPTIMIZER_INDEX_COST_ADJ');
    

    But .. the most secure database is the database with no users connected to it.

    The James Bond of Database Administration

    In Data, DBA, Golden Gate, Oracle database, Oracle Engineered Systems on October 27, 2015 at 07:23

    “Defending our systems needs to be as sexy as attacking others. There’s really only one solution: Bond.”

    That is what ‘The Guardian’ wrote recently in an article entitled “The Man with the Golden Mouse: why data security needs a James Bond“.

    Attending the annual Oracle ACE Director Briefing at Oracle HQ awoke up an interesting debate on the following question: What will happen in the near future with the DBA profession? Who is now the James Bond of Database Administration?

    JB007

    According to TechTarget, big data tools are changing data architectures in many companies. The effect on the skill sets required by database administrators may be moderate, but some new IT tricks are likely to be needed. GoldenGate is the new Streams, Exadata is the new RAC, Sharding the new Partitioning, Big Data is the new data (Texas is an exception), you name it…

    Having the privilege to work throughout the years with some of the best database experts in the world has, for all it matters, proved to me that Double-O-Sevens are in fact more like Double-O-six-hundreds. Meaning that there are 100s of DBAs that qualify with no hesitation whatsoever as the James Bonds of Database Administration. I have learned so much from my ex Nokia colleagues, from my current Enkitec and Accenture colleagues. Not to mention friends from companies like eDBA, Pythian, Miracle, etc.

    A DBA needs to have so many skills. Look for instance at Craig S. Mullins’ suggested 17 skills required of a DBA. Kyle Hunter’s article The evolution of the DBA and the Data Architect is clearly pointing to the emerging skillsets in the Data Revolution.

    In the IT business, and in database administration in particular, it is not that important how well you know the old stuff, it is more important how fast you can learn the new things. Here are some of the tools that help every modern Oracle DBA:

    Oracle Enterprise Manager 12c
    ORAchk
    Metalink/MOS
    Developer Tools
    Oracle Application Express (APEX)
    SQL Developer
    Oracle JDeveloper
    SQL Developer Data Modeler
    And last but not least SQL*Plus®

    7init

    These additional Metalink tools might be often of great help:

    Diagnostic Tools Catalog – Note ID 559339.1
    OS Watcher (Support Tool) – Note 301137.1
    LTOM (Support Tool) – Note 352363.1
    HANGFG (Support Tool) – Note 362094.1
    SQLT (Support Tool) – Note 215187.1
    PLSQL Profiler (Support Script) – Note 243755.1
    MSRDT for the Oracle Lite Repository – Note 458350.1
    Trace Analyzer TRCANLZR – Note 224270.1
    ORA-600/ORA-7445 Error Look-up Tool – Note 153788.1
    Statspack (causing more problems than help in 12c)

    The Man with the Golden Mouse is the James Bond of Database Administration. The best DBA tools are still knowledge and experience.

    GoldenMouse

    DBA skills are among the Top 10 IT skills for 2014

    In Data, DBA on February 24, 2014 at 15:27

    Recently, IT Business Knowledge and Global Knowledge published the Top 10 IT skills for 2014. In survey order, they are: Programming and Application Development, Help Desk and Technical Support, Networking, Mobile Applications and Device Management, Project Management, Database Administration, Security, Business Intelligence/Analytics, Cloud and Interpersonal.

    I_LOVE_MY_DBA

    “Though database administration missed the 2013 list, it will be quite hot in 2014. That stems from the growing interest in big data. Organizations have been gathering huge amounts of information from websites, social media, and third-party arrangements. Now they want to be able to use that data to make better decisions on products, services, and their customers.”

    Although the above quote is somehow correct, there are definitely other factors that make the DBA profession being on top of the lists of hiring and salary surveys.

    According to an estimate, 90% of the data in the world today has been created in the last two years alone. I think this is based on the estimate that every day we create 2.5 quintillion bytes of data. It is more like that (on top of all the social media and unstructured data) we have created IT systems with logging and auditing processes that generate a significant amount of that data.

    Several colleagues of mine, including me, are fascinated by the fact how much unnecessary data is being logged into corporate databases. I would say more than 80% is usually either audited/logged data or data that is not being accessed/used. However, according to Red Hat CEO Jim Whitehurst the advice is to capture everything you can. It is a very interested view on big data’s chicken-and-egg question, I do recommend it.

    Of course, we cannot capture it all. For example, there is no computer/storage in the world that can record the processes that take place in the human body within 1 second. Estimates of the human brain’s memory capacity vary wildly from 1 to 1,000 terabytes. For comparison, all the 19 million volumes in the US Library of Congress represents about 10 terabytes of data.

    BDA

    Data!

    The growing amount of data is just a consequence of the complex data and database administration that comes with new technologies and software options that rather increase the complexity of IT systems. Increasing business and legal demands require companies to implement them. Keeping unstructured data outside the database (as the easier option) also increase the complexity. It takes extra skills and persuasion by data architects and DBAs to convince IT organization to seriously take the concept of keeping unstructured data with (relational) databases. I do recommend Marcelle Kratochvil’s book Managing Multimedia and Unstructured Data in the Oracle Database.

    Only taking into use Audit Vault and Total Recall in an Oracle databases, might often cause exponential data growth. When such options are implemented in conjunction with other data and database mechanisms such as compression, partition, deduplication, redaction and encryption, one really needs a very experienced DBA (rather a team of DBAs) to support and maintain this growing complexity. Not to mention when such databases are being replicated, have standby copies, patched or part of an EXA environment. On top of that the constantly growing requirement of infinite availability makes the DBA skills very, very hot.

    Database Administrator Average Starting Salaries are growing year by year, the 2 year increase is often more than 10%:

    DBA_startig_salary

    Add 5% for IBM DB2 database skills, 9% for Oracle database skills and 10% for Microsfot SQL Server database skills.

    In addition, 2 from the Top 10 highest paying IT certifications for 2014 are database related.

    Recently Forbes published the Top Jobs for 2014. Database Administrators are among the Top 10 with:

    Total employment in 2013: 119,676 jobs
    Jobs added between 2010 and 2013: 11,241 (up 10%)
    Median hourly earnings: $37.39

    Last but not least, according to Sweet Top 10, the Database Administrators are among the top 10 highest paying jobs in 2014. They also note that the rise of big data and the collection of massive amounts of data greatly increases the demand for database administrators.

    DBA

    Big Year, Big Data, Big Databases

    In Data, DBA on December 29, 2012 at 14:47

    In birding, a Big Year is seeing or hearing as many different species of birds as possible in a calendar year. In databases, Big Data is something that cannot be defined that clearly.

    Big_Data

    You can read and try to understand the definition of Wikipedia. Up to you. What I like most is what Christo Kutrovsky says: “Big Data is actually a license issue. It is partially a license issue – Oracle Database is expensive and MySQL isn’t good at data warehouse stuff. It is partially a storage and network issue of scaling large volumes of data, locality of data is becoming more critical.”

    Gwen Shapira wrote a very interesting article entitled Oracle Database or Hadoop?. What she (generally) says is not only interesting and joy to read but most of all true: whenever I tell an experienced Oracle DBA about Hadoop and what companies are doing with it, the immediate response is “But I can do this in Oracle”.

    And she goes on: “Just because it is possible to do something, doesn’t mean you should.” Yes, and the reason I would say, is that databases are not always in good shape. And why so, one might ask. This is the reason of this blog post.

    Reason #1: Databases are very seldom reorganized. It is a very hard task to explain to non-DBAs why this is necessary. Only core/system DBAs comprehend the real benefits of that reorganization. It it an extremely important task that is so often neglected.

    Reason #2: Old data is never removed or archived. I dare say that most of the data (more than 80%) in all databases I have seen is rarely touched. Call is junk data, big data, whatever you like.

    Reason #3: Databases are not upgraded as new databases. Meaning, I seldom see that a new 11g database is created and the database is imported from the old database. Often migration scripts are being run (easy to do so and minimal downtime) and what you are left with is a fake 11g database. It almost has the Data Dictionary of 11g but modified with scripts. Such databases underperform in most cases.

    Reason #4: Patching the databases to the latest possible level. How often I am asked this question: “But Julian, how can one digit at the end matter?”. Well, it matters a lot in most cases.

    Reason #5: Automation of several critical tasks is not used at all. One very good example is AST (Automatic SQL Tuning).

    So, what is the way ahead if ignoring all of the above? You can decide after all if the following from Future DBA? 5 reasons to learn NOSQL are valid ones to learn NoSQL:

    1. Big Data and Scaling
    2. Crashing servers, not a problem.
    3. Changes aren’t as stressful
    4. Be NoSQL Pioneer
    5. Work less

    Do you believe in Tall Stories? Big Data, Big Problems, Little Talent!

    Is Big Data just a trend or is it just another way to solve the big problems we have created over time? Should we concentrate our efforts on “smart use of data” or on “use of big data”?

    “Depending on whom you ask, ‘big data’ is either:

    A. Bullshit (Brad Feld)
    B. No substitute for judgment (David Friend)
    C. The marriage of corporate data with external data (Chris Lynch)
    D. Data that’s growing faster than Moore’s law (Richard Dale).

    After this week, my answer would be E, all of the above.”–Gregory T. Huang, Xconomy

    Try the following: type in Google “Big data is” and see if you get what I did:

    Big_Data_Is

    What happened to machine learning, computational intelligence and data mining?

    It is interesting to note that the database conference with highest H-Index is the one on Very Large Databases.