Dontcheff

Archive for the ‘Data’ Category

Artificial stupidity as a DBA limitation of artificial intelligence

In Data, Database tuning, Databases, DBA on December 6, 2017 at 07:47

“Artificial intelligence is no match for natural stupidity” ― Albert Einstein

What about introducing Artificial Intelligence into the database to an extent it tunes itself into all possible dimensions?

You have probably either seen the question above or have already asked yourself if that was at all possible. On Ask Tom, John from Guildford wrote the following:

As for Artificial Intelligence, well Artificial Stupidity is more likely to be true. Humanity is not privy to the algorithm for intelligence. Anyone who’s had the pleasure of dealing with machine generated code knows that software is no more capable of writing a cohesive system than it is of becoming self-aware.

Provided you’re not trying to be a cheap alternative to an automaton you just need to think. That one function alone differentiates us from computers, so do more of it. The most sublime software on the planet has an IQ of zero, so outdoing it shouldn’t be all that hard.

Stephen Hawking thinks computers may surpass human intelligence and take over the world. Fear artificial stupidity, not artificial intelligence!

Einstein is credited with saying (but it was probably Alexandre Dumas or Elbert Hubbard who deserve the recognition): “The difference between genius and stupidity is that genius has its limits.”

Explore artificial stupidity (AS) and/or read Charles Wheelan’s book Naked Statistics to understand this kind of AI danger. By the way, according to Woody Allen, 94.5% of all statistics are made up!

So what are the limitations of AI? Jay Liebowitz argues that “if intelligence and stupidity naturally exist, and if AI is said to exist, then is there something that might be called “artificial stupidity?” According to him three of these limitations are:

  • Ability to possess and use common sense
  • Development of deep reasoning systems
  • Ability to easily acquire and update knowledge
  • But does artificial intelligence use a database in order to be an artificial intelligence? Few very interesting answers to that question are give by Douglas Green, Jordan Miller and Ramon Morales, here is a summary:

    Although AI could be built without a database, it would probably be more powerful if a database were added. AI and databases are currently not very well integrated. The database is just a standard tool that the AI uses. However, as AI becomes more advanced, it may become more a part of the database itself.

    I don’t believe you can have an effective Artificial Intelligence without a database or memory structure of some kind.

    While it is theoretically possible to have an artificial intelligence without using a database, it makes things a LOT easier if you can store what the AI knows somewhere convenient.

    As Demystifying Artificial Intelligence explains, AI hass been embedded into some of the most fundamental aspects of data management, making those critical data-driven processes more celeritous and manageable.

    Amazon Mechanical Turk is worth looking into and Oracle are also ready for business with AI.

    Matt Johnson, a Navy pilot turned AI researcher, at a conference this simmer by saying that one of the places we are not making a lot of advances is in that teaming, in that interaction (of humans and AI) – Artificial Stupidity: When Artificial Intelligence + Human = Disaster

    Bottom line: if AI uses a database, then the intelligent database should be at least autonomous and have most tasks automated but not relying on artificial stupidity as a DBA limitation of artificial intelligence. Whatever it means… I do not want to curb your enthusiasm but we need to first fill in the skills gap: we need data engineers who understand databases and data warehouses, infrastructure and tools that span data cleaning, ingestion, security, predictions. And in this aspect Cloud is critical and a big differentiator.

    P.S. Is Artificial Intelligence Progress a Bubble? was published 4 days after this blog post.

    Advertisements

    Blockchain for DBAs

    In Data, Databases, DBA on October 30, 2017 at 09:25

    “Instead of putting the taxi driver out of a job, blockhchain puts Uber out of a job and lets the taxi driver work with the customer directly.” – Vitalik Buterin

    A blockchain database consists of two kinds of records: transactions and blocks. Blocks contain the lists of the transactions that are hashed and encoded into a hash (Merkle) tree. The linked blocks form a chain as every block holds the hash pointer to the previous block.

    The blockchain can be stored in a flat file or in a database. For example, the Bitcoin core client stores the blockchain metadata using LevelDB (based on Google’s Bigtable database system).

    The diagram above can be used to create the schema in PostgreSQL. “As far as what DBMS you should put it in”, says Ali Razeghi, “that’s up to your use case. If you want to analyze the transactions/wallet IDs to see some patterns or do BI work I would recommend a relational DB. If you want to setup a live ingest with multiple cryptocoins I would recommend something that doesn’t need the transaction log so a MongoDB solution would be good.”

    If you want to setup a MySQL database: here are 8 easy steps.

    But what is the structure of the block, what does it look like?

    The block has 4 fields:

    1. Block Size: The size of the block in bytes
    2. Block Header: Six fields in the block header
    3. Transaction Counter: How many transactions follow
    4. Transactions: The transactions recorded in this block

    The block header has 6 fields:

    1. Version: A version number to track software/protocol upgrades
    2. Previous Block Hash: A reference to the hash of the previous (parent) block in the chain
    3. Merkle Root: A hash of the root of the merkle tree of this block’s transactions
    4. Timestamp: The approximate creation time of this block (seconds from Unix Epoch)
    5. Difficulty Target: The proof-of-work algorithm difficulty target for this block
    6. Nonce: A counter used for the proof-of-work algorithm

    More details, like for example details on block header hash and block height, can be found here.

    But how about blockchain vs. relational database: Which is right for your application? As you can see, because the term “blockchain” is not clearly defined, you could argue that almost any IT project could be described as using a blockchain.

    It is worth reading Guy Harrison’s article Sealing MongoDB documents on the blockchain. Here is a nice quote: “As a database administrator in the early 1990s, I remember the shock I felt when I realized that the contents of the database files were plain text; I’d just assumed they were encrypted and could only be modified by the database engine acting on behalf of a validated user.”

    The Blockchain technology is a very special kind of a distributed database. Sebastien Meunier’s post cocludes that ironically, there is no consensus on the definition of what blockchain technology is.

    I particularly, like his last question: Is a private blockchain without token really more efficient than a centralized system? And I would add: private blockchain, really?

    But once more, what is blockchain? Rockford Lhotka gives a very good DBA-friendly definition/characteristics of blockchain:

    1. A linked list where each node contains data
    2. Immutable:
    – Each new node is cryptographically linked to the previous node
    – The list and the data in each node is therefore immutable, tampering breaks the cryptography
    3. Append-only
    – New nodes can be added to the list, though existing nodes can’t be altered
    4. Persistent
    – Hence it is a data store – the list and nodes of data are persisted
    5. Distributed
    – Copies of the list exist on many physical devices/servers
    – Failure of 1+ physical devices has no impact on the integrity of the data
    – The physical devices form a type of networked cluster and work together
    – New nodes are only appended to the list if some quorum of physical devices agree with the cryptography and validity of the node via consistent algorithms running on all devices.

    Kevin Ford’s reply is a good one to conclude with: “Based on this description (above) it really sounds like your (Rockford Lhotka’s) earlier comparison to the hype around XML is spot on. It sounds like in and of itself it isn’t particularly anything except a low level technology until you structure it to meet a particular problem.”

    The nature of blockchain technology makes it difficult to work with high transnational volumes.

    But DBAs can have a look at (1) BigchainDB, a database with several blockchain characteristics added: high-transaction, decentralized database, immutability & native support for assets and (2) at Chainfrog if interested in connecting legacy databases together. As far as I know, they support as of now at least MySQL and SQL Server.

    Twelve new features for Cyber Security DBAs

    In Cloud, Data, DBA, Security and auditing on June 2, 2017 at 08:32

    In the early years of Oracle, Larry Ellison was asked if clients ever ask for their money back. “Nobody’s asked for their money back yet – he replied – a few have asked for their data back though!

    A relatively new Wells Fargo Insurance Cyber Security study shows that companies are more concerned with private data loss than with hackers:

    Thus, one of the main roles of the cyber security DBA is to protect and secure the data.

    Here is what the latest Oracle release 12cR2 is offering us:

    1. A Fully Encrypted Database

    To encrypt an entire database, you must encrypt all the tablespaces within this database, including the Oracle-supplied SYSTEM, SYSAUX, UNDO, and TEMP tablespaces (which is now possible in 12.2). For a temporary tablespace, drop it and then recreate it as encrypted – do not specify an algorithm. Oracle recommends that you encrypt the Oracle-supplied tablespaces by using the default tablespace encryption algorithm, AES128. Here is how you do it:

    ALTER TABLESPACE system ENCRYPTION ONLINE ENCRYPT 
    FILE_NAME_CONVERT=('system01.dbf','system01_enc.dbf'); 
    

    2. TDE Tablespace Live Conversion

    You can now encrypt, decrypt, and rekey existing tablespaces with Transparent Data Encryption (TDE) tablespace live conversion. The feature performs initial cryptographic migration for TDE tablespace encryption on the tablespace data in the background so that the tablespace can continue servicing SQL and DML statements like insert, delete, select, merge, and so on. Ensure that you have enough auxiliary space to complete the encryption and run (for example):

    ALTER TABLESPACE users ENCRYPTION ONLINE USING 'AES192' ENCRYPT 
    FILE_NAME_CONVERT = ('users.dbf', 'users_enc.dbf'); 
    

    3. Support for ARIA, SEED, and GOST algorithms

    By default, Transparent Data Encryption (TDE) Column encryption uses the Advanced Encryption Standard with a 192-bit length cipher key (AES192), and tablespace and database encryption use the 128–bit length cipher key (AES128). 12.2 provides advanced security Transparent Data Encryption (TDE) support for these encryption algorithms:

    – SEED (Korea Information Security Agency (KISA) for South Korea
    – ARIA (Academia, Research Institute, and Agency) for South Korea
    – GOST (GOsudarstvennyy STandart) for Russia

    ALTER TABLE clients REKEY USING 'GOST256'; 
    

    4. TDE Tablespace Offline Conversion

    12.2 introduces new SQL commands to encrypt tablespace files in place with no storage overhead. You can do this on multiple instances across multiple cores. Using this feature requires downtime, because you must take the tablespace temporarily offline. With Data Guard configurations, you can either encrypt the physical standby first and switchover, or encrypt the primary database, one tablespace at a time. This feature provides fast offline conversion of existing clear data to TDE encrypted tablespaces. Use the following syntax:

    ALTER TABLESPACE users ENCRYPTION OFFLINE ENCRYPT; 
    

    5. Setting Future Tablespaces to be Encrypted

    ALTER SYSTEM SET ENCRYPT_NEW_TABLESPACES = CLOUD_ONLY; 
    

    CLOUD_ONLY transparently encrypts the tablespace in the Cloud using the AES128 algorithm if you do not specify the ENCRYPTION clause of the CREATE TABLESPACE SQL statement: it applies only to an Oracle Cloud environment. ALWAYS automatically encrypts the tablespace using the AES128 algorithm if you omit the ENCRYPTION clause of CREATE TABLESPACE, for both the Cloud and premises scenarios.

    6. Role-Based Conditional Auditing

    Role-based conditional auditing provides the ability to define unified audit policies that conditionally audit users based on a role in addition to the current capability to audit by users. This feature enables more powerful policy-based conditional auditing by using database roles as the condition for auditing. For example, auditing for new users with the DBA role would begin automatically when they are granted the role:

    CREATE AUDIT POLICY role_dba_audit_pol ROLES DBA CONTAINER = ALL; 
    AUDIT POLICY role_dba_audit_pol;
    

    7. Strong Password Verifiers by Default and Minimum Authentication Protocols

    The newer verifiers use salted hashes, modern SHA-1 and SHA-2 hashing algorithms, and mixed-case passwords.

    The allowed_logon_version_server in the sqlnet.ora file is used to specify the minimum authentication protocol allowed when connecting to Oracle Database instances. 
    Oracle notes that the term “version” in the allowed_logon_version_server parameter name refers to the version of the authentication protocol.  It does NOT refer to the Oracle release version.

    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=8 generates all three password versions 10g, 11g, and 12c
    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=12 generates both 11g and 12c password versions, and removes the 10g password version
    – SQLNET.ALLOWED_LOGON_VERSION_SERVER=12a generates only the 12c password version

    8. New init.ora parametercalled OUTBOUND_DBLINK_PROTOCOLS

    Due to direct SQL*Net Access Over Oracle Cloud, existing applications can now use Oracle Cloud without any code changes. We can easily control the outbound database link options:

    – OUTBOUND_DBLINK_PROTOCOLS specifies the allowed network protocols for outbound database link connections: this can be used to restrict database links to use secure protocols
    – ALL_GLOBAL_DBLINKS allows or disallow global database links, which look up LDAP by default

    9. SYSRAC – Separation of Duty in a RAC

    SYSRAC is a new role for Oracle Real Application Clusters (Oracle RAC) management. This administrative privilege is the default mode for connecting to the database by the clusterware agent on behalf of the Oracle RAC utilities such as srvctl. For example, we can now create a named administrative account and grant only the administrative privileges needed such as SYSRAC and SYSDG to manage both Oracle RAC and Oracle Data Guard configurations.

    10. Automatic Locking of Inactive User Accounts

    CREATE PROFILE time_limit LIMIT INACTIVE_ACCOUNT_TIME 60;
    

    Within a user profile, the INACTIVE_ACCOUNT_TIME parameter controls the maximum time that an account can remain unused. The account is automatically locked if a log in does not occur in the specified number of days. Locking inactive user accounts prevents attackers from using them to gain access to the database. The minimum setting is 15 and the maximum is 24855. The default for INACTIVE_ACCOUNT_TIME is UNLIMITED.

    11. Kerberos-Based Authentication for Direct NFS

    Oracle Database now supports Kerberos implementation with Direct NFS communication. This feature solves the problem of authentication, message integrity, and optional encryption over unsecured networks for data exchange between Oracle Database and NFS servers using Direct NFS protocols.

    12. Lockdown Profiles

    Lockdown profile is a mechanism used to restrict operations that can be performed by connections to a given PDB for both cloud and non-cloud.

    There are three functionalities that you can disable:

    Feature: it lets us enable or disable database features for say junior DBAs (or cowboy DBAs)
    Option: for now, the two options we can enable/disable are “DATABASE QUEUING” and “PARTITIONING”
    Statement: we can either enable or disable the statements “ALTER DATABASE”, “ALTER PLUGGABLE DATABASE”, “ALTER SESSION”, and “ALTER SYSTEM”. In addition, we can specify granular options along with these statements. Example:

    ALTER LOCKDOWN PROFILE junior_dba_prof STATEMENT = ('ALTER SYSTEM') 
    CLAUSE = ('SET')  OPTION= ('OPTIMIZER_INDEX_COST_ADJ');
    

    But .. the most secure database is the database with no users connected to it.

    The James Bond of Database Administration

    In Data, DBA, Golden Gate, Oracle database, Oracle Engineered Systems on October 27, 2015 at 07:23

    “Defending our systems needs to be as sexy as attacking others. There’s really only one solution: Bond.”

    That is what ‘The Guardian’ wrote recently in an article entitled “The Man with the Golden Mouse: why data security needs a James Bond“.

    Attending the annual Oracle ACE Director Briefing at Oracle HQ awoke up an interesting debate on the following question: What will happen in the near future with the DBA profession? Who is now the James Bond of Database Administration?

    JB007

    According to TechTarget, big data tools are changing data architectures in many companies. The effect on the skill sets required by database administrators may be moderate, but some new IT tricks are likely to be needed. GoldenGate is the new Streams, Exadata is the new RAC, Sharding the new Partitioning, Big Data is the new data (Texas is an exception), you name it…

    Having the privilege to work throughout the years with some of the best database experts in the world has, for all it matters, proved to me that Double-O-Sevens are in fact more like Double-O-six-hundreds. Meaning that there are 100s of DBAs that qualify with no hesitation whatsoever as the James Bonds of Database Administration. I have learned so much from my ex Nokia colleagues, from my current Enkitec and Accenture colleagues. Not to mention friends from companies like eDBA, Pythian, Miracle, etc.

    A DBA needs to have so many skills. Look for instance at Craig S. Mullins’ suggested 17 skills required of a DBA. Kyle Hunter’s article The evolution of the DBA and the Data Architect is clearly pointing to the emerging skillsets in the Data Revolution.

    In the IT business, and in database administration in particular, it is not that important how well you know the old stuff, it is more important how fast you can learn the new things. Here are some of the tools that help every modern Oracle DBA:

    Oracle Enterprise Manager 12c
    ORAchk
    Metalink/MOS
    Developer Tools
    Oracle Application Express (APEX)
    SQL Developer
    Oracle JDeveloper
    SQL Developer Data Modeler
    And last but not least SQL*Plus®

    7init

    These additional Metalink tools might be often of great help:

    Diagnostic Tools Catalog – Note ID 559339.1
    OS Watcher (Support Tool) – Note 301137.1
    LTOM (Support Tool) – Note 352363.1
    HANGFG (Support Tool) – Note 362094.1
    SQLT (Support Tool) – Note 215187.1
    PLSQL Profiler (Support Script) – Note 243755.1
    MSRDT for the Oracle Lite Repository – Note 458350.1
    Trace Analyzer TRCANLZR – Note 224270.1
    ORA-600/ORA-7445 Error Look-up Tool – Note 153788.1
    Statspack (causing more problems than help in 12c)

    The Man with the Golden Mouse is the James Bond of Database Administration. The best DBA tools are still knowledge and experience.

    GoldenMouse

    DBA skills are among the Top 10 IT skills for 2014

    In Data, DBA on February 24, 2014 at 15:27

    Recently, IT Business Knowledge and Global Knowledge published the Top 10 IT skills for 2014. In survey order, they are: Programming and Application Development, Help Desk and Technical Support, Networking, Mobile Applications and Device Management, Project Management, Database Administration, Security, Business Intelligence/Analytics, Cloud and Interpersonal.

    I_LOVE_MY_DBA

    “Though database administration missed the 2013 list, it will be quite hot in 2014. That stems from the growing interest in big data. Organizations have been gathering huge amounts of information from websites, social media, and third-party arrangements. Now they want to be able to use that data to make better decisions on products, services, and their customers.”

    Although the above quote is somehow correct, there are definitely other factors that make the DBA profession being on top of the lists of hiring and salary surveys.

    According to an estimate, 90% of the data in the world today has been created in the last two years alone. I think this is based on the estimate that every day we create 2.5 quintillion bytes of data. It is more like that (on top of all the social media and unstructured data) we have created IT systems with logging and auditing processes that generate a significant amount of that data.

    Several colleagues of mine, including me, are fascinated by the fact how much unnecessary data is being logged into corporate databases. I would say more than 80% is usually either audited/logged data or data that is not being accessed/used. However, according to Red Hat CEO Jim Whitehurst the advice is to capture everything you can. It is a very interested view on big data’s chicken-and-egg question, I do recommend it.

    Of course, we cannot capture it all. For example, there is no computer/storage in the world that can record the processes that take place in the human body within 1 second. Estimates of the human brain’s memory capacity vary wildly from 1 to 1,000 terabytes. For comparison, all the 19 million volumes in the US Library of Congress represents about 10 terabytes of data.

    BDA

    Data!

    The growing amount of data is just a consequence of the complex data and database administration that comes with new technologies and software options that rather increase the complexity of IT systems. Increasing business and legal demands require companies to implement them. Keeping unstructured data outside the database (as the easier option) also increase the complexity. It takes extra skills and persuasion by data architects and DBAs to convince IT organization to seriously take the concept of keeping unstructured data with (relational) databases. I do recommend Marcelle Kratochvil’s book Managing Multimedia and Unstructured Data in the Oracle Database.

    Only taking into use Audit Vault and Total Recall in an Oracle databases, might often cause exponential data growth. When such options are implemented in conjunction with other data and database mechanisms such as compression, partition, deduplication, redaction and encryption, one really needs a very experienced DBA (rather a team of DBAs) to support and maintain this growing complexity. Not to mention when such databases are being replicated, have standby copies, patched or part of an EXA environment. On top of that the constantly growing requirement of infinite availability makes the DBA skills very, very hot.

    Database Administrator Average Starting Salaries are growing year by year, the 2 year increase is often more than 10%:

    DBA_startig_salary

    Add 5% for IBM DB2 database skills, 9% for Oracle database skills and 10% for Microsfot SQL Server database skills.

    In addition, 2 from the Top 10 highest paying IT certifications for 2014 are database related.

    Recently Forbes published the Top Jobs for 2014. Database Administrators are among the Top 10 with:

    Total employment in 2013: 119,676 jobs
    Jobs added between 2010 and 2013: 11,241 (up 10%)
    Median hourly earnings: $37.39

    Last but not least, according to Sweet Top 10, the Database Administrators are among the top 10 highest paying jobs in 2014. They also note that the rise of big data and the collection of massive amounts of data greatly increases the demand for database administrators.

    DBA

    Big Year, Big Data, Big Databases

    In Data, DBA on December 29, 2012 at 14:47

    In birding, a Big Year is seeing or hearing as many different species of birds as possible in a calendar year. In databases, Big Data is something that cannot be defined that clearly.

    Big_Data

    You can read and try to understand the definition of Wikipedia. Up to you. What I like most is what Christo Kutrovsky says: “Big Data is actually a license issue. It is partially a license issue – Oracle Database is expensive and MySQL isn’t good at data warehouse stuff. It is partially a storage and network issue of scaling large volumes of data, locality of data is becoming more critical.”

    Gwen Shapira wrote a very interesting article entitled Oracle Database or Hadoop?. What she (generally) says is not only interesting and joy to read but most of all true: whenever I tell an experienced Oracle DBA about Hadoop and what companies are doing with it, the immediate response is “But I can do this in Oracle”.

    And she goes on: “Just because it is possible to do something, doesn’t mean you should.” Yes, and the reason I would say, is that databases are not always in good shape. And why so, one might ask. This is the reason of this blog post.

    Reason #1: Databases are very seldom reorganized. It is a very hard task to explain to non-DBAs why this is necessary. Only core/system DBAs comprehend the real benefits of that reorganization. It it an extremely important task that is so often neglected.

    Reason #2: Old data is never removed or archived. I dare say that most of the data (more than 80%) in all databases I have seen is rarely touched. Call is junk data, big data, whatever you like.

    Reason #3: Databases are not upgraded as new databases. Meaning, I seldom see that a new 11g database is created and the database is imported from the old database. Often migration scripts are being run (easy to do so and minimal downtime) and what you are left with is a fake 11g database. It almost has the Data Dictionary of 11g but modified with scripts. Such databases underperform in most cases.

    Reason #4: Patching the databases to the latest possible level. How often I am asked this question: “But Julian, how can one digit at the end matter?”. Well, it matters a lot in most cases.

    Reason #5: Automation of several critical tasks is not used at all. One very good example is AST (Automatic SQL Tuning).

    So, what is the way ahead if ignoring all of the above? You can decide after all if the following from Future DBA? 5 reasons to learn NOSQL are valid ones to learn NoSQL:

    1. Big Data and Scaling
    2. Crashing servers, not a problem.
    3. Changes aren’t as stressful
    4. Be NoSQL Pioneer
    5. Work less

    Do you believe in Tall Stories? Big Data, Big Problems, Little Talent!

    Is Big Data just a trend or is it just another way to solve the big problems we have created over time? Should we concentrate our efforts on “smart use of data” or on “use of big data”?

    “Depending on whom you ask, ‘big data’ is either:

    A. Bullshit (Brad Feld)
    B. No substitute for judgment (David Friend)
    C. The marriage of corporate data with external data (Chris Lynch)
    D. Data that’s growing faster than Moore’s law (Richard Dale).

    After this week, my answer would be E, all of the above.”–Gregory T. Huang, Xconomy

    Try the following: type in Google “Big data is” and see if you get what I did:

    Big_Data_Is

    What happened to machine learning, computational intelligence and data mining?

    It is interesting to note that the database conference with highest H-Index is the one on Very Large Databases.