Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Wednesday, March 29, 2017

OUG Ireland 2017 Presentation

Here are the slides from my presentation at OUG Ireland 2017. All about running R using SQL.

Thursday, January 26, 2017

Formatting results from ORE script in a SELECT statement

This blog post looks at how to format the output or the returned returns from an Oracle R Enterprise (ORE), user defined R function, that is run using a SELECT statement in SQL.

Sometimes this can be a bit of a challenge to work out, but it can be relatively easy once you have figured out how to do it. The following examples works through some scenarios of different results sets from a user defined R function that is stored in the Oracle Database.

To run that user defined R function using a SELECT statement I can use one of the following ORE SQL functions.

  • rqEval
  • rqTableEval
  • "rqGroupEval"
  • rqRowEval

For simplicity we will just use the first of these ORE SQL functions to illustrate the problem and how to go about solving it. The rqEval ORE SQL function is a generate purpose function to call a user defined R script stored in the database. The function does not require any input data set and but it will return some data. You could use this to generate some dummy/test data or to find some information in the database. Here is noddy example that returns my name.

BEGIN
   --sys.rqScriptDrop('GET_NAME');
   sys.rqScriptCreate('GET_NAME',
      'function() {
         res<-data.frame("Brendan")
         res
         } ');
END;

To call this user defined R function I can use the following SQL.

select *
from table(rqEval(null,
                  'select cast(''a'' as varchar2(50))  from dual',
                  'GET_NAME') );  

For text strings returned you need to cast the returned value giving a size.

If we have a numeric value being returned we can don't have to use the cast and instead use '1' as shown in the following example. This second example extends our user defined R function to return my name and a number.

BEGIN
   sys.rqScriptDrop('GET_NAME');
   sys.rqScriptCreate('GET_NAME',
      'function() {
         res<-data.frame(NAME="Brendan", YEAR=2017)
         res
         } ');
END;

To call the updated GET_NAME function we now have to process two returned columns. The first is the character string and the second is a numeric.

select *
from table(rqEval(null,
                  'select cast(''a'' as varchar2(50)) as "NAME", 1 AS YEAR  from dual',
                  'GET_NAME') );                  

These example illustrate how you can process character strings and numerics being returned by the user defined R script.

The key to setting up the format of the returned values is knowing the structure of the data frame being returned by the user defined R script. Once you know that the rest is (in theory) easy.

Monday, January 16, 2017

Explicit Semantic Analysis setup using SQL and PL/SQL

In my previous blog post I introduced the new Explicit Semantic Analysis (ESA) algorithm and gave an example of how you can build an ESA model and use it. Check out this link for that blog post.

In this blog post I will show you how you can manually create an ESA model. The reason that I'm showing you this way is that the workflow (in ODMr and it's scheduler) may not be for everyone. You may want to automate the creation or recreation of the ESA model from time to time based on certain business requirements.

In my previous blog post I showed how you can setup a training data set. This comes with ODMr 4.2 but you may need to expand this data set or to use an alternative data set that is more in keeping with your domain.

Setup the ODM Settings table

As with all ODM algorithms we need to create a settings table. This settings table allows us to store the various parameters and their values, that will be used by the algorithm.

-- Create the settings table
CREATE TABLE ESA_settings (
    setting_name VARCHAR2(30),
    setting_value VARCHAR2(30));

-- Populate the settings table
-- Specify ESA. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used. Need to turn this on.
BEGIN
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.algo_name,       
           dbms_data_mining.algo_explicit_semantic_analys);
   
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
  
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (odms_sampling,odms_sampling_disable);
  
    commit;
END; 

These are the minimum number of parameter setting needed to run the ESA algorithm. The other ESA algorithm setting include:

NewImage

Setup the Oracle Text Policy

You also need to setup an Oracle Text Policy and a lexer for the Stopwords.

DECLARE
   v_policy_name  varchar2(30);
   v_lexer_name   varchar2(3)
BEGIN
    v_policy_name  := 'ESA_TEXT_POLICY';
    v_lexer_name   := 'ESA_LEXER';
    ctx_ddl.create_preference(v_lexer_name, 'BASIC_LEXER');
    v_stoplist_name := 'CTXSYS.DEFAULT_STOPLIST';  -- default stop list
    ctx_ddl.create_policy(policy_name => v_policy_name, lexer => v_lexer_name, stoplist => v_stoplist_name);
END;

Create the ESA model

Once we have the settings table created with the parameter values set for the algorithm and the Oracle Text policy created, we can now create the model.

To ensure that the Oracle Text Policy is applied to the text we want to analyse we need to create a transformation list and add the Text Policy to it.

We can then pass the text transformation list as a parameter to the CREATE_MODEL, procedure.

DECLARE
   v_xlst              dbms_data_mining_transform.TRANSFORM_LIST;
   v_policy_name       VARCHAR2(130) := 'ESA_TEXT_POLICY';
   v_model_name        varchar2(50) := 'ESA_MODEL_DEMO_2';
BEGIN
   v_xlst := dbms_data_mining_transform.TRANSFORM_LIST();
   DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM(v_xlst, '"TEXT"', NULL, '"TEXT"', '"TEXT"', 'TEXT(POLICY_NAME:'||v_policy_name||')(MAX_FEATURES:3000)(MIN_DOCUMENTS:1)(TOKEN_TYPE:NORMAL)');

    DBMS_DATA_MINING.DROP_MODEL(v_model_name, TRUE);
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => v_model_name,
        mining_function     => DBMS_DATA_MINING.FEATURE_EXTRACTION,
        data_table_name     => 'WIKISAMPLE',
        case_id_column_name => 'TITLE',
        target_column_name  => NULL,
        settings_table_name => 'ESA_SETTINGS',
        xform_list          => v_xlst);
END;

NOTE: Yes we could have merged all of the above code into one PL/SQL block.

Use the ESA model

We can now use the FEATURE_COMPARE function to use the model we just created, just like I did in my previous blog post.
SELECT FEATURE_COMPARE(ESA_MODEL_DEMO_2
               USING 'Oracle Database is the best available for managing your data' text 
               AND USING 'The SQL language is the one language that all databases have in common' text) similarity 
FROM DUAL;

Go give the ESA algorithm a go and see where you could apply it within your applications.

Wednesday, January 4, 2017

Explicit Semantic Analysis in Oracle 12.2c Database

A new Oracle Data Mining algorithm in the Oracle 12.2c Database is called Explicit Semantic Analysis.

[The following examples are built using Oracle Data Miner 4.2 (SQL Developer 4.2) and the Oracle 12.2 Database cloud service (extreme edition) ]

The Explicit Semantic Analysis algorithm is an unsupervised algorithm used for feature extraction. ESA does not discover latent features but instead uses explicit features based on an existing knowledge base. There is no setup or install necessary to use this algorithm All you need is a licence for the Advanced Analytics Option for the database. The out from the algorithm is a distance measure that indicates how similar or dis-similar the input texts are, using the ESA model (and the training data set used). Let us look at an example. Setup training data for ESA Algorithm

Oracle Data Miner 4.2 (that comes with SQL Developer 4.2) has a data Wiki data set from 2005. This contains over 200,000 features. To locate the file go to.

...\sqldeveloper\dataminer\scripts\instWikiSampleData.sql

This file contains the DDL and the insert statements for the Wiki data set.

NewImage

After you run this script a new table called WIKISAMPLE table exists and contains records

NewImage

This gives us the base/seed data set to feed into the ESA algorithm.

Create the ESA Model using ODMr

To create the ESA model we have 2 ways of doing this. In this blog post I'll show you the easiest way by using the Oracle Data Miner (ODMr) tool. I'll have another blog post that will show you the SQL needed to create the model.

In an ODMr workflow create a new Data Source node. Then set this node to have the WIKISAMPLE table as it's data source.

Next you need to create the ESA node on the workflow. This node can be found in the Models section, of the Workflow Editor. The node is called Explicit Feature Extraction. Click on this node, in the model section, and then move your mouse to your workflow and click again. The ESA node will be created.

Join the Data Node to the ESA node by right clicking on the data node and then clicking on the ESA node.

Double click on the ESA node to edit the properties of the node and the algorithm.

NewImage

Explore the ESA Model and ESA Model Features

After the model node has finished you can now explore the results generated by the ESA model. Right click on the model node and select 'View Model'. The model properties window opens and it has 2 main tabs. The first of these is the coefficients tab. Here you can select a particular topic (click on the search icon beside the Feature ID) and select it from the list. The attributes and their coefficient values will be displayed.

NewImage

Next you can examine the second tab that is labeled as Features. In this table we can select a particular record and have a tag cloud and coefficients displayed. The tag cloud is a great way to see visually what words are important.

NewImage

How to use the ESA model to Compare new data using SQL

Now that we have the ESA model created, we can not use it model to compare other similar sets of documents.

You will need to use the FEATURE_COMPARE SQL function to evaluate the input texts, using the ESA model to compare for similarity. For example,

SELECT FEATURE_COMPARE(feat_esa_1_1
          USING 'Oracle Database is the best available for managing your data' text 
          AND USING 'The SQL language is the one language that all databases have in common' text) similarity 
FROM DUAL;
NewImage

The result we get is 0.7629.

The result generate by the query is a distance measure. The FEATURE_COMPARE function returns a comparison number in the range 0 to 1. Where 0 indicates that the text are not similar or related. If a 1 is returned then that indicated that the text are very similar or very related.

You can use this returned value to make a decision on what happens next. For example, it can be used to decide what the next step should be in your workflow and you can easily write application logic to manage this.

The examples given here are for general text. In the real world you would probably need a bigger data set. But if you were to use this approach in other domains, such as legal, banking, insurance, etc. then you would need to create a training data set based on the typical language that is used in each of those domains. This will then allow you to compare documents with each domain with greater accuracy.

[The above examples are built using Oracle Data Miner 4.2 (SQL Developer 4.2) and the Oracle 12.2 Database cloud service (extreme edition) ]

Monday, December 19, 2016

Auditing Oracle Data Mining model usage

In a previous blog post I talked about how you can rename and comment your Oracle Data Mining models. This is to allow you to easily to see and understand the intended use of the data mining model.

Another feature available to you is to audit the usage of the the data mining models. As your data mining environment grows to many 10s or more typically 100s of models, you will need to have some way of tracking their usage. This can allow you to discover what models are frequently being used and those that are not being used in-frequently. You can then use this information to investigate if there are any issues. Or in some companies I've seen an internal charging scheme in place for each time the models are used.

The following outlines the steps required to setup the auditing of your models and how to inspect the usage.

Note: You will need to the AUDIT_ADMIN role to audit the models.

First create an audit policy for the data mining model in a particular schema.

CREATE AUDIT POLICY oaa_odm_audit_usage 
ACTIONS ALL 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

This creates a policy that monitors all activity on the data mining model HIGH_VALUE_CHURN_CLAS_SVM in the DMUSER schema.

Now we need to enable the policy and allow to to tract all activity on the model.

AUDIT POLICY oaa_odm_audit_usage BY oaa_model_user;

This will track all usage of the data mining model by the schema call OAA_MODEL_USER. We can then use the following query to search for the audit records for the OAA_MODEL_USER schema.

SELECT dbusername,
       action_name, 
       systemm_privilege_used, 
       return_code,
       object_schema, 
       object_name, 
       sql_text
FROM  unified_audit_trail
WHERE object_name = 'HIGH_VALUE_CHURN_CLAS_SVM';

But there is a little problem with using what I've just shown you above. The problem is that it will track all activity on the data mining model. Perhaps this isn't what we really want. Perhaps we only want to track only certain activity of the data mining model. Instead of creating the policy using 'ACTIONS ALL', we can list out the actions or operations we want to track. For example, we want to tract when it is used in a SELECT. The following shows how you can set this up for just SELECT.

CREATE AUDIT POLICY oaa_odm_audit_select 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

AUDIT POLICY oaa_odm_audit_select BY oaa_model_user;

The list of individual audit events you can use include:

  • AUDIT
  • COMMENT
  • GRANT
  • RENAME
  • SELECT

A policy can be setup to tract one or more of these events. For example, if we wanted a policy to track SELECT and GRANT, we would have list each event separated by a comma.

CREATE AUDIT POLICY oaa_odm_audit_select_grant 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
ACTIONS GRANT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
;

AUDIT POLICY oaa_odm_audit_select_grant BY oaa_model_user;

Monday, December 12, 2016

Renaming & Commenting Oracle Data Mining Models

As your company evolves with their data mining projects, the number of models produced and in use in production will increase dramatically.

Care needs to be taken when it comes to managing these. This includes using meaningful names, adding descriptions of what the model is about or for, and being able to track their usage, etc.

I will look at tracking the usage of the models in another blog post, but the following gives examples of how to rename Oracle Data Mining models and how to add comments or descriptions to these models. This is particularly useful because our data analytics teams have a constant turn over or it has been many months since you last worked on a model and you want a quick idea of what purpose of the model was for.

If you have been using the Oracle Data Mining tool (part of SQL Developer) will will see your model being created with some sort of sequencing numbers. For example for a Support Vector Machine (SVM) model you might see it labelled for classification:

CLAS_SVM_5_22

While you are working on this project you will know and understand what it was about and why it is being used. But afterward you may forget as you will be dealing with many hundreds of models. Yes you could check your documentation for the purpose of this model but that can take some time.

What if you could run a SQL query to find out?

But first we need to rename the model.

DBMS_DATA_MINING.RENAME_MODEL('CLAS_SVM_5_22', 'HIGH_VALUE_CHURN_CLAS_SVM');

Next we will want to add a longer description of what the model is about. We can do this by adding a comment to the model.

COMMENT ON MINING MODEL high_value_churn_clas_svm IS
'Classification Model to Predict High Value Customers most likely to Churn';

We can now see these updated details when we query the Oracle Data Mining models in a user schema.

SELECT model_name, mining_function, algorithm, comments 
FROM user_mining_models;

These are two very useful commands.

Monday, December 5, 2016

Evaluating Cluster Dispersion in Oracle Data Mining

When working with the Clustering algorithms, and particularly k-Means, in the Oracle Data Miner tool there is no way of seeing how compact or dispersed the data is within a cluster.

There are a number of measures typically used in various tools and algorithms, but with Oracle Data Miner we are not presented with any of this information.

But if we flip from using the Oracle Data Miner tool to using SQL we can get to see some more details of the clusters produced by the k-Means algorithm along with some additional and useful information.

As I said there are a number of different measures used to evaluate clusters. The one that Oracle uses is called Dispersion. Now there are a few different definitions of what this could be and I haven't been able to locate what is Oracle's own definition of it in any of the documentation.

We can use the Dispersion value as a measure of how compact or how spread out the data is within a cluster. The Dispersion value is a number greater than 0. The lower the value of the more compact the cluster is i.e. the data points are close the the centroid of the cluster. The larger the value the more disperse or spread out the data points are.

The DBMS_DATA_MINING PL/SQL package comes with a function called GET_MODEL_DETAILS_KM. This function returns a record of the form DM_CLUSTERS.

(id                   NUMBER,
 cluster_id           VARCHAR2(4000),
 record_count         NUMBER,
 parent               NUMBER,
 tree_level           NUMBER,
 dispersion           NUMBER,
 split_predicate      DM_PREDICATES,
 child                DM_CHILDREN,
 centroid             DM_CENTROIDS,
 histogram            DM_HISTOGRAMS,
 rule                 DM_RULE)

We can not use the following query to get the Dispersion value for each of the clusters from an ODM cluster model.

SELECT cluster_id,
       record_count,
       parent,
       tree_level,
       dispersion
FROM  table(dbms_data_mining.get_model_details_km('CLUS_KM_3_2'));
NewImage

Monday, November 14, 2016

Using the Identity column for Oracle Data Miner

If you are a user of the Oracle Data Miner tool (the workflow data mining tool that is part of SQL Developer), then you will have noticed that for many of the algorithms you can specify a Case Id attribute along with, say, the target attribute.

NewImage

The idea is that you have one attribute that is a unique identifier for each case record. This may or may not be the case in your data model and you may have a multiple attribute primary key or case record identifier.

But what is the Case Id field used for in Oracle Data Miner?

Based on the documentation this field does not need to have a value. But it is recommended that you do identify an attribute for the Case Id, as this will allow for reproducible results. What this means is that if we run our workflow today and again in a few days time, on the exact same data, we should get the same results. So the Case Id allows this to happen. But how? Well it looks like the attribute used or specified for the Case Id is used as part of the Hashing algorithm to partition the data into a train and test data set, for classification problems.

So if you don't have a single attribute case identifier in your data set, then you need to create one. There are a few options open to you to do this.

  • Create one: write some code that will generate a unique identifier for each of your case records based on some defined rule.
  • Use a sequence: and update the records to use this sequence.
  • Use ROWID: use the unique row identifier value. You can write some code to populate this value into an attribute. Or create a view on the table containing the case records and add a new attribute that will use the ROWID. But if you move the data, then the next time you use the view then you will be getting different ROWIDs and that in turn will mean we may have different case records going into our test and training data sets. So our workflows will generate different results. Not what we want.
  • Use ROWNUM: This is kind of like using the ROWID. Again we can have a view that will select ROWNUM for each record. Again we may have the same issues but if we have our data ordered in a way that ensures we get the records returned in the same order then this approach is OK to use.
  • Use Identity Column: In Oracle 12c we have a new feature called Identify Column. This kind of acts like a sequence but we can defined an attribute in a table to be an Identity Column, and as records are inserted into the the data (in our scenario our case table) then this column will automatically generate a unique number for our data. Again if we need to repopulate the case table, you will need to drop and recreate the table to get the Identity Column to reset, otherwise the newly inserted records will start with the next number of the Identity Column

Here is an example of using the Identity Column in a case table.

CREATE TABLE case_table (
id_column	NUMBER GENERATED ALWAYS AS IDENTITY,
affinity_card 	NUMBER,
age		NUMBER,
cust_gender	VARCHAR2(5),
country_name	VARCHAR2(20)
...
);

You can now use this Identity Column as the Case Id in your Oracle Data Miner workflows.

NewImage

Friday, July 29, 2016

Checking out the Oracle Reserved Words using V$RESERVED_WORDS

When working with SQL or PL/SQL we all know there are some words we cannot use in our code or to label various parts of it. These languages have a number of reserved words that form the language.
Somethings it can be a challenge to know what is or isn't a reserved word. Yes we can check the Oracle documentation for the SQL reserved words and the PL/SQL reserved words. There are other references and list in the Oracle documentation listing the reserved and key words.
But we also have the concept of Key Words (as opposed to reserved words). In the SQL documentation these are are not listed. In the PL/SQL documentation most are listed.
What is a Key Word in Oracle ?
Oracle SQL keywords are not reserved. BUT Oracle uses them internally in specific ways. If you use these words as names for objects and object parts, then your SQL statements may be more difficult to read and may lead to unpredictable results.
But if we didn't have access to the documentation (or google) how can we find out what the key words are. You can use the data dictionary view called V$RESERVED_WORDS.
NewImage
But this view isn't available to version. So if you want to get your hands on it you will need the SYS user. Alternatively if you are a DBA you could share this with all your developers.
When we query this view we get 2,175 entries (for 12.1.0.2 Oracle Database).
NewImage

Monday, July 11, 2016

Creating ggplot2 graphics using SQL

Did you read the title of this blog post! Read it again.

Yes, Yes, I know what you are saying, "SQL cannot produce graphics or charts and particularly not ggplot2 graphics".

You are correct to a certain extent. SQL is rubbish a creating graphics (and I'm being polite).

But with Oracle R Enterprise you can now produce graphics on your data using the embedded R execution feature of Oracle R Enterprise using SQL. In this blog post I will show you how.

1. Pre-requisites

You need to have installed Oracle R Enterprise on your Oracle Database Server. Plus you need to install the ggplot2 R package.

In your R session you will need to setup a ORE connection to your Oracle schema.

2. Write and Test your R code to produce the graphic

It is always a good idea to write and test your R code before you go near using it in a user defined function.

For our (first) example we are going to create a bar chart using the ggplot2 R package. This is a basic example and the aim is to illustrate the steps you need to go through to call and produce this graphic using SQL.

The following code using the CLAIMS data set that is available with/for Oracle Advanced Analytics. The first step is to pull the data from the table in your Oracle schema to your R session. This is because ggplot2 cannot work with data referenced by an ore.frame object.

data.subset <- ore.pull(CLAIMS) 

Next we need to aggregate the data. Here we are counting the number of records for each Make of car.

aggdata2 <- aggregate(data.subset$POLICYNUMBER,
                      by = list(MAKE = data.subset$MAKE),
                      FUN = length)

Now load the ggplot2 R package and use it to build the bar chart.

ggplot(data=aggdata2, aes(x=MAKE, y=x, fill=MAKE)) + 
       geom_bar(color="black", stat="identity") +
       xlab("Make of Car") + 
       ylab("Num of Accidents") + 
       ggtitle("Accidents by Make of Car")

The following is the graphic that our call to ggplot2 produces in R.

NewImage

At this point we have written and tested our R code and know that it works.

3. Create a user defined R function and store it in the Oracle Database

Our next step in the process is to create an in-database user defined R function. This is were we store R code in our Oracle Database and make this available as an R function. To create the user defined R function we can use some PL/SQL to define it, and then take our R code (see above) and in it.

BEGIN
   -- sys.rqScriptDrop('demo_ggpplot');
   sys.rqScriptCreate('demo_ggpplot', 
      'function(dat) {
         library(ggplot2)
         
         aggdata2 <- aggregate(dat$POLICYNUMBER,
                      by = list(MAKE = dat$MAKE),
                      FUN = length)

        g <-ggplot(data=aggdata2, aes(x=MAKE, y=x, fill=MAKE)) + geom_bar(color="black", stat="identity") +
                   xlab("Make of Car") + ylab("Num of Accidents") + ggtitle("Accidents by Make of Car")

        plot(g)
   }');
END;

We have to make a small addition to our R code. We need need to include a call to the plot function so that the image can be returned as a BLOB object. If you do not do this then the SQL query in step 4 will return no rows.

4. Write the SQL to call it

To call our defined R function we will need to use one of the ORE SQL API functions. In the following example we are using the rqTableEval function. The first parameter for this function passes in the data to be processed. In our case this is the data from the CLAIMS table. The second parameter is set to null. The third parameter is set to the output format and in our case we want this to be PNG. The fourth parameter is the name of the user defined R function.

select *
from table(rqTableEval( cursor(select * from claims),
                        null,
                        'PNG',
                        'demo_ggpplot'));                        

5. How to view the results

The SQL query in Step 4 above will return one row and this row will contain a column with a BLOB data type.

NewImage

The easiest way to view the graphic that is produced is to use SQL Developer. It has an inbuilt feature that allows you to display BLOB objects. All you need to do is to double click on the BLOB cell (under the column labeled IMAGE). A window will open called 'View Value'. In this window click the 'View As Image' check box on the top right hand corner of the window. When you do the R ggplot2 graphic will be displayed.

NewImage

Yes the image is not 100% the same as the image produced in our R session. I will have another blog post that deals with this at a later date.

But, now you have written a SQL query, that calls R code to produce an R graphic (using ggplot2) of our data.

6. Now you can enhance the graphics (without changing your SQL)

What if you get bored with the bar chart and you want to change it to a different type of graphic? All you need to do is to change the relevant code in the user defined R function.

For example, if we want to change the graphic to a polar plot. The following is the PL/SQL code that re-defines the user defined R script.

BEGIN
   sys.rqScriptDrop('demo_ggpplot');
   sys.rqScriptCreate('demo_ggpplot', 
      'function(dat) {
         library(ggplot2)
         
         aggdata2 <- aggregate(dat$POLICYNUMBER,
                      by = list(MAKE = dat$MAKE),
                      FUN = length)

         n <- nrow(aggdata2)
         degrees <- 360/n

        aggdata2$MAKE_ID <- 1:nrow(aggdata2)

        g<- ggplot(data=aggdata2, aes(x=MAKE, y=x, fill=MAKE)) + geom_bar(color="black", stat="identity") +
               xlab("Make of Car") + ylab("Num of Accidents") + ggtitle("Accidents by Make of Car") + coord_polar(theta="x") 
        plot(g)
   }');
END;

We can use the exact same SQL query we defined in Step 4 above to call the next graphic.

NewImage

All done.

Now that was easy! Right?

I kind of is easy once you have been shown. There are a few challenges when working in-database user defined R functions and writing the SQL to call them. Most of the challenges are around the formatting of R code in the function and the syntax of the SQL statement to call it. With a bit of practice it does get easier.

7. Where/How can you use these graphics ?

Any application or program that can call and process a BLOB data type can display these images. For example, I've been able to include these graphics in applications developed in APEX.

Monday, May 30, 2016

PREDICTION_DETAILS function in Oracle

When building predictive models the data scientist can spend a large amount of time examining the models produced and how they work and perform on their hold out sample data sets. They do this to understand is the model gives a good general representation of the data and can identify/predict many different scenarios. When the "best" model has been selected then this is typically deployed is some sort of reporting environment, where a list is produced. This is typical deployment method but is far from being ideal. A more ideal deployment method is that the predictive models are build into the everyday applications that the company uses. For example, it is build into the call centre application, so that the staff have live and real-time feedback and predictions as they are talking to the customer.

But what kind of live and real-time feedback and predictions are possible. Again if we look at what is traditionally done in these applications they will get a predicted outcome (will they be a good customer or a bad customer) or some indication of their value (maybe lifetime value, possible claim payout value) etc.

But can we get anymore information? Information like what was reason for the prediction. This is sometimes called prediction insight. Can we get some details of what the prediction model used to decide on the predicted value. In more predictive analytics products this is not possible, as all you are told is the final out come.

What would be useful is to know some of the thinking that the predictive model used to make its thinking. The reasons when one customer may be a "bad customer" might be different to that of another customer. Knowing this kind of information can be very useful to the staff who are dealing with the customers. For those who design the workflows etc can then build more advanced workflows to support the staff when dealing with the customers.

Oracle as a unique feature that allows us to see some of the details that the prediction model used to make the prediction. This functions (based on using the Oracle Advanced Analytics option and Oracle Data Mining to build your predictive model) is called PREDICTION_DETAILS.

When you go to use PREDICTION_DETAILS you need to be careful as it will work differently in the 11.2g and 12c versions of the Oracle Database (Enterprise Editions). In Oracle Database 11.2g the PREDICTION_DETAILS function would only work for Decision Tree models. But in 12c (and above) it has been opened to include details for models created using all the classification algorithms, all the regression algorithms and also for anomaly detection.

The following gives an example of using the PREDICTION_DETAILS function.

select cust_id, 
       prediction(clas_svm_1_27 using *) pred_value,
       prediction_probability(clas_svm_1_27 using *) pred_prob,
       prediction_details(clas_svm_1_27 using *) pred_details
from mining_data_apply_v;

The PREDICTION_DETAILS function produces its output in XML, and this consists of the attributes used and their values that determined why a record had the predicted value. The following gives some examples of the XML produced for some of the records.

NewImage

I've used this particular function in lots of my projects and particularly when building the applications for a particular business unit. Oracle too has build this functionality into many of their applications. The images below are from the HCM application where you can examine the details why an employee may or may not leave/churn. You can when perform real-time what-if analysis by changing some of attribute values to see if the predicted out come changes.

NewImage

Saturday, December 12, 2015

KScope 2016 Acceptances

I've never been to KScope. Yes never.

I've always wanted to. Each year you hear of all of these stories about how much people really enjoy KScope and how much they learn.

So back in October I decided to submit 5 presentations to KScope. 4 of these presentations are solo presentations and 1 joint presentation.

This week I have received the happy news that 2 of my solo presentations have been accepted, plus my joint presentation with Kim Berg Hansen.

So at the end of June 2016 I will be making my way to Chicago for a week of Oracle geekie fun at KScope.

My presentations will be:

  • Is Oracle SQL the best language for Statistic?
  • Running R in your Oracle Database using Oracle R Enterprise

and my join presentations is called

Forecasting in Oracle using the Power of SQL (this will talk about ROracle, Forecasting in R, Using Oracle R Enterprise and SQL)

I was really hoping that one of my rejected presentations would have been accepted. I really enjoy this presentation and I get to share stories about some of my predictive analytics projects. Ah well, maybe in 2017.

The last time I was in Chicago was over 15 years ago when I sent 5 days in Cellular One (The brand was sold to Trilogy Partners by AT&T in 2008 shortly after AT&T had completed its acquisition of Dobson Communications). I was there to kick off a project to build them a data warehouse and to build their first customer churn predictive model. I stayed in a hotel across the road from their office which was famous because a certain person had stayed in it why one the run. Unfortunately I didn't get time to visit downtown Chicago.

Monday, October 12, 2015

SQL and PL/SQL icons and stickers

Over the past couple of weeks I've been preparing my slides and presentations for Oracle Open World (2015).

One thing that occurred to me was that there was no icon or image to represent Oracle SQL and PL/SQL. I needed something that I could include in my presentations to represent these.

After a bit of Tweeting it turns out that there is no (official) icons or images for Oracle SQL and Oracle PL/SQL.

So I created some and here they are.

SQL icon sm PLSQL icon sm

and there are these

SQL 2 sm PLSQL 2 sm

Feel free to use these in your presentations and share around. All I ask is that you give me an odd acknowledgement from time to time.

Stickers

If you would like to get these as stickers and put them on your laptop, notebooks, or anywhere really, you can order them on Stickermule.

NewImage

NB: It is important to note that these are in no way approved or acknowledged or endorsed or anything else by Oracle.

Thursday, July 30, 2015

Check out What Sauron is saying about Oracle

Over past year we have been (hopefully) hearing about Oracle Big Data SQL.

This is a new(-ish) option from Oracle that allows us to run our SQL queries not just on the data in our Oracle Database but also against NoSQL databases and Hadoop. No extra coding is needed, no extra formatting is needed, etc.

All the hard work in connecting to the data in this systems, translating it into executable code on these systems, executing it, capturing the results and presenting the results back to us sitting in our schema in our Oracle Database.

How cool is that.

To learn more about Oracle Big Data SQL check out their webpage.

But what let us get back to the title of this blog post, 'What Sauron is saying about Oracle'. I used these back at one of my presentation at BIWA Summit in January 2015 and I've been meaning to post these since.

If you have read books or watch the movies you will remember the phrase.

NewImage

We can apply this phrase to Oracle SQL now.

NewImage

or maybe my alternative version might be better.

NewImage

Thursday, April 30, 2015

Viewing Models Details for Decision Trees using SQL

When you are working with and developing Decision Trees by far the easiest way to visualise these is by using the Oracle Data Miner (ODMr) tool that is part of SQL Developer.
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don't have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
Blog dt 1
What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I've covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn't seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
SELECT dbms_data_mining.get_model_details_xml('CLAS_DT_1_59')
FROM dual;

If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Blog dt 2
Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.

Friday, March 6, 2015

RIP SQL*Plus & hello SQL Command Line

Over the past couple of months Oracle has been releasing some EA (Early Adopter) versions of a new tool that is currently called SQL Command Line.

The team behind this new tool is the SQL Developer development team and they have been working on creating a new command line SQL tool that is based on some of the technology that is included in SQL Developer.

SQL Command Line in an stand alone tool and all you need to do is to download and un-zip the tile.

What I want to show in this blog post is some of new features that are available and that I have found particularly useful. But before we get onto those commands let us first have a look at how you can get setup and running with SQL Command Line.

Download & Setup

The current download of SQL Command Line can be found under the SQL Developer 4.1 EA Download page. I'm assuming when 4.1 is formally released the download for SQL Command line will be on the main SQL Developer Download web page.

SQL CL 1

After you have downloaded the file, all you need to do is to unzip the file and then copy the unzipped directory to where you want the software to be located on your client.

Now you are ready to get started with using SQL Command Line.

Connecting to your Oracle Schema

(That) Jeff Smith and Barry McGillin have a couple of good blog posts on the different connection methods and some setup or configuration you might need to consider. Check out these links for more details.

For me I did not have to do any additional setup or configuration. I was able to use the TNS Names and the EZConnect methods without any problems.

The following how to connect to my (DMUSER) schema using the EZConnect method. With this method we pass in the username, password, the host name, port number and the service name. Just like this

> sql dmuser/dmuser@localhost:1521/pdb12c

We can not have a look at the JDBC connection details.

SQL> show jdbc

-- Database Info --

Database Product Name: Oracle

Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Database Major Version: 12

Database Minor Version: 1

-- Driver Info --

Driver Name: Oracle JDBC driver

Driver Version: 12.1.0.2.0

Driver Major Version: 12

Driver Minor Version: 1

Driver URL: jdbc:oracle:thin:@localhost:1521/pdb12c

SQL>


If we have a TNSNAMES.ORA file on our computer and the directory that it is in, is on the search PATH, then we can use the service names defined in the TNSNAMES.ORA file. The following example shows you how to use this in two ways. The first shows how to enter all the details when you are starting SQL CL and the other is when SQL CL prompts you for each parameter.

> sql dmuser/dmuser@pdb12c

and when we are prompted to enter the parameters, we get the following.

> sql

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:12 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:14 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.

Username? (''?) dmuser

Password? (**********?) ******

Database? (''?) pdb12c

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL>


As you can see these work in the same way as when we use SQL*Plus.


Now that you are connected to your schema, what else can you do? The following sections are some useful commands.

Commands & Help

The following list of commands is by no means a complete list of commands available in SQL Command Line. Theoretically everything you can currently do in SQL*Plus you can also do in SQL Command Line (theoretically) But the commands I give examples of below are some of my favourites (so far).

You can get the list of commands by typing help at the SQL prompt.

SQL> help

Then to get help on a specific command you can just add the command after the help.

SQL> help cd

CD

---

Changes path to look for script at after startup.

(show SQLPATH shows the full search path currently:

- CD current directory setting set by last cd command

- baseURL (url for subscripts)

- topURL (top most url when starting script)

- Last Node opened (i.e. file in worksheet)

- Where last script started

- Last opened on sqlplus path related file chooser

- SQLPATH setting

- "." if in SQLDeveloper UI (included in SQLPATH in command line (sdsql))

).

SQL>

Some work is still needed on the help documentation and what is listed for each command, as the current version is missing some important details.

Alais

This is by far my favourite new feature. This allows us to take some of our most common SQL statements and to create a shortcut for it.

Very soon I will not be using Oracle SQL but I will be using My SQL, as I will have created my own personalised version of SQL.

To list what aliases you have defined in your schema you can type

SQL > alais

Oracle will have a few aliases already defined in SQL CL. By having a look at some of these you can see some of what you want they can do and get ideas for what you might want to do with them. To list the contents of an alias, you can use the following command.

alias list {alias name}

for example

SQL > alias list tables

This command lists the query that is used for the 'tables' alias that comes with SQL CL.

I use Oracle Data Miner a lot and when you use this tool it can create a number of tables with a variety of names in your schema. Most of these you will never need to look at. So what I do is create an alias that excludes these from the list of tables in my schema.

SQL> alias tables2=select table_name from user_tables where table_name not like 'ODMR$%' and table_name not like 'DM$%' and table_name not like 'SYS_IOT%';

So now all I need to do to list my important data only tables (and exclude all the Oracle Data Miner tables) I can run my alias 'table2'.

SQL> tables2

You will quickly build up a suite of commands using aliases.

info and >info+

info and info+ are the new commands to replace the DESC command.

The difference between info and info+ is that info+ gives you some statistical information about the table and the attributes in the table. This is illustrated in the following examples.

Example using 'info'

Sqlcl 2

Example using 'info+'

Sqlcl 3

CTAS & DDL

If you want to get the DDL script to create a copy of a table you have two options open to you. The first of these is the DDL command. This creates a DDL statement based on the meta data for the table, just like in the following

Sqlcl 4

An alternative to this is to use the CTAS command that will give a slightly different output to DDL command. With the CTAS we also get the CREATE TABLE .. AS SELECT ...

History

In SQL*Plus we had a limited scroll through our previous commands. The same kind of scrolling is available in SQL CL, but we can get to see all our previous commands using the 'history' command. The following illustrates how you can list all you previous commands, I'm sure it is limited to a certain number or will be otherwise it will become a very long, long list.

SQL> history

To find out how often each command has been run you can run

SQL> history usage

and to find out how long the query took to run the last time it was run

SQL> history time


There are lots more that I could show, but this post is way, way to long as it is. What I suggest you do is go and download SQL CL (Command Line) and start using it today.

Wednesday, November 12, 2014

Approximate Count Distinct (12.1.0.2 new feature)

With the release of the Oracle Database 12.1.0.2 there was a number of new features and options. Most of the publicity has been around the in-Memory option. But there was lots of other features for the DBA and a few for the developer.

One of the new SQL functions is the APPROX_COUNT_DISTINCT(). This function is different to the tradition count distinct, COUNT(DISTINCT expression), in that is performs an approximate count distinct. The theory is that this approximate count is a lot more efficient than performing the full count distinct.

The APPROX_COUNT_DISTINCT() function is really only suitable when you are processing very large volumes of data and when the data set contains a large number of distinct values.

The general syntax of the function is:

... APPROX_COUNT_DISTINCT(expression) ...

and returns a Number.

The function returns the approximate number of records that contain distinct value for the expression.

SELECT approx_count_distinct(cust_id)

FROM mining_data_build_v;

The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations. Just run and Explain Plan and you can see the differences.

In some of the material from Oracle the APPROX_COUNT_DISTINCT() function can be 5x to 50x++ times faster. But it depends on the number of distinct values and the complexity of the SQL query.

As the result / returned value from the function may not be 100% accurate, Oracle says that the functions has an accuracy of >97% (with 95% confidence).

The function cannot be used on the following data types: BFILE, BLOB, CLOB, LONG, LONG RAW and NCLOB

Wednesday, October 29, 2014

Something new in 12c: FETCH FIRST x ROWS

In this post I want to show some example of using a new feature in 12c for selecting the first X number of records from the results set of a query.

See the bottom of this post for the background and some of the reasons for this post.

Before we had the 12c Database if we only wanted to see a subset or the initial set of records from the results of a query we could add something like the following to our query

...

AND ROWNUM <= 5;

The could use the pseudo column ROWNUM to restrict the number of records that would be displayed. This was particularly useful when the results many 10s, 100s, or millions of records. It allowed us to quickly see a subset and to see if the results where what we expected.

In my book (Predictive Analytics Using Oracle Data Miner) I had lots of examples of using ROWNUM.

What I wasn't aware of when I was writing my book was that there was a new way of doing this in 12c. We now have something like the following:

...

FETCH FIRST x ROWS ONLY;

There is an example:

SELECT * FROM mining_data_build_v

FETCH FIRST 10 ROWS ONLY;

Fetch first 1

There are a number of different ways you can use the row limiting feature. Here is the syntax for it:

[ OFFSET offset { ROW | ROWS } ]

[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]

{ ROW | ROWS } { ONLY | WITH TIES } ]

In most cases you will probably use the number of rows. But there many be cases where you might what to use the PERCENT. In previous versions of the database you would have used SAMPLE to bring back a certain percentage of records.

select CUST_GENDER from mining_data_build_v

FETCH FIRST 2 PERCENT ROWS ONLY;

This will set the first 2 percent of the records.

You can also decide from what point in the result set you want the records to be displayed from. In the previous examples above the results displayed will befing with the first records. In the following example the results set will be processed to record 60 and then the first 5 records will be selected and displayed. This will be records 61, 62, 63, 64 and 65. So the first record processed will be the OFFSET record + 1.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 5 ROWS ONLY;

Similar to the PERCENT example above you can use the OFFSET value, for example.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 2 PERCENT ROWS ONLY;

This query will go to records 61 and return the next 2 percent of the records.


The background to this post

There are a number of reasons that I really love attending Oracle User Group conferences. One of the challenges I set myself is to go to presentations on topics that I think I know or know very well. I can list many, many reasons for this but there are 2 main points. The first is that you are getting someone elses perspective on the topic and hence you might learn something new or understand it better. The second is that you might actually learn something new, like some new command, parameter setting or something else like that.

At Oracle Open World recently I attended the EMEA 12 things about 12c set of presentations that Debra Lilly arranged during the User Group Forum on the Sunday. During these session Alex Nuijten gave an overview of some 12c new SQL features. One of these was the command FETCH FIRST x ROWS. This blog post illustrates some of the different ways of using this command.

Wednesday, July 30, 2014

Tokenizing a String : Using Regular Expressions

In my previous blog post I gave some PL/SQL that performed the tokenising of a string. Check out this blog post here.

Thanks also to the people who sent me links examples of how to tokenise a string using the MODEL clause. Yes there are lots of examples of this out there on the interest.

While performing the various searches on the internet I did come across some examples of using Regular Expressions to extract the tokens. The following example is thanks to a blog post by Tanel Poder

I've made some minor changes to it to remove any of the special characters we want to remove.

column token format a40

define separator=" "

define mystring="$My OTN LA Tour (2014?) will consist of Panama, CostRica and Mexico."

define myremove="\?|\#|\$|\.|\,|\;|\:|\&|\(|\)|\-";

 

SELECT regexp_replace(REGEXP_REPLACE(

                 REGEXP_SUBSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL )

                           , '&separator$', ''), '&myremove', '') TOKEN

FROM

    DUAL

CONNECT BY

    REGEXP_INSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL ) > 0

ORDER BY

LEVEL ASC

/

When we run this code we get the following output.

Token fun 3

So we have a number of options open to use to tokenise strings using SQL and PL/SQL, using a number of approaches including substring-ing, using pipelined functions, using the Model clause and also using Regular Expressions.