Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Monday, March 12, 2018

Oracle Code Presentation March 2018

Last week I was presenting at Oracle Code in New York. I've presented at a few Oracle Code events over the past 12 months and it is always interesting to meet and talk with developers from around the World.

The title of my presentation this time was 'SQL: The one language to rule all your data'.

I've given this presentation a few times at different events (POUG, OOW, Oracle Code). I take the contents of this presentation for granted and that most people know these things. But the opposite is true. Well a lot of people do know these things, but a magnitude more do not seem to know.

For example, at last weeks Oracle Code event, I had about 100 people in the room. I started out by asking the attendees 'How many of you write SQL every day?'. About 90% put up their hand. Then a few minutes later after I start talking about various statistical functions in the database, I then ask them to 'Count how many statistical functions they have used?' I then asked them to raise their hands if they use over five statistical functions. About eight people put up their hands. Then I asked how many people use over ten functions. To my surprise only one (yes one) person put up their hand.

The feedback from the attendees was fantastic and they were very eager to go back to their day jobs and start implementing better SQL code and to learn more about the database. All they have to do is to send me 15% of their pay rises (a bit of a joke during the presentation. You had to be there ...)

The first half of the presentation talks about statistical, analytical and machine learning in the database.

The second half covers some (not all) of the various data types and locations of data that can be accessed from the database.

The presentation then concludes with the title of the presentation about SQL being the one language to rule all your data.

Based on last weeks experience, it looks like a lot more people need to hear it !

Hopefully I'll get the chance to share this presentation with other events and Oracle User Group conferences.

Two of the key take away messages are:

  • Google makes us stupid
  • We need to RTFM more often

Here is a link to the slides on SlideShare

And I recorded a short video about the presentation with Bob from OTN/ODC.

Monday, March 5, 2018

Python and Oracle : Fetching records and setting buffer size

If you used other languages, including Oracle PL/SQL, more than likely you will have experienced having to play buffering the number of records that are returned from a cursor. Typically this is needed when you are processing more than a few hundred records. The default buffering size is relatively small and by increasing the size of the number of records to be buffered can dramatically improve the performance of your code.

As with all things in coding and IT, the phrase "It Depends" applies here and changing the buffering size may not be what you need and my not help you to gain optimal performance for your code.

There are lots and lots of examples of how to test this in PL/SQL and other languages, but what I'm going to show you here in this blog post is to change the buffering size when using Python to process data in an Oracle Database using the Oracle Python library cx_Oracle.

Let us begin with taking the defaults and seeing what happens. In this first scenario the default buffering is used. Here we execute a query and the process the records in a FOR loop (yes these is a row-by-row, slow-by-slow approach.

import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')
print("Finished cursor at", time.ctime())

# for each row returned to the cursor, print the record
print("Starting for loop", time.ctime())
t0 = time.time()
for row in cur2:
    i = i+1
    if (i%10000) == 0:
        print(i,"records processed", time.ctime())

t1 = time.time()
print("Finished for loop at", time.ctime())
print("Number of records counted = ", i)

ttime = t1 - t0
print("in ", ttime, "seconds.")

This gives us the following output.

Starting cursor at  10:11:43
Finished cursor at  10:11:43
Starting for loop  10:11:43
10000 records processed  10:11:49
20000 records processed  10:11:54
30000 records processed  10:11:59
40000 records processed  10:12:05
50000 records processed  10:12:09
Finished for loop at  10:12:11 
Number of records counted =  55500
in  28.398550033569336 seconds.

Processing the data this way takes approx. 28 seconds and this corresponds to the buffering of approx 50-75 records at a time. This involves many, many, many round trips to the the database to retrieve this data. This default processing might be fine when our query is only retrieving a small number of records, but as our data set or results set from the query increases so does the time it takes to process the query.

But we have a simple way of reducing the time taken, as the number of records in our results set increases. We can do this by increasing the number of records that are buffered. This can be done by changing the size of the 'arrysize' for the cursor definition. This reduces the number of "roundtrips" made to the database, often reducing networks load and reducing the number of context switches on the database server.

The following gives an example of same code with one additional line.

cur2.arraysize = 500

Here is the full code example.

# Test : Change the arraysize and see what impact that has
import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
cur2.arraysize = 500
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')
print("Finished cursor at", time.ctime())

# for each row returned to the cursor, print the record
print("Starting for loop", time.ctime())
t0 = time.time()
for row in cur2:
    i = i+1
    if (i%10000) == 0:
        print(i,"records processed", time.ctime())

t1 = time.time()
print("Finished for loop at", time.ctime())
print("Number of records counted = ", i)

ttime = t1 - t0
print("in ", ttime, "seconds.")

Now the response time to process all the records is.

Starting cursor at 10:13:02 Finished cursor at 10:13:02 Starting for loop 10:13:02 10000 records processed 10:13:04 20000 records processed 10:13:06 30000 records processed 10:13:08 40000 records processed 10:13:10 50000 records processed 10:13:12 Finished for loop at 10:13:13 Number of records counted = 55500 in 11.780734777450562 seconds.

All done in just under 12 seconds, compared to 28 seconds previously.

Here is another alternative way of processing the data and retrieves the entire results set, using the 'fetchall' command, and stores it located in 'res'.

# Test : Change the arraysize and see what impact that has
import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
cur2.arraysize = 500
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')

t0 = time.time()
print("starting FetchAll at", time.ctime())
res = cur2.fetchall()
t1 = time.time()
print("finished FetchAll at", time.ctime())

ttime = t1 - t0
print("in ", ttime, "seconds.")

Friday, March 2, 2018

Oracle 18c DBaaS Cloud Setup

The 18c Oracle DBaaS is now available. This is the only place that Oracle 18c will be available until later in 2018. So if you want to try it out, then you are going to need to get some Oracle Cloud credits, or you may already have a paying account for Oracle Cloud.


The following outlines the steps you need to go through to gets Oracle 18c setup.

1. Log into your Oracle Cloud

Log into your Oracle Cloud environment. Depending on your access path you will get to your dashboard.

Select Create Instance from the dashboard.


2. Create a new Database

From the list of services to create, select Database.


3. Click 'Create Instance'


4. Enter the Database Instance details

Enter the details for your new Oracle 18c Database. I've called mine 'db18c'.

Then for the Software Release dropdown list, select 'Oracle Database 18c'.

Next select the Software Edition from the dropdown list.


5. Fill in the Instance Details

Fill in the details for 'DB Name', 'PDB Name', 'Administration Password', 'Confirm Password', setup the SSH Public Key, and then decide if you need the Backup and Recovery option.


6. Create the DBaaS

Double check everything and when ready click on the 'Create' button.


7. Wait for Everything to be Create

Now is the time to be patient and wait while your cloud service is created.


I've created two different version of the 18c Oracle DBaaS. The Enterprise Edition to 30 minutes to complete and the High Performance service too 47 minutes.

No it's time to go play.

Sunday, February 18, 2018

18c is now available (but only on the Cloud)

On Friday afternoon (16th February) we started to see tweets and blog posts from people in Oracle saying that Oracle 18c was now available. But is only available on Oracle Cloud and Engineered Systems.

It looks like we will have to wait until the Autumn before we can install it ourselves on our own servers :-(

Here is the link to the official announcement for Oracle 18c.

Oracle 18c is really Oracle The next full new release of the Oracle database is expected to be Oracle 19.

The new features and incremental enhancements in Oracle 18c are:

  • Multitenant
  • In-Memory
  • Sharding
  • Memory Optimized Fetches
  • Exadata RAC Optimizations
  • High Availability
  • Security
  • Online Partition Merge
  • Improved Machine Learning (OAA)
  • Polymorphic Table Functions
  • Spatial and Graph
  • More JSON improvements
  • Private Temporary Tablespaces
  • New mode for Connection Manager

And now the all important links to the documentation.

Oracle 18c Documentation

Oracle 18c New Features

Oracle 18c Data Warehousing

To give Oracle 18c a try you will need to go to and select Database from the drop down list from the Platform menu. Yes you are going to need an Oracle Cloud account and some money or some free credit. Go and get some free cloud credits at the upcoming Oracle Code events.

If you want a 'free' way of trying out Oracle 18c, you can use Oracle Live SQL. They have setup some examples of the new features for you to try.


NOTE: Oracle 18c is not Autonomous. Check out Tim Hall's blog posts about this. The Autonomous Oracle Database is something different, and we will be hearing more about this going forward.

Tuesday, February 13, 2018

Oracle and Python setup with cx_Oracle

Is Python the new R?

Maybe, maybe not, but that I'm finding in recent months is more companies are asking me to use Python instead of R for some of my work.

In this blog post I will walk through the steps of setting up the Oracle driver for Python, called cx_Oracle. The documentation for this drive is good and detailed with plenty of examples available on GitHub. Hopefully there isn't anything new in this post, but it is my experiences and what I did.

1. Install Oracle Client
The Python driver requires Oracle Client software to be installed. Go here, download and install. It's a straightforward install. Make sure the directories are added to the search path.

2. Download and install cx_Oracle
You can use pip3 to do this.
pip3 install cx_Oracle

Collecting cx_Oracle
  Downloading cx_Oracle-6.1.tar.gz (232kB)
    100% |████████████████████████████████| 235kB 679kB/s
Building wheels for collected packages: cx-Oracle
  Running bdist_wheel for cx-Oracle ... done
  Stored in directory: /Users/brendan.tierney/Library/Caches/pip/wheels/0d/c4/b5/5a4d976432f3b045c3f019cbf6b5ba202b1cc4a36406c6c453
Successfully built cx-Oracle
Installing collected packages: cx-Oracle
Successfully installed cx-Oracle-6.1  

3. Create a connection in Python
Now we can create a connection. When you see some text enclosed in angled brackets <>, you will need to enter your detailed for your schema and database server.
# import the Oracle Python library
import cx_Oracle

# define the login details
p_username = ""
p_password = ""
p_host = ""
p_service = ""
p_port = "1521"

# create the connection
con = cx_Oracle.connect(user=p_username, password=p_password, dsn=p_host+"/"+p_service+":"+p_port)

# an alternative way to create the connection
# con = cx_Oracle.connect('/@/:1521')

# print some details about the connection and the library
print("Database version:", con.version)
print("Oracle Python version:", cx_Oracle.version)

Database version:
Oracle Python version: 6.1

4. Query some data and return results to Python
In this example the query returns the list of tables in the schema.
# define a cursor to use with the connection
cur = con.cursor()
# execute a query returning the results to the cursor
cur.execute('select table_name from user_tables')
# for each row returned to the cursor, print the record
for row in cur:
    print("Table: ", row)

Table:  ('ODMR_CARS_DATA',)

Now list the Views available in the schema.
# define a second cursor
cur2 = con.cursor()
# return the list of Views in the schema to the cursor
cur2.execute('select view_name from user_views')
# display the list of Views
for result_name in cur2:
    print("View: ", result_name)


5. Query some data and return to a Panda in Python
Pandas are commonly used for storing, structuring and processing data in Python, using a data frame format. The following returns the results from a query and stores the results in a panda.
# in this example the results of a query are loaded into a Panda
# load the pandas library
import pandas as pd

# execute the query and return results into the panda called df
df = pd.read_sql_query("SELECT * from INSUR_CUST_LTV_SAMPLE", con)

# print the records returned by query and stored in panda

0     CU13388     LEIF   ARNOLD    MI    Midwest   M        PROF-2   
1     CU13386     ALVA   VERNON    OK    Midwest   M       PROF-18   
2      CU6607   HECTOR  SUMMERS    MI    Midwest   M  Veterinarian   
3      CU7331  PATRICK  GARRETT    CA       West   M       PROF-46   
4      CU2624  CAITLYN     LOVE    NY  NorthEast   F      Clerical   

0            No   70             0   ...                          0   
1            No   24             0   ...                          9   
2            No   30             1   ...                          2   
3            No   43             0   ...                          4   
4            No   27             1   ...                          4   

0                0            3            0               0   
1             3000            4            1               1   
2              980            4            1               3   
3                0            2            0               1   
4             5000            4            1               2   

0               2500              1                 0  17621.00   MEDIUM  
1               2500              1               450  22183.00     HIGH  
2                500              1               250  18805.25   MEDIUM  
3                800              1                 0  22574.75     HIGH  
4               3000              2              1500  17217.25   MEDIUM  

[5 rows x 31 columns]

6. Wrapping it up and closing things
Finally we need to wrap thing up and close our cursors and our connection to the database.
# close the cursors

# close the connection to the database

Useful links
cx_Oracle website
cx_Oracle documentation
cx_Oracle examples on GitHub

Watch out for more blog posts on using Python with Oracle, Oracle Data Mining and Oracle R Enterprise.

Monday, December 11, 2017

Oracle Code Online December 2017

This week Oracle Code will be having an online event consisting of 5 tracks and with 3 presentations on each track. This online Oracle Code event will be given in 3 different geographic regions on 12th, 13th and 14th December. NewImage I've been selected to give one of these talks, and I've given this talk at some live Oracle Code events and at JavaOne back in October. The present is pre-recorded and I recorded this video back in September. I hope to be online at the end of some of these presentations to answer any questions, but unfortunately due to changes with my work commitments I may not be able to be online for all of them. The moderator for these events will take your questions (or you can send them to me here) and I will write a blog post answering all your questions.

Tuesday, September 19, 2017

My Oracle Open World 2017 Presentations

Oracle Open World 2017 will be happening very soon (1st-5th October). Still lots to do before I can get on that plane to San Francisco.

This year I'll be giving 2 presentations (see table below). One on the Sunday during the User Groups Sunday sessions. I've been accepted on the EMEA track. I then get a few days off to enjoy and experience OOW until Thursday when I have my second presentation that is part of JavaOne (I think!)

My OOW kicks off on Friday 29th September with the ACE Director briefing at Oracle HQ, after flying to SFO on Thursday 28th. This year it is only for one day instead of two days. I really enjoy this event as we get to learn and see what Oracle will be announcing at OOW as well as some things that will be coming out during the following few months.

Sunday13:45-14:30SQL: One Language to Rule All Your Data [OOW SUN1238]

SQL is a very powerful language that has been in use for almost 40 years. SQL comes with many powerful techniques for analyzing your data, and you can analyze data outside the database using SQL as well. Using the new Oracle Big Data SQL it is now possible to analyze data that is stored in a database, in Hadoop, and in NoSQL all at the same time. This session explores the capabilities in Oracle Database that allow you to work with all your data. Discover how SQL really is the unified language for processing all your data, allowing you to analyze, process, run machine learning, and protect all your data.

Hopefully this presentation will be a bit of Fun! For those who have been working with the database for a long time, we can sometimes forget what we can really do. For those starting out in the career may not realise what the database can do. The presentation delivers an important message while having a laugh or two (probably at me).

Marriott Marquis (Golden Gate Level) - Golden Gate C1/C2
Thursday13:45-14:30Is SQL the Best Language for Statistics and Machine Learning?

[OOW and JavaOne CON7350]

Did you know that Oracle Database comes with more than 300 statistical functions? And most of these statistical functions are available in all versions of Oracle Database? Most people do not seem to know this. When we hear about people performing statistical analytics, we hear them talking about Excel and R, but what if we could do statistical analysis in the database without having to extract any data onto client machines? This presentation explores the various statistical areas available in Oracle Database and gives several demonstrations. We can also greatly expand our statistical capabilities by using Oracle R Enterprise with the embedded capabilities in SQL.

This presentation is just one of the 14 presentations that are scheduled for the Thursday! I believe this session is already fully booked, but you can still add yourself to the wait list.

Marriott Marquis (Golden Gate Level) - Golden Gate B

My flights and hotel have been paid by OTN as part of the Oracle ACE Director program. Yes this costs a lot of money and there is no way I'd be able to pay these costs. Thank you.

My diary for OOW is really full. No it is completely over booked. It is just mental. Between attending conference session, meeting with various product teams (we only get to meet at OOW), attending various community meet-ups, this year I get to attend some events for OUG leaders (representing UKOUG), spending some time on the EMEA User Group booth, various meetings with people to discuss how they can help or contribute to the UKOUG, then there is Oak Table World, trying to check out the exhibition hall, spend some time at the OTN/ODC hangout area, getting a few OTN t-shirts, doing some book promotions at the Oracle Press shop, etc., etc., etc. I'm exhausted just thinking about it. Mosts days start at 7am and then finish around 10pm.

I'll need a holiday when I get home! but it will be straight back to work :-(

If you are at OOW and want to chat then contact me via DM on Twitter or WhatsApp (these two are best) or via email (this will be the slowest way).

I'll have another blog post listing the presentations from various people and partners from the Republic of Ireland who are speaking at OOW.

Tuesday, July 5, 2016

Cluster Distance using SQL with Oracle Data Mining - Part 4

This is the fourth and last blog post in a series that looks at how you can examine the details of predicted clusters using Oracle Data Mining. In the previous blog posts I looked at how to use CLUSER_ID, CLUSTER_PROBABILITY and CLUSTER_SET.

In this blog post we will look at CLUSTER_DISTANCE. We can use the function to determine how close a record is to the centroid of the cluster. Perhaps we can use this to determine what customers etc we might want to focus on most. The customers who are closest to the centroid are one we want to focus on first. So we can use it as a way to prioritise our workflows, particularly when it is used in combination with the value for CLUSTER_PROBABILITY.

Here is an example of using CLUSTER_DISTANCE to list all the records that belong to Cluster 14 and the results are ordered based on closeness to the centroid of this cluster.

SELECT customer_id, 
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob,
       cluster_distance(clus_km_1_37 USING *) as cluster_Distance
FROM   insur_cust_ltv_sample
WHERE   cluster_id(clus_km_1_37 USING *) = 14
order by cluster_Distance asc;

Here is a subset of the results from this query.


When you examine the results you may notice that the records that is listed first and closest record to the centre of cluster 14 has a very low probability. You need to remember that we are working in a N-dimensional space here. Although this first record is closest to the centre of cluster 14 it has a really low probability and if we examine this record in more detail we will find that it is at an overlapping point between a number of clusters.

This is why we need to use the CLUSTER_DISTANCE and CLUSTER_PROBABILITY functions together in our workflows and applications to determine how we need to process records like these.

Thursday, June 23, 2016

Cluster Sets using SQL with Oracle Data Mining - Part 3

This is the third blog post on my series on examining the Clusters that were predicted by an Oracle Data Mining model. Check out the previous blog posts.

In the previous posts we were able to list the predicted cluster for each record in our data set. This is the cluster that the records belonged to the most. I also mentioned that a record could belong to many clusters.

So how can you list all the clusters that the a record belongs to?

You can use the CLUSTER_SET SQL function. This will list the Cluster Id and a probability measure for each cluster. This function returns a array consisting of the set of all clusters that the record belongs to.

The following example illustrates how to use the CLUSTER_SET function for a particular cluster model.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc; 

The output from this query will be an ordered data set based on the customer id and then the clusters listed in descending order of probability. The cluster with the highest probability is what would be returned by the CLUSTER_ID function. The output from the above query is shown below.


If you would like to see the details of each of the clusters and to examine the differences between these clusters then you will need to use the CLUSTER_DETAILS function (see previous blog post).

You can specify topN and cutoff to limit the number of clusters returned by the function. By default, both topN and cutoff are null and all clusters are returned.

- topN is the N most probable clusters. If multiple clusters share the Nth probability, then the function chooses one of them.

- cutoff is a probability threshold. Only clusters with probability greater than or equal to cutoff are returned. To filter by cutoff only, specify NULL for topN.

You may want to use these individually or combined together if you have a large number of customers. To return up to the N most probable clusters that are greater than or equal to cutoff, specify both topN and cutoff.

The following example illustrates using the topN value to return the top 4 clusters.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, 4, null USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output from this query shows only 4 clusters displayed for each record.


Alternatively you can select the clusters based on a cut off value for the probability. In the following example this is set to 0.05.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, NULL, 0.05 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output this time looks a bit different.


Finally, yes you can combine these two parameters to work together.

SELECT t.customer_id, s.cluster_id, s.probability FROM (select customer_id, cluster_set(clus_km_1_37, 2, 0.05 USING *) as Cluster_Set from insur_cust_ltv_sample WHERE customer_id in ('CU13386', 'CU100')) T, TABLE(T.cluster_set) S order by t.customer_id, s.probability desc;

Thursday, June 16, 2016

Cluster Details with Oracle Data Mining - Part 2

This is the second blog post of my series on examining the clusters that are predicted for by an Oracle Data Mining model for your data. In my previous blog post I should you how to use CLUSTER_ID and CLUSTER_PROBABILITY functions. These are the core of what you will be used when working with clusters and automating the process.

In this blog post I will look at what details are used by the clustering model to make the prediction. The function that you can use is called CLUSTER_DETAILS. I had an earlier blog post on using PREDICTION_DETAILS to see some of the details that are produced when performing classification.

CLUSTER_DETAILS returns the cluster details for each row in the selection. The return value is an XML string that describes the attributes of the highest probability cluster.

Here is an example of using the CLUSTER_DETAILS function in a SELECT statement.

select cluster_details(clus_km_1_37, 14 USING *) as Cluster_Details
from   insur_cust_ltv_sample 
where  customer_id = 'CU13386';

The output is an XML string and the easiest way to view this is in SQL Developer. It will list the top 5 highest weighted attributes for the cluster centroid.

NewImage NewImage

The returned attributes are ordered by weight. The weight of an attribute expresses its positive or negative impact on cluster assignment. A positive weight indicates an increased likelihood of assignment. A negative weight indicates a decreased likelihood of assignment. By default, CLUSTER_DETAILS returns the attributes with the highest positive weights in defending order.

Tuesday, June 7, 2016

Examining predicted Clusters and Cluster details using SQL

In a previous blog post I gave some details of how you can examine some of the details behind a prediction made using a classification model. This seemed to spark a lot of interest. But before I come back to looking at classification prediction details and other information, this blog post is the first in a 4 part blog post on examining the details of Clusters, as identified by a cluster model created using Oracle Data Mining.

The 4 blog posts will consist of:

  • 1 - (this blog post) will look at how to determine the predicted cluster and cluster probability for your record.
  • 2 - will show you how to examine the details behind and used to predict the cluster.
  • 3 - A record could belong to many clusters. In this blog post we will look at how you can determine what clusters a record can belong to.
  • 4 - Cluster distance is a measure of how far the record is from the cluster centroid. As a data point or record can belong to many clusters, it can be useful to know the distances as you can build logic to perform different actions based on the cluster distances and cluster probabilities.

Right. Let's have a look at the first set of these closer functions. These are CLUSTER_ID and CLUSTER_PROBABILITY.

CLUSER_ID : Returns the number of the cluster that the record most closely belongs to. This is measured by the cluster distance to the centroid of the cluster. A data point or record can belong or be part of many clusters. So the CLUSTER_ID is the cluster number that the data point or record most closely belongs too.

CLUSTER_PROBABILITY : Is a probability measure of the likelihood of the data point or record belongs to a cluster. The cluster with the highest probability score is the cluster that is returned by the CLUSTER_ID function.

Now let us have a quick look at the SQL for these two functions. This first query returns the cluster number that each record most strong belongs too.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id, 
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');


Now let us add in the cluster probability function.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id,
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob       
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');


These functions gives us some insights into what the cluster predictive model is doing. In the remaining blog posts in this series I will look at how you can delve deeper into the predictions that the cluster algorithm is make.

Friday, May 6, 2016

Advanced Analytics in Oracle Data Visualization Desktop

Oracle Data Visualisation Desktop has the feature of being able to include some advanced analytics. In a previous blog post I showed you how to go about installing Oracle R Distribution on your desktop/client machine. This will allow you to make use of some of the advanced analytics features of Oracle Data Visualization Desktop.

The best way to get started with using the advanced analytics features of Oracle Data Visualization Desktop, is to ignore that these features exist. Start with creating your typical analytics, charts etc. Only then you can really look at adding some of the advanced analytics features.

To access the advanced analytics features you can select the icon from the menu bar for advanced analytics. It is the icon with the magnifying glass.


When you have listed on this icon the advanced analytics menu opens displaying the 5 advanced analytics options available to you.

With your chart/graphic already created then you can click on one of the advanced analytics options and drag it onto your char or onto the palette for the chart. For example in the following diagram the Outlier option was selected and dragged into the Color section. This will then mark Outlier data on your chart with a different color.


You can follow a similar approach with all the other advanced analytics options. Click and drag. It is that simple. As you add each advanced analytics option, the chart will be updated automatically for you.

An alternative to clicking and dragging from the chart options palette, you can right click on the chart (or click on the wheel on the top right hand corner of the chart window), and then select the advanced analytics feature you want from the menu.


or what I prefer doing is to select Properties from the menu above. When you do this you get a new window opening and when you click on the icon with the magnifying glass you get to add and customize the advanced analytics features.



I would urge caution when you are reading other demonstrations about Oracle Visualization Desktop that are showing examples of predictive analytics. There are a few blog posts out there and also some videos too.

What they are actually showing you is the embedded R execution feature of Oracle R Enterprise. Oracle R Enterprise is part of the Oracle Advanced Analytics Option, which is a licensed option.

So if you follow these blog posts and videos, thinking that you can do this kind of advanced analytics, you could be getting into license issues. This confusion is not helped with comments like the following on the Oracle website.

"Predictive Analytics: Analytics has progressed from providing oversight to offering insight, and now to enabling foresight. Oracle Data Visualization supports that progression, delivering embedded predictive capabilities that enable anyone to see trend lines and other visuals with a click, and extend their analysis using a free R download."

Personally I find this a bit confusing. Yes you can perform some advanced and predictive analytics with Oracle Data Visualization, but you need to ensure that you are using the client side R installation, for your analytics.

As with all licensing questions, you should discuss them with your Oracle Sales representative.

Tuesday, May 3, 2016

Oracle Data Visualisation Desktop : Enabling Advanced Analytics (R)

Oracle Data Visualization comes with all the typical features you have with Visual Analyzer that is part of BICS, DVCS and OBIEE.

An additional install you may want to do is to install the R language for Oracle Data Visualization Desktop. This is required to enable the Advanced Analytics feature of the tool.


After installing Data Visualisation Desktop when you open the Advanced Analytics section and try to add one of the Advanced Analytics graphing option you will get an errors message as, shown below.


In Windows, click on the Start button, then go to Programs and then Oracle. In there you will see a menu item called install Advanced Analytics i.e. install Oracle R Distribution on your machine.


When you click on this menu option a new command line window will open and will proceed with the installation of Oracle R Distribution (in this case version 3.1.1, which is not the current version of Oracle R Distribution).

NewImage NewImage

By accepting the defaults and clicking next, Oracle R Distribution will be installed. The following images will step you through the installation.

NewImage NewImage

NewImage NewImage

NewImage NewImage

NewImage NewImage


The final part of the installation is download and install lots and lots of supporting R packages.


When these supporting R packages have been installed, you can now use the Advanced Analytics features of Oracle Data Visualisation Desktop.

If you had the tool open during this installation you will need to close/shutdown the tool and restart it.

Saturday, April 30, 2016

Oracle Data Visualization Desktop - now available

After a bit of a long wait Oracle have finally release Oracle Data Visualization for the desktop. The desktop version of this tool is only available for Windows desktops at the moment. I'm sure Oracle will be bringing out versions of other OS soon (I hope).

To get you hands on the Oracle Data Visualization to to the following OTN webpage (click on this image)


After downloading has finished, you can run the installer.

When the Oracle Installer opens you will be prompted to enter the required details or to accept the defaults, as outlined below.


  • Installation Location : Decide where you are going to have the Oracle Data Visualization tool installed on your desktop. The default location is C:\Program Files\Oracle Data Visualization Desktop . Click Next
  • Options : There are 2 check boxes for 'Create desktop shortcut' and 'Deploy samples'. Leave both of these checked, as you will probably want these. Click Next.
  • Summary : Lists a summary of the installation. There is nothing really for you to do here, so on the Install button.
  • NewImage
  • Progress : You can ten sit back and monitor the progress of the installation. The installation tool about 4 minutes on my small Windows VM
  • NewImage

When the installation is complete you can now fire up Oracle Data Visualization and enjoy. If you have just installed the tool it will automatically be started for you.


When the tool has finished all the configurations that it needs to do, the tool will open with the following window and shows a sample projects for you to get an idea of some of the things that are possible.


For more details on the tool and on the Oracle Cloud hosted version click on the following image to get to the Oracle webpage for the product.


Wednesday, October 21, 2015

People from Ireland who are Speaking at #OOW15

Oracle Open World 2015 will be kicking off in a few days time. There will be over 50K people attending this event, with a couple of hundred speakers.

I'm one of the lucky ones to have been selected to speak at Oracle Open World. This will be my third or fourth year in a row that I'm speaking at Oracle Open World.

Am I the only person from Ireland who is speaking at Oracle Open World ? No, I'm not, but there is a small number of us this year. There are basically 2, yes 2 other people from Ireland speaking at Oracle Open World. These are Richard Corbridge from the HSE, and Debra Lilley.

If you are interesting in attending these sessions here is the schedule.


Please share this information to allow your colleagues and help spread the news about he Irish Speakers.

If you are attending OOW, then please tweet, blog, Facebook, etc about these presentations.

Hopefully I will see you are one or both of my presentations on the Sunday.

And I will be having a draw/raffle for a copy of my book at my second presentation on the Sunday at 15:30 in Moscone South - 307


Monday, October 19, 2015

My schedule for OOW15

It seems to be a things that people blog about their schedule for Oracle Open World and talk about how busy they will be.

So to join the club this is my current schedule.


The boxes that are (a kind of) orange with red text are when I have MY presentations.

The purple boxes indicate some fun event and entertainment.

When attending conferences sometime you get to hear about a good presentation or some other events just happen. So this schedule is tentative and will probably only reflect 40%-60% of what I will really attend/do. There is so much going on each day.

You will see I have some Hands-on-Labs booked. These are a great way to try out some product for an hour. I highly recommend you doing some too.

By far my favourite part of my OOW trip is the Oracle ACE Director briefing that OTN arranges for us. This briefing will happen on the Thursday (22nd) and Friday (23rd) before OOW, and this is held in Oracle HQ in Redwood Shores.

We have been given a tentative schedule for the ACED Briefing and for me I highlight of this briefing is when Thomas Kurian comes in an very openly talks to us, telling us lots and lots and lots about what is happening across the Oracle product set. I would lover to be able to share what we are told but we are all under NDA. After Thomas we have some Cloud updates, then we get onto some updates on Oracle Development Tools, lots of updates on what is coming next in the Oracle Database. So we get the likes of Penny Avril, Andy Mendelsohn, Tom Michelini, Steven Feurerstein, Roland Smart and Wim Coekaerts, among others. So lots and lots of EVPs, Sr VPs and VPs.

After the briefing is over we then hop on a bus and head to our hotel in download San Francisco, near Union Square, where we will be based for the OOW.

Thanks to all in OTN and to those who run the Oracle ACE programme for arranging and paying for my flights, hotels and transportation.

Monday, October 12, 2015

SQL and PL/SQL icons and stickers

Over the past couple of weeks I've been preparing my slides and presentations for Oracle Open World (2015).

One thing that occurred to me was that there was no icon or image to represent Oracle SQL and PL/SQL. I needed something that I could include in my presentations to represent these.

After a bit of Tweeting it turns out that there is no (official) icons or images for Oracle SQL and Oracle PL/SQL.

So I created some and here they are.

SQL icon sm PLSQL icon sm

and there are these

SQL 2 sm PLSQL 2 sm

Feel free to use these in your presentations and share around. All I ask is that you give me an odd acknowledgement from time to time.


If you would like to get these as stickers and put them on your laptop, notebooks, or anywhere really, you can order them on Stickermule.


NB: It is important to note that these are in no way approved or acknowledged or endorsed or anything else by Oracle.