Forever Learning

Forever learning and helping machines do the same.

Big Data is Big

leave a comment »

We happen to have one sat in the next building over. Would you guys like to see it?

Oh, boy! Would we!

Myself and about twenty other Oracle employees are attending a Cloudera training on Hadoop in the Oracle Reading office. Five days packed with information covering a whole new ecosystem filled with some pretty crazy beasts.

Our heads are spinning like a room full of network-attached storage and our pens are humming like a data center cooling system as we attempt to map and reduce every little piece of data they throw at us.

During one of the breaks, we get the opportunity to go see the Oracle Big Data Appliance. Standing in front of this enormous machine, it finally dawns on me what a massive bulk of raw power this really is. A seemingly countless number of disks are mounted in a box higher and wider than myself. Each disk can hold three terabytes of data.

Big Data is Big!

Written by Lukas Vermeer

February 23, 2012 at 19:24

Facet Based Predictions in Oracle Real-Time Decisions

leave a comment »

[ Crossposting from the Oracle Real-Time Decisions Blog. ]

The analytical models method detailed in a previous post are not only extremely valuable for reporting, but can also be used to predict likelihoods for things other than regular choices. We can for instance generate predictions based on statistics for an attribute of a choice, rather than the choice itself. We use the term facet based prediction to describe this advanced form of generating predictions.

This novel approach to modeling can be applied to significantly improve predictive accuracy and model quality. It can also facilitate the rapid transfer of existing learnings to newly created choices based on their facet values. These capabilities can be of use to practically all implementations, but they are of utmost importance in cases where the number of choices is very high or individual choices have short shelf life. In these instances, there might simply not be enough time or data to be able to predict likelihoods for individual choices. We could predict likelihoods for certain facets of our choices; as long as their cardinality remains relatively low.

Consider the following example in which we recommend products based on the acceptance of other products in the same category. In our ILS, Oracle Real-Time Decisions will be used to recommend a single product based on a single performance goal: Likelihood.

Choice Groups Setup

Products that may be recommended are stored in a choice group Products (we will use static choices, but this approach could be implemented for dynamic choices also). Product choices have an attribute Category which will contain a category name. We will use a second and separate dynamic choice group Categories to record acceptance of the different product categories.

Choice Group Setup

Note that we never intend to return any choices from the Categories choice group to a client. It is configured using a dummy source and will not contain any actual choices. This group is only used within the ILS for predicting likelihoods. Statistics for this group may however be viewed in decision center reports.

Recording Events

Similar to the example for analytical models, we will record events against a dynamically generated choice representing a facet value rather than against the actual choice. In this example, both the actual choice and the event to record will be passed through a request represented as Strings.

// create a new choice to represent the category facet
CategoriesChoice c = new CategoriesChoice(Categories.getPrototype());
// set properties of the choice (SDOId should be of the form "{ChoiceGroupId}${ChoiceLabel}")
c.setSDOId("Category" + "$" + Products.getChoice(request.getChoice()).getCategory());
// record event in model (catching an exception just in case)
try { c.recordEvent(request.getEvent()); } catch (Exception e) { logError("Exception: " + e); }

Model Setup
Our model setup is practically identical to before, but this time we’ll enable “Use for prediction“.

Model Setup

Predicting Likelihoods

A function PredictLikelihood will be used to predict likelihoods for our products. The function takes a Products choice and an Event (String) as parameters and returns a Double value representing the predicted likelihood.

// get instance of the model used for predicting Category Events
CategoryEvents m = CategoryEvents.getInstance();
// return the likelihood based on the generated SDOId and the "Accepted" event
return m.getChoiceEventLikelihood("Categories$"+product.getCategory(), event );

Prediction Function

Choice Group Scores Setup

On the scores tab for the Products choice group we configure the Likelihood performance goal to be populated by thePredictLikelihood function using parameters this and “Accepted”. The keyword this refers to the particular choice being scored and will ensure each choice is scored according to its category facet.

Scoring Setup

That is all that is required to score choices against a facet. We can now create decisions and advisors that use these predictions to recommend products based on their categories.

In this example, we have predicted likelihoods based on a single product facet. As a result, products in the same category will be scored the same. In practical implementations this will rarely be an issue, because there will presumably be multiple performance goals. Also, likelihoods may be mixed with product specific attributes like price or cost; resulting in score differentiation between products regardless of equality in likelihoods.

In a later post, we will discuss how we can expand on this to include multiple product facets in our likelihood prediction.

Written by Lukas Vermeer

February 17, 2012 at 12:15

Marketing Personalization and the Uncanny Valley

with one comment

Dear [prospect.first_name],

Following our last discussion on [prospect.last_contact_date] concerning [prospect.subject_area] I think the following article would be of particular interest to you.

Seth Godin writes.

Sure, it’s easy to grab a first name from a database or glean some info from a profile.

But when you pretend to know me, you’ve already started our relationship with a lie. You’ve cheapened the tools we use to recognize each other and you’ve tricked me, at least a little.

Increased familiarity begets heightened expectations. Personalization has its own uncanny valley.

The uncanny valley is a hypothesis in the field of robotics and 3D computer animation, which holds that when human replicas look and act almost, but not perfectly, like actual human beings, it causes a response of revulsion among human observers.

When you treat your customers as though you know them personally they will be personally offended if you do not. Beware of the eerie hollow of broken promise.

Written by Lukas Vermeer

February 2, 2012 at 15:33

Analytical Models in Oracle Real-Time Decisions

with one comment

[ Crossposting from the Oracle Real-Time Decisions Blog. ]

As explained in a previous post, we can record events against unsourced dynamic choices created on-the-fly using the getPrototype method. Choices instantiated in this fashion, and the events recorded against them, will be visible in decision center reports.

This enables us to create extensive reporting based on arbitrary input from different sources without the need to specify all the possible choice values upfront. Creating so-called analytical models can be very useful for analysis.

Recording Client Input

Consider the following example which shows how this approach can be used to create an analytical model based on informant input. In our ILS, Oracle Real-Time Decisions will be used to find and report on correlations between a regular session attribute and arbitrary codes passed through an informant.

Choice Group Setup

A choice group Reason is used to store codes passed through the informant. During initialization, the choice group will attempt to grab choices from the ReasonEntityArray, but the array is a dummy entity that will always return nothing, because we’ve not defined a value for it.

Reason choice group configuration, dynamic choices tab.

Reason choice group configuration, group attributes tab.

Informant Setup

When invoked, a RecordReason informant will record an event for the ReasonCode input parameter. The logic for this informant is pretty straightforward.

// create a new choice based on the request attribute (a string that describes the reason)
ReasonChoice c = new ReasonChoice(Reason.getPrototype());
// set properties of the choice (SDOId should be of the form "{ChoiceGroupId}${ChoiceLabel}")
c.setSDOId("Reason" + "$" + request.getReasonCode());
// record choice in model (catching an exception just in case)
try { c.recordChoice(); } catch (Exception e) { logTrace("Exception: " + e); }

Model Setup

In order to actually find and report on correlations, we will need to define at least one event model on our choice group. For this example, we’ll keep things as simple as possible.

Reasons choice event model configuration.

Reports

The reports in decision center will show the reason codes sent to the informant as if they were dynamic choices and calculate statistics and correlations against session attributes.

(In this example, an Oracle Real-Time Decisions Load Balancer script was used to send four different codes to the ILS with a severe bias towards certain age groups.)

Decision center report for Reason choice, analysis tab.

This approach enables us to generate detailed reporting and analysis of more than just regular choices in the familiar decision center environment. In this example we were using informant input, but this technique can also be applied using the attributes of other choices to gain additional insight into the correlations between session attributes and choice attributes like product group or category (rather than individual choices).

This method can also be used in conjunction with predictive models. We will explore this possibility and its applications in future posts.

Written by Lukas Vermeer

January 24, 2012 at 11:48

All You Need Is a Good Brainwashing

leave a comment »

Classical conditioning is underrated. Too many bad spy movies have taught us that ‘brainwashing’ is bad.

But conditioning can be a powerful tool for self-improvement. I’ve deliberately been playing the Brian Eno song Thursday Afternoon every time I felt myself immersed in ‘the zone‘. In my mind, the track and the mental state have now become intricately linked. This is so much the case that I can now descend into productivity Walhalla simply by listening to my personal work anthem.

In effect, I’ve brainwashed myself to work better in response to a particular tune.

 

There is nothing special about this trick. Anyone can do it and almost no real effort is required.

A few guidelines.

  • Choose a song that is long. Not a two minute ditty. This will also help for the next prerequisite.
  • Choose a song that can stand to be repeated. You’ll want to be productive for longer than one play.
  • Choose a song without lyrics. This is more personal. To me, words and melody are distracting.
  • Choose a song that is timeless. Something you wouldn’t mind listening to in a few years time.
  • Choose a song that is not a classic. Classics are played on the radio. That is not what you want.
  • Carry your song with you always. You need to be ready. Productivity can strike at any moment.
  • Play your song every time you are in the zone. Especially initially you want the bonding to be strong.
  • Play your song without interruptions. Interruptions kill productivity. Interruptions break the spell.
  • Never play your song when you are not in the zone. That would break the spell. Don’t do it.
  • Don’t overuse. There are limits to how productive you can be. This trick does not fix that.
  • Don’t expect magic. The song will not always work. If it doesn’t work, stop listening right away.

Have I missed anything important? Feel free to add your tips and tricks in the comments below.

Written by Lukas Vermeer

January 23, 2012 at 17:52

Waste Of Search

leave a comment »

Analyzing website traffic can lead to unexpected insights. This incoming search term caught my eye.

I’m not entirely sure why someone would even search for something like this, but I’m pretty confident this person did not find what he or she was looking for.

Oracle Real-Time Decisions is definitely not a waste of money.

Written by Lukas Vermeer

January 18, 2012 at 16:35

Posted in Meta, Oracle, RTD

Tagged with ,

Conversion Rates

leave a comment »

In my mind, I write a blog post almost every day. But when I sit down to write, I am at a loss for words.

I have a brain busting with thought, an inbox full of interesting ideas and a stack of draft posts containing loads of links and random ramblings on disparate topics. All this pondering without publication is paralyzing.

Analysis without action is a crippling affliction that affects individuals as well as businesses. Wasting time or wasting money; we all need to be mindful of our conversion rates.

Time to stop thinking and start typing.

 

Written by Lukas Vermeer

January 17, 2012 at 18:01

Recording Events

with one comment

[ Crossposting here from the Oracle Real-Time Decisions Blog where I have been invited to contribute. ]

There is always more than one way to skin a cat; it’s just that some ways to excoriate a feline are more efficient than others. This is certainly true for the way in which we can record events against choices in RTD, and the differences in performance can be striking.

getChoice

A common approach to record events against static choices is to use the getChoice API.

Choice ch = MyChoiceGroup.getChoice("MyChoiceId");
ch.recordEvent("Clicked");

On the first line, we are asking RTD to go through the list of all choices in MyChoiceGroup and retrieve one particular choice keyed MyChoiceId. On the second, we record a Clicked event against the returned choice.

Because we are asking RTD to go through the list of all choices, we require that RTD has a full list of choices available; even if we only end up using a single choice. If we use the same code for dynamic choices, RTD will thus have to fetch and instantiate the full list of dynamic choices in order to find the single one we are interested in. This may be an expensive operation, as it may require accessing an external database or web service; execution of complex custom code; and/or instantiating a large number of dynamic choices.

[ Note that dynamic choices do not cache at the choice level. They cache at the entity level. These entities hold the data for the dynamic choices but not the choices themselves. The RTD API could perhaps be optimized differently, but the current implementation will instantiate all the dynamic choices and then find the one the API call is looking for. Consider also that the actual source data used for the dynamic choices could come from anywhere and need not come from cached entities at all; it might even be generated on the fly through custom functions. This flexibility in sourcing dynamic choices makes retrieving a single choice non-trivial from the RTD API perspective. ]

This approach will certainly work, but may waste precious time and resources retrieving and instantiating dynamic choices that are never used.

getPrototype

For recording an event, it is sufficient that we have a choice that has the desired SDOId; all other choice attributes are irrelevant for this purpose. A more efficient way to record events against dynamic choices is therefore to create a new empty dynamic choice; assign it an SDOId; and record the event against that, rather than retrieving the ‘actual’ dynamic choice. The result in terms of statistics and learning are the same, but in this approach there is no need for RTD to retrieve any choices at all.

We can instantiate an empty dynamic choice using the getPrototype API.

Choice ch = new MyDynamicGroupChoice(MyDynamicGroup.getPrototype());
ch.setSDOId("MyDynamicGroup" + "$" + "MyChoiceId");
ch.recordEvent("Clicked");

This approach can provide significant performance improvements in implementations with a large number of dynamic choices; or in implementations where retrieving dynamic choices is a non-trivial complex operation and entity caching proves insufficient.

Being able to record events on choices that are not retrievable through the dynamic list is an additional advantage that has several interesting applications which we will explore in future posts.

[ Special thanks to Michel Adar for bringing this to my attention and providing an initial draft of this article. ]

Written by Lukas Vermeer

January 10, 2012 at 17:37

Placebo Punctuation

leave a comment »

I know it’s almost never necessary to use a semicolon, but I like semicolons; they make me feel smart.

Written by Lukas Vermeer

January 5, 2012 at 10:54

Posted in Meta, Psychology

Tagged with , ,

Why Metrics Matter

with one comment

Daring Fireball quotes some interesting research findings related to what Barry Schwartz dubbed The Paradox Of Choice.

About 60% of the people stopped when we had 24 jams on display and then at the times when we had 6 different flavors of jam out on display only 40% of the people actually stopped, so more people were clearly attracted to the larger varieties of options, but then when it came down to buying, so the second thing we looked at is in what case were people more likely to buy a jar of jam.

What we found was that of the people who stopped when there were 24 different flavors of jam out on display only 3% of them actually bought a jar of jam whereas of the people who stopped when there were 6 different flavors of jam 30% of them actually bought a jar of jam.  So, if you do the math, people were actually 6 times more likely to buy a jar of jam if they had encountered 6 than if they encountered 24, so what we learned from this study was that while people were more attracted to having more options, that’s what sort of got them in the door or got them to think about jam, when it came to choosing time they were actually less likely to make a choice if they had more to choose from than if they had fewer to choose from.

A fascinating psychological effect with clear implications for display advertising, but there is a lesson here for online marketeers and analysts as well.

In this study, fewer people stopped when there was less choice, but more people actually bought something. If we were only measuring the former (i.e. attention), and not the latter (i.e. sales), we would be led to think more choice would be about 50% more effective at bringing in customers. And boy, would we be wrong!

Metrics matter; especially when you are using a system which can automatically optimize your process in order to maximize those metrics.

Don’t get yourself in a jam; remember this next time you decide to measure click acceptance instead of actual sales to drive your online marketing effort. Clickthrough rates are useful as a measure by proxy, but they can be misleading.

Written by Lukas Vermeer

January 3, 2012 at 16:58

Follow

Get every new post delivered to your Inbox.