My girlfriend has been struggling with an interesting little problem lately. She was asked to determine the optimal distribution of medicine boxes and bottles over a set of adaptable cabinets; under volume as well as weight constraints. Not an easy task for a computer scientist; much less for a hospital pharmacist in training.
After describing the problem to me last night I (unhelpfully) mumbled that “this sounds like a variable sized bin packing problem to me, you can’t solve the kind of thing in Excel, you probably need an LP solver”.
Apparently I was wrong. It already seemed obvious to me that Excel suffers from a severe case of feature bloat, but this is just absurd.
As the beautiful old car cruised in almost perfect silence under the guidance of its automatic controls, Duncan tried to see something of the terrain through which he was passing. The spaceport was fifty kilometers from the city—no one had yet invented a noiseless rocket—and the four-lane highway bore a surprising amount of traffic. Duncan could count at least twenty vehicles of various types, and even though they were all moving in the same direction, the spectacle was somewhat alarming.
“I hope all those other cars are on automatic,” he said anxiously.
Washington looked a little shocked. “Of course,” he said “It’s been a criminal offence for—oh, at least a hundred years—to drive manually on a public highway. Though we still have occasional psychopaths who kill themselves and other people.”
The future sounds fascinating, but I want my Google Driverless Car now.
Derek Jones posits that “success does not require understanding“.
In my line of work I am constantly trying to understand what is going on (the purpose of this understanding is to control and make things better) and consider anybody who uses machine learning as being clueless, dim witted or just plain lazy; the problem with machine learning is that it gives answers without explanations (ok decision trees do provide some insights).
Problem solving versus solving problems.
As one who specializes in using machine learning, I obviously resent being called “clueless, dim witted or just plain lazy”. However, I feel a larger point should be made here. Success does most definitely require understanding, but not necessarily of how one particular instance of a solution came about.
To be successful in any machine learning effort, one needs to have intricate understanding of what the problem is and how techniques can be applied to find solutions. This is a more general form of understanding which puts more emphasis on the process of finding workable models, rather than on applying these models to individual instances of a problem. Comprehension of problem solving over understanding a particular solution.
Driving a black box.
Consider the following example. To me, the engine of my car is a black box; I have very little idea how it works. My mechanic does know how engines work in general, but he is unable to know the exact internal state of the engine in my car as I am cruising down the highway at 100 miles per hour. None of this “lack of understanding” prevents me from getting from A to B. I turn the wheel, I push the peddel and off we go.
In essence, my mechanic and I have different levels of understanding of my car. But importantly, at different levels of precision, the thing becomes a black box to each of us; in the sense that there is a point where our otherwise perfectly practical models break down and no longer are able to reflect reality. In the end, it’s black boxes all the way down.
Models are merely tools to help you navigate a vastly complex world. Very much like machine learning models, a scientific model might work in many cases, but so does Newton’s law of universal gravitation. We know for a fact that that particular model is definitely wrong; and I sincerely hope many others are just as incorrect.
There will always be limits to our understanding. The fact that we have a model that can help us predict does not necessarily mean we have correctly understood the nature of the universe. All models are wrong, but some are useful.
Reality is simply much too complicated to be captured in a manageable set of rules, but even incomplete (or incorrect) models can provide insight and help us better navigate this world. Machine learning is successful, precisely because it can help us find such models.
[ Peter Norvig has written an excellent piece on this subject in relation to language models. ]
Marketing catchphrases like “recommended by experts” (an appeal to authority), “world-renowned bestseller” (candidly claiming consensus) and “limited supply only” (suggesting scarcity) are widely used to promote many different types of products. To a marketeer, these persuasion tactics are like universally coaxing super supplements that can make just about any offer seem more enticing.
But not all these advertising additives are created equal; and neither are apparently all consumers.
In a fascinating (at least, to scientific advertising geeks like me) study titled “Heterogeneity in the Effects of Online Persuasion“, social scientists Maurits Kaptein and Dean Eckles looked at the differences in susceptibility to varying influence tactics between individuals. What they found may change the way we think about recommendation engines and marketing personalization in general.
It is striking how large the heterogeneity is relative to the average effects of each of the influence strategies. Even though the overall effects of both the authority and consensus strategies were significantly positive, the estimates of the effects of these strategies was negative for many participants. [...] Employing the “wrong” strategy for an individual can have negative effects compared with no strategy at all; and the present results suggest there are many people for whom the included strategies have negative effects.
Our advertising additives can have adverse side-effects. Some people don’t respond well to authority; others don’t feel much for the majority rule. If you pick the wrong strategy for a particular individual, you may actually hurt your marketing efforts; independent of what product you are actually trying to sell.
Kaptein also collaborated in another publication “Means Based Adaptive Persuasive Systems“, which looked at the combined effects of multiple persuasion strategies.
Contrary to intuition, having multiple sources of advice agree on the recommendation had not only no positive impact on compliance levels but actually had a slightly negative effect when compared to the preferred strategy. This is a fascinating discovery since one would assume two agreeing opinions would be stronger than one.
As strange as it may seem, in the case of combined cajolery, the whole is not only less than the sum of its parts; it is less than the single best bit.
1 + 3 = 2
Eckles and Kaptein conclude that personalization is key; and I couldn’t agree more.
To use the results presented above influencers will have to create implementations of distinct influence strategies to support product representations or customer calls to action. As in the two studies presented above, multiple implementations of influence strategies can be created and presented separately. Thus, one can support a product presentation on an e-commerce website by an implementation of the scarcity strategy (“This product is almost out of stock”) or by an implementation of the consensus strategy (“Over a million copies sold”). If technically one is able to represent these different strategies together with the product presentations, identify distinct customers, and measure the effect of the influence strategy on the customer, then one can dynamically select an influence strategy for each customer.
The good news is that we can do this today. Using Oracle Real-Time Decisions, choosing the best influence strategy for a particular customer can easily be implemented as a separate decision to be optimized for conversion. Alternatively, these strategies could simply be considered as another facet of your assets in RTD; similar to the way we would utilize product category metadata to share learnings across promotions.
Personalization is about more than just deciding what you want to sell. This research clearly shows that a recommendation engine that can only select the “best” product is simply not good enough.
Because conversion sometimes requires a little persuasion.
It almost seems like everyone has their head in the cloud these days. And it’s not all just hot air and water vapor. Infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) are truly revolutionizing the corporate computing industry.
That is why, for the past few months, my good friend Matt Feigal and I have been collaborating with budding startup Cloudular to bring you the next logical evolutionary step in cloud computing. Inspired by the skyward ascent of hardware, middleware and software, we are proud to bring you vaporized wetware; or “artificial intelligence as a service” (AIaaS).
I’m mostly kidding, of course, but here is something I cooked up over the weekend. A web service that plays connect four (based on an earlier post) and is looking for worthy sparring partners.
If you think you can code a better connect four algorithm (and you probably can, especially since I’ve deliberately lobotomized this particular version of my implementation), head on over to github and build your own to compete against mine. All the code and an interface description are available there. The service itself is available on Google App Engine.
I’ve got (part of) my head in the cloud, what about you?
In a series of posts on this blog we have already described a flexible approach to recording events, a technique to create analytical models for reporting, a method that uses the same principles to generate extremely powerful facet based predictions and a waterfall strategy that can be used to blend multiple (possibly facet based) models for increased accuracy.
This latest, and also last, addition to this sequence of increasing modeling complexity will illustrate an advanced approach to amalgamate models, taking us to a whole new level of predictive modeling and analytical insights; combination models predicting likelihoods using multiple child models.
The method described here is far from trivial. We therefore would not recommend you apply these techniques in an initial implementation of Oracle Real-Time Decisions. In most cases, basic RTD models or the approaches described before will provide more than enough predictive accuracy and analytical insight. The following is intended as an example of how more advanced models could be constructed if implementation results warrant the increased implementation and design effort. Keep implemented statistics simple!
Combined Likelihood Models
Because facet based predictions are based on metadata attributes of the choices selected, it is possible to generate such predictions for more than one attribute of a choice. We can predict the likelihood of acceptance for a particular product based on the product category (e.g. ‘toys’), as well as based on the color of the product (e.g. ‘pink’).
Of course, these two predictions may be completely different (the customer may well prefer toys, but dislike pink products) and we will have to somehow combine these two separate predictions to determine an overall likelihood of acceptance for the choice.
Perhaps the simplest way to combine multiple predicted likelihoods into one is to calculate the average (or perhaps maximum or minimum) likelihood. However, this would completely forgo the fact that some facets may have a far more pronounced effect on the overall likelihood than others (e.g. customers may consider the product category more important than its color).
We could opt for calculating some sort of weighted average, but this would require us to specify up front the relative importance of the different facets involved. This approach would also be unresponsive to changing consumer behavior in these preferences (e.g. product price bracket may become more important to consumers as a result of economic shifts).
Preferably, we would want Oracle Real-Time Decisions to learn, act upon and tell us about, the correlations between the different facet models and the overall likelihood of acceptance. This additional level of predictive modeling, where a single supermodel (no pun intended) combines the output of several (facet based) models into a single prediction, is what we call a combined likelihood model.
Facet Based Scores
As an example, we have implemented three different facet based models (as described earlier) in a simple RTD inline service. These models will allow us to generate predictions for likelihood of acceptance for each product based on three different metadata fields: Category, Price Bracket and Product Color. We will use an Analytical Scores entity to store these different scores so we can easily pass them between different functions.
A simple function, creatively named Compute Analytical Scores, will compute for each choice the different facet scores and return an Analytical Scores entity that is stored on the choice itself. For each score, a choice attribute referring to this entity is also added to be returned to the client to facilitate testing.
One Offer To Predict Them All
In order to combine the different facet based predictions into one single likelihood for each product, we will need a supermodel which can predict the likelihood of acceptance, based on the outcomes of the facet models. This model will not need to consider any of the attributes of the session, because they are already represented in the outcomes of the underlying facet models.
For the same reason, the supermodel will not need to learn separately for each product, because the specific combination of facets for this product are also already represented in the output of the underlying models. In other words, instead of learning how session attributes influence acceptance of a particular product, we will learn how the outcomes of facet based models for a particular product influence acceptance at a higher level.
We will therefore be using a single All Offers choice to represent all offers in our combined likelihood predictions. This choice has no attribute values configured, no scores and not a single eligibility rule; nor is it ever intended to be returned to a client. The All Offers choice is to be used exclusively by the Combined Likelihood Acceptance model to predict the likelihood of acceptance for all choices; based solely on the output of the facet based models defined earlier.
In Oracle Real-Time Decisions, models can only learn based on attributes stored on the session. Therefore, just before generating a combined prediction for a given choice, we will temporarily copy the facet based scores—stored on the choice earlier as an Analytical Scores entity—to the session. The code for the Predict Combined Likelihood Event function is outlined below.
// set session attribute to contain facet based scores. // (this is the only input for the combined model) session().setAnalyticalScores(choice.getAnalyticalScores); // predict likelihood of acceptance for All Offers choice. CombinedLikelihoodChoice c = CombinedLikelihood.getChoice("AllOffers"); Double la = CombinedLikelihoodAcceptance.getChoiceEventLikelihoods(c, "Accepted"); // clear session attribute of facet based scores. session().setAnalyticalScores(null); // return likelihood. return la;
This sleight of hand will allow the Combined Likelihood Acceptance model to predict the likelihood of acceptance for the All Offers choice using these choice specific scores. After the prediction is made, we will clear the Analytical Scores session attribute to ensure it does not pollute any of the other (facet) models.
To guarantee our combined likelihood model will learn based on the facet based scores—and is not distracted by the other session attributes—we will configure the model to exclude any other inputs, save for the instance of the Analytical Scores session attribute, on the model attributes tab.
In order for the combined likelihood model to learn correctly, we must ensure that the Analytical Scores session attribute is set correctly at the moment RTD records any events related to a particular choice. We apply essentially the same switching technique as before in a Record Combined Likelihood Event function.
// set session attribute to contain facet based scores // (this is the only input for the combined model). session().setAnalyticalScores(choice.getAnalyticalScores); // record input event against All Offers choice. CombinedLikelihood.getChoice("AllOffers").recordEvent(event); // force learn at this moment using the Internal Dock entry point. Application.getPredictor().learn(InternalLearn.modelArray, session(), session(), Application.currentTimeMillis()); // clear session attribute of facet based scores. session().setAnalyticalScores(null);
In this example, Internal Learn is a special informant configured as the learn location for the combined likelihood model. The informant itself has no particular configuration and does nothing in itself; it is used only to force the model to learn at the exact instant we have set the Analytical Scores session attribute to the correct values.
After running a few thousand (artificially skewed) simulated sessions on our ILS, the Decision Center reporting shows some interesting results. In this case, these results reflect perfectly the bias we ourselves had introduced in our tests. In practice, we would obviously use a wider range of customer attributes and expect to see some more unexpected outcomes.
The facetted model for categories has clearly picked up on the that fact our simulated youngsters have little interest in purchasing the one red-hot vehicle our ILS had on offer.
Also, it would seem that customer age is an excellent predictor for the acceptance of pink products.
Looking at the key drivers for the All Offers choice we can see the relative importance of the different facets to the prediction of overall likelihood.
The comparative importance of the category facet for overall prediction might, in part, be explained by the clear preference of younger customers for toys over other product types; as evident from the report on the predictiveness of customer age for offer category acceptance.
Oracle Real-Time Decisions’ flexible decisioning framework allows for the construction of exceptionally elaborate prediction models that facilitate powerful targeting, but nonetheless provide insightful reporting. Although few customers will have a direct need for such a sophisticated solution architecture, it is encouraging to see that this lies within the realm of the possible with RTD; and this with limited configuration and customization required.
There are obviously numerous other ways in which the predictive and reporting capabilities of Oracle Real-Time Decisions can be expanded upon to tailor to individual customers needs. We will not be able to elaborate on them all on this blog; and finding the right approach for any given problem is often more difficult than implementing the solution. Nevertheless, we hope that these last few posts have given you enough of an understanding of the power of the RTD framework and its models; so that you can take some of these ideas and improve upon your own strategy.
As always, if you have any questions about the above—or any Oracle Real-Time Decisions design challenges you might face—please do not hesitate to contact us; via the comments below, social media or directly at Oracle. We are completely multi-channel and would be more than glad to help.
Update (19th September, 2012): Please note that the above is intended only as an example. The implementation details shown here have been simplified so as to keep the application comprehensive enough for a single blog post. Most notably, an unhighlighted shortcut has been taken with regards to the way feedback events are recorded.
When implementing combined likelihood models in production systems, care must be taken to assure that the analytical scores used when recording feedback for a choice are exactly identical to the scores used when this particular choice was recommended. In real world applications, this will require that all scores for recommended choices are stored separately on the session for later retrieval, rather than recalculating the scores when feedback occurs (which is what the example above will do).
The method described here is far from trivial. We therefore would not recommend you apply these techniques in an initial implementation of Oracle Real-Time Decisions and that you enlist the help of experienced RTD resources to ensure the resulting implementation is correct.