Archive for the ‘Statistics’ Category
Extrapolation
There are two types of people in this world:
- Those who can extrapolate from incomplete data.
[ Via @professorkitteh. Original source unknown. ]
Why Metrics Matter
Daring Fireball quotes some interesting research findings related to what Barry Schwartz dubbed The Paradox Of Choice.
About 60% of the people stopped when we had 24 jams on display and then at the times when we had 6 different flavors of jam out on display only 40% of the people actually stopped, so more people were clearly attracted to the larger varieties of options, but then when it came down to buying, so the second thing we looked at is in what case were people more likely to buy a jar of jam.
What we found was that of the people who stopped when there were 24 different flavors of jam out on display only 3% of them actually bought a jar of jam whereas of the people who stopped when there were 6 different flavors of jam 30% of them actually bought a jar of jam. So, if you do the math, people were actually 6 times more likely to buy a jar of jam if they had encountered 6 than if they encountered 24, so what we learned from this study was that while people were more attracted to having more options, that’s what sort of got them in the door or got them to think about jam, when it came to choosing time they were actually less likely to make a choice if they had more to choose from than if they had fewer to choose from.
A fascinating psychological effect with clear implications for display advertising, but there is a lesson here for online marketeers and analysts as well.
In this study, fewer people stopped when there was less choice, but more people actually bought something. If we were only measuring the former (i.e. attention), and not the latter (i.e. sales), we would be led to think more choice would be about 50% more effective at bringing in customers. And boy, would we be wrong!
Metrics matter; especially when you are using a system which can automatically optimize your process in order to maximize those metrics.
Don’t get yourself in a jam; remember this next time you decide to measure click acceptance instead of actual sales to drive your online marketing effort. Clickthrough rates are useful as a measure by proxy, but they can be misleading.
Predicting Pi
A few weeks ago, I showed a colleague* my little visual demo of my favorite algorithm for estimating the number Pi. His immediate response was so obvious I am almost ashamed it did not occur to me before he mentioned it.
“RTD could do that!”
Turns out, he is of course right. Although Oracle Real-Time Decisions was certainly not designed for the task, it does a pretty good job of predicting Pi, given the right inputs and some mathemagic.
To reiterate, the idea behind the original algorithm is that we throw a bunch (well actually quite a lot) of darts at a square board (uniformly distributed, of course). We then count the number of darts that land in the largest circle we can fit in this square. The ratio between the darts in the circle and the total number of darts thrown, multiplied by four happens to approximate Pi for reasons explained in an earlier post.
The key thing to understand when using RTD to implement this algorithm is that the ratio described above also represents the odds that a single dart will land in the circle. More easily put, the number of darts that land in the circle divided by the total number of darts thrown equals the chance of a dart landing in the circle. If we can predict the odds of hitting the circle we can predict Pi; and RTD is pretty good at making predictions.
The Setup.
I’ll run through the RTD setup briefly. If you’re interested in a more detailed explanation how I did this, feel free to drop me a line.
RTD can only predict the likelihoods related to choices, so we’ll need a couple of those. For this experiment, we’ll pretend we have a choice of landing the dart inside or outside the circle. We’ll then diligently record whichever of those two happened to be the case each time we throw a dart so that RTD will learn to predict how likely either outcome is.
[ You can click on the screenshots below for a closer look. ]
As you can see, I’ve already included the likelihood as a choice attribute to be returned to the calling application. This attribute will be populated by a model aptly named Pi.
The Pi model could hardly be simpler. It will predict the likelihood of mutually exclusive choices (hence the checkbox at the bottom of the configuration) from the choice group Choices.
Note that Pi is configured as a simple so-called Choice Model, not the more common Choice Event Model. The latter, more complex model is more frequently used in practice, because it allows RTD to predict multi-step likelihoods (customer clicked an add, then put the product in his or her shopping basket, then proceeded to checkout, then completed the payment) for positive as well as negative events (e.g. the customer actively declines the offer).
We’ll use a client application similar to the one used in the earlier Pi experiments to simulate the throwing of darts. This application will also show the predicted value of Pi. For this client to tell RTD about darts thrown and RTD to respond with the predicted value we will use a single advisor. The RTD server will generate a web service based on this configuration.
Each request to this advisor will tell RTD about a single dart thrown. The Dart Was In Circle boolean input parameter seems pretty self-explanatory. To process this input and feed the model we will need some simple logic.
This is all that is needed to allow RTD to build a model that can predict the likelihoods for the darts landing inside or outside the circle.
[ Of course we also need so performance goal and decisions to allow the advisor request to return the two choices created, but we're not really interested in that here. As long as the request returns the In Circle choice and its Likelihood attribute we're happy. I'll leave this part of the configuration as an exercise for the reader. ]
The Result.
It takes RTD a while (about a thousand darts or so) to come up with a prediction at all. But when it does, it is pretty much spot on.
The prediction RTD makes seems to be less sensitive to fluctuations resulting from the randomized input. At times, RTD’s guess was even closer to the actual number Pi than the value calculated mathematical function I’d used in previous experiments (and of course, I couldn’t resist taking a screenshot when it was).
RTD not only good for 986% ROI, but also for predicting Pi.
And this cake is no lie.
[* I forget who exactly suggested this. Either Alan, Simon or Tim. ]
Scientific Advertising on Steroids
It seems that my little rant against the apparent lack of scientific rigor and the use of data to analyse performance in the world of advertising is nothing new. Scientific Advertising, written in 1923 by advertising icon Claude C. Hopkins, lays out a convincing argument in favor of the use of an empirical and results-oriented approach in all marketing.
The bottom line of this argument is the same as the true bottom line for every company. It’s all about the money.
Scientific advertising is impossible without [knowing your results]. So is safe advertising. So is maximum profit.
Groping in the dark in this field has probably cost enough money to pay the national debt. That is what has filled the advertising graveyards. That is what has discouraged thousands who could profit in this field. And the dawn of knowledge is what is bringing a new day in the advertising world.
Hopkins pioneered the use of keyed coupons to track the success of different campaigns and ads. He believed that the only purpose of advertising was to sell more products and that the effects of such efforts should be measurable and those responsible be held accountable. New ideas and concepts should be tried in a small, controlled and safe setting so that their (monetary) results could be measured and analyzed.
Only when a new approach proved to be successful in a number of trails could it be trusted to be applied at larger scale. Take this passage from his autobiography My Life in Advertising.
How have I been able to win from this situation so many great successes? Simply because I made so many mistakes in a small way, and learned something from each. I made no mistake twice. Every once in a while I developed some great advertising principle. That endured.
The technology of the time allowed Hopkins et al. to try new things and make mistakes only on a per town basis. Results had to be analyzed manually and each iteration required significant effort and some investment. Still, what knowledge could be gleaned from these relatively small scale ventures proved key to Hopkins’ success in advertising.
Judging by his own accounts it never occurred to Hopkins that different ads would have different results for different towns or different people. He was simply empirically searching for the perfect ad; one town at a time. Once found, this super ad would be unleashed upon the entire nation.
In that light, Oracle Real-Time Decisions (RTD) is like the traditional scientific advertising method on steroids. Not only does it apply the concepts of empirically testing success and failing in small doses and learns from those mistakes automatically; it is also able to segment the respondents into an seemingly infinite number of sub-groups and find a super ad for each.
Computing Science has taken Scientific Advertising to the next level; and you cannot afford not to follow.
[As a side note, I think it important to realize that Scientific Advertising was written before Edward Bernays took his uncle's ideas and used them to revolutionize the field of advertising. Thanks to Hopkins's scientific and empirical approach most of the facts and results cited still hold water, but some of the explanations and conclusions he puts forward are terribly outdated. If you want to know more, I can highly recommend Adam Curtis's award-winning 2002 documentary for the BBC, The Century of Self.]
More Pi
In a previous post I’ve explained how we can estimate the number Pi using Monte Carlo Methods (basically throwing darts at a circle). I personally think this method is quite elegant, but it is not particularly efficient. It might take a very, very, very long time for it to converge to anything close to the actual number. Still, pretty awesome.
This week I’ve compiled a simple, interactive and more visual example using the jQuery library and SVG graphics. You can find it here. Let me know what you think.
Update (april 2nd 2011): Code for this project is now available on Github.
a Coccyx in the Brain
You might not realize it, but your brain is running some pretty wonky legacy code. Stuff that worked fine back in the day, but is not very well suited for the challenges of modern life. Let me give you an example.
Leonard Mlodinow writes in The Drunkard’s Walk:
Which is greater: the number of six-letter English words having n as their fifth letter or the number of six-letter English words ending in ing? Most people choose the group of words ending in ing. Why? Because words ending in ing are easier to think of than generic six-letter words having n as their fifth letter.
If you have no difficulty coming up with six-letter English words that have an n as their fifth letter, then you’ve probably been playing too much Words with Friends (I am certainly guilty as charged on that count). Anyway, Mlodinow continues:
But you don’t have to survey the Oxford English Dictionary—or even know how to count—to prove that guess wrong: the group of six-letter words having n as their fifth letter includes all six-letter words ending in ing.
Just like the “Linda is a bank teller” example, one answer encompasses the whole of the other answer.
[Note that Mlodinow implicitly (and correctly) assumes that there exist six-letter English words that have an n as their fifth letter but do not end in ing. Otherwise the two groups would have been equal in size.]
Mlodinow explains why we guessed wrong:
Psychologists call that type of mistake the availability bias because in reconstructing the past, we give unwarranted importance to memories that are most vivid and hence most available for retrieval.
In other words: if it’s easy to think of an example, we assume it’s pretty probable.
Good approximation heuristic for a caveman, maybe not so great for modern man. A coccyx in the brain.
In his book The Science of Fear author Daniel Gardner talks about this availability bias (he calls it the ‘example rule’ and Kahneman and Tversky called it the ‘availability heuristic’, but they all mean the same thing) in the context of (perceived) security.
But car crashed aren’t like terrorist hijackings. They aren’t covered live on CNN. They aren’t discussed endlessly by pundits. They don’t inpsire Hollywood movies and television shows. They aren’t fodder for campaigning politicians.
As a consequence of all this public attention, we have no trouble at all thinking of examples of terrorism. Conversely, we have more difficulty thinking of examples of car crashes. Our prehistoric wetware falsely intuits that the former is more likely than the latter; even when we know this to be untrue.
Caveman fears ending with a bang more than the end of the road, because bangs make better stories.
I think it’s about time for a patch. Caveman needs an upgrade. Until then, please be wary of the bugs in your brain.
[Just so you know, the word 'coccyx' is worth a whopping 4+1+4+4+3+8 = 24 points (excluding bonuses) in Words with Friends. However, as there are only two c's available in each game, you would need both c's and a joker (or two jokers and a c) in order to be able to play it for twenty points (sixteen with double jokers).]
Blogging with the Stars
I started a blog because I wanted to improve my writing skills; and thought I had something to say. You, as a reader can help me figure out how I am doing by rating each post (preferably after you’ve actually read it) on a five star scale.
To give you some guidance I’ve compiled a summary of how I interpret each star rating.
- Fail.
- tl;dr.
- I like turtles! (meh)
- Orly? Om nom nom!
- Double Rainbow / Over 9000!!!1111one
Please take the time to consider how you feel about my past and future posts and use the star ratings to let me know. Thanks!
Also, feel free to come up with your own interpretation of the five star scale and share it in the comments.
the Moral of a Good Story
I love reading. I love reading about statistics. I love reading about psychology. But most of all, I love reading quite a few books about statistics and psychology. I might just be a little weird, but statistics and psychology do make up a large part of my job.
One study that fascinates me is Extension versus intuitive reasoning: The conjunction fallacy in probability judgment by Daniel Kahneman and Amos Tversky (and I am not the only one, this study appears in many books by different authors and Kahneman was rewarded the Nobel Memorial Prize in Economics for his efforts in 2002). In this study they asked participants (among other things) the following question.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
- Linda is a bank teller.
- Linda is a bank teller and is active in the feminist movement.
If you are like the vast majority (~85%) of the participants, you feel the second option is more probable. You will also, like that same vast majority, be wrong. Wikipedia does a pretty good job explaining why this is a conjunction fallacy. How can A + B be more probable than just A (and possibly B, but perhaps not B)? Kahneman and Tversky found that this effect does not occur when the questions seem unrelated to what the participants already knew about Linda.
The perceived probability, it seems, depends on how well the answer fits in our internalized story of Linda, not logic or statistics. A good story is viewed as more likely than a poorly framed fact. This is important, if your job is helping people make decisions based on numbers and statistics.
What do you think is more likely: that you understand the implications of this study or that you understand the implications of this study because I’ve provided you with a vivid (and self-referential) example of its application?
[do you see what I did there?]
Benford’s Law Revised
After posting my previous article I started wondering why my transactional data contained a disproportional amount of twos. Something must be distorting the natural spread of first digits that Benford predicts. Later, while discussing this aberration with a friend, I finally figured it out.
It’s me. I distort the data. In fact, I distort the data every time I draw cash from an ATM.
Like most people I have habits. One of them is that when I use an ATM I almost always withdraw twenty euros (I find cash a nuisance and prefer plastic, so I withdraw as little as I can manage). This, almost deterministic behavior, is the reason why there are a disproportionate amount of twos in the data. It seems my previous conclusion applies to the data in more ways than one.
Real-life data is obviously not random data, and when you think about it there are perfectly logical explanations for this result.
If I remove ATM transactions from the original dataset the graph looks different. Still not exactly matching Benford’s prediction, but certainly a lot closer.
Update (april 2nd 2011): Code for this project is now available on Github.
Benford’s Law
In a (not so) recent episode of Radio Lab (one of my favorite podcasts) I was introduced to Benford’s Law. I’d never heard of this phenomenon, which in hindsight is rather strange, because it’s effects are so profound. Let me explain.
Take a some numerical data from a real-life source, bank transaction amounts for instance, and tally the number of transactions whose amount starts with a one (i.e. whose first digit is a ’1′ and not ’2′ or ’9′). What percentage of the transactions would you expect to match this criterion? What about the number of transactions whose amount starts with a three?
I’d never really though about this before. Implicitly I had always assumed that in large sets of data the distribution of first digits was equal amongst the numbers one through nine. This would result in the answer to the two questions above being “eleven percent” in both cases (zero is not considered, so there are nine possible first digits).
Benford’s Law predicts that I was wrong, and it turns out he is right.
The answers according to Benford’s Law are “probably about thirty percent” and “probably about twelve percent”. And using my own bank transaction amounts and some javascript gimmickry we can see that he is closer to the actual numbers than I was.
Real-life data is obviously not random data, and when you think about it there are perfectly logical explanations for this result. Still, for me, this is something I had to see in order to believe.
I encourage you to try it out yourself on your own data. I’m curious to know what you find.
Update (april 2nd 2011): Code for this project is now available on Github.







