Forever Learning

Forever learning and helping machines do the same.

Benford’s Law Revised

leave a comment »

After posting my previous article I started wondering why my transactional data contained a disproportional amount of twos. Something must be distorting the natural spread of first digits that Benford predicts. Later, while discussing this aberration with a friend, I finally figured it out.

It’s me. I distort the data. In fact, I distort the data every time I draw cash from an ATM.

Like most people I have habits. One of them is that when I use an ATM I almost always withdraw twenty euros (I find cash a nuisance and prefer plastic, so I withdraw as little as I can manage). This, almost deterministic behavior, is the reason why there are a disproportionate amount of twos in the data. It seems my previous conclusion applies to the data in more ways than one.

Real-life data is obviously not random data, and when you think about it there are perfectly logical explanations for this result.

If I remove ATM transactions from the original dataset the graph looks different. Still not exactly matching Benford’s prediction, but certainly a lot closer.

Tally of the first digits of bank transaction amounts excluding ATM transactions

Tally of the first digits of bank transaction amounts excluding ATM transactions


Update (april 2nd 2011): Code for this project is now available on Github.

Advertisements

Written by Lukas Vermeer

June 16, 2010 at 21:23

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: