Forever Learning

Forever learning and helping machines do the same.

Archive for the ‘Datamining’ Category

The Middle Way

with one comment

James Taylor is spot-on.

Too many analytic professionals think that only the data speaks and that business rules are, as someone once said to me, “for people too stupid to analyze their data”. Similarly too many IT professionals think that everything can be reduced to business rules or to code using explicit analysis. The reality for most decisions is somewhere in between.

In order to truly achieve business transcendence one must follow the Middle Way.

Written by Lukas Vermeer

May 2, 2012 at 14:59

Extrapolation

leave a comment »

There are two types of people in this world:

  1. Those who can extrapolate from incomplete data.

[ Via @professorkitteh. Original source unknown. ]

Written by Lukas Vermeer

April 12, 2012 at 11:15

the Same Old Song

leave a comment »

[I've tweeted about this before.]

A few months ago my friend and neighbor Olav was fiddling around with a dataset of movie plot descriptions he downloaded from the Internet Movie Database (IMDb). If I recall correctly, he was taking a stab at the Netflix Prize. We discussed this for a while over coffee, but (as usual) our conversations were all over the place; and somewhere along the line we wondered what songs are used most often in movies.

Play

Play!

What is that song they always play? The one that goes like ‘#dun dun dun dun dudun dun dun duuuuun#‘. You know?

The IMDb site offers lots of different datasets for download, and we quickly found that one of them contains soundtrack listings (the aptly named file soundtracks.list.gz). Now it was just a matter of filtering out the unnecessary contextual data and counting songs. Quickly Olav, who does datamining for a living, managed to get all this done using spiffy point-and-click tools. I proceeded to ask twitter what people thought the answer would be.

The top five results turned out to be a collection of classics. The songs played in movies (according to the IMDb data) is as follows.

  1. “Jingle Bells” (220x)
  2. “William Tell Overture” (204x)
  3. “Home Sweet Home” (160x)
  4. “Auld Lang Syne” (149x)
  5. “Rock-a-Bye Baby” (140x)

Not at all what we were expecting, but quite obvious when you think about how many Christmas movies are out there. Data mining is very often like that. You find answers that were unexpected, but also unsurprisingly obvious.

It’s the same song, but it never gets old.

[Much later, a friend (can't remember exactly who) noted that the song that is played most often in theaters is probably not listed in the data set the IMDb provides. It's the 20th Century Fox intro.]

Written by Lukas Vermeer

July 16, 2010 at 16:53

Posted in Datamining

Follow

Get every new post delivered to your Inbox.