Forever Learning

Forever learning and helping machines do the same.

Archive for the ‘Big Data’ Category

Snake Oil and Tiger Repellant

leave a comment »

The Wall Street Journal has an interesting article explaining how companies are starting to use (big) data to support their recruiting efforts. It provides a good example of the more general trend in businesses towards evidence-based decisioning and data science, but it also shows how some crucial aspects of these techniques are easily overlooked or oversimplified.

My big-data-science-bogus-alarm started ringing upon reading the last sentence in this short paragraph.

Applicants for the job take a 30-minute test that screens them for personality traits and puts them through scenarios they might encounter on the job. Then the program spits out a score: red for low potential, yellow for medium potential or green for high potential. Xerox accepts some yellows if it thinks it can train them, but mostly hires greens.

Sounds smart, right? Well, maybe.

If Xerox never hires any “reds” and only very few “yellows”, how will they know the program is actually working? How will they know that all that complicated math is doing something more than simply returning random colour values? An evidence-based approach should always include some form of scientific control. If it doesn’t, it might as well be snake oil.

Of course, this is probably just a simple journalistic crime of omission of a trivial implementation detail, but it reminded me of that old chestnut “the tiger repellant”. For your convenience, this blogpost has been equipped with some very strong Tiger Repellant tonic. If you do not see any tigers around you right now, you will know it is working.

See? No tigers?

Proven to work like a charm. Order yours today! Great prices! Limited availability! Now taking applications in the comments.

[ Disclaimer: Tiger Repellant is not certified for use in South-East Asia or zoological parks. Tiger Repellant inc. and its employees and subsidiaries cannot be held liable for any damage caused to your person in the event of being eaten by a tiger. ]

Written by Lukas Vermeer

September 26, 2012 at 14:34

Big Data is Big

leave a comment »

We happen to have one sat in the next building over. Would you guys like to see it?

Oh, boy! Would we!

Myself and about twenty other Oracle employees are attending a Cloudera training on Hadoop in the Oracle Reading office. Five days packed with information covering a whole new ecosystem filled with some pretty crazy beasts.

Our heads are spinning like a room full of network-attached storage and our pens are humming like a data center cooling system as we attempt to map and reduce every little piece of data they throw at us.

During one of the breaks, we get the opportunity to go see the Oracle Big Data Appliance. Standing in front of this enormous machine, it finally dawns on me what a massive bulk of raw power this really is. A seemingly countless number of disks are mounted in a box higher and wider than myself. Each disk can hold three terabytes of data.

Big Data is Big!

Written by Lukas Vermeer

February 23, 2012 at 19:24

%d bloggers like this: