Forever Learning

Forever learning and helping machines do the same.

Archive for the ‘Data Science’ Category

Predictive Analytics World London

leave a comment »

In October I’ll speak at Predictive Analytics World in London. Once again, I’ll be talking about Data Science.

Special Featured Sessions at Predictive Analytics World London

You can register for the event on the site. Slides are already available online.

Written by Lukas Vermeer

September 9, 2013 at 17:24

Posted in Data Science, Meta

Tagged with

Simulating Repeated Significance Testing

with 2 comments

My colleague Mats has an excellent piece on the topic of repeated significance testing on his blog.

To demonstrate how much [repeated significance testing] matters, I’ve ran a simulation of how much impact you should expect repeat testing errors to have on your success rate.

The simulation simply runs a series of A/A conversion experiments (e.g. there is no difference in conversion between two variants being compared) and shows how many experiments ended with a significant difference, as well as how many were ever significant somewhere along the course of the experiment. To correct for wild swings at the start of the experiment (when only a few visitors have been simulated) a cutoff point (minimum sample size) is defined before which no significance testing is performed.

Although the post includes a link to the Perl code used for the simulation, I figured that for many people downloading and tweaking a script would be too much of a hassle, so I’ve ported the simulation to a simple web-based implementation.

Repeated Significance Testing Simulation Screenshot

You can tweak the variables and run your own simulation in your browser here, or fork the code yourself on Github.

Written by Lukas Vermeer

August 23, 2013 at 15:47

Data Science: for Fun and for Profit

with 5 comments

In the next few weeks I’ll be giving two talks on the topic of Data Science at Xebicon and another event affiliated with Xebia. There is an abstract of my spiel available on the Xebicon site.

Data Science is one of the most exciting developing fields in technology today. Ever expanding data sets and increasing computing power allow statisticians and computing scientists to explore new business opportunities that were simply not possible merely a few years ago. Although their applications are new, the ideas and techniques that form the underpinnings for this evidence-oriented discipline have a solid foundation in hundreds of years of scientific development. In order then to understand the new science of data, one must first understand the science of science.

The Scientific Method, the unintended effects of repeated significance testing and Simpson’s paradox: this talk will focus on the practical applications of the theoretical constructs that lie at the heart of Data Science; and expand on some potential pitfalls of statistical analysis that you are likely to encounter when venturing into the field.

If you’re interested, feel free to sign up for either event. I’ll also post slides and additional thoughts here afterwards.

Written by Lukas Vermeer

May 17, 2013 at 10:37

Posted in Data Science, Meta

Tagged with

%d bloggers like this: