<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Forever Learning</title>
	<atom:link href="http://lukasvermeer.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://lukasvermeer.wordpress.com</link>
	<description>Forever learning and helping machines do the same.</description>
	<lastBuildDate>Thu, 06 Jun 2013 07:36:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='lukasvermeer.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/99ad764cfc587d8dd9fc72725309c005?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Forever Learning</title>
		<link>http://lukasvermeer.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://lukasvermeer.wordpress.com/osd.xml" title="Forever Learning" />
	<atom:link rel='hub' href='http://lukasvermeer.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Data Science: for Fun and for Profit</title>
		<link>http://lukasvermeer.wordpress.com/2013/05/17/data-science-for-fun-and-for-profit/</link>
		<comments>http://lukasvermeer.wordpress.com/2013/05/17/data-science-for-fun-and-for-profit/#comments</comments>
		<pubDate>Fri, 17 May 2013 08:37:36 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[data science]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1666</guid>
		<description><![CDATA[In the next few weeks I&#8217;ll be giving two talks on the topic of Data Science at Xebicon and another event affiliated with Xebia. There is an abstract of my spiel available on the Xebicon site. Data Science is one of the most exciting developing fields in technology today. Ever expanding data sets and increasing computing power [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1666&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In the next few weeks I&#8217;ll be giving two talks on the topic of Data Science at <a href="http://www.xebicon.nl/programma">Xebicon</a> and <a href="http://datafun.eventbrite.com/">another event</a> affiliated with <a href="http://xebia.com/">Xebia</a>. There is an abstract of my spiel available on the Xebicon site.</p>
<blockquote><p>Data Science is one of the most exciting developing fields in technology today. Ever expanding data sets and increasing computing power allow statisticians and computing scientists to explore new business opportunities that were simply not possible merely a few years ago. Although their applications are new, the ideas and techniques that form the underpinnings for this evidence-oriented discipline have a solid foundation in hundreds of years of scientific development. In order then to understand the new science of data, one must first understand the science of science.</p>
<p>The Scientific Method, the unintended effects of repeated significance testing and Simpson&#8217;s paradox: this talk will focus on the practical applications of the theoretical constructs that lie at the heart of Data Science; and expand on some potential pitfalls of statistical analysis that you are likely to encounter when venturing into the field.</p></blockquote>
<p>If you&#8217;re interested, feel free to sign up for either event. I&#8217;ll also post slides and additional thoughts here afterwards.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1666/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1666/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1666&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2013/05/17/data-science-for-fun-and-for-profit/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>A New Kind Of Science?</title>
		<link>http://lukasvermeer.wordpress.com/2013/05/02/a-new-kind-of-science/</link>
		<comments>http://lukasvermeer.wordpress.com/2013/05/02/a-new-kind-of-science/#comments</comments>
		<pubDate>Thu, 02 May 2013 15:20:48 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[data science]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1654</guid>
		<description><![CDATA[I am no longer a Corporate Ninja. As of a few weeks ago I can now call myself &#8221;Data Scientist at Booking.com&#8220;. Although I am really excited about the new challenges and opportunities that await me in the sexiest job of the 21st century, I must say this new title bothers me ever so slightly. It somehow seems so [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1654&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I am no longer a <a href="http://lukasvermeer.wordpress.com/2010/05/07/i-am-a-business-ninja/">Corporate Ninja</a>. As of a few weeks ago I can now call myself &#8221;<a href="http://www.linkedin.com/in/lukasvermeer">Data Scientist</a> at <a href="http://www.booking.com/">Booking.com</a>&#8220;.</p>
<p>Although I am really excited about the new challenges and opportunities that await me in <a href="http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/">the sexiest job of the 21st century</a>, I must say this new title bothers me ever so slightly. It somehow seems so redundant.</p>
<p><strong>If you&#8217;re not using data, is it really <a href="http://en.wikipedia.org/wiki/Science">science</a>?</strong></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1654/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1654&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2013/05/02/a-new-kind-of-science/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>Restaurant Reviews and the Availability Heuristic</title>
		<link>http://lukasvermeer.wordpress.com/2013/03/22/restaurant-reviews-and-the-availability-heuristic/</link>
		<comments>http://lukasvermeer.wordpress.com/2013/03/22/restaurant-reviews-and-the-availability-heuristic/#comments</comments>
		<pubDate>Fri, 22 Mar 2013 16:11:15 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Psychology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[availability heuristic]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1613</guid>
		<description><![CDATA[You could say fine dining is a bit of a hobby of mine; and as I&#8217;ve mentioned before, I&#8217;ve composed quite a few restaurant reviews over the years. I enjoy writing about food almost as much as I love eating it. Whilst fantasising about fancy food with a colleague the other day, we wondered whether there is [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1613&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>You could say fine dining is a bit of a hobby of mine; and as I&#8217;ve <a href="http://lukasvermeer.wordpress.com/2012/01/02/from-restaurant-advice-to-recommending-associates/">mentioned before</a>, I&#8217;ve composed quite a few <a href="https://plus.google.com/106518479241050228896/reviews">restaurant reviews</a> over the years. I enjoy writing about food almost as much as I love eating it.</p>
<p>Whilst fantasising about fancy food with a colleague the other day, we wondered whether there is any relation between the lengthiness of my reviews and the associated score. In some strange way it made intuitive sense to me that I would devote more words to describe why a particular restaurant did not live up to my expectations.</p>
<p>Thinking about this, the first negative review that came to my mind was one I wrote for <a href="https://plus.google.com/106395365263529906125/about">&#8220;The Good View&#8221; in Chiang Mai, Thailand</a>.</p>
<blockquote><p>If service were any slower than it already is, cobwebs would certainly overrun the place. When food and drinks eventually do arrive they&#8217;re hardly worth the wait.</p>
<p>Fruit juice contained more sugar than a Banglamphu brothel and cocktails had less alcohol in them than a Buddhist monk. The mixed Northern specialties appetizer revealed itself to be three kinds of sausage and some raw chillies; very special indeed.</p>
<p>The spicy papaya salad probably tasted alright, but I was unable to tell because my taste buds were destroyed on the first bite. (Yes, I see the irony in complaining a spicy papaya salad was too spicy, but in my mind there&#8217;s a difference between spicy food and napalm.)</p>
<p>Also, the view is terribly overrated.</p></blockquote>
<p>Conversely, the first positive review that popped into my brain was this rather terse piece for <a href="https://plus.google.com/116496292311938777959/about">&#8220;Opium&#8221; in Utrecht, the Netherlands</a>.</p>
<blockquote><p>Om nom nom.</p></blockquote>
<p>Judging by this tiny sample there might indeed be something to the hypothesis that review length and review score are negatively correlated. To confirm my hunch, I decided to load my reviews into <a href="http://www.r-project.org/">R</a> for a proper statistical analysis.</p>
<pre class="brush: r; title: ; notranslate">
&gt; cor.test(nn_reviews$char_count, nn_reviews$score)

Pearson's product-moment correlation
data: nn_reviews$char_count and nn_reviews$score
t = 0.2246, df = 121, p-value = 0.8227
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1571892 0.1967366

sample estimates:
   cor
0.02041319
</pre>
<p>To my surprise, the analysis shows there is practically no relation between length and score. Contrary to what the two reviews above seem to suggest I do not require more letters to describe an unpleasant dining experience as opposed to a pleasant one.</p>
<p>A simple plot of the two variables gives some insight into a possible cause for my misconception.</p>
<div id="attachment_1616" class="wp-caption aligncenter" style="width: 660px"><img class="size-full wp-image-1616" alt="Review scores vs review length" src="http://lukasvermeer.files.wordpress.com/2013/03/review_scores_vs_length.jpeg?w=700"   /><p class="wp-caption-text">Review scores vs review length</p></div>
<p>The outlier in the bottom right happens to represent my review for the Good View. All my other reviews are much shorter in length and seem to be quite evenly distributed over the different scores.</p>
<p>My misjudgement is an excellent example of the <a href="http://en.wikipedia.org/wiki/Availability_heuristic">availability heuristic</a>. The pair of examples that presented themselves to me upon initial reflection were not representative of the complete set, but that did not stop me from drawing overarching, and incorrect, conclusions based on a sample of two.</p>
<p>This is why I use statistics, because I am a fallible human being; just like everyone else</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1613/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1613/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1613&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2013/03/22/restaurant-reviews-and-the-availability-heuristic/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>

		<media:content url="http://lukasvermeer.files.wordpress.com/2013/03/review_scores_vs_length.jpeg" medium="image">
			<media:title type="html">Review scores vs review length</media:title>
		</media:content>
	</item>
		<item>
		<title>Evidence-Based Everything</title>
		<link>http://lukasvermeer.wordpress.com/2012/12/04/evidence-based-everything/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/12/04/evidence-based-everything/#comments</comments>
		<pubDate>Tue, 04 Dec 2012 11:14:18 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Psychology]]></category>
		<category><![CDATA[evidence]]></category>
		<category><![CDATA[evidence-based]]></category>
		<category><![CDATA[facts]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1570</guid>
		<description><![CDATA[I&#8217;m not really interested in an exposition of your facts. I don&#8217;t very much care to learn about your reasons. First, show me your evidence. Once we&#8217;ve established what you think you&#8217;ve seen, we can talk about what you think it means. Supported by sufficient proof, your theories and derived truths should be much easier [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1570&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m not really interested in an exposition of your facts. I don&#8217;t very much care to learn about your reasons.</p>
<p>First, show me your evidence.</p>
<p>Once we&#8217;ve established what you think you&#8217;ve seen, we can talk about what you think it means. Supported by sufficient proof, your theories and derived truths should be much easier to express; sometimes they may even be self-evident.</p>
<p>Then we can both decide what to believe.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1570/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1570/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1570&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/12/04/evidence-based-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>A/B Testing XXL</title>
		<link>http://lukasvermeer.wordpress.com/2012/11/30/ab-testing-xxl/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/11/30/ab-testing-xxl/#comments</comments>
		<pubDate>Fri, 30 Nov 2012 16:05:23 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Marketing]]></category>
		<category><![CDATA[Psychology]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1580</guid>
		<description><![CDATA[[I've tweeted about this before.] If fashion stores believed in A/B testing, they would probably only sell white XXL shirts. Most customers would fit tent-sized garments; most colours go well with white. Giant colourless shirts would presumably have the better sales conversion rate by far. But of course this would be far from optimal. Customers come in different shapes [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1580&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><span style="color:#bbb;">[I've <a href="http://twitter.com/lukasvermeer/status/273360780538281985">tweeted about this before</a>.]</span></p>
<p>If fashion stores believed in A/B testing, they would probably only sell white XXL shirts. Most customers would fit tent-sized garments; most colours go well with white. Giant colourless shirts would presumably have the better sales conversion rate by far.</p>
<p>But of course this would be far from optimal.</p>
<p>Customers come in different shapes and sizes. If you really want to maximise conversion, you will have to tailor to their specific needs and personal preferences. A/B testing might be the latest fashion, but the truth is that some customers will have a taste for B even though the majority might fancy A. This is why <a href="http://stevehanov.ca/blog/index.php?id=132">these 20 lines of code will beat A/B testing every time</a>.</p>
<p>The trick is not to figure out whether A is better than B, but when A is better than B; and for whom.</p>
<p>Marketing should not be one-size-fits-all.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1580/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1580&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/11/30/ab-testing-xxl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>Actionable Predictive Analytics with Oracle Data Mining</title>
		<link>http://lukasvermeer.wordpress.com/2012/10/03/actionable-predictive-analytics-with-oracle-data-mining/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/10/03/actionable-predictive-analytics-with-oracle-data-mining/#comments</comments>
		<pubDate>Wed, 03 Oct 2012 14:59:09 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Datamining]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[RTD]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[data mining]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1496</guid>
		<description><![CDATA[Oracle Data Mining (ODM) provides powerful data mining functionality as native SQL functions within the Oracle Database. This Oracle By Example Tutorial gives a good overview of the GUI. While being able to build predictive models on mountains of data without moving it out of the database is pretty cool in itself, I feel analysis [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1496&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/index.html">Oracle Data Mining</a> (ODM) provides powerful data mining functionality as native SQL functions within the Oracle Database. This <a href="http://apex.oracle.com/pls/apex/f?p=44785:24:5873518276883::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:5271,2">Oracle By Example Tutorial</a> gives a good overview of the GUI.</p>
<p>While being able to build predictive models on mountains of data without moving it out of the database is pretty cool in itself, I feel analysis without action is pretty much pointless. <a href="http://www.tomdavenport.com">Tom Davenport</a> describes this common data mining conundrum in <a href="http://www.amazon.com/Competing-Analytics-New-Science-Winning/dp/1422103323">Competing on Analytics</a>.</p>
<blockquote><p>Many firms are able to segment their customers and determine which ones are most profitable or which are most likely to defect. However, they are reluctant to treat different customers differently—out of tradition or egalitarianism or whatever. With such compunctions, they will have a very difficult time becoming successful analytical competitors—yet it is surprising how often companies initiate analyses without ever acting on them. <strong>The &#8220;action&#8221; stage of any analytical effort is, of course, the only one that ultimately counts.</strong></p></blockquote>
<p>The OBE tutorial describes a scenario in which a business wants to identify customers who are most likely to purchase insurance. Through a set of simple steps, a (decision tree) classification model is built that can be used to predict whether a particular customer is likely to purchase based on historic data.</p>
<p>In a classical data mining approach, the predictions of this model would be written to some OUTPUT_TABLE where they would be available for subsequent processing. Growing staler every minute—and soon forgotten when its newer sibling OUTPUT_TABLE_NEW_FINAL_2 is inevitably created—our precious business intelligence slowly withers away in a disregarded section of the database until ultimately dropped by a careless DBA.</p>
<p>Output tables are where analytical insight goes to die.</p>
<p>If all we were interested in was building models, we&#8217;d be better off <a href="http://www.marklin.com">glueing choo-choos</a>. It is the new ways in which we can utilise these database resident models that makes this technology really interesting. With a few simple additional steps, this same model can be used in real-time to provide inline predictions based on up-to-date customer data; as well as for new customers.</p>
<p>All we need is a view<del> and a join</del>.</p>
<p><strong>Update (October 3rd, 2012)</strong>: as Marcos points out in the comments, I was making things far too complicated. No need for a separate join; simply select the output columns you need and pass everything directly to the view.</p>
<p><img class="aligncenter size-full wp-image-1566" title="Oracle Data Mining Model Columns in a View" src="http://lukasvermeer.files.wordpress.com/2012/10/odm_view.jpg?w=700" alt=""   /></p>
<p><del>The join operations glues the original data and the prediction models together;</del> The view allows us to look at the harmonised results directly. When a customer record is selected from the view the source data for this record is passed to the model to generate the predicted values in real-time. When source data changes so does the prediction. When new source records are added they are automatically processed in the same way.</p>
<pre class="brush: sql; title: ; notranslate">
-- Create a new customer.
INSERT INTO INSUR_CUST_LTV_SAMPLE (CUSTOMER_ID, LAST, FIRST) VALUES ('CU123', 'VERMEER', 'LUKAS');
1 rows inserted.
Elapsed: 00:00:00.003

-- Get prediction and probability for the new customer.
SELECT CUSTOMER_ID, insur_pred, insur_prob FROM insur_cust_ltv_prediction WHERE CUSTOMER_ID = 'CU123';
CUSTOMER_ID INSUR_PRED INSUR_PROB
----------- ---------- ----------
CU123       No         0.7262813
Elapsed: 00:00:00.004

-- Update customer data.
UPDATE INSUR_CUST_LTV_SAMPLE SET bank_funds = 500, checking_amount = 100 WHERE CUSTOMER_ID = 'CU123';
1 rows updated.
Elapsed: 00:00:00.003

-- Get prediction and probability for the updated customer.
SELECT CUSTOMER_ID, insur_pred, insur_prob FROM insur_cust_ltv_prediction WHERE CUSTOMER_ID = 'CU123';
CUSTOMER_ID INSUR_PRED INSUR_PROB
----------- ---------- ----------
CU123       Yes        0.6261398
Elapsed: 00:00:00.004
</pre>
<p>Seamless. Any system that can read data from an Oracle database can now utilise Oracle Data Mining models. No need to move your data. No need to build new applications.</p>
<p>Applications reading data from the view need never know the difference between the original source data and machine generated predictions. <a href="http://www.oracle.com/us/solutions/business-analytics/business-intelligence/publisher/overview/index.html">Oracle Business Intelligence Publisher</a> can easily display this data in forecasting reports; or use it to power pro-active alerts. In <a href="http://www.oracle.com/us/solutions/business-analytics/business-intelligence/real-time-decisions/overview/index.html">Oracle Real-Time Decisions</a>, rules can be built around the outcomes of these models; or predictions from multiple sources can be fed into <a href="https://blogs.oracle.com/rtd/en/entry/combined_likelihood_models">combined likelihood models</a> for increased accuracy.</p>
<p>This is huge. Trust me. Stop over-analysing and start taking action. After all, that&#8217;s the only step that ultimately counts.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1496/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1496&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/10/03/actionable-predictive-analytics-with-oracle-data-mining/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>

		<media:content url="http://lukasvermeer.files.wordpress.com/2012/10/odm_view.jpg" medium="image">
			<media:title type="html">Oracle Data Mining Model Columns in a View</media:title>
		</media:content>
	</item>
		<item>
		<title>Snake Oil and Tiger Repellant</title>
		<link>http://lukasvermeer.wordpress.com/2012/09/26/snake-oil-and-tiger-repellant/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/09/26/snake-oil-and-tiger-repellant/#comments</comments>
		<pubDate>Wed, 26 Sep 2012 12:34:53 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Datamining]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[scientific control]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1479</guid>
		<description><![CDATA[The Wall Street Journal has an interesting article explaining how companies are starting to use (big) data to support their recruiting efforts. It provides a good example of the more general trend in businesses towards evidence-based decisioning and data science, but it also shows how some crucial aspects of these techniques are easily overlooked or oversimplified. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1479&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The Wall Street Journal has <a href="http://online.wsj.com/article/SB10000872396390443890304578006252019616768.html">an interesting article</a> explaining how companies are starting to use (big) data to support their recruiting efforts. It provides a good example of the more general trend in businesses towards evidence-based decisioning and <a href="http://radar.oreilly.com/2010/06/what-is-data-science.html">data science</a>, but it also shows how some crucial aspects of these techniques are easily overlooked or oversimplified.</p>
<p>My big-data-science-bogus-alarm started ringing upon reading the last sentence in this short paragraph.</p>
<blockquote><p>Applicants for the job take a 30-minute test that screens them for personality traits and puts them through scenarios they might encounter on the job. Then the program spits out a score: red for low potential, yellow for medium potential or green for high potential. Xerox accepts some yellows if it thinks it can train them, but mostly hires greens.</p></blockquote>
<p>Sounds smart, right? Well, maybe.</p>
<p>If Xerox never hires any &#8220;reds&#8221; and only very few &#8220;yellows&#8221;, how will they know the program is actually working? How will they know that all that complicated math is doing something more than simply returning random colour values? An evidence-based approach should always include some form of <a href="http://en.wikipedia.org/wiki/Scientific_control">scientific control</a>. If it doesn&#8217;t, it might as well be <a href="http://en.wikipedia.org/wiki/Snake_oil">snake oil</a>.</p>
<p>Of course, this is probably just a simple journalistic crime of omission of a trivial implementation detail, but it reminded me of that old chestnut &#8220;the tiger repellant&#8221;. For your convenience, this blogpost has been equipped with some very strong Tiger Repellant tonic. If you do not see any tigers around you right now, you will know it is working.</p>
<p>See? No tigers?</p>
<p>Proven to work like a charm. Order yours today! Great prices! Limited availability! Now taking applications in the comments.</p>
<p><span style="color:#c0c0c0;">[ Disclaimer: Tiger Repellant is not certified for use in South-East Asia or zoological parks. Tiger Repellant inc. and its employees and subsidiaries cannot be held liable for any damage caused to your person in the event of being eaten by a tiger. ]</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1479/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1479&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/09/26/snake-oil-and-tiger-repellant/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>Bin Packing Too Many Features</title>
		<link>http://lukasvermeer.wordpress.com/2012/09/19/bin-packing-too-many-features/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/09/19/bin-packing-too-many-features/#comments</comments>
		<pubDate>Wed, 19 Sep 2012 15:19:00 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[lp]]></category>
		<category><![CDATA[solver]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1450</guid>
		<description><![CDATA[My girlfriend has been struggling with an interesting little problem lately. She was asked to determine the optimal distribution of medicine boxes and bottles over a set of adaptable cabinets; under volume as well as weight constraints. Not an easy task for a computer scientist; much less for a hospital pharmacist in training. After describing [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1450&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/lukasvermeer/2795354345/"><img class="alignright size-medium wp-image-1472" title="Packed" src="http://lukasvermeer.files.wordpress.com/2012/09/2795354345_78c50a4af2_n.jpg?w=199&#038;h=300" alt="Packed Motorbike" width="199" height="300" /></a>My <a href="http://lisannekrens.nl/">girlfriend</a> has been struggling with an interesting little problem lately. She was asked to determine the optimal distribution of medicine boxes and bottles over a set of adaptable cabinets; under volume as well as weight constraints. Not an easy task for a computer scientist; much less for a hospital pharmacist in training.</p>
<p>After describing the problem to me last night I (unhelpfully) mumbled that &#8220;this sounds like a <a href="http://en.wikipedia.org/wiki/Bin_packing_problem">variable sized bin packing problem</a> to me, you can&#8217;t solve the kind of thing in Excel, you probably need an LP solver&#8221;.</p>
<p>Apparently I was wrong. It already seemed obvious to me that Excel suffers from a severe case of feature bloat, but <a href="http://www.codeproject.com/Articles/257027/Excel-Solver-and-Liner-Programming">this</a> is just absurd.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1450/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1450/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1450&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/09/19/bin-packing-too-many-features/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>

		<media:content url="http://lukasvermeer.files.wordpress.com/2012/09/2795354345_78c50a4af2_n.jpg?w=199" medium="image">
			<media:title type="html">Packed</media:title>
		</media:content>
	</item>
		<item>
		<title>Future Felony</title>
		<link>http://lukasvermeer.wordpress.com/2012/09/17/1433/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/09/17/1433/#comments</comments>
		<pubDate>Mon, 17 Sep 2012 14:51:51 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[driverless car]]></category>
		<category><![CDATA[future]]></category>
		<category><![CDATA[science fiction]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1433</guid>
		<description><![CDATA[Written by Arthur C. Clarke in 1976, Imperial Earth is set in faraway 2276. As the beautiful old car cruised in almost perfect silence under the guidance of its automatic controls, Duncan tried to see something of the terrain through which he was passing. The spaceport was fifty kilometers from the city—no one had yet [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1433&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Written by <a href="http://en.wikipedia.org/wiki/Arthur_C._Clarke">Arthur C. Clarke</a> in 1976, <a href="http://en.wikipedia.org/wiki/Imperial_Earth">Imperial Earth</a> is set in faraway 2276.</p>
<blockquote><p>As the beautiful old car cruised in almost perfect silence under the guidance of its automatic controls, Duncan tried to see something of the terrain through which he was passing. The spaceport was fifty kilometers from the city—no one had yet invented a noiseless rocket—and the four-lane highway bore a surprising amount of traffic. Duncan could count at least twenty vehicles of various types, and even though they were all moving in the same direction, the spectacle was somewhat alarming.</p>
<p>&#8220;I hope all those other cars are on automatic,&#8221; he said anxiously.</p>
<p>Washington looked a little shocked. &#8220;Of course,&#8221; he said &#8220;It&#8217;s been a criminal offence for—oh, at least a hundred years—to drive manually on a public highway. Though we still have occasional psychopaths who kill themselves and other people.&#8221;</p></blockquote>
<p>The future sounds fascinating, but I want my <a href="http://en.wikipedia.org/wiki/Google_driverless_car">Google Driverless Car</a> now.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1433/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1433/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1433&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/09/17/1433/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
		<item>
		<title>Understanding</title>
		<link>http://lukasvermeer.wordpress.com/2012/08/15/understanding/</link>
		<comments>http://lukasvermeer.wordpress.com/2012/08/15/understanding/#comments</comments>
		<pubDate>Wed, 15 Aug 2012 09:01:15 +0000</pubDate>
		<dc:creator>Lukas Vermeer</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Datamining]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[black box]]></category>
		<category><![CDATA[models]]></category>
		<category><![CDATA[problem solving]]></category>

		<guid isPermaLink="false">http://lukasvermeer.wordpress.com/?p=1384</guid>
		<description><![CDATA[Derek Jones posits that &#8220;success does not require understanding&#8220;. In my line of work I am constantly trying to understand what is going on (the purpose of this understanding is to control and make things better) and consider anybody who uses machine learning as being clueless, dim witted or just plain lazy; the problem with [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1384&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Derek Jones posits that &#8220;<a href="http://shape-of-code.coding-guidelines.com/2012/07/23/success-does-not-require-understanding/">success does not require understanding</a>&#8220;.</p>
<blockquote><p>In my line of work I am constantly trying to understand what is going on (the purpose of this understanding is to control and make things better) and consider anybody who uses machine learning as being clueless, dim witted or just plain lazy; the problem with machine learning is that it gives answers without explanations (ok decision trees do provide some insights).</p></blockquote>
<p><strong>Problem solving versus solving problems.</strong></p>
<p>As one who specializes in using machine learning, I obviously resent being called &#8220;clueless, dim witted or just plain lazy&#8221;. However, I feel a larger point should be made here. Success <em>does</em> most definitely require understanding, but not necessarily of how one particular instance of a solution came about.</p>
<p>To be successful in any machine learning effort, one needs to have intricate understanding of what the problem is and how techniques can be applied to find solutions. This is a more general form of understanding which puts more emphasis on the process of finding workable models, rather than on applying these models to individual instances of a problem. Comprehension of problem solving over understanding a particular solution.</p>
<p><strong>Driving a black box.</strong></p>
<p>Consider the following example. To me, the engine of my car is a black box; I have very little idea how it works. My mechanic does know how engines work in general, but he is unable to know the exact internal state of the engine in my car as I am cruising down the highway at 100 miles per hour. None of this “lack of understanding” prevents me from getting from A to B. I turn the wheel, I push the peddel and off we go.</p>
<p>In essence, my mechanic and I have different levels of understanding of my car. But importantly, at different levels of precision, the thing becomes a black box to each of us; in the sense that there is a point where our otherwise perfectly practical models break down and no longer are able to reflect reality. In the end, it&#8217;s black boxes <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down">all the way down</a>.</p>
<p><strong><a href="http://en.wikipedia.org/wiki/Allegory_of_the_Cave">Chasing shadows</a>.</strong></p>
<p>Models are merely tools to help you navigate a vastly complex world. Very much like machine learning models, a scientific model might work in many cases, but so does <a href="http://en.wikipedia.org/wiki/Newton's_law_of_universal_gravitation">Newton’s law of universal gravitation</a>. We know for a fact that that particular model is definitely wrong; and I <a href="https://twitter.com/zachweiner/status/235031757156667393">sincerely hope many others are just as incorrect</a>.</p>
<p>There will always be limits to our understanding. The fact that we have a model that can help us predict does not necessarily mean we have correctly understood the nature of the universe. <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory">All models are wrong, but some are useful</a>.</p>
<p>Reality is simply much too complicated to be captured in a manageable set of rules, but even incomplete (or incorrect) models can provide insight and help us better navigate this world. Machine learning is successful, precisely because it can help us find such models.</p>
<p><span style="color:#c0c0c0;">[ Peter Norvig has written an <a href="http://norvig.com/chomsky.html"><span style="color:#c0c0c0;">excellent piece</span></a> on this subject in relation to language models. ]</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lukasvermeer.wordpress.com/1384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lukasvermeer.wordpress.com/1384/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lukasvermeer.wordpress.com&#038;blog=13469739&#038;post=1384&#038;subd=lukasvermeer&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://lukasvermeer.wordpress.com/2012/08/15/understanding/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/da492bd819224723b7d81bb6ae5cae78?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">lukasvermeer</media:title>
		</media:content>
	</item>
	</channel>
</rss>
