Forever Learning

Forever learning and helping machines do the same.

Restaurant Reviews and the Availability Heuristic

with 3 comments

You could say fine dining is a bit of a hobby of mine; and as I’ve mentioned before, I’ve composed quite a few restaurant reviews over the years. I enjoy writing about food almost as much as I love eating it.

Whilst fantasising about fancy food with a colleague the other day, we wondered whether there is any relation between the lengthiness of my reviews and the associated score. In some strange way it made intuitive sense to me that I would devote more words to describe why a particular restaurant did not live up to my expectations.

Thinking about this, the first negative review that came to my mind was one I wrote for “The Good View” in Chiang Mai, Thailand.

If service were any slower than it already is, cobwebs would certainly overrun the place. When food and drinks eventually do arrive they’re hardly worth the wait.

Fruit juice contained more sugar than a Banglamphu brothel and cocktails had less alcohol in them than a Buddhist monk. The mixed Northern specialties appetizer revealed itself to be three kinds of sausage and some raw chillies; very special indeed.

The spicy papaya salad probably tasted alright, but I was unable to tell because my taste buds were destroyed on the first bite. (Yes, I see the irony in complaining a spicy papaya salad was too spicy, but in my mind there’s a difference between spicy food and napalm.)

Also, the view is terribly overrated.

Conversely, the first positive review that popped into my brain was this rather terse piece for “Opium” in Utrecht, the Netherlands.

Om nom nom.

Judging by this tiny sample there might indeed be something to the hypothesis that review length and review score are negatively correlated. To confirm my hunch, I decided to load my reviews into R for a proper statistical analysis.

> cor.test(nn_reviews$char_count, nn_reviews$score)

Pearson's product-moment correlation
data: nn_reviews$char_count and nn_reviews$score
t = 0.2246, df = 121, p-value = 0.8227
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1571892 0.1967366

sample estimates:
   cor
0.02041319

To my surprise, the analysis shows there is practically no relation between length and score. Contrary to what the two reviews above seem to suggest I do not require more letters to describe an unpleasant dining experience as opposed to a pleasant one.

A simple plot of the two variables gives some insight into a possible cause for my misconception.

Review scores vs review length

Review scores vs review length

The outlier in the bottom right happens to represent my review for the Good View. All my other reviews are much shorter in length and seem to be quite evenly distributed over the different scores.

My misjudgement is an excellent example of the availability heuristic. The pair of examples that presented themselves to me upon initial reflection were not representative of the complete set, but that did not stop me from drawing overarching, and incorrect, conclusions based on a sample of two.

This is why I use statistics, because I am a fallible human being; just like everyone else

About these ads

Written by Lukas Vermeer

March 22, 2013 at 18:11

3 Responses

Subscribe to comments with RSS.

  1. But what can you do when there’s a small correlation, and the data set is tiny, like here: http://www.nytimes.com/interactive/2013/03/11/us/politics/small-state-advantage.html ? Draw a quirky infograph to make your point?

    michiel

    March 22, 2013 at 19:30

    • The data is indeed too small to be conclusive. Perhaps I could write more reviews; make my dataset bigger? I would have to write quite a few reviews! Also, I would argue that anything I write after today will be biassed and the relation between length and score might become a self-fulfilling prophecy.

      If only I worked for a company that provided me with access to hundreds of thousands of reviews of some kind and sufficient computing power to test this hypothesis on a grander scale.

      Oh. Wait … ;-)

      Lukas Vermeer

      March 23, 2013 at 12:47

    • > cor.test(data_nn$length_total, data_nn$hotel_average_score)

      Pearson’s product-moment correlation
      data: data_nn$length_total and data_nn$hotel_average_score
      t = -255.9257, df = 4359151, p-value < 2.2e-16
      alternative hypothesis: true correlation is not equal to 0
      95 percent confidence interval:
      -0.1225922 -0.1207425
      sample estimates:
      cor
      -0.1216675

      Lukas Vermeer

      April 18, 2013 at 11:52


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: