Forever Learning

Forever learning and helping machines do the same.

a Database is not a Box

with 3 comments

I’ve seen people using and building upon databases in lots of different ways; many of them surprisingly original. However, it seems to me that some people do not fully grasp the concept of a database. Using a database is clearly not the same as understanding how one works.

Not a Database

Not a Database (image by Jared Tarbell)

I think we need to take a step back and consider the basic question ”what is a database”. You might think you know the answer. You might be one of those people who thinks the answer is not important as long as you know how to use it. I disagree, and I’m going to explain why. If you are one of those people, this article is for you.

A database is not a box of data.

A database is not a file drawer you put stuff in for later retrieval. A database is not a spreadsheet. A database is not a collection of arrays. Of course you know this, but many programmers forget this when they start coding. They treat the data in the database as if they need to manage it; as if it were their responsibility.

Surprise! It’s not.

A better metaphor

A better metaphor (image from The U.S. National Archives)

A database is more like a library. Access to data (books) is the most important thing for a library; it is its reason for existence. And although the purpose of a library is to give you easy access to books, you are not expected to know exactly where each book is stored, nor are you expected to put them back on a shelve when you return them. You are not, in other words, the one who decides how the books are organized. That’s the librarians job.

A database also has a librarian. We call her ”the Optimizer”.

Databases, just like libraries, are built to facilitate fast and easy access to data. In later posts I will discuss some of the techniques used to improve performance. For now, let us assume that the database is quite different from a normal array or any other basic data structure. Just like a librarian in a library, the objective of the optimizer is thus to help you find what you are looking for faster. It does so by creating a plan to find the data every time you run a query.

As the Oracle9i Database Performance Tuning Guide and Reference puts it:

The optimizer determines the most efficient way to execute a SQL statement after considering many factors related to the objects referenced and the conditions specified in the query. This determination is an important step in the processing of any SQL statement and can greatly affect execution time.

Every query you send to the database goes through this process. There is no way to go around it; and you should not want to. The trick is to work together with the optimizer to get the best results.

I hope that working together will be easier now you know she exists. Next time, I will discuss some of the decisions the optimizer can take.

[This article is based on a presentation I prepared for Ortec, the company where I completed my Master thesis.]


Written by Lukas Vermeer

May 14, 2010 at 15:16

Posted in Database, Oracle

3 Responses

Subscribe to comments with RSS.

  1. Ik werk de laatste tijd meer met een database zonder SQL (CouchDB) dan met relationele databases (voornamelijk PostgreSQL en MySQL). Wat een genot is dat zeg. Lekker simpel, lekker inzichtelijk. Het is uiteraard niet de gouden kogel voor elk probleem, maar het is me wel duidelijk geworden dat relationele databases dat ook niet zijn, zeker niet voor gebruik op het web.

    Nils Breunese

    May 15, 2010 at 12:15

    • I’ve heard lots of good things about CouchDB (mainly from you :D), but I haven’t had the chance yet to work with it. Here’s hoping I get to try my hands at it soon.

      You are right to point out that I am mostly focussed on relational databases, but the general lesson, that you as a programmer are not always fully in control and should work with and learn to trust the systems you are building upon, should also apply to CouchDB and many other systems.

      Why generalize the whole web as unfit for DBMS? Certainly there must be some web applications where a relational database is the preferred solution?


      May 15, 2010 at 10:49

      • I didn’t say a RDBMS is not a good fit for all of the web. (And as history shows, by throwing enough hardware, time and skills at a problem, it can always be solved using a RDBMS.) Like I said, neither is the one-size-fits-all solution. I’m just glad I discovered that there is more to databases than the relational ones and I can now decide when to use what. The latest project I’m working on even uses both a relational database and a CouchDB database.

        It’s like discovering functional programming after some years of doing everything the OO way. It’s not like you’ll want to suddenly code everything using a functional language (at least, not me), but it’s good to know about it and just have an extra tool in your toolbox.

        Nils Breunese

        May 15, 2010 at 11:02

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: