Saturday 17 December 2011

Data, Information and Knowledge

As a result of a project I am doing in work, and some study I am doing for college, I have been thinking about the distinction between data, information and knowledge. I have come up with an example which helps to distinguish them for me and I would like to share it.

Data is the lowest level of these three concepts. Data consists of simple “facts and figures”. For the purpose of this example, the following is a piece of data: 914 baby boys who were born in Ireland in 2010 were given the name Jack. While this fact might (or might not!) be interesting, it is not very informative.

Aside: Intellectually I know that data is the plural of datum. This means that you should use phrases such as “the data tell us”, “the data indicate” and “the data are”. Many phrases such as “entering the data” and “validating the data” are unaffected by this distinction. But I myself find it must more natural to write “once the data has been validated” rather than the correct version of “once the data have been validated”. But maybe that’s just me!

The next level up is Information. Information is gleaned from data by asking questions of it.

Aside: I am reminded of the part of the The Hitchhiker's Guide to the Galaxy books where a supercomputer (“Deep Thought”) is built to answer “the Ultimate Question of Life, the Universe and Everything”. The answer turns out (after 7.5 million years of computation) to be 42 and then the people have to build an even bigger supercomputer to discover the question. So they were looking for knowledge (ultimate knowledge in fact) and they end up with a datum!

In the case of boys’ names, you might want to know where Jack ranked in terms of popularity during 2010 (particularly if you wished to avoid using one of the most popular names). The answer to this question is that Jack ranked number one in terms of popularity. A supplemental question might reveal that the second most popular name (Sean) has 812 occurrences. This is over 100 less which indicates that Jack is the overwhelming favourite. This information would be a red flag for parents who didn’t want to send their son to school with 2, 3 or 4 other Jacks.

So in our example data has fed into information. But does knowing this information equate with having knowledge? I would argue that this is not knowledge because the value of this piece of information decays over time. The information will be less relevant with each passing year. In 5 or 6 years’ time this information (about 2010) may have very little value to people selecting names.
Knowledge is the level above information because it has a much longer shelf-life. Rather than facts and figures, it concerns itself with truth and understanding.

Steven D. Levitt and Stephen J. Dubner wrote an entertaining and thought-provoking book called Freakonomics: A Rogue Economist Explores the Hidden Side of Everything in which they try to extract knowledge from data by asking questions of it (or should I say “them”!).

In relation to baby names, they conclude that the evidence indicates that names move through the population from a higher socioeconomic level to a lower level. They further concluded that when a particular name has been widely adopted then the "high-end parents begin to abandon it," and the whole process starts again with a new crop of names.

You can read the opening of this chapter online on their website here.

Another interesting piece of nascent knowledge in relation to baby names is described on their website. This is that the rate at which a name gains popularity will be mirrored by the rate at which it loses it. You can read it here.

Postscript


Douglas Adams was a genius and the Hitchhiker books are hilarious, full of fun and invention.
But my favourite single quote from Adam is this:
I love deadlines. I like the whooshing sound they make as they fly by.