Perspectives on Data, Information and Knowledge
July 01, 2010
There are some structures that seem impossible to escape. The Data – Information – Knowledge hierarchy seems to be one of them. This representational structure is one that depicts information standing above data and knowledge standing on top of both. Sometimes it is finished with wisdom as a mystical cherry-on-top but I generally prefer to avoid such heart-warming embellishments.
Encounters with this Data – Information - Knowledge pyramid, as I sometimes term it, generally fall into one of two categories. It is often invoked as a given – something that is self-evidently true and of obvious utility. Just as often, especially of late, it is invoked as an object of ridicule – something to be derided and chased from the consulting stage. One of the interesting aspects of this phenomenon is that no matter how frequently it is attacked, the Data – Information – Knowledge pyramid continues to show remarkable resilience. Some refuse to give it up while others would miss having it as a foil for their arguments. To my mind, if the data-information-knowledge pyramid is going to be around then it’s probably best to think about how it can be understood so that it can also be useful.
And now for a confession. I have been using the Data – Information – Knowledge pyramid for many years. Twenty years in fact. My first company, Euclid Consulting Group (aside from terrifying prospective customers with memories of High School geometry) explicitly invoked the relationship between data, information and knowledge with the implied message that we could help customers shuffle resources from one level, data, to the highest level, knowledge. The logo for this consulting group layered triangles so that data, information, knowledge transmuted into a form capable of flight. I cannot, therefore, be considered an unbiased spectator in the debates about the merits or shortcomings of the data, information and knowledge pyramid.
So the question that surfaces, at least for me, is why have some people, me included, found the Data – Information – Knowledge pyramid to be a worthwhile tool for organizing lessons from projects and for integrating disparate insights. Some might be quick to say that this is because I could be classified, without undue violence, as a technologist. There is a side to my recurrent use of the pyramids that can rightfully be seen as trying to understand data, information and knowledge in a way that technology can have a useful role to play and in a way that allows us to couple any of these terms with the word “management”. On this, I am guilty as charged.
When I review the positions taken on the subject by others I note that by far the most thoughtful tend to be criticisms of the data – information – knowledge hierarchy. See David Weinberger on The Problem with the Data-Information-Knowledge-Wisdom Hierarchy and Patrick Lambe on From Data, with Love. For a brief overview of the hierarchy see Wikipedia on the DIKW Pyramid. One of the recurrent themes that appears is that the hierarchy was constructed by the advocates of a nascent computer industry as it sought to make its case to management stakeholders. Computers, the argument went, could be deployed to sift information from data and knowledge from information. I distinctly recall hearing this story spun on more than one occasion in the late 1980s and early 1990s when I worked among computer scientists who often wore lab coats to highlight their role as scientists exploring the frontiers of knowledge. But even as a technologist, I have to say that I was never able bend my mind in a way that allowed me to imagine what this “sifting” might be. In fact, I have difficulty mounting an attack on the filtering model because I cannot form a sufficiently firm conception of it to allow the criticism to begin. The analogy that seems to apply likens the process that connects data to information and information to knowledge as something analogous to panhandling for gold in a stream filled with sand and rocks. Or perhaps a better analogy would be taken from medieval alchemy with its incantations, toxic brews and burnt fingers being deployed so as to transmute lead into gold. None of these seem to help.
If this sifting model does not seem to work, it does not then follow, to my mind, that the hierarchy of data, information and knowledge should itself be jettisoned. Preferring to re-conceptualize what already exists and does not seem to want to die, I have tended to posit a different connective process as the underpinning of the data – information – knowledge pyramid. Rather than a sifting of one level from another, I see the process as closer to “construction”, wherein building blocks are fashioned so as to be useful within a social context (data), then used within communication transactions (information), and then from out of the latter activity emerges a shared understanding (knowledge) that undergoes continuous “validation” (or at least application providing feedback on utility). All of these I prefer to see as artifacts of communication – as existing between people instead of being a cognitive process occurring inside a given agent. (I understand that there is a mathematical attractiveness to generalizing agents so as to cover everything between cells and civilizations but I find that such universalizing tends to stumble across some rather important categorical differences.) I further posit that these artifacts of communication tend to figure very prominently as formative influences on what others sometimes see as the province of inner and private cognitive processes. Under this model, you can see the pyramid of data, information and knowledge not so much as a value hierarchy as a depiction of how the three are inter-related and co-dependent. An ounce of knowledge will be accompanied by a pound of information and a ton of data. In practice, you cannot have one without the others. Volumetrically, there is a sense that knowledge is the most overwhelming of the artifacts precisely because it will be manifest amid such vast quantities of data and information.
We can see the contending camps in the debate over the data – information – knowledge hierarchy exposed when we look at the differing conceptualizations of what we mean by the term “knowledge”.
The camp that tends to attack the pyramid will often position knowledge as a psychological state upon which an agent is ready to act. This can be pursued to the point where anything that provides a precursor to a decision by an agent can be taken to be knowledge. This then leads to a definition of knowledge as the set of influences drawn upon in decision making – learned behaviors, intuitions, experience, feelings and so on become integral to what knowledge is. I sometimes observe that the only thing missing from some of these definitions is “mint jelly” - so guttural are the components being assembled as a definition of knowledge. This approach also tends to make knowledge into something that is profoundly personal. This explains why some people can react with horror to any suggestion that knowledge, effectively their inner private universes, might be something that can be subjected to management. As you can see, I am not much of a fan of this approach to defining knowledge. My main objection is that is not very useful when we consider the full range of domains where it is worthwhile to have a practical working definition for “knowledge”.
On the other side, we find some who apply the most rigid conception of knowledge as a logical assertion. This definition is most commonly found within the computing community and more specifically those working in the field of artificial intelligence and its various offspring. I have encountered practitioners from this community who have even gone so far as defining knowledge explicitly as what can be reduced into a logical form amenable to processing by computers. This is clearly the polar opposite of the camp we considered above. There can be no rapprochement between these two camps. By the way, I am not a fan of this camp either, finding their definition of “knowledge” as indistinguishable from what I would define as “data”, or the meaningful representation of experience.
Clearly, we are circling a large topic here and it is not possible to lay the debates to rest at this time. I have approached this topic on a number of occasions and I have found myself to have been remarkably consistent despite having encountered a wide range of alternatives. (I am hoping that this is more indicative of the merit of my re-conception of the pyramid than of any intransigence on my part.) Of these alternatives, by far the most cogent and worthy of consideration is the position sketched out by Max H. Boisot in Information Space and elaborated on in two successive books (Knowledge Assets and Explorations in Information Space). His work in some ways leverages aspects of both of the above camps and, most delightfully, progresses towards an expansion of considerations from the microcosms of decision-making agents to the larger planes of economic markets and historical change. I am inclined, still, to make further distinctions between phenomena (more rooted in the perceptions of individual cognitive agents) and data (emerging as socially constructed representation schemes), and between knowledge (a socially constructed and continuously evolving understanding) and the basis of decisions (occurring again at the level of cognitive agents and based on a wide range of influencing factors), than are evidenced in Mr. Boisot’s gloriously detailed framework. Perhaps with time, I will find that these distinctions are unnecessary but that is a work in progress.
Returning to the subject at hand, whether there is any merit in continuing to use the pyramidal hierarchy of data, information and knowledge, I am suggesting that there is merit provided that we adopt and refine definitions for the constituent terms that allow us to use the model in useful ways. As the example of Mr. Boisot’s work underscores, if we make sure that we are weighing our options within a broad enough context then the limitations of some interpretations of the data, information and knowledge hierarchy will be illuminated and, with that, some of the more fierce attachments exhibited by the different camps can be left behind.
I agree with your distinctions of data vs phenomena and knowledge vs basis of decisions. It adds the much needed layers of subjectivity vs objectivity to the model, thus making the model stronger.
Posted by: Alex | May 18, 2012 at 12:10 PM