A Short Primer on Intelligent Content
The Nature of Books

Putting Content in its Place

Forest Erratic Boulder

A question that comes up from time to time, and indeed one I pose to myself quite regularly, is how does the investment in industry event participation pay off. The first answer is usually that it is a way to generate business leads. This is true but it is also superficial. If that is the only goal then there are definitely more effective ways to proceed which are also less taxing. So what then might be another potential benefit that we could use to justify the investment of time and energy that is called for? At the recent Intelligent Content Conference (San Jose, February 2014), I encountered one of these other benefits in the form of a really good question.

Intelligent Content 2014

My presentation was entitled "So You Want to be a Content Engineer" and, as usual, it included a handful of painfully precise definitions and then followed a somewhat unforgiving logic to wind up with "Do You Still Want to be a Content Engineer?" I will return to the topic of Content Engineering at another time. What I wanted to highlight was one of the questions that was put to me once I was done. It was posed by Tatiana Batova from Arizona State University (@TatianaBatova) and Rebekka Andersen from the University of California, Davis (@RebekkaUCD). The question stuck in my mind for a few weeks until, sitting in a coffee shop in Vancouver, I finally hit upon an answer. 

So what was this vexing question? It was a simple one, as the hardest questions usually are. "How does your definition of Content as Potential Information fit into the commonly used Data - Information - Knowledge hierarchy?" There could simply be no better question to distract me. Squirrel! (I think that they did this on purpose.)

I have danced around an answer to this question on a few occasions. Below are some of the more germane posts that addressed aspects of this question. But none of them ultimately tackled this question directly and not finding a way to integrate these different elements into a single story was a sore point. So this question hit its mark to be sure. 

I will try to save my valiant readers the effort and aggravation of digging through these past posts and summarize the key parts here. Essentially, there has been a long-standing model called the Data - Information - Knowledge - Wisdom hierarchy referred to in short as the DIKW pyramid. In Perspectives on Data, Information & Knowledge, I tried to reset the foundation of this hierarchy so it could be more practically useful.

Data Information Knowledge Pyramid Hierarchy

In The Anatomy of Knowledge, I took up some of the key concepts feeding into a practical definition of knowledge and this led me to make a clear distinction between the potential inherent in knowledge and the real-world decisions and actions that become possible as a result. I was separating potentials from actuals and in doing so I was following a venerable line of reasoning running from Plato through to Sir Francis Bacon and beyond. (Of course, many of my colleagues in the Knowledge Management industry found this logic to be a little too harsh, preferring more of a "mint jelly" definition of knowledge that simply tastes better.) I illustrated the dichotomy between potentials (blue) and actuals (green) in what I dubbed The Knowledge Dynamic

The Knowledge Dynamic

In The Truth about Content, I set out to address a core weakness that afflicted the content management industry - namely that there was no consistent or particularly illuminating definition for the core concept of "content". Following the potential / actual dichotomy seen in the Knowledge Dynamic, I chose to define content as "potential information" - as something we manage in order to execute highly effective information actions. This definition of content suffers from being a little too abstract for the tastes of many content professionals but it does have some serious legs (it does for example flow natually from the latin etymology behind the word "content") and it can come to grow on you over time because it does help when we come around to "managing content". 

So to get back to our core question here - how does content, when defined as potential information, fit into the Data, Information and Knowledge Hierarchy? It turns out that the answer was already present in the definition of content as potential information that, once transacted, becomes the content that the information transaction carries into the world for interpretation and application. So our Data-Information-Knowledge pyramid and the core of my knowledge dynamic becomes:

Data Information Content Knowledge

It turns out that there are several interesting implications of making content the connective tissue that connects information transactions to data resources and accumulates these transactions into the social construction of grounded knowledge that, in turn, serves as the basis for sound judgment and effective action. In effect, the placement of content in the role of intermediary in these models underlines why content is so important as a concept and why there is, and should be, a burgeoning industry focused on designing, managing and publishing content as expeditiously as possible.

And I think that this is ultimately what I have been trying to do over all these years: to identify exactly how content practices and technologies concretely fit into the big picture of what it is that enterprises are doing and perhaps even into what they figure that they are doing.

Without exhausting the many pathways I could take from this model, we can still explore a few avenues. For example, the above model might stand as a form of ideal for an enterprise that must draw upon a massive and growing base of high quality data in order to perform effectively and in order to build up a market-leading understanding of the domains of knowledge that are germane to its ongoing evolution. We might think of an enterprise that designs and builds complex engineered systems as falling squarely into this high-end scenario.

There are other scenarios that we can consider. For example, how about an enterprise that operates in a well understood area, say government operations or mass produced consumer products, where a big part of the success of the operation will depend on streamlining how much data and knowledge that their business activities demand. It is a common mistake to think of data, information and knowledge as undiluted goods that must be universally maximized. In reality, an effective business will acquire, manage and leverage only that data, information and knowledge that it really needs. Anything more than that is overhead and often a contingent liability.  

Data Information Content Knowledge

And then there are businesses that are all but allergic to data, information and knowledge and their success frequently depends on how formuliac their operations can be made. Think of an auto dealership or a call center. In these scenarios, the range of information transactions that are supported is incredibly narrow and keeping it so is central to maintaining their margins.

Data Information Content Knowledge

What we see in this case is a winnowing down of the intellectual resources that an enterprise depends upon. The goal is increased efficiency. And depending on the type of business this strategy can be more than just advisable. It might be a matter of survival. But consider how and when this particular strategy will break down. What happens, for example, if the industry landscape changes violently? Will a streamlined enterprise such as this be able to adapt? Without access to deeper stores of knowledge and data, these types of streamlined enterprises frequently perish in the face of major environmental change. But once stability returns, newly streamlined enterprises will spring up and once again choke out those competitors that carry around too much knowledge. It is the circle of institutional life.

All of this helps to encourage us to continue exploring and elaborating a suitably robust concept of content. It turns out that by following this line of reasoning we advance our understanding of content quite substantially and this advancement still fits neatly within our core definition of content as potential information. Our model here helps us to see that inherent to the concept of content is its grounding in data resources and in established (and evolving) knowledge assets. Authority, established through fully contextualized references, becomes something that is difficult to separate from the notion that content is the potential upon which an enterprise can effectively execute many different information transactions and do so in many different contexts.

There is merit, I would submit, of exploring these implications and possibilities. There are many enterprises that really do depend on extensive and expanding arrays of knowledge assets and data resources and our work in the "content business" is a key piece in the overall puzzle of their future success. And there is merit in developing and deploying a more robust definition for content than might seem appropriate in more streamlined and selective scenarios. At this lower end of the spectrum, it is conceivably possible to get away with defining content as "words and pictures" or "what we put into the template fields". As we move up the spectrum, we quickly find that an overly simple definition of content leaves us high and dry when we have more important problems to solve. Understanding content as potential information, and seeing it as the connective tissue enlivening the data, information and knowledge dynamic, gives us a full range of possibilities. And this is what we want of our content.

And besides, putting content into its place within the hierarchy of data, information, content and knowledge provides us with a much more entertaining acronym than we had previously.


Feed You can follow this conversation by subscribing to the comment feed for this post.

Account Deleted

Hi Joe,

I haven't had the opportunity to see you present, though I've been aware of your work for quite a while. A few years ago I was doing a lot of work with XML and architecting an ecommerce publishing solution built around XSLT. These days I'm doing something different: Managing the creation of online content for a public TV station. Perhaps inevitably my work is taking me back into the broad Information Architecture realm again.

My answer to the question "So Do You Still Want to be a Content Engineer?" asked on the last slide of your presentation "So You Want to be a Content Engineer" is "yes." Perhaps I'd have a different answer if I'd actually been at the Intelligent Content Conference and not merely seen the slides on Slideshare but my first reaction to the term Content Engineer was "Finally there's a label that describes how I think."

That said, I'm definitely coming at the content definition problem from a different perspective than you are. I started thinking about this when I was coming to grips with the Document Object Model in XML and HTML contexts. My view is definitely very colored by the fact that my job has always ultimately been defined by my ability to deliver effective user experiences, usually without the time and resources to build coherent domain models.

So, with that introduction and preface out of the way here's the way I think about content...

Content is not a thing. It is a human experience. Content is presentation and meaning combined.

Information is structured data. It has context and definition. It is the pupal stage between data's caterpillar and content's butterfly.

Data is, well, data. It is not structured per se, but it contains keys that can be used to create structure during input, output, or management operations.

So, my conceptual model is Data - Information - Content, which doesn't seem that different from your Data - Information - Knowledge pyramid, except for my concept of Content as a temporal experience.

My idealized content lifecycle is an hourglass shape. On the input side, creation is a human content experience that generates structured information, which is then stored as data. On the output side, stored data is retrieved and structured into information (say as semantic HTML), and then made into content by applying experience in the form of styling and behavior.

Maybe my way of looking at it is motivated in part by the feeling that applying the term "Knowledge" to many of the messages I must deliver as content experience is to give them weight they don't deserve. It's also arguable that I'm really talking about Model - View - Controller software architecture, but I'd counter that the difference is I'm setting Content, and my definition of it as a human experience, into a position of primacy that the term "Controller" can't express.

Thanks for an interesting read and a chance to explore my own thinking.

-Brook Ellingwood


It's nice to see respectful disagreement about a topic and keep the tone thoughtful. Joe and I have been through this dance before, and I'm still of the mind that there is a slightly different paradigm.

Data is, well, data. It can be quite important (think cost) but without any context, it's useless. Example: 50. 14.
Content brings context. Example: Pay only 50% of the parking ticket if you pay within 14 days.

So content is what makes data actionable. But content itself is not information. Information is defined by the Oxford Dictionary as something "provided or learned" or "what is conveyed". Using this basis, information needs to be communicated to an audience and absorbed or learned. In other words, information is learning a fact from experiencing ("reading" in the broadest sense of the word) content.

Knowledge is a whole other beast. Again, going to back to the basic definition, it is the assimilation of information to be able to understand and apply it. Continuing my example from above, the knowledge could be: I'd better pay that parking ticket now, to get the discount.

Given the graphic where content it subsumed into information, I'm not sure I agree. I believe that the graphic should show knowledge, information, and content (with data being a subset of content). When I think about this in the last few contracts, and particularly now, which is squarely in ecommerce, the role of data and content is front and centre in my mind - and reinforces my position. It does mess up the acronym, but hey, we all have to make sacrifices.

Joe Gollner

Thanks Brook and Rahel for contributing to the mix. And I will have to thank you both as well for sending me off on another "thought-circuit".

It is to be expected that a term such as "content" which is so central to the professional lives, or product investments, of so many people, should be a bit of a work in progress. I salute anyone who ventures beyond "words and stuff". I also sympathize with those who hang back and say "that shark infested lagoon doesn't look like a lot of fun".

The thought-circuit that I am now embarking on is how our different perspectives may in fact be reconciled or at least seen in a way where the differences in definition are fully understandable once the right lens is set. Some of my earlier efforts on the term "knowledge" were necessitated because, as a form of rough karma, I found myself in two communities (one made up of KM practitioners and the other made up of "hard scientists") and I needed to reconcile the apparently irreconcilable definitions of knowledge that were in use. Of course, I probably managed to satisfy neither camp.

At root, one of the key issues to be tackled, or at least situated, is where to place the "locus of definition" (a point of reference for all subsequent coordinates). Underlying my entire project is the not so subtle conceit that it is better to center our definitions of knowledge, information, data and thus content outside the cognitive behaviour of any agent that may be on the receiving side of such external resources. This is a big shift for some and one that cannot be made by those, for different reasons, prefer to place the locus inside the cognitive bahaviour of agents (people or autonomous agents for my AI friends). I find that centering the locus of these definitions outside of cognition places them in the domain of communication and this in fact enriches the landscape because it permits other constructs and strategies to be leveraged in the effort to tackle the inner workings of cognitive agents. This way the complexities on both side, and there are many, as well as their interactions, can be excavated and scrutinized. In working with Max Boisot's information space, for example, I found it better to make a sharp distinction between perception and representation, with the latter being associated with "data".

I tend to argue that once we place the locus of definition outside of the agent's cognition and decision, then it becomes possible to genuinely talk about managing knowledge, information or content. I will confess that I saw this as a worthwhile step.

So with that disclosure out in the open, you can see how indeed my rubric becomes inapplicable if we place part, or all, of it inside the cognitive behaviour of an agent. Content as a temporal experience would do this. Information defined by its receipt as learning or knowledge by its display in action would do this as well.

Now in wrestling with terminology, the goal is not really about winning. It's about exploring. And you have both raised points that I need to think a lot more about. Stressing that content is an experience, and both for the recipient and the creator (and stressing both of these is frequently not done and this is a problem), does resurface things within my rubric. For example, my definitional approach, and in many ways the entire "management project" applied to content, runs the risk of making things "cold". One of the challenges on content management projects is to avoid severing the vital link between creators and users. (Somewhere on my blog I follow Jack Kerouac into this topic http://www.gollner.ca/2009/09/connecting-with-content.html)

There is also remembering how important formatting really is - again something lost in many CMS projects where people get it into their heads that the goal is to abstract content away from format with the unintended consequence that all renditions subsequently underwhelm. The reason we abstract content from format (or perhaps some would prefer to say, structure from format) is so that we can deliver better formatted content not worse.

With all that said and done, there is more to be explored here - which was the real message in my post. I am thrilled that you have taken up that thread and sent me off in new directions. I would submit though that the larger point I have raised here about the placement of the locus of definition does a lot to explain, if not resolve, our differences. I would also submit that the model that I have sketched out can be used to accommodate and explain the specific details you have identified as needing sound handling - whether they be eCommerce data or content experiences.

Joe Gollner

One more thing. In truth I suspect that there might not be too much difference between Rahel and I on the definition of content after all. I remember thinking that the first time I saw her present her definition.

In my rubric, information is the meaningful organization of data, communicated in a specific context and with the purpose of informing others and thereby influencing their actions. This definitely picks up on the transactional dimension that I am forever emphasizing (a la Speech Acts). If we pull away the transaction part of this definition we would presumably be left with "content" which is potential information (aka what precedes the transaction or transactions). So another way to define content, in a way that is less obtuse (something I excel at, if you haven't noticed), would be to say that "content is the meaningful organization of data" which is not so different from "content is contextualized data". Not so different at all. And the transactional wrap-up in my definition of information does try to connect us to the world of the recipients who are "informed" by the receipt.

All this to say, that even with my specialized placement of the locus of definition, there may not be a whole lot separating some of the objects that the different perspectives are zeroing in on.

And yes, my diagram could be redrawn with content being the band between Data and Information. Definitely. But what then of my acronym? And how would I showcase my "skills" with Visio? You have to leave me something...

Marcia Riefer Johnston

Such thoughtful analyses by all parties. Here's my less-nuanced way of thinking about the main distinction: content is to knowledge what food is to energy. But then, as the acronym goes, I clearly don't know D-I-...

Geoff Dutton

Just discovered your stimulating blog after reading a bio that Stilo posted on an advert for a webinar. I'm a simple technical writer (OK, communicator; I also design diagrams and other graphics) working in a software factory. I would never call myself an information architect, though some of my peers do. I do call myself an author, and write for fun when not writing for profit.

I believe I understand the various distinctions that you and your guests make about data, information, content, and knowledge; some I find intuitive others seem a bit labored. But at the risk of seeming to demean your life's work and appearing to be a publishing Philistine, I cannot help but wonder aloud why such distinctions really matter to someone like me, or to people who read what I create.

Consider a relational or other database that contains numbers and text, and functions and procedures to process them into snippets of content that can be assembled into documents. Now add sets of rules so that the docs can generate automatically based on user requests and the page's mission. Sounds like what happens on DHTML web pages.

Through the lens of a DBA, the contents of the database are pretty much just data. "Ah no," says the content manager, "this is a store of information and knowledge." People who wordsmith text might regard the database as content. People who lay out the web pages and orchestrate interactions with them may think of it as information design. A visitor to the page might think "Some of this information is useful", "the information I'm looking for is too hard to find," "the presentation is attractive/unattractive", & etc.

They all have their jobs to do, and will do them well or not based on their skills, tools, objectives, and motivation. Does how they categorize "stuff" actually help or hinder them?

As an aside, in the end, the user's experience is the contextualization that really matters. All the machinery and activity to contextualize data on the publishing side may or may not create a satisfying experience if it is not congruent to what the user expects, desires, or needs.

Discussions of these terms and their distinctions remind me of the blind-men-and-elephant parable, and I come off feeling that whether we label something data, content, information, or knowledge is rather immaterial. If the distinctions are material to doing my job or to the creative writing process, I'll need some help to understand why I should commit further effort to thinking about them. Thanks.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)