
Well now that I have thrown down the gauntlet in my previous post about the Trials and Tribulations of Content Management, I feel that I should take the next step and posit some core definitions for terms such as content, information, data, publishing and even knowledge. Hopefully, others will contribute superior alternatives or hack these contributions to pieces so as to expose something better.
And why not start with the term content. There is an entire industry dedicated to its management so we would hope that it is a term for which we have a ready definition. In reality this does not seem to be the case. So let’s toss one out.
Content is potential information.
Content is the raw material from which information is fashioned. It encompasses all of the constituent pieces that, when assembled together, constitute a specific information transaction. So it is that when we undertake a content analysis we are in effect rewinding the communication process that produces the information product. Working backwards from the envisioned audience and the intended outcome, we itemize the many pieces that will need to come together to forge an effective information event. Many of these pieces are recognizable to practitioners in the content management and publishing field – text components, media assets, assembly maps, governing models, processing rules, audience profiles, personalization filters, formatting stylesheets, metadata structures, relationship links, security controls and probably others. All must come together to form an information output which, in being delivered, becomes an information transaction.
This leads us to a point where a second definition is needed – one for information. In many environments, the terms content and information are used almost interchangeably but not here because to do so would be to conceal an all-important difference.
Information is the meaningful organization of data, communicated in a specific context and with the purpose of informing others and thereby influencing their actions.
In other words, information is transactional. It is an action. A given information product will therefore have specific goals, take on a specific physical form, occur in a specific context, and be exchanged between specific participants. Information is also authoritative, meaning that someone is responsible for it just as someone is responsible for an action. This fact explains why information management has been receiving so much attention in this time of heightened accountability. Information, it is important to note, is formatted and physical – it has been fashioned into a form that is fit for its purpose whether that be as a book, a text message, a voice prompt or a web page. The transactional unit of information, the artifact that is ultimately encountered, is known as a document.
Once enacted, information, as we well know, persists and accumulates - in some ways taking on a life of its own quite separate from the context in which it originally occurred. Past information transactions, as an illustration of this process, become inputs to, and reference points for, content creators and information actors. As information events are actions they will give rise to consequences and this experience can become part of what is known about each information transaction (its outcome or results). In this way, past information events become a source of content that can inform, and potentially improve the effectiveness of, future information transactions.
So how does content become information? How does it change from being merely the potential for action to being a real event? By what mechanism do all the constituent pieces assemble together to form an information transaction that will suit a specific circumstance? This is the role of publishing.
Publishing is the process of transforming content into information, of converting the latent assets into concrete actions that will inform people, impact performance and influence outcomes.
As a sweeping generalization, I would be inclined to say that publishing technology stands as the most challenging aspect of the entire content management domain. The variety and sophistication of transformations, necessary to support the increasingly wide range of information transactions that organizations seem to need today, represents a very demanding requirement and especially when everyone seems to want access to relevant information at the touch of a button.
In our posited definition for information, we see another term deserving of attention – data.
Data is the meaningful representation of experience.
More specifically, data is the method of encoding that we use to represent the basic building blocks of communication, with these providing the substance for what we call content. It would be a distraction to dig too deeply into the subject of data now but suffice it to say that the various components that make up content will each be associated with a form of representation for which rules have been specified. One such component would be a formatting stylesheet which will, in addition to a basic character encoding, possess a specific syntax that governs how formatting instructions are to be framed as well as how they are to be applied and to what. So not only are the details arrayed in a table in a final document considered data, so should the formatting stylesheet that governs its rendition.
It is at this point that it is worth pausing to consider the subject of openness and portability. These considerations are effectively determined at the level of data representation and it is here that the adoption of, and adherence to, open standards will establish whether content components will themselves be open and portable or not. It is also here that the degree of intelligence that can be exhibited in the content components will be decided. How effectively and efficiently information transactions can traverse boundaries associated with different media and different devices is, in turn, governed by the openness, portability and intelligence of the content that stands behind it.
Most recently, I have come to think of content as a transformation layer that separates data representations from information transactions – the layer in which we, as the content owners and information actors, plan the many different ways in which we may want to engage audiences and to lay out our resources and processes accordingly.
Finally, the relationships between these concepts are important because they illuminate the mechanisms whereby organizations and societies express, share and evolve their knowledge. So as a final step, I offer a somewhat controversial definition of knowledge itself, with it being controversial because it emphasizes the physicality of knowledge (in emerging from the meaningful interplay and organization of information transactions) and its standing as an evolving understanding that is publicly communicated and therefore subject to testing, refutation and even validation. This definition explicitly links knowledge to our definition of information, and thereby to our definitions of data, content and publishing.
Knowledge is the meaningful organization of information, expressing an evolving understanding of a subject and establishing a justified basis for judgment and the potential for effective action.

As we might expect, content is what is contained in an information transaction, within a document. When we look at the lifecycle that leads up to the formation of information actions, we see that it is content, in its rich physicality, that is being fashioned, evaluated, assembled, endorsed and eventually incorporated into the transactions. We also see that the content of past transactions can be referenced, reused or revised in the framing of new information events. Data resources, of varying types, can be seen in evidence in each and every content component and, likewise, the imprint of background knowledge, whether acknowledged or not, can be seen in the representational data schemes and in the structural patterns underlying each information product. And knowledge itself achieves a degree of persistence and portability when it is instantiated in networks of inter-related information transactions that trace out, over time, the emergence and evolution of a shared understanding of a subject.
This overall system, or content ecology, becomes, in no time, almost unfathomably complex and this is one of the reasons why content management and publishing solutions have, historically, struggled to put it politely. When we really think about it, it should come as no surprise that the management of content, together with its publication and cyclical evolution, is phenomenally difficult and the one mistake that must be avoided is assuming, as all too many technology implementations have, that content and its associated processes are simple.

This is an awful lot to pack into a relatively short declaration but I believe we have managed to float definitions for content, information, documents, publishing, data and knowledge. And without sound working definitions for these concepts, content management, as an industry and as a technology domain, cannot hope to be successful. As this is something that we, as a community of content management practitioners, have a stake in, I am hopeful that others will weigh in to challenge, correct, change or confirm these definitions.
If this is a topic that the reader finds interesting, then the following references might be worthwhile:
Blog Posts:
Whitepapers:
Now to make the context for this specific post, this information transaction, explicit, I will say that the notes for it were prepared in the above pictured quadrangle at Pembroke College, Oxford. It so happened that at the time organ music was pouring out from the small chapel that would be off to the right of the above photo. The blog entry was finalized and posted from my well-appointed room and desk as shown below. It is noteworthy that perhaps Pembroke's most famous alumnus is none other than Samuel Johnson, who among many other things produced a highly influencial dictionary of the English language. Perhaps it is his spirit that spurred me into a fit of definitions and inclined me to sweeping generalizations and serpentine sentences. To what extent this context might colour the meaning of this post will be left open to debate. Of course, it might also have been the wine.
