Believe it or not, there was a time when we did not talk about content. At least not in the way we do today. To some ears this will sound decidedly odd. To others it might even sound outrageous. But it is neither. I would like to suggest that the concept of content that we now associate with management and publishing has been shifting under our feet and that these changes should help us to define the term more precisely and to wield it more effectively.
We can start by turning the clock back a few decades and consider how we once used the word content. Whether we were confronted with bewildering stacks of military manuals or unending shelves of historical texts, we would talk about digging into the contents of this or that publication. Even while we fussed over the pagination or the method of printing that had been used, we were certain that the real value lay elsewhere – somewhere within the content being expressed. In these cases, we thought of content as the meaning being communicated by the publication – its intrinsic value. More recently, in an organization buried under mountains of legal dossiers, a colleague of mine declared that what we were trying to do is break the documents open and to unlock the content that lay within.
This use of the word content is completely in line with its historical sense, coming as it does from the Latin term contentum or that which is contained. Content is what we seek to extract from within document containers, and then to apply in a given situation or to reuse in another context. We are in fact highly adept at finding the content within published documents. With remarkable facility, we scan documents, interpret the supplemental meaning provided by the layout, and identify what content is relevant to our needs.
I would like to argue that we need to define the term content more precisely. Specifically, we need to start using the term content in a way that considers content separately from its many potential delivery containers – that thoughtfully abstracts content away from the organization and format of any one publication. I would also argue that it is only by enforcing this separation that we can design and deploy content management and publishing solutions that are genuinely effective, scalable, and sustainable.
Now let’s move the clock forward to the very early days of the web. One thing that stands out about the creation of early web sites is how it forced us to see content in a new light. Once again we were digging into these documents in order to grapple with their contents. But this time it was different. This time we were not only trying to interpret what the documents meant. This time we were trying to extract that meaning so we could make it accessible in a new medium. On the most basic level, we became very conscious of lots of nitty-gritty minutia like character encoding so that it was possible to display the files correctly on computers over which we had no control. On a higher level, we immediately set to work on finding ways to present the content in compelling and effective ways through the relatively new phenomena of web browsers. It was an exciting time because we had a chance to explore a new publishing medium and what promised to be a new way of doing business.
Now during the times we were working on these early web sites, we continued to have a community of users, and indeed a very large community of users, for whom the printed output was the most important deliverable. So it became necessary to maintain legacy publishing processes even while we were bringing into service new web delivery channels.
We were not entirely happy about this circumstance. After many years (decades actually and sometimes even longer) of progressive refinement, the organization and format, or as we will call it the layout, of printed publications had become subject to layer upon layer of onerous control. It was not lost on us at the time that one of the things that made the web so attractive was the chance to break away from the stifling regime of print documents that had come to prevail in many organizations. We were at the head of the line when it came to championing the new freedom that the web seemed to offer. The old paper regime had breathed its last and by bringing web servers online, usually by side-stepping obstructions erected by old school information technology groups, we would usher in a brave new world of promiscuous collaboration and unbridled innovation. To lean on a gratuitous reference to The Lord of the Rings, we were all of us deceived. The power of the paper regime, and the empire of documents, could not be so easily undone.
It did not take long for us to discover why many of those print publishing rules had existed in the first place. In all too many circumstances, these legacy layout rules had evolved over years of experience and with the intent of serving very real needs. Often the layout rules were designed help save precious time for people accessing or handling the publications. In some cases, the rules had evolved in order to ensure user safety. In still others, the rules had become subject to exacting legal requirements whether through judicial precedence or through legislation. While it is true that these publishing rules had become overgrown and top-heavy, there were an infuriatingly large number of them that simply could not be discarded. And this was not an isolated occurrence, limited to just one industry sector. This was as true for academic reference texts as it was for military manuals.
So the realization sank in that, no matter what else we set out to achieve, we would need to maintain a print publishing capability that would sustain the types of print deliverables that legacy business processes demanded. It also became clear that the same pressures that had produced the print publishing rules would come to bear upon our web publishing efforts. And this is exactly how things turned out. What was most surprising, now that we can look back, was the speed and ferocity with which organizational obligations began to collect around web publishing and how quickly we responded with our own bevy of guidelines and publishing rules for the web.
We knew what we needed to do but, of course, we immediately tried to find some shortcuts. Perhaps there was a way to migrate content directly from the print-oriented representation into a web-ready form. Perhaps doing so would allow us to import, holus-bolus, layout conventions from the print world and thereby help us to get around needing to define and validate new publishing rules for the web that would meet the same business objectives. We knew better but we had to try anyway. There was also a lot of pressure to maintain the print update cycles even while we brought new channels online. And naturally there was a desire to see that anything delivered via the web would be synchronized with what had been released in paper. All this meant that time was of the essence and a shortcut for streaming content from print onto the web, if it worked, would have been the answer to many prayers. These experiments turned out pretty much as expected in that they failed miserably. So we returned to what we knew was the necessary answer. We were forced to turn our attention to true multi-channel publishing from a single, authoritative source of content. We were be forced to look at the content as content.
In the latter part of the 1980s, there were a number of industry sectors (including the military as well as academic and legal publishing) where the march toward the multi-channel publishing had begun a few years earlier with the adoption of the Standard Generalized Markup Language (SGML). In these sectors, publishers had been forced by the escalating complexity of their business to radically change how they prepare, manage and publish their content. In a predictable number of cases, a key driver for the adoption of SGML was enabling the distribution of publications through multiple channels with an emphasis on electronic delivery. This move towards SGML was also driven by a need to establish searchable stores of content as a more practical way to handle swelling volumes and long-term access requirements. The people working in this field at the time did have a sense that, in their various experiments, they were exploring the future of publishing.
The introduction of SGML into publishing processes, it turned out, necessitated a lot of changes in how we viewed and handled publications. The most profound of these changes was the fact that applying SGML, if done correctly, would force us to thoughtfully abstract the content away from the delivery formats that were appropriate to any one publishing channel. As an illustration of how difficult making this change was, a disturbing number of the early SGML applications were in fact typesetting specifications re-expressed using angle brackets. Even in these cases, however, there was a growing appreciation for the fact that the way content is prepared and maintained need not be identical to how it is delivered for use and that automation could be deployed to facilitate high quality publishing in order to transform content assets into useful information products. Fortunately there were also SGML projects that really did pursue the new goal of representing the content as an asset that existed separately from its containers, and that maintained an arms-length relationship to the organization and formatting associated with various publications. Invariably, these projects also discovered that content assets needed to be managed in a modular way so as to facilitate reuse and referencing across collections that would quickly escalate in size, volatility, and complexity. I would argue that it is within this latter type of SGML project that we first got a glimpse of the true nature of content.
It is worth recalling that an early SGML-based publishing environment, and in fact the system that was used to publish the SGML standard itself in 1986, operated at the high energy physics laboratory at CERN where Tim Berners-Lee was, in parallel, hatching the web. This is why the Hypertext Markup Language (HTML) was framed as an SGML application. This is also why HTML exhibited the common compromise seen in early SGML projects in that it was largely a format-oriented application of SGML. Far from being a problem, the ability for just about anyone to create a web page using simple formatting markup turned out to be one of the key reasons the web took off as it did. It seems a little odd to say, but part of the web`s success stems from the fact that it ignored the content and provided instead a new delivery channel for any content that could be poured into a web page and given basic formatting tags. But the fact that HTML draws its roots from SGML also points to a latent capability that lies buried within the tissue of the web – the capability to handle content separately from its presentation layout and to thereby to enable, for businesses, far more scalable publishing models and, for users, more responsive web experiences.
Through this series of events one very important thing was becoming clear. For a number of interrelated reasons we were being forced to think about content differently than we had in the past. With the emergence of the web in particular, we started to think about content as the conceptual material that we would plan, prepare and manage in a way that was different from any one of the forms in which it would be delivered to users. This content, if done correctly according to this line of reasoning, would allow us to efficiently produce all of the publication types that would be needed – including those of which we were as yet unaware. This lesson, if it needed reinforcing, has been driven home by the new demands being introduced by the mobile revolution that is unfolding around us.
Somewhere along the line in this journey from print publications to mobile devices, I began to define content in a somewhat unusual way. I came to use the term content to mean “potential information” and, apart from being wilfully idiosyncratic, this definition allowed me to make a sharp, but essential, distinction between the reusable content that we would want to manage as a long term asset and the many transactional forms that the information might take as it is printed, served to a web browser, or delivered to a mobile device. Information, under this rubric, is understood as an action and one that should be judged on whether or not it is effective. Information transactions become the venue where the potential value of content assets is realized.
Looking more deeply into the content, we started to build on our appreciation for the true nature of content. By studying the publishing rules associated with legacy publications, and coming to grips with the business objectives and obligations tied to those rules, we came to understand that in order to fully abstract content away from the organization and formatting of publications we needed to locate, understand, and represent the supplemental meaning that was being expressed in those publishing rules. If we were going to establish appropriate layout behaviour in radically different channels then it was not just the text and media assets that we needed to manage. We needed to manage the relationships between the text and media assets that reflected and respected the logic that governed how content assets would interact during the delivery of effective information experiences. Essentially everything that will be necessary to facilitate effective information events would need to be managed as the content. This is what “potential information” really means. And this is why there is a unique and inescapable role for content technologies - for tools that handle content as content, and that can transform content assets into information products.
Warnings in technical manuals provide a useful illustration. Anyone who has had to open a manual in order to fix something will recall seeing “warnings” that caution people to avoid doing certain things. Within the military, our favorite off-color example was a warning that read “Do not look into the laser with remaining good eye!” As this example showcases, the placement and timing of a warning is important. There is a logical connection between a particular warning and a set of steps in a procedure. This connection must be made explicit because it is rather obvious that the warning needs to be made prominently visible (and even audible) before someone starts a task. It is also important that the warning needs to remain visible until the relevant steps are completed and the danger has passed. The presentational design criteria that will apply will be different for a printed loose-leaf technical manual than it will be for a portable maintenance application that runs on a tablet. But the goal of safety and the logical connections between a warning and a set of tasks will remain constant.
The logic behind publishing rules, such as those governing these warnings, is in fact an intrinsic part of the content and this logic must be incorporated into any representation of the content that hopes to be able to credibly reproduce effective publications in all of the channels being addressed. Hopefully the example of safety warnings helps to illustrate why thoughtfully abstracting content away from the publication layout is not as simple as might initially appear. And hopefully this example helps to explain why thoughtfully abstracting content away from any one publication is essential if we wish to achieve our strategic business goals by effectively addressing emergent publishing channels.
Although the definition of content as potential information, and this sharp distinction between content assets and information products, has earned me more than a few rolling eyes over the years, it has proven to be far more indispensable than I would have ever imagined. With this distinction in hand, it becomes possible, even natural, to think about content in two separate, but obviously interrelated, ways. One way asks about how best to design and manage the content assets so as to be ready to support many different publishing channels. The second way directs attention to the processes whereby content assets will be assembled into, and published as, information products that achieve both organizational goals and individual needs.
In fact, over the last 25 years, it has been my experience that truly successful content management and publishing solutions have only been possible when we have rigorously applied this distinction between content assets (as potential information) and information products (as contextualized information events). And whenever affordability, sustainability, and adaptability have been given any attention at all, this approach is the only one that consistently delivers the desired results. By staging content assets in a way that is thoughtfully independent of all publishing channels and that makes explicit the logical connections between content assets, we have been able to optimize how we acquire and manage that content while simultaneously optimizing how that content is published across channels and in response to always-changing user needs. We have literally been able to have our cake and eat it too.
At the Best Practices conference for the Center for Information Development Management (CIDM), in Saint Petersburg, Florida, I gave a short TEDTalk-style presentation on this subject. The short talk did not dig into all of the details addressed in this post although it did apply more attention to some themes. Specifically, the presentation did a better job of approaching, and illuminating, how escalating complexity drives the move towards content modularity, and how content technologies have emerged as a distinct technology discipline that really does set out to handle content as content.