Content Integration
The Consolations of Content Management

Poetry in the Machine

Urizen by William Blake

The question sometimes come up of what the value of an arts education might be in the modern world. Acknowledging that this is quite a sprawling question and that the better answers will be the least welcome ones, we can chose instead to focus on a much smaller version of the question. What value, if any, can be realized by projects that seek to apply modern technology to handling and analysis of literary and historical materials? What value might there be in what is often called the "digital humanities"?

There are two sides to this question. One is fairly obvious and uncontroversial. The other is markedly less so but also much more interesting.

The uncontroversial rendition essentially asks whether the application of technology can assist researchers in the humanities to advance their work. The answer is quite obviously yes. As more and more historical and literary materials find their way online, and are joined there by more and more published research, the greater the speed with which researchers can access and connect these materials. Steven Pinker in a recent talk in Ottawa about his latest book (The Better Angels of Our Nature: Why Violence Has Declined) specifically highlighted the fact that a work of such sprawling synthesis could not have been performed in a reasonable time if the vast majority of the relevant materials had not been available online.

The Web of Linked Data

And this points us towards what is probably the most important way that technology can be leveraged to improve how work in the humanities is pursued. The use being made of open standards to make materials accessible is something that is accelerating and evolving at quite a pace. The open standards in question are largely those surrounding the open encoding of research materials themselves using the Extensible Markup Language (XML) and the open exchange of data resources using the Resource Description Framework (RDF). The consequence of leveraging these standards is that the full range of the content becomes available for linking and processing using automated mechanisms. This, in turn, can be used to discover and document new connections that would have remained hidden without these ways to expand the researchers' view.

Additional examples could be put forward exemplifying how technology can assist humanities research but all of them will have a similar air of obviousness about them. If there are other angles to explore when considering this question then that is really where our attention should go. For example, are there benefits for the field of information technology that might be realized through digital humanities projects? The answer to this question is also a firm "yes" and in fact evidence of this can be found all around us.

Ghost in the Machine

As should surprise no one, literary and historical materials exhibit a bewildering range of features. This is true of the materials themselves and it is true of the publishing, dissemination and revision patterns that trace each unit's trajectory through history. And this is true whether we are talking about volumes of poetry, picaresque novels, journalistic exchanges, or encyclopedic reference tomes. It follows that every attempt to apply technology to the handling of these materials forces us to wrestle with a plethora of complex challenges. Specifically, in wrestling with literary and historical materials we are forced to think about patterns in human communication that we would be unlikely to encounter in the conduct of routine business. And this has the consequence that we will, through these endeavours, develop and test technologies that are fundamentally more attuned to the unpredictable ways in which people behave than we would ever otherwise imagine. Projects in the digital humanities, if pursued fulsomely, will push the boundaries of technology. In fact, this is, in no small way, how we arrived at the internet of today.

Although it is neither well known nor widely acknowledged, among the most interesting things about the above-mentioned open standards (XML, RDF, and their fellow travellers) is the fact that they have emerged in large part to give us better ways to handle the things that people actually need to do with the content they create, share, squirrel away, retrieve, reuse, publish, forget, recover and so on. Historically, computers have evolved to do a reasonably good job at handling a very limited set of the transactions that people undertake. Specifically, computers have established a real strength around capturing, storing and processing data such as what we find in inventories and financial ledgers. Computers have, until recently, been far less successful at handling "content" which is a much more complex amalgam of data prepared and transacted in ways that are primarily geared to consumption by people. As we see with social media, however, there is a massive, and growing, interest in applying computers to helping people create, manage, share and evolve content as part of their social networks. It turns out that one of the best ways to learn about what content is like and what people tend to do with it is to look at literary and historical materials. The patterns we encounter in poetry, drama, encyclopedia, commonplace books, correspondence, newspapers and so on - exemplify exactly the range of structures we find ourselves wrestling with today. And so it is that there are numerous precedents where we see how vanguard work on digital humanities projects has provided lessons and working models that have helped us to tackle content as it surges online.

A Matrix of Content

One example is a rather grand one. The Standard Generalized Markup Language (SGML) was a standard that was ratified as an ISO standard (ISO 8879) in 1986. While tracing its roots back to challenges like handling legal material on mainframe computers, the team that helped finalize the standard included people with an interesting array of experiences including at least one doctorate in German Philology. As a consequence of this diversity, the standard embodied a grammar for representing the patterns that were found in human communication with this spanning a disturbingly broad range and one that definitely reflected an acute familiarity with literature. Moving the dial forward, we find a reformed physicist working at CERN and trying to bring to life something called the World Wide Web. By one of recent history's better turns of good fortune, the publishing team working at CERN at the time was populated with expert users of the SGML standard (ironically one of the world's few locations with a high concentration of this knowledge). And so it was that the HyperText Markup Language (HTML) accidentally leveraged SGML and with it a wide range of features that made it something that could be extended easily to accommodate a universe of content prepared by people working in many fields and in many countries.

There are other examples that are much more explicit. In an obscure area of the academy, work had progressed on a standard music description language and this effort had confronted the issues with representing time sequences in a way that could be interpreted by different software programs. When the people working on improving how technical information for military systems could be made more effective, they stumbled upon this work and immediately recognized that however useful these models might be for representing Mozart to a computer it was going to be excellent for representing multimedia training resources for a battleship. And this is precisely the path that the US Navy decided to follow, although they understandably felt inclined to keep quiet about their sources. In another example, in this case with the Canadian military and a solution that was then escalated to a number of NATO projects, an architectural strategy was taken up that had been originally developed by the Text Encoding Initiative (TEI) for encoding and exchanging "headers" (bibliographic profiles for literary and historical resources). In the legal environment, it was found that the models expertly developed by the Text Encoding Initiative for marking up drama texts applied directly and completely to the encoding needs of court reporting systems. Courts cases, it turned out, are pure theatre and the applicability of the drama model surprised no one who was working on the project. In other cases, strategies for encoding and exchanging complex indices for artifacts such as encyclopedia blazed the trail that in many ways leads us to the open data standards that are so topical today in discussions of open and linked data (RDF and OWL, the Web Ontology Language).


It is interesting that of the two lead editors of the XML recommendation (W3C 1998), one wielded a doctorate in Comparitive Literature and the other had cut his teeth as a professional software developer working on the project to digitize and index the Oxford English Dictionary. I have written more about the evolution of these open standards, and how they have enabled the online revolution occurring around us. See the whitepaper The Emergence of Intelligent Content: The Evolution of Open Content Standards and their Significance. For more on the nature of content, see The Truth about Content and its related posts.

And so it is that many of the open standards that underpin the web and the burgeoning world of social media exhibit a variety of connections to the world of digital humanities. While it is something that would benefit from some systematic research, it is difficult not to conclude that these connections, and their roots in the world of literary and historical research, have contributed a lot to the subsequent success of these standards in enabling a brave new world of digital communication. It follows then that there will be ongoing benefits to be realized by returning to the font and continuing to explore how leading edge technologies can be applied to yet more literary and historical challenges. Current efforts in this area are in fact stirring up new opportunities to advance technology toward becoming more adept at handling the content that people actually produce and use. Indeed, this is an important avenue to pursue because it will be through these efforts that we will ultimately give our technology the human face it needs.

Text Silhouette
See Screen (2002-2005)


Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.