What Structured Content Can Do for You (and what it can’t)

Posted by

I gave a talk about causes of structured content management project failure at the CMS/DITA Europe conference last month. Two common causes of failure are lack of success criteria and no business case. Decision makers often want to buy tools such as CMSs without having a clear idea of how they will use them or what the benefits will be. Perhaps they hear that competitors are using those tools, or they come away from conferences with the vague impression that structured content is a good thing. But without a clear idea of what benefits can be achieved and how, any such project is doomed to inefficiency or complete failure.

Slide on Kappelman et al’s 12 early warning signs of IT project failure, from my presentation at CMS/DITA Europe 2014. From a photo by Keith Schengili-Roberts

To begin with, we should be clear what we mean by the term structured content. In common usage, it refers to content that has metadata applied to its subcomponents. Metadata is “information about information”, and in this case it tells you more about the content it’s applied to. For example, a library cataloging system contains descriptive metadata that helps library users to find books by keyword, author, title, and so on. It also contains administrative metadata that helps library staff manage and preserve resources. However, most books themselves don’t count as structured content, because the metadata isn’t applied to their internal structure — for example the sections, chapters, pages, lists, diagrams, and terms they contain. In structured content, metadata is attached to these kinds of subcomponents to indicate either:

  • their type (for example a section could be an intro, a methodology description, or a conclusion), or
  • the real-world entities they refer to (for example the concept or person that a particular page, paragraph, or even an individual phrase describes). For more on this kind of metadata see my post Is Structured Content Missing a Trick.

Because structured content contains extra information, it takes extra planning and work to produce. The tools used to manage it can be costly, but the human changes needed to work with it can be even more expensive (see my post on preparing for such changes in the context of the DITA XML architecture). It’s essential that before starting a structured content implementation, stakeholders in the organization understand the benefits that they can achieve through it, as well as those they can’t.

I believe that the five kinds of benefit I list below are comprehensive (although please do comment if you feel I’ve missed any). Often, a project starts by prioritizing more output platforms or increased efficiency — the first two kinds — and as the implementation becomes more mature, the other three kinds become more important. In any case, it is helpful to understand each category, since even a small-scale project with limited goals can benefit from an eye to the long-term potential and the desirability of keeping options open in the future.

Note that there are various technical approaches to storing structured content, for example as various flavors of XML, or within a relational database. There are also different kinds of systems to create and manage it, including integrated Web content management and delivery systems, dedicated XML component-based systems, and barebones implementations using basic version control. For the purposes of this post, the differences between technical approaches and systems are not important.

1. Deliver content to various platforms and audiences

A common driver for a move to structure is the need for digital content that works well no matter what size screen, operating system, or application it is viewed with. For technical content there may also be legal or contractual requirements to produce a print version, or at least a standalone package of content such as a PDF. And it may be necessary to tailor content for different customers (e.g. branding requirements), audiences (e.g. information for a particular role), and even for different individuals (providing different information according to their needs, experience, interests, and preferences).

By labeling content subcomponents with descriptive metadata according to the information they contain rather than how they should look, different presentational styles can be applied for different viewing contexts. In a responsive web context, where screen width allows, a description can be floated next to a figure it describes, but where the screen is too narrow, the description can fit below the figure. In print or PDF output, rules can be applied to ensure that the description and the figure are always on the same page. And headings, lists, and tables can be formatted appropriately for each viewing context. Where different audiences require a different look and feel, for example customer-specific branding, this is straightforward too.

HTC User Education content on various platforms

However, an organization may require different versions of a particular piece of content to be displayed on different platforms or in different contexts. For example, if an article has a short description and a long description available, the short description could be displayed in a “mobile” context — narrower screens or mobile apps / browsers — whereas the long description could be used in a desktop or print context. Alternatively, one description could be displayed for viewers arriving at the article via an external search, and another displayed for those coming from another page on the same site.

At a larger scale, if an organization wants to include or omit whole topic areas or types for particular contexts or audiences (for example if it is felt that users who are traveling do not want to view reference material, or different roles have different tasks), this can be achieved by building a different collection of topics for each context, either dynamically based on metadata associated with the topic chunks, or manually if a more curated approach is required. The principle is the same as in the paragraph described above, but the chunks are bigger. Where the delivery platform has access to data on individuals (for example via CRM integration), dynamic filtering can also be used to tailor information to those individuals’ needs, experiences, and preferences.

While there are well-trodden technical paths to selectively present different content in different viewing contexts, we should not assume that we understand enough about users’ varying needs in those different contexts to make this approach useful or worthwhile. As Karen McGrane writes in Content Strategy for Mobile, to take one example:

If we want objective and accurate data about how people engage with mobile devices, we first need to get all of our content on mobile. Only then will we be able to get real facts and glean meaningful insights about what people want, when they want it, and how they want it presented.

Where requirements for selective / adaptive presentation are very clear, or where they are externally mandated, we can go ahead confidently. Otherwise, we should be cautious and err on the side of letting users filter their own content.

2. Create content more efficiently

The ability to deliver multiple outputs from a single source can be seen an instant productivity increase compared to the cost of tailoring such outputs using traditional presentation-oriented approaches to content creation. Even where a content team produces only a single output type, automated rules-based publishing of structured content can eliminate effort spent manually formatting outputs, for example tweaking figure layouts, tables, and page breaks. It bears mentioning, however, that the initial outlay and ongoing maintenance of such publishing solutions depends to some extent on the consistency of the source content. If the source structure is controlled and predictable, automation is far easier.

Further savings can be made by working with reusable content components. If the structured content implementation is done in such a way that chunks of content can not only be labelled descriptively but also managed independently from each other, it is no longer necessary to push complete deliverables (publications, or collections of content) through the workflow. This is of course nothing new for teams working primarily with individual pages of web content or articles, but for those producing books or collections of content, it is a significant departure from the traditional way of working. If a publication is being updated, authors, reviewers, and translators can focus on the new or updated chunks of content without having to handle the unchanged parts.

3. Improve the technical quality of content

There is no point in publishing to more platforms or producing greater quantities of information if that information is of low quality. Some aspects of information quality that structured content can help with are:

  • accuracy — the information that an organization produces should be factually correct, or reflective of the organization’s viewpoints and approach. By applying review workflow rules to independent chunks or modules, and by reusing approved modules wherever appropriate, maintaining accuracy becomes easier and less error-prone. Another aspect of accuracy is correct terminology. Where external systems store canonical lists of terms (for example a database of software UI strings), these can be integrated into a structured content system so that the correct, up-to-date term is always used.
  • conformance to contractual, regulatory, or other legal requirements — certain information in a certain format may need to be included, for example warnings in technical documentation, or correct use of trademarked terms. Specific review processes may be needed for chunks of content that fall into this category. In a structured environment, metadata can be applied to content chunks to indicate what specific requirements apply to them. Alternatively, a central bank of legal or safety text can be maintained, which authors can draw on whenever needed.
  • extent and balance of subject matter coverage — when metadata describes what content modules are about, it is possible to query an entire collection of modules to find out what it includes and to what extent those subject areas are covered. When data on user needs or common search queries is available, this can be analyzed alongside the subject coverage data to reveal where there are information gaps or where there is duplicated, redundant, or otherwise necessary information. (Of course, the subsequent corrective action can help to increase efficiency as well as content quality. There are no hard dividing lines between the various benefits that can be achieved with structured content.)
  • consistent sequence of elements — it can help readers if particular types of content are always found on the same area of a page or in the same navigational arrangement. For example, Wikipedia articles nearly all feature a lead section/intro at the top, and “References” and “External links” sections at the bottom. A structured content environment can strongly encourage the use of standard structures, for example by means of template placeholders, form fields, or the direct control of block elements by means of an XML schema.
The FontoXML web-based XML editor (http://www.fontoxml.com/solutions/fontoxml-editor/)

4. Enable readers to discover information

The uses of metadata that we’ve looked at so far are all internal; that is, they enable content producers to customize outputs, manage content better, and improve its quality. These uses are well-established and are often the only ones that an organization considers when starting an implementation. A less-explored use, but one with huge potential, is to surface aspects of the source metadata to improve the experience of end users; the people who read the information.

Some examples of this potential are:

  • taxonomy-based navigation — when metadata describes the subject matter of content modules (e.g. pages or articles), it can be used in a faceted search application to help readers find what they’re looking for. They select categories of interest to browse content, or search for a term first and use the categories to filter the search results.
  • automated content recommendations — taxonomical metadata can be used to generate links to content that’s related to the currently viewed page. This can be very helpful when, as is often the case, users land on a page from an external search result, and it is close to what they are looking for, but not quite there. Suggested links can take them to the exact piece of content that they need. (Automated textual analysis, “topic modeling”, can also be used to enhance related links, though this is not in itself dependent on structured content.) Suggested links can also be tailored to individuals’ needs, experiences, and interests, where data exists on those individuals (for example with CRM integration). The difference from the pre-emptive personalized filtering described above is that with content suggestions, users always have the choice to view them or not. This is generally a better approach unless we are completely confident in our judgements of what information our users need.
  • term definitions and related links — when phrases in content such as names of organizations or scientific/technical terms are labeled with metadata that indicates what they refer to, links to definitions of those terms or related content can be generated automatically. In this demo application built with Mekon DITAweb, each highlighted term is clickable to reveal related definitions and other links.
A customized Web application based on Congility DITAweb
  • exposing metadata in Web content to enhance external searches — we can publish our descriptive metadata directly with our Web content, improving results from external search engines such as Google and allowing any external organization to build applications which connect to the information we publish. This is possible through standard, extendable vocabularies such as Schema.org and formats such as RDFa and JSON-LD.

5. Give your content a life beyond the system it’s currently held in

The information we create is a valuable asset and can remain valuable for many years. For many organizations, it is important that the information can still be accessed and used even if the system it is managed in changes — for example if a new CMS is adopted or even if the underlying hardware platform changes. The metadata that is stored with structured content allows the content to be exported with the meaning intact, transformed into other structured formats if necessary, and brought into other systems.

This also allows collaborative working between various groups using various tools in the same or different organizations. Note that it is not necessary to store content in a single, interchangeable format for this to work. While it can be cheaper to do so, there may be legitimate business reasons to work with structured content in several different formats and to automatically transform it between the various formats.

In contrast, a hallmark of non-structured content is that it is hard to work with outside of the specific tool with which it is created. While non-structured tools exist that can provide some of the benefits listed above, such as modular content management, it is almost always expensive or time-consuming to get the created content out of the tool and into another one.

What structured content can’t do: substitute for good writing skills

In Does Your Writing Tool Leave Space to Build a Story? I wrote:

All writing, bar the driest reference material or the most avant-garde literature, must tell a story. For example, a market research report (at least one that’s any good) weaves the data into a coherent plot, with an actionable denouement. A help page starts with a goal or problem and finishes with one or more solutions. And email should be the haiku of business communication. A sentence or two to frame the topic; some more for the details; and then an unambiguous statement of the necessary action. But it’s difficult to wrestle a wriggling mass of ideas into a coherent structure.

For sure, the guidance that a structured template provides can help authors to avoid missing important or mandatory information, as can the automated checks on coverage discussed above. But valuable information requires much more than a “paint-by-numbers” approach to writing. It requires writers to exercise imagination and empathy to put themselves in readers’ shoes; to provide sufficient background information and orientation and lead readers through a line of argument or discussion; in other words to avoid the “curse of knowledge”, to use Steven Pinker’s catchy phrase.

With structured content, particularly when dealing with reusable content modules, it is easy to focus on the technical aspects and ignore the reading experience; the need for context and flow. A mature approach to a structured content implementation has a clear idea of the benefits to be achieved, and doesn’t fall into the trap of thinking that information quality will automatically improve without continued investment in learning and development for skilled writers.