Since the nineties, bold promises have been made about XML and what it could make possible in the realm of content reuse and automated publishing. In some areas such as tech pubs this has happened, but when it comes to mainstream content creation by business users and knowledge workers, XML is no more prevalent than it was 20 years ago.

Hopefully the term Smart Content causes you to think, “Something is different here. I know what content is, but what is Smart Content?” That’s our aim because it’s time to think differently about how we create content.

What is Smart Content?

At a 30,000 foot-view, Smart Content is Quark’s name for the next generation of XML-driven authoring and automated publishing of high-value communications. It reflects the fact that, while XML may be critical as an underlying technology, XML must to be relegated to the background when it comes to content creation and reuse.

At a very granular level, Smart Content is Quark’s open, customer-configurable XML-based content schema. Quark’s enterprise products are focused on understanding and implementing Smart Content so that customers see a return on the investment of adopting dynamic publishing as quickly as possible.

Smart Content is best deployed when the content type has characteristics that include one or more of the following:

  • High volume of similar documents
  • High volume of revisions
  • Frequently repeated creation processes
  • Government or corporate regulated documents
  • High possibility to reuse content across multiple documents
  • Integration of data into the content
  • Translated to multiple languages
  • Delivered in multiple formats
  • Delivered with multiple different presentation styles

Since the core of Smart Content is an XML schema definition, it’s almost impossible to describe Smart Content without getting into the technical details.

Who Else Is Using XML for Document Production?

There are many XML schema for authoring and publishing in the marketplace. Some are very generic and some are industry specific. Interestingly, even HTML4 and later versions are actually implementations of an XML schema called “XHTML.” Other popular XML schemas include:

One of the most popular XML schema for technical document authoring and publishing. It was originally developed at IBM and moved to OASIS as an industry standard for technical publications. More on DITA later.

A precursor to DITA and used heavily in technical publications and reference books.

Used in the United States, Australia, and other partner countries for capturing and sharing intelligence research at the Department of Homeland Security and across nearly every department of the US government. Supported for authoring through Quark Pubs-XML Accelerator.

SPL – Structured Product Labeling
Used in the United States for submitting drug labeling information to the FDA for approval prior to releasing a new drug or packaging to market. Supported for authoring through the SPL Accelerator for Quark XML Author.

And there are many more, including some companies that define their own custom schema from scratch which is A LOT of work, difficult, and expensive to do well.

If there are many XML document schema available, why did Quark create a new Smart Content schema?

What’s Wrong With XML?

XML for document production was first adopted by the technical publications industry. It is heavily used in Computer Software and Hardware documentation, complex discreet manufacturing, and some process manufacturing where the content is ultimately published as print and PDF, HTML, and several Help system formats such as HTMLHelp, MSHelp, EclipseHelp, WebHelp, as well as other output types. The most widely used document XML schema were created by and for the technical publications industry, including the very popular DITA schema.

The result is that these schemas are extremely powerful tools, but are also extremely complex. To steal a quote from a Quark professional services partner, “DITA is great if your authors can think like programmers.” That’s perfect for technical authors who are, by nature of their jobs, highly technical and well trained. They are also full-time authors.


But for high-value communications, for example, documents written by financial and legal analysts or product marketing teams, it is unreasonable to think that these part-time authors can or want to “think like programmers.”

What makes these authoring schema hard? They are often overly restrictive. At Quark many of our early adopters that used one of these schema complained that the simple task of cutting and pasting content from one area of a document to another area of a document was blocked by the application. Why was it blocked? Take the following simple example of a title and a paragraph (we’re showing the XML tags, but remember that most XML authoring tools try to hide the tags)

<title>How to Make</title>
<para>Begin with the ingredients from the <keyword>Thanksgiving Recipe</keyword>.</para>

If the user selects and copies the phrase ‘the <keyword>Thanksgiving Recipe</keyword>.’ and pastes that after Make in the <title> then the authoring tool might block that paste because the controlling schema doesn’t allow <keyword> inside a <title> element. That’s frustrating and worse, the reason for the failed paste is often hidden from the user – they can’t figure out why it’s blocked so they think the tool is broken.

Of course, a trained, full-time technical author would have a good idea what happened, would turn on “show tags” in their tool of choice, and only select the text they wanted –skipping the keyword tag. This is a simple example but many, similar use cases exist. It’s a problem the Quark team refers to as “gross-edits,” and is a significant issue when it blocks a business user from authoring with the ease they are used to.

This example highlights one of the major challenges for any XML authoring tool vendor, and especially for Quark, who is targeting non-technical authors. The challenge is one of trying to impose rules and restrictions on users that have years of experience using free-form tools. Additionally, creating a user experience that manages and exposes those rules and restrictions to the user – without making the tool overly complex – is extremely difficult. That’s why the user experience of so many XML authoring software products is more similar to a programmer’s interactive development environment than it is to a word processing tool.

Remember, the reason this challenge is worth tackling is the value of applying automation to the high-value, multi-channel communications process. Generally, the automation value proposition is relatively simple to describe:

  • Automation lowers costs, improves quality, and shortens time-to-market.
  • For automation to succeed, it requires that the inputs are valid and expected: “Garbage in, Garbage out,” as the saying goes.

So, for Publishing Automation to succeed, the input – which is the authored, narrative content – must be expected and validated. That’s where XML is powerful, because it is easy to validate and forces authors to only create what is expected. But it is also where XML authoring tools cause the most problems, because they only allow what is expected and content which can be validated.


Business users, part-time authors, and subject matter experts that have used a free-form word processing tool their entire career such as Microsoft Word or Google Docs, have expectations about how fast they can write and how much freedom (often total freedom) they have in how they write their document. Switching these types of authors to a controlled, “structured” content authoring tool that limits what they can do presents a significant challenge to the authors. The more prescriptive and restrictive the XML schema is, the bigger the gap between the author’s expectations and their experience with authoring XML. And resolving that challenge is what led Quark to develop the Smart Content schema.

Smart Content Schema Details

For the XML savvy, the Smart Content schema borrows ideas from many other XML implementations including, importantly, the idea of content types – sometimes called content classes or information architectural forms. The core idea is relatively simple: there are a set of fundamental types of content and all other content can be described as belonging to one of these root classes. For those familiar with DITA, another way to describe this would be “specialization” of one of those root classes. The concept of root classes and class hierarchies is common in computer programming, biology, physics, mathematics and more.

The value of root classes and class hierarchies is that a system that knows how to process the root element can provide basic processing of any specialization of that root without previously knowing anything about the specific specialization.

This is less complicated than you might think. By a simple example, if the system knows that all <para> elements should be presented with a blank line above and a blank line below, then if the system processes content that includes <para type=”blockquote”> it will at least get right that a Block Quote should have a blank line above and below. There are many other processing rules, presentation rules and user interactions that can be applied to all content of similar types. The “specialization” is created because a system could also add new and unique processing such as right and left indents for presenting a Block Quote.

What are some of these root classes? Smart Content represents these in different categories, and here is a table that compares some of the terminology that Smart Content, HTML and DITA use:

Content TypeSmart ContentHTMLDITA
In-linestagb, i, u, etc.phrase
Listsul, olul, ollist type=”type”
MediaMediavideo, objectobject
MetadataXML meta fragmenttag attribute = “value”tag attribute = “value”

How specialization of these root content types is handled in each markup language is one of the important differences:

In HTML, specialization of a root HTML tag is usually done to drive the CSS formatting or to trigger tag specific javascript and is most often encoded using a ‘class’ attribute such as:

<div class=”Navigation”>…</div>

However, in HTML, there are very few rules about where and how you can use <div> and there are no rules on the value of the “class” attribute, so HTML is actually very freeform and not useful for high-value communications content authoring – though it is great for presentation in a web page or mobile application.

In DITA, specialization of a root DITA element such as <topic> is encoded like this:

<concept class=”- topic/topic concept/concept”>…</concept>

It’s beyond the scope of this document to explain why the class attribute has such an apparently redundant value, but it’s easy to identify the goal, which is that the element “concept” is of the class “topic” and therefore should be treated as a topic except where specific processing for concept has been defined.

In Smart Content, specialization is encoded like this:

<section type=”purpose”>

This is very similar to the HTML method for specialization, but has very specific implementation rules so that, for example, authoring a Standard Operating Procedure document can limit each document to one and only one “Purpose” and that Purpose must be after the title of the document. HTML doesn’t limit the use of or even validate the value of class attributes.

It’s worth highlighting that in HTML and Smart Content, the element name is always the root of the class. It is:

<section type=”mySection”> it is not <mySection class=”section”>.

DITA users and other XML experts might ask, “Why not use the DITA method for defining specializations?” The full answer is complex, but the simple answer is directly related to the difficulties described earlier in providing good authoring usability including support for gross-edits by cut and paste across one or more documents.

Nearly all-available XML parsing tools validate the structure of a document based on the element name (valid structure means that all the elements used are allowed by the schema and are in a valid order). Also, XML parsers ignore attribute values when validating structure. By using the HTML style of element specialization, Smart Content can enable gross-edits with a positive user experience. The user can cut and paste an element and after the paste, added processing can either silently correct the type attribute, or, if there is more than one choice that could be made, provide the author with a user experience that allows them to make a valid type choice.

While there are many other reasons for how the Smart Content schema is architected, this ability to “fallback” to processing based on the root class is one of the biggest and most valuable.

Even though the Smart Content Schema is relatively new in XML schema terms, its development has been grounded in years of XML, content authoring and publishing expertise by Quark and our customers and partners. The schema is being used by a number of customers in different industries with great success. We welcome feedback on the schema and plan in the future to make the specifications widely available for other companies to use.

For the full details and background on Smart Content, why not read our Beginner’s Guide to Smart Content.

To find out more about implementing a Smart Content solution, check out Quark Author. Quark Author is the Web-based content creation software that, together with Quark Publishing Platform, offers subject matter experts and non-technical writers an intuitive online authoring experience for rapidly creating, previewing, publishing and reusing content.