Cooking with XML:

From Document Design to Delivery


Presented to Bay Area Publications Managers Forum

by Paul Tyson

Tuesday 2002-02-26 18:30/20:00

© 2002 by Paul Tyson

paul@precisiondocuments.com

Cooking with XML

From Document Design to Delivery

  1. Introductory remarks
  2. How to XML
  3. Managing XML

Introductory remarks

You are publications managers and editors who want to know more about how XML can help you produce publications. You've heard the sales talks and the media buzz, but need to see how it really works.

I worked in aircraft technical publications for eighteen years. For the last ten years I have worked with SGML systems for the creation and delivery of technical publications.

Most of the benefits of XML come from improvements in data quality. Higher quality data leads to process and product improvements. Historically, quality in publications is determined by the appearance and usefulness of the final documents. But the visual rendition of a document is derived from electronic data, so it is an inescapable fact that characteristics of the data will affect the appearance and behavior of the documents in some way.

A good introduction to data quality is Data quality for the information age, by Thomas C. Redman.

Document design

What & why of document design

Every document has some identifiable structure. The structure is explicit or implicit. Every document is an instance of a general class of documents (possibly the only instance). In other words, we can always imagine another document similar to the one in front of us that exhibits the same features.

The details of the structure don't matter. You select characteristics to define document structure based on your needs. Only by planning and analysis can you identify the required components of your document structure.

Deliverables of document design

A DTD is a set of declarations, using a specified syntax, that identifies the elements in the document. It also specifies the allowable relationships among elements.

Document design is much more an exercise in thinking than in writing code. A DTD does look vaguely like computer code--and, in reality, it is--but that's what it takes to make ourselves understood to the machine. Just as importantly, we must make ourselves understood to our coworkers, supervisors, customers, implementors, and successors. And plain old words and pictures are invaluable.

Document design methodology

The document design process is driven by two imperatives:

  1. Distinguish things that are different.
  2. Connect things that are related.

The things to be distinguished and connected are document components. You should keep a very broad notion of "components" while designing documents, because the things you can describe with structural markup go beyond what you can actually see in a document.

There are three ways to indicate relationships using structural markup:

In other words, you can connect things by:

Hierarchy is the most definite and robust indication of relationship; sequence is the easiest but weakest; semantics is the most complicated and flexible method.

Visual document design

Document creation

Converting non-XML data

Pros Cons
  • Less disruptive to writers
  • Lower software costs
  • Appropriate for large-scale one-time conversion of legacy data
  • Appropriate for database records such as parts lists and tables
  • Requires highly consistent input
  • Costly to develop automation rules for complex or irregular data
  • For recurring process, must find and fix defects in XML after each batch
  • Does not significantly improve document data quality

Interactively creating XML documents

Selecting an XML editor

Using XML to represent your documents lets you become more document-centric than application-centric. When documents instead of applications become the center of your universe you will have more options and capabilities.

Applying Style

The need for style

Style methods

  Direct Association Transformation Embedded formatting
Standard (ISO, MIL, or W3C)
  • CSS
  • FOSI
  • DSSSL
  • XSLT
  • XSL-FO
Nonstandard or Proprietary*
  • Frame+SGML
  • Wordperfect+SGML
  • Xyenterprise XPP
  • ________________
  • Balise
  • Omnimark
  • ________________
  • MIF
  • RTF
  • TeX
  • ________________
* Specifications for these languages and formats were not produced by a recognized standards-making organization.

Standard style languages do not produce any different results than proprietary or nonstandard ones. They can, however, simplify the process of specifying and applying style. Styles and transformations expressed in a standard language are usually more portable and reusable than others.

Style by direct association

CSS and FOSI

  CSS FOSI
1
para {
<e-i-c gi="para">
2  
 <charlist>
3  
  <font
4
 font-size: 12pt
      size="12pt"
5
 font-weight: normal
      weight="medium">
6
 start-indent: 0.5in
  <indent leftind="0.5in">
7
 space-before: 6pt
  <presp nominal="6pt">
8
 display: block
  <textbrk startln="1">
9
}
</e-i-c>
  1. Select this rule for elements named para.
  2. Begin a charlist element (FOSI only).
  3. Begin a font element (FOSI only).
  4. Set the font size to 12 points.
  5. Set the font weight to normal. In FOSI, close the font element begin tag.
  6. Set the start indent to 0.5 inches.
  7. Set the space before this block to 6 points.
  8. Make this a block-like formatting object.
  9. End the style rule.

Limitations of direct association


If you view a formatted document as a tree-like structure, you can solve the problem using a general method of tree transformation.

Style by transformation

DSSSL and XSLT

  DSSSL XSLT
1
(element para
<xsl:template match="para">
2
  (make paragraph
  <fo:block
3
    font-size: 12pt
    font-size="12pt"
4
    font-weight: 'normal
    font-weight="normal"
5
    start-indent: 0.5in
    start-indent="0.5in" 
6
    space-before: 6pt
    space-before="6pt">
7
    (process-children)
      <xsl:apply-templates/>
8
  )
  </fo:block>
9
)
</xsl:template>
  1. Select this rule for elements named para.
  2. Create a block formatting object.
  3. Set the font size to 12 points.
  4. Set the font weight to normal.
  5. Set the starting indent to 0.5 inches.
  6. Set the space before this block to 6 points. Close the fo:block begin tag.
  7. Process the children of the current element. Formatting objects produced by processing the children can inherit certain characteristics assigned by this transformation rule.
  8. Close the block formatting object.
  9. Close this node transformation rule.

The key to the power of this method is shown on line 7 of this example. This gives the stylesheet author full control over processing at each node in the source document tree. Normally, you would process the children of the current element. But you can choose instead to process selected children, or you can process any other element in the document in any way you want. You can process nodes based on the content or structure of the source document.

It is hard to overstate the power of this technique. There are few document processing problems that cannot be solved with a tree transformation specification.

Embedded formatting specifications

XSL-FO

  XSL-FO
1
<fo:block
2
    font-size="12pt"
3
    font-weight="normal"
4
    start-indent="0.5in"
5
    space-before="6pt">
6
  [element content]
7
</fo:block>
  1. Create a block formatting object.
  2. Set the font size to 12 points.
  3. Set the font weight to normal.
  4. Set the starting indent to 0.5 inches.
  5. Set the space before this block to 6 points. Close the fo:block begin tag.
  6. The content of this element will be formatted according to the specified characteristics (including any inherited characteristics).
  7. Close the block formatting object.

Old SGML hands learned to rigorously separate logical structure from formatting information. Now we have an entire document type devoted to expressing formatting information!

But an even more important principle of SGML is that you can design a document type to describe anything you want--including formatting specifications. Even if XSL-FO doesn't include all the formatting characteristics you need, you can easily extend it.

Delivery options

XML + CSS in a browser

XML+CSS
Advantages Disadvantages
Simple--easy to develop and implement Simple--doesn't support complex formatting requirements or dynamic processing
Flexible: use in standalone, file-based system or server-controlled system Not all browsers consistently support XML and CSS.

XML + XSLT to HTML

XML+XSLT=HTML
Advantages Disadvantages
Use built-in HTML semantics Limited by HTML
Moderately easy to develop and implement Not all browsers consistently support HTML and CSS
Full transformation capabilities of XSLT Server-based or other external process must be used for just-in-time transformations.

XML + XSLT to multiple outputs

XML+XSLT=nonXML
Advantages Disadvantages
Single-source documents and style information Requires higher investments in document design, data creation, and transformation specifications
Flexible--works with any production or delivery method  
Easy to maintain, modify, and troubleshoot  

Managing XML

Required skill sets

Deciding on architecture

Cost of implementation

See $GML: The billion dollar secret, by Chet Ensign, for case studies of successful SGML implementations.

Thank you!

Paul Tyson

paul@precisiondocuments.com

408-375-8851


Precision Documents

www.precisiondocuments.com