IBM Skip to main content
Search for:   within 
   Search help 
    IBM home  |  Products & services  |  Support & downloads   |  My account
IBM : developerWorks : XML education

Part 1: Transforming XML into HTML

Doug Tidwell
Cyber Evangelist, developerWorks
Published: November 1999
Updated: November 2000

Contents:
 Our contestants
 Process
 Technology
 Common
 XML into HTML
 Summary
 Resources
 About the author
 Appendix A
 Appendix B
 Appendix C
 Part 2: XML to SVG
 Part 3: XML to PDF

Part 1 of this Transforming XML documents tutorial shows you how to transform XML documents into HTML.

Let’s meet our contestants
For our transformations, we’ll use six source documents: a Shakespearean sonnet, a business letter, definitions for several technical terms, some spreadsheet data, a section from a technical manual, and a short section from Henry Fielding’s 18th-century British novel Tom Jones. This will give us a wide range of document types to transform.

As we discuss our different transformations, some of these documents will be more relevant than others. For example, converting spreadsheet data into a pie chart would be useful to many people; converting a sonnet into a pie chart, while an intriguing exercise, probably has a more narrow appeal. We’ll present a simple rendering of all our documents here. If you’d like to see the XML source for the documents and their DTDs, check out Appendix A – Sample XML documents.

A Shakespearean sonnet
This document is rendered line-by-line, with an indication of the rhyme scheme at the start of each line.

A business letter
This is a block-style business letter.

A small glossary
Here’s a short glossary with a few technical terms:

Sales figures
Here is our sales figures document, rendered as a table:

An excerpt from a technical manual
Here’s a short section of a technical manual:

An excerpt from Tom Jones
Here’s a chapter from the novel:

The transformation process
When we first started working with transformations, we assumed the process would work like this:

[Our thanks to David Epstein of IBM’s T.J. Watson Research Center for this schematic.]

In reality, though, a couple of factors break up this neat model:

With these things in mind, we present these revised diagrams, any or all of which may describe a development effort:

As you begin your own transformations, we hope your experience resembles our original, idealized process as much as possible.

Technologies for transformations
There are two basic technologies we’ll use for transforming documents. The first is the Extensible Stylesheet Language for Transformations, better known as XSL or XSLT. The second is Java code that uses the methods of the Document Object Model (DOM). Most of the time, it’s simpler to write an XSL stylesheet, but there may be times when a stylesheet can’t do what you want. In general, any time you’re transforming a document from one XML vocabulary to another, XSL is probably the best way to go.

On the other hand, if you’re transforming an XML document into something that isn’t a markup language, you’ll almost certainly want to write Java code instead. Writing Java code is more difficult, but it gives you complete control over the transformation. In our examples, we’ll use the simplest technology, whatever that happens to be.

XSL stylesheets
We’ll do many of our transformations with XSL stylesheets. An XSL stylesheet contains some number of templates, each of which describes how to transform a given element in the source document. In this tutorial, we’ll discuss the XSL templates we use in our sample stylesheets. If you’d like a more complete introduction to XSL, we highly recommend Ken Holman’s tutorial, Practical transformations using XSLT and XPath. (See
Resources.)

Appendix B contains sample stylesheets for HTML transformations used in this tutorial.

Java and the Document Object Model (DOM)
If you want or need complete control over your transformations, you can write Java code that uses the methods and interfaces of the DOM. The good news is that you can do anything you want; the bad news is that you have to do everything yourself. You can find more information about this kind of programming in our tutorial, XML Programming in Java.

Appendix C contains sample Java code to transform XML documents.

Common transformations
Before we get into the gory details of transforming our documents, let’s talk about what’s involved in doing the transformations. In general, we’ll start by looking at what information we have in the source document, and considering how we want to display that information in the new format. For our examples here, we’ll determine the output format ourselves; in the real world, the output format might be imposed upon us in many cases.

There are a number of common tasks involved in transforming an XML document. We’ll discuss those in general here, with more complete examples in our sample transformations.

Moving text
Very often you’ll need to change the order in which elements appear in your output document. For example, an XML document with names and addresses might be displayed with the last name or first name first at various times.

Moving text with XSL
To move items from one place to another with XSL, you can use <xsl:value-of> elements to insert the text of an element wherever you want. Here are excerpts from two example templates; sample one shows how to display the last name first, sample two shows how to display the first name first.

<!-- Example 1: last name first -->
<xsl:text>Name: </xsl:text>
<xsl:value-of select="last-name"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="first-name"/>

<!-- Example 2: first name first -->
<xsl:text>Name: </xsl:text>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="last-name"/>

Moving text with Java and the DOM
You can use Java and the DOM to change the order in which elements appear. Because you’re writing the code, you can search for, find, and manipulate any part of the document you want.

Sorting
Another common task is sorting elements. If you have an XML document that contains a number of data items, you could generate different views of that document by sorting on different elements. For example, an XML document containing names and addresses could be sorted by last name, city, state, or zip code.

Sorting with XSL
Sorting text with XSL is simple. Here’s part of an XSL template that sorts all of the <zip> elements within an XML document:

<xsl:for-each select="//zip">
   <xsl:sort order="ascending" select="."/>
   <xsl:value-of select="."/>
</xsl:for-each>

Sorting with Java and the DOM
Sorting data with Java is just like any other sorting task: You have to get the data (extracting it from the various elements in the DOM tree), sort the data, then output the data in sorted order. You can find a complete example of this in our
XML Programming in Java tutorial. Sorting elements with Java code is a lot more work than sorting with XSL.

Generating text
There are many times when you want to insert text into your output document. For example, you might want to insert the text “Total: ” before you display an actual total. We’ll assume for the sake of this example that the word “Total” doesn’t appear in the input document, so you can’t just extract it from an element.

Generating text with XSL
If you want to use XSLT to insert text into your output stream, simply use an <xsl:text> element in your template. Here’s an element that inserts the text “Total: ” into the output document:

<xsl:text>Total: </xsl:text>

It’s as simple as that!

Generating text with Java and the DOM
If you want to insert text into a DOM tree, you’ll need to create a Text node and insert it into the tree. Here’s some sample code:

x = 7;

[It’s as lame as a three-legged racehorse, but I love that joke.] Now here’s some relevant sample code:

Element containingElement = doc.createElement("xyz");
containingElement.appendChild(doc.createTextNode("Total: );

This sample creates an <xyz> element and adds a text node to it. Again, our XML programming in Java tutorial has complete details on building DOM trees and inserting things into them.

Numbering items
Many documents, particularly technical publications and legal documents, number certain items. You could create numbering like this for headings and subheadings, for example:

1. Quite a big heading

1.1. A less large heading

1.2. The penultimate heading

1.2.1. Most likely not a significant heading

Numbering items with XSL
To number items with XSL, use the <xsl:number> element. This allows you to number things in a variety of ways, and handles multiple levels of numbering if you wish.

<xsl:for-each select="//section">
   <h2>
     <xsl:number level="multiple" count="section" format="1.1.1. " />
     <xsl:value-of select="./title"/>
   </h2>
   <xsl:apply-templates select="./para"/>
</xsl:for-each>

In this example, the numbers are automatically generated by the stylesheet, with the level="multiple" attribute specifying that the numbers be calculated for multiple levels. Each <section> element is numbered automatically. If you add or delete <section> elements in the source document, the stylesheet automatically renumbers everything for you.

Numbering items with Java and the DOM
To number items using Java, you’ll have to do all the numbering yourself. In our current example, you would probably use a recursive algorithm so that you could handle any level of nested headings and subheadings. Each iteration of the algorithm would print the current heading, then all of the current heading’s children, then yield to the next sibling of the current heading. As with most of our examples here, the stylesheet is much simpler than the Java code.

Performing calculations
There may be times when you want to perform some sort of calculation based on the contents or structure of a document. For example, in our spreadsheet data, it would be useful to calculate the total sales per product or per region.

Performing calculations with XSL
There are a few basic number functions defined in the XPath specification. The function we’ll use in our example is sum, which takes a group of elements, converts their values to numbers, then returns the sum of those values. Here’s an example that calculates the sum of all the values of all the <product> elements:

<xsl:text>Total: </xsl:text>
<xsl:value-of select="sum(product)"/>

As you’d expect, this is much easier than writing the Java code to find all of these elements and do the math yourself.

Performing calculations with Java and the DOM
If you’d like to duplicate the functions of the template above in Java, you’d need to write code that found all the <product> elements you wanted to add together. Once you had a NodeList of those elements, you’d need to convert their values to numbers (handling any exceptions that might occur) and add them together yourself. While this isn’t the world’s most difficult programming task, it’s much more tedious than writing the XSL template in the previous section.

XML into HTML
The most common transformation task is converting an XML document into HTML. Because XML and HTML are both markup languages, we’ll write XSL stylesheets to transform our documents.

Transformation #1 – A Shakespearean sonnet
To transform this document, we’ll consider our original document and the output we want. Based on our desired output, we’ll need rules that generate the following HTML content:

[Begin HTML document]
[Output information about the title of the sonnet, the author, and the author’s background]
<table>
    [For each line in the sonnet:]
    <tr>
    <td>[rhyme scheme]</td>
    <td>[the actual line from the sonnet]</td>
    </tr>
</table>
[End HTML document]

Now let’s look at our original document and see how these rules apply.

First of all, we’ll create the template for the root element of our XML document. Because our document root is the <sonnet> element, that’s the element we’ll select in our root element template.

<xsl:template match="/">
   <xsl:apply-templates select="sonnet"/>
</xsl:template>

The entire HTML document is contained in an <html> element. Because our original source document is contained in the <sonnet> element, the transformation rule for the <sonnet> element generates the <html> element and everything inside it.

<xsl:template match="sonnet">
  <html>

   <head>
    <title>
     <xsl:value-of select="title"/>
    </title>
   </head>
   <body>
    <h3>
     <xsl:value-of select="title"/>
    </h3>

We use the <xsl:value-of> element to insert the text of the <title> element. Notice that by using this element, we can insert the text of this element anywhere we want.

<p>
  <xsl:text>Author: </xsl:text>
  <xsl:value-of select="author/first-name"/>
  <xsl:text> </xsl:text>
  <xsl:value-of select="author/last-name"/>
  <xsl:text> (</xsl:text>
  <i>
    <xsl:value-of select="author/nationality"/>
    <xsl:text>, </xsl:text>
    <xsl:value-of select="author/year-of-birth"/>
    <xsl:text>-</xsl:text>
    <xsl:value-of select="author/year-of-death"/>
  </i>
  <xsl:text>)</xsl:text>
</p>

In this section, we create a paragraph that contains the author’s name, nationality, and life span. Whenever we need to insert a literal character, such as a space, a dash, or a parenthesis, we use <xsl:text> elements.

<table border="0" colspec="30 *">
<xsl:apply-templates select="//line" />
</table>
</body>
</html>
</xsl:template>
;

Finally, we insert a <table> tag that contains all of the lines of the sonnet. The <line> elements in our original source document contain the lines of the sonnet, so we reference those lines here.

Within the table we just created, we need to generate a <tr> from each line of the sonnet. To do this, we’ll have a rule for each similar line. The rhyme scheme of a Shakespearan sonnet is abab cdcd efef gg, so we’ll have seven different rules for all the lines in the sonnet. Here are the rules for the first, second, third, and fourth lines of the sonnet:

<xsl:template match="line[1]|line[3]">
  <tr>
    <td align="right"><b><i>A</i></b></td>
      <td>
      <font color="green">
        <xsl:value-of select="." />
      </font>
    </td>
  </tr>
</xsl:template>

<xsl:template match="line[2]|line[4]">
  <tr>
    <td align="right">
      <b><i>B</i></b>
    </td>
    <td>
      <font color="purple">
        <xsl:value-of select="." />
      </font>
    </td>
  </tr>
</xsl:template>

The templates for the other lines of the sonnet are similar.

Transformation #1A – Mangling a Shakespearean sonnet
As a special added bonus, we’ll include another stylesheet that sorts the lines of our Shakespearean sonnet. We did this as an exercise in our
XML programming in Java tutorial, although in that tutorial we used Java code to sort the sonnet. Doing the same task with a stylesheet is much simpler; we simply use a template that sorts all of the <line> elements of the document:

<xsl:for-each select="//line">
  <xsl:sort order="ascending" select="."/>
  <xsl:value-of select="."/>
  <br name="x"/>
</xsl:for-each>

If you take a look at the sample Java code from our earlier tutorial, you’ll agree that this is much easier to write. Finally, a word about the <br name="x"/> tag above: we inserted the name="x" attribute so that the <br> tag itself is correctly processed by Netscape. It’s a kludge, but it works.

Also note that our select statement here uses the XPath notation //line. This tells the XSL processor to select all <line> elements, regardless of where they occur in the document. This notation can simplify the design of your templates, assuming that you want to process all the <line> elements as a group.

Although the output of our stylesheet is a correctly-sorted sonnet, it can’t be said that we’ve done much for the cause of poetry:

Sorted version of Sonnet 130

And in some perfumes is there more delight
And yet, by Heaven, I think my love as rare
As any she belied with false compare.
But no such roses see I in her cheeks.
Coral is far more red than her lips red.
If hairs be wires, black wires grow on her head.
If snow be white, why then her breasts are dun,
I grant I never saw a goddess go,
I have seen roses damasked, red and white,
I love to hear her speak, yet well I know
My mistress' eyes are nothing like the sun,
My mistress when she walks, treads on the ground.
Than in the breath that from my mistress reeks.
That music hath a far more pleasing sound
.

Transformation #2 – A business letter
Now that we’ve gotten our feet wet, we’ll do a much simpler transformation: Converting a business letter into HTML. The biggest technical challenge here is how to handle optional elements. In our DTD, items such as <subject> and <attention> are optional. If those elements exist, we want to output their contents to the HTML document. If not, we don’t want to putout anything. To handle this correctly, we use the <xsl:apply-templates> element:

<xsl:apply-templates select="subject"/>
<xsl:apply-templates select="attention"/>
<tr>
  <td colspan="2">
    <br x="7"/>
    <xsl:value-of select="salutation"/>
    <xsl:text>,</xsl:text>
  </td>
</tr>

In this example, if no <subject> or <attention> elements exist, the templates are never processed, and nothing is output. A template like this would not give us what we want:

<xsl:text>Subject: </xsl:text>
<xsl:value-of select="subject"/>

This doesn’t work because the text Subject: is always output, whether a <subject> element exists or not. The template for the <subject> element looks like this:

<xsl:template match="subject">
  <tr>
    <td colspan="2">
      <br x="7"/>
      <xsl:text>Subject: </xsl:text>
      <xsl:value-of select="."/>
    </td>
  </tr>
</xsl:template>

Notice that we put the <tr> and <td> tags inside the <subject> template. If no <subject> element exists, we don’t want those tags in the output.

Transformation #3 – A technical glossary
Formatting the information in our XML glossary file is fairly straightforward. The main complication here is that we want to create hyperlinks between terms. To do this, we need to create HTML <a> tags with the appropriate attributes. The <xsl:element> and <xsl:attribute> element tags are used to do this. Here’s an example:

<xsl:element name="a">
  <xsl:attribute name="name">
    <xsl:value-of select="./@id"/>
  </xsl:attribute>
</xsl:element>

In this template, we create an HTML <a> tag that contains a name attribute whose value is equal to the id attribute of the original <glentry> tag. Any text that we generate inside the <xsl:attribute> tag becomes part of the attribute we’re creating; any <xsl:attribute> tags become attributes of the <xsl:element> tag that contains them.

Another feature of this stylesheet is that we use the last() function when generating the title. This function lets us access the last node in a given set of nodes. In our case, the title of the HTML document contains the first term (glentry[1]/term) as well as the last (glentry[last()]/term). The output from the template below is “Glossary Listing: applet – zombie process.”

<title>
  <xsl:text>Glossary Listing: </xsl:text>
  <xsl:value-of select="glentry[1]/term"/>
  <xsl:text> - </xsl:text>
  <xsl:value-of select="glentry[last()]/term"/>
</title>

The final complication in generating the HTML text is that we need to retrieve the xreftext of the referenced glossary item. Our XML source looks like this:

<glentry id="GLE-applet">
  <term id="GLT-applet" xreftext="applet">
    applet
  </term>

   ...

<glentry id="GLE-servlet">
  <term id="GLT-servlet" xreftext="servlet">
    servlet
  </term>
<defn id="GLD-servlet-001">

    ...

    Contrast with <xref refid="GLT-applet" />.
  </defn>
</glentry>

What we want is for the sentence that begins “Contrast with...” to contain the text retrieved from the referenced item. To perform this bit of magic, we'll use the key function, defined with an <xsl:key> tag. There are two steps to setting up a key. The first is to define the function itself:

<xsl:key name="termref" match="/glossary/glentry/term" use="@id"/>

This tag has three attributes. The first, name, defines the name of the key function. When we're ready to use this key to look up the text of a reference, we'll refer to the key by this name. Next, the match attribute defines the nodes that are part of the key. Finally, the use attribute defines exactly which part of those nodes is used as the key value. So in our example above, we'll use the name termref whenever we need to look up a reference, the nodes we'll look at are all the <term> elements inside the <glentry> elements inside the <glossary> elements, and the value we'll be looking for is the id attribute of the <term> tag.

Our key function has defined a group of <term> elements. Whenever we want to find one, we will pass the key function the refid value we're looking for. The key function returns a node, from which we'll select the xreftext attribute. The value of that attribute is the text of the HTML hyperlink that appears in the browser.

Now that we've defined the key function, we can create our template to retrieve the text for the cross-reference:

<xsl:template match="xref">
  <xsl:element name="a">
   <xsl:attribute name="href">
    <xsl:text>#</xsl:text>
    <xsl:value-of select="@refid"/>
   </xsl:attribute>
   <xsl:value-of select="key('termref', @refid)/@xreftext"/>
  </xsl:element>
</xsl:template>

We refer to our key function in the select attribute of <xsl:value-of>. Notice that the function returns a node, from which we then extract the xreftext attribute. This stylesheet allows us to build hyperlinks automatically, and it makes it easy to manage the text of those links. Because the text of the link is built from the referenced element, changing the text of the link in one place changes it throughout our document. The key function adds a level of complexity to our stylesheet, but it is very powerful and flexible.

Transformation #4 – A sales report
To create our sales report, we need to total the sales per region, as well as the total sales for the entire company. To do this, we’ll use two functions: sum, defined in the XPath specification, and format-number, defined by XSLT. Here are the two templates that define this information:

<xsl:template match="region">
  <tr>
    <td rowspan="6" valign="center" align="right" width="300">
      <font size="+2">
        <xsl:value-of select="name"/>
      </font>
    </td>
    <td align="right" width="150">
      <xsl:apply-templates select="product[1]"/>
    </td>
  </tr>
  ...
  <tr>
    <td align="right" width="150">
      <font color="green" size="+2">
        <b>
          <xsl:text>Total: </xsl:text>
          <xsl:value-of select="sum(product)"/>
        </b>
      </font>
    </td>
  </tr>
</xsl:template>

The first <tr> created by the template contains the name of the region in column one, followed by the list of all product sales figures for that region. Once all of the product sales figures for a particular region are output, the last <tr> contains the total sales for that region.

Notice that the first cell has a rowspan of 6; this assumes that there are five <product> tags within each <region>. You could modify this template so that it would correctly handle any number of <product> tags; we’ll leave this as an exercise for the reader. A simpler approach would be to create a new report format to avoid the problem altogether.

Once all of the <region> tags have been processed, we use the sum function a final time to print the total sales for all <product> tags in all <region>s:

<tr>
  <td colspan="2" align="right">
    <font color="green" size="+3">
      <xsl:text>Total sales for all regions: </xsl:text>
      <xsl:value-of
        select="format-number(sum(//product), '$#,##0.0')"/>
    </font>
  </td>
</tr>

The XPath notation //product tells the XSL processor to select all <product> elements, regardless of where they appear in the document. The sum function adds up all of the <product> elements, and the format-number function formats the actual total with the requested currency symbols, commas, and decimal places.

Transformation #5 – A technical manual
The significant topic here is that our stylesheet uses the <xsl:number> element to automatically number the sections of the document. There are a couple of important points about the way we built our sample document and our stylesheet.

First of all, our XML source document uses a single tag, <section>, to indicate the sections of the document. Any <section> element can contain any number of nested <section> elements, and they can be nested to any level of depth.

Secondly, the title of each <section> element is defined in the <title> tag. This means that as we’re generating the heading for a given section, we need both the value of the <xsl:number> element, followed by the text of the <title> element inside the section.

Here is the template at the heart of the stylesheet:


<xsl:for-each select="//section">>
  <h2>>
    <xsl:number level="multiple" count="section" 
                format="1.1.1. " />
    <xsl:value-of select="./title"/>
  </h2>
  <xsl:apply-templates select="./para"/>
</xsl:for-each>

After outputting the heading for the section, we call the template for the <para> elements. As with our sorting transformation, the stylesheet is much simpler than the Java code necessary to do the same thing.

Notice that this template puts all of the headings inside an <h2> tag. If you wanted to use different tags for different levels (for example, use an <h1> tag for the heading of a first-level section, an <h2> tag for the heading of a second-level section, etc.), the stylesheet would be more complicated.

Transformation #6 – An excerpt from a novel
This stylesheet is fairly straightforward; the main technical complication here is that we invoke certain templates by name. We do this with the <xsl:call-template> element. This allows us to invoke a particular template in a particular situation. Here’s an example:

<xsl:template match="chapter">
  <xsl:copy>
    <center>
      <xsl:call-template name="h3"/>
      <xsl:apply-templates select="caption"/>
    </center>
    <xsl:apply-templates select="body/para"/>
  </xsl:copy>
</xsl:template>

<xsl:template name="h3">
  <h3>
    <xsl:text>Chapter </xsl:text>
    <xsl:value-of select="@name"/>
  </h3>
</xsl:template>

The template named h3 creates a heading based on the name attribute of the <chapter> tag. We used <xsl:call-template> for education purposes only; we could replace the two templates above with the following:

<xsl:template match="chapter">
  <center>
    <h3>
      <xsl:text>Chapter </xsl:text>
      <xsl:value-of select="./@name"/>
    </h3>
    <xsl:apply-templates select="caption"/>
  </center>
  <xsl:apply-template select="body/para"/>
</xsl:template>

Summary
Well, by this point, we’ve done just about everything you’ll commonly do with XSLT. Although none of these examples is terribly complicated, they should give you a good idea of XSLT’s capabilities. Transforming documents into HTML is just the beginning, though; we’ll expand this tutorial to include several other transformations:

  • XML into Scalable Vector Graphics (SVG): Convert our XML documents into pie charts and other graphical formats defined with the W3C’s SVG markup language.
  • XML into PDF: Convert our XML documents into PDF files. We’ll use James Tauber’s FOP (Formatting Objects to PDF) tool for this.

If there are other transformations you’d like to see, let us know. As always, we’d love to hear your comments, questions, complaints, and suggestions about this tutorial.


Resources

About the author
Doug Tidwell is a Senior Programmer and Cyber Evangelist at IBM. He has more than a seventh of a century of programming experience and has been working with XML-like applications for several years. His work as a Cyber Evangelist is basically to look busy, and to help customers evaluate and implement XML technology. Using a specially designed pair of zircon-encrusted tweezers, he holds a Masters Degree in Computer Science from Vanderbilt University and a Bachelors Degree in English from the University of Georgia. He can be reached at
dtidwell@us.ibm.com.

   
 
What do you think of this article?

Killer! Good stuff So-so; not bad Needs work Lame!

Comments?


 
     
  About IBM  |  Privacy  |  Legal  |  Contact