|
Part 1: Transforming XML into HTML
Doug Tidwell
Cyber Evangelist, developerWorks
Published: November 1999
Updated: November 2000
Part 1 of this Transforming XML documents tutorial shows you how to transform XML documents into HTML.
Let’s meet our contestants
For
our transformations, we’ll use six source documents: a Shakespearean sonnet,
a business letter, definitions for several technical terms, some spreadsheet
data, a section from a technical manual, and a short section from Henry Fielding’s
18th-century British novel Tom Jones. This will give us a wide range of document types to transform.
As
we discuss our different transformations, some of these documents will be
more relevant than others. For example, converting spreadsheet data into
a pie chart would be useful to many people; converting a sonnet into a pie
chart, while an intriguing exercise, probably has a more narrow appeal. We’ll
present a simple rendering of all our documents here. If you’d like to see
the XML source for the documents and their DTDs, check out Appendix A – Sample XML documents.
A Shakespearean sonnet
This document is rendered line-by-line, with an indication of the rhyme scheme at the start of each line.

A business letter
This is a block-style business letter.

A small glossary
Here’s a short glossary with a few technical terms:

Sales figures
Here is our sales figures document, rendered as a table:

An excerpt from a technical manual
Here’s a short section of a technical manual:

An excerpt from Tom Jones
Here’s a chapter from the novel:

The transformation process
When we first started working with transformations, we assumed the process would work like this:

[Our thanks to David Epstein of IBM’s T.J. Watson Research Center for this schematic.]
In reality, though, a couple of factors break up this neat model:
With these things in mind, we present these revised diagrams, any or all of which may describe a development effort:

As you begin your own transformations, we hope your experience resembles our original, idealized process as much as possible.
Technologies for transformations
There
are two basic technologies we’ll use for transforming documents. The first
is the Extensible Stylesheet Language for Transformations, better known as
XSL or XSLT. The second is Java code that uses the methods of the Document
Object Model (DOM). Most of the time, it’s simpler to write an XSL stylesheet,
but there may be times when a stylesheet can’t do what you want. In general,
any time you’re transforming a document from one XML vocabulary to another,
XSL is probably the best way to go.
On
the other hand, if you’re transforming an XML document into something that
isn’t a markup language, you’ll almost certainly want to write Java code
instead. Writing Java code is more difficult, but it gives you complete control
over the transformation. In our examples, we’ll use the simplest technology,
whatever that happens to be.
XSL stylesheets
We’ll
do many of our transformations with XSL stylesheets. An XSL stylesheet contains
some number of templates, each of which describes how to transform a given
element in the source document. In this tutorial, we’ll discuss the XSL templates
we use in our sample stylesheets. If you’d like a more complete introduction
to XSL, we highly recommend Ken Holman’s tutorial, Practical transformations using XSLT and XPath. (See Resources.)
Appendix B contains sample stylesheets for HTML transformations used in this tutorial.
Java and the Document Object Model (DOM)
If
you want or need complete control over your transformations, you can write
Java code that uses the methods and interfaces of the DOM. The good news
is that you can do anything you want; the bad news is that you have to do
everything yourself. You can find more information about this kind of programming
in our tutorial, XML Programming in Java.
Appendix C contains sample Java code to transform XML documents.
Common transformations
Before
we get into the gory details of transforming our documents, let’s talk about
what’s involved in doing the transformations. In general, we’ll start by
looking at what information we have in the source document, and considering
how we want to display that information in the new format. For our examples
here, we’ll determine the output format ourselves; in the real world, the
output format might be imposed upon us in many cases.
There
are a number of common tasks involved in transforming an XML document. We’ll
discuss those in general here, with more complete examples in our sample
transformations.
Moving text
Very
often you’ll need to change the order in which elements appear in your output
document. For example, an XML document with names and addresses might be
displayed with the last name or first name first at various times.
Moving text with XSL
To
move items from one place to another with XSL, you can use <xsl:value-of>
elements to insert the text of an element wherever you want. Here are excerpts
from two example templates; sample one shows how to display the last name
first, sample two shows how to display the first name first.
<!-- Example 1: last name first -->
<xsl:text>Name: </xsl:text>
<xsl:value-of select="last-name"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="first-name"/>
<!-- Example 2: first name first -->
<xsl:text>Name: </xsl:text>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="last-name"/>
Moving text with Java and the DOM
You
can use Java and the DOM to change the order in which elements appear. Because
you’re writing the code, you can search for, find, and manipulate any part
of the document you want.
Sorting
Another
common task is sorting elements. If you have an XML document that contains
a number of data items, you could generate different views of that document
by sorting on different elements. For example, an XML document containing
names and addresses could be sorted by last name, city, state, or zip code.
Sorting with XSL
Sorting
text with XSL is simple. Here’s part of an XSL template that sorts all of
the <zip> elements within an XML document:
<xsl:for-each select="//zip">
<xsl:sort order="ascending" select="."/>
<xsl:value-of select="."/>
</xsl:for-each>
Sorting with Java and the DOM
Sorting
data with Java is just like any other sorting task: You have to get the data
(extracting it from the various elements in the DOM tree), sort the data,
then output the data in sorted order. You can find a complete example of
this in our XML Programming in Java tutorial. Sorting elements with Java code is a lot more work than sorting with XSL.
Generating text
There
are many times when you want to insert text into your output document. For
example, you might want to insert the text “Total: ” before you display an
actual total. We’ll assume for the sake of this example that the word “Total”
doesn’t appear in the input document, so you can’t just extract it from an
element.
Generating text with XSL
If
you want to use XSLT to insert text into your output stream, simply use an
<xsl:text> element in your template. Here’s an element that inserts
the text “Total: ” into the output document:
<xsl:text>Total: </xsl:text>
It’s as simple as that!
Generating text with Java and the DOM
If
you want to insert text into a DOM tree, you’ll need to create a Text node
and insert it into the tree. Here’s some sample code:
x = 7;
[It’s as lame as a three-legged racehorse, but I love that joke.] Now here’s some relevant sample code:
Element containingElement = doc.createElement("xyz");
containingElement.appendChild(doc.createTextNode("Total: );
This sample creates an <xyz> element and adds a text node to it. Again, our XML programming in Java tutorial has complete details on building DOM trees and inserting things into them.
Numbering items
Many
documents, particularly technical publications and legal documents, number
certain items. You could create numbering like this for headings and subheadings,
for example:
1. Quite a big heading
1.1. A less large heading
1.2. The penultimate heading
1.2.1. Most likely not a significant heading |
Numbering items with XSL
To
number items with XSL, use the <xsl:number> element. This allows you
to number things in a variety of ways, and handles multiple levels of numbering
if you wish.
<xsl:for-each select="//section">
<h2>
<xsl:number level="multiple" count="section" format="1.1.1. " />
<xsl:value-of select="./title"/>
</h2>
<xsl:apply-templates select="./para"/>
</xsl:for-each>
In
this example, the numbers are automatically generated by the stylesheet,
with the level="multiple" attribute specifying that the numbers be calculated
for multiple levels. Each <section> element is numbered automatically.
If you add or delete <section> elements in the source document, the
stylesheet automatically renumbers everything for you.
Numbering items with Java and the DOM
To
number items using Java, you’ll have to do all the numbering yourself. In
our current example, you would probably use a recursive algorithm so that
you could handle any level of nested headings and subheadings. Each iteration
of the algorithm would print the current heading, then all of the current
heading’s children, then yield to the next sibling of the current heading.
As with most of our examples here, the stylesheet is much simpler than the
Java code.
Performing calculations
There
may be times when you want to perform some sort of calculation based on the
contents or structure of a document. For example, in our spreadsheet data,
it would be useful to calculate the total sales per product or per region.
Performing calculations with XSL
There
are a few basic number functions defined in the XPath specification. The
function we’ll use in our example is sum, which takes a group of elements,
converts their values to numbers, then returns the sum of those values. Here’s
an example that calculates the sum of all the values of all the <product>
elements:
<xsl:text>Total: </xsl:text>
<xsl:value-of select="sum(product)"/>
As you’d expect, this is much easier than writing the Java code to find all of these elements and do the math yourself.
Performing calculations with Java and the DOM
If
you’d like to duplicate the functions of the template above in Java, you’d
need to write code that found all the <product> elements you wanted
to add together. Once you had a NodeList of those elements, you’d need to
convert their values to numbers (handling any exceptions that might occur)
and add them together yourself. While this isn’t the world’s most difficult
programming task, it’s much more tedious than writing the XSL template in
the previous section.
XML into HTML
The
most common transformation task is converting an XML document into HTML.
Because XML and HTML are both markup languages, we’ll write XSL stylesheets
to transform our documents.
Transformation #1 – A Shakespearean sonnet
To
transform this document, we’ll consider our original document and the output
we want. Based on our desired output, we’ll need rules that generate the
following HTML content:
[Begin HTML document]
[Output information about the title of the sonnet, the author, and the author’s background]
<table>
[For each line in the sonnet:]
<tr>
<td>[rhyme scheme]</td>
<td>[the actual line from the sonnet]</td>
</tr>
</table>
[End HTML document]
Now let’s look at our original document and see how these rules apply.
First
of all, we’ll create the template for the root element of our XML document.
Because our document root is the <sonnet> element, that’s the element
we’ll select in our root element template.
<xsl:template match="/">
<xsl:apply-templates select="sonnet"/>
</xsl:template>
The
entire HTML document is contained in an <html> element. Because our
original source document is contained in the <sonnet> element, the
transformation rule for the <sonnet> element generates the <html>
element and everything inside it.
<xsl:template match="sonnet">
<html>
<head>
<title>
<xsl:value-of select="title"/>
</title>
</head>
<body>
<h3>
<xsl:value-of select="title"/>
</h3>
We
use the <xsl:value-of> element to insert the text of the <title>
element. Notice that by using this element, we can insert the text of this
element anywhere we want.
<p>
<xsl:text>Author: </xsl:text>
<xsl:value-of select="author/first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="author/last-name"/>
<xsl:text> (</xsl:text>
<i>
<xsl:value-of select="author/nationality"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="author/year-of-birth"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="author/year-of-death"/>
</i>
<xsl:text>)</xsl:text>
</p>
In
this section, we create a paragraph that contains the author’s name, nationality,
and life span. Whenever we need to insert a literal character, such as a
space, a dash, or a parenthesis, we use <xsl:text> elements.
<table border="0" colspec="30 *">
<xsl:apply-templates select="//line" />
</table>
</body>
</html>
</xsl:template>;
Finally,
we insert a <table> tag that contains all of the lines of the sonnet.
The <line> elements in our original source document contain the lines
of the sonnet, so we reference those lines here.
Within
the table we just created, we need to generate a <tr> from each line
of the sonnet. To do this, we’ll have a rule for each similar line. The rhyme
scheme of a Shakespearan sonnet is abab cdcd efef gg, so we’ll have seven
different rules for all the lines in the sonnet. Here are the rules for the
first, second, third, and fourth lines of the sonnet:
<xsl:template match="line[1]|line[3]">
<tr>
<td align="right"><b><i>A</i></b></td>
<td>
<font color="green">
<xsl:value-of select="." />
</font>
</td>
</tr>
</xsl:template>
<xsl:template match="line[2]|line[4]">
<tr>
<td align="right">
<b><i>B</i></b>
</td>
<td>
<font color="purple">
<xsl:value-of select="." />
</font>
</td>
</tr>
</xsl:template>
The templates for the other lines of the sonnet are similar.
Transformation #1A – Mangling a Shakespearean sonnet
As
a special added bonus, we’ll include another stylesheet that sorts the lines
of our Shakespearean sonnet. We did this as an exercise in our XML programming in Java
tutorial, although in that tutorial we used Java code to sort the sonnet.
Doing the same task with a stylesheet is much simpler; we simply use a template
that sorts all of the <line> elements of the document:
<xsl:for-each select="//line">
<xsl:sort order="ascending" select="."/>
<xsl:value-of select="."/>
<br name="x"/>
</xsl:for-each>
If
you take a look at the sample Java code from our earlier tutorial, you’ll
agree that this is much easier to write. Finally, a word about the <br
name="x"/> tag above: we inserted the name="x" attribute so that the <br>
tag itself is correctly processed by Netscape. It’s a kludge, but it works.
Also
note that our select statement here uses the XPath notation //line. This
tells the XSL processor to select all <line> elements, regardless of
where they occur in the document. This notation can simplify the design of
your templates, assuming that you want to process all the <line> elements
as a group.
Although
the output of our stylesheet is a correctly-sorted sonnet, it can’t be said
that we’ve done much for the cause of poetry:
|
Sorted version of Sonnet 130
And in some perfumes is there more delight
And yet, by Heaven, I think my love as rare
As any she belied with false compare.
But no such roses see I in her cheeks.
Coral is far more red than her lips red.
If hairs be wires, black wires grow on her head.
If snow be white, why then her breasts are dun,
I grant I never saw a goddess go,
I have seen roses damasked, red and white,
I love to hear her speak, yet well I know
My mistress' eyes are nothing like the sun,
My mistress when she walks, treads on the ground.
Than in the breath that from my mistress reeks.
That music hath a far more pleasing sound.
|
Transformation #2 – A business letter
Now
that we’ve gotten our feet wet, we’ll do a much simpler transformation: Converting
a business letter into HTML. The biggest technical challenge here is how
to handle optional elements. In our DTD, items such as <subject> and
<attention> are optional. If those elements exist, we want to output
their contents to the HTML document. If not, we don’t want to putout anything.
To handle this correctly, we use the <xsl:apply-templates> element:
<xsl:apply-templates select="subject"/>
<xsl:apply-templates select="attention"/>
<tr>
<td colspan="2">
<br x="7"/>
<xsl:value-of select="salutation"/>
<xsl:text>,</xsl:text>
</td>
</tr>
In
this example, if no <subject> or <attention> elements exist,
the templates are never processed, and nothing is output. A template like
this would not give us what we want:
<xsl:text>Subject: </xsl:text>
<xsl:value-of select="subject"/>
This
doesn’t work because the text Subject: is always output, whether a <subject>
element exists or not. The template for the <subject> element looks
like this:
<xsl:template match="subject">
<tr>
<td colspan="2">
<br x="7"/>
<xsl:text>Subject: </xsl:text>
<xsl:value-of select="."/>
</td>
</tr>
</xsl:template>
Notice
that we put the <tr> and <td> tags inside the <subject>
template. If no <subject> element exists, we don’t want those tags
in the output.
Transformation #3 – A technical glossary
Formatting
the information in our XML glossary file is fairly straightforward. The main
complication here is that we want to create hyperlinks between terms. To
do this, we need to create HTML <a> tags with the appropriate attributes.
The <xsl:element> and <xsl:attribute> element tags are used to
do this. Here’s an example:
<xsl:element name="a">
<xsl:attribute name="name">
<xsl:value-of select="./@id"/>
</xsl:attribute>
</xsl:element>
In
this template, we create an HTML <a> tag that contains a name attribute
whose value is equal to the id attribute of the original <glentry>
tag. Any text that we generate inside the <xsl:attribute> tag becomes
part of the attribute we’re creating; any <xsl:attribute> tags become
attributes of the <xsl:element> tag that contains them.
Another
feature of this stylesheet is that we use the last() function when generating
the title. This function lets us access the last node in a given set of nodes.
In our case, the title of the HTML document contains the first term (glentry[1]/term)
as well as the last (glentry[last()]/term). The output from the template
below is “Glossary Listing: applet – zombie process.”
<title>
<xsl:text>Glossary Listing: </xsl:text>
<xsl:value-of select="glentry[1]/term"/>
<xsl:text> - </xsl:text>
<xsl:value-of select="glentry[last()]/term"/>
</title>
The
final complication in generating the HTML text is that we need to retrieve
the xreftext of the referenced glossary item. Our XML source looks like this:
<glentry id="GLE-applet">
<term id="GLT-applet" xreftext="applet">
applet
</term>
...
<glentry id="GLE-servlet">
<term id="GLT-servlet" xreftext="servlet">
servlet
</term>
<defn id="GLD-servlet-001">
...
Contrast with <xref refid="GLT-applet" />.
</defn>
</glentry>
What
we want is for the sentence that begins “Contrast with...” to contain the
text retrieved from the referenced item. To perform this bit of magic, we'll
use the key function, defined with an <xsl:key> tag. There are two
steps to setting up a key. The first is to define the function itself:
<xsl:key name="termref" match="/glossary/glentry/term" use="@id"/>
This
tag has three attributes. The first, name, defines the name of the key function.
When we're ready to use this key to look up the text of a reference, we'll
refer to the key by this name. Next, the match attribute defines the nodes
that are part of the key. Finally, the use attribute defines exactly which
part of those nodes is used as the key value. So in our example above, we'll
use the name termref whenever we need to look up a reference, the nodes we'll
look at are all the <term> elements inside the <glentry> elements
inside the <glossary> elements, and the value we'll be looking for
is the id attribute of the <term> tag.
Our
key function has defined a group of <term> elements. Whenever we want
to find one, we will pass the key function the refid value we're looking
for. The key function returns a node, from which we'll select the xreftext
attribute. The value of that attribute is the text of the HTML hyperlink
that appears in the browser.
Now that we've defined the key function, we can create our template to retrieve the text for the cross-reference:
<xsl:template match="xref">
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:text>#</xsl:text>
<xsl:value-of select="@refid"/>
</xsl:attribute>
<xsl:value-of select="key('termref', @refid)/@xreftext"/>
</xsl:element>
</xsl:template>
We
refer to our key function in the select attribute of <xsl:value-of>.
Notice that the function returns a node, from which we then extract the
xreftext attribute. This stylesheet allows us to build hyperlinks automatically,
and it makes it easy to manage the text of those links. Because the text
of the link is built from the referenced element, changing the text of the
link in one place changes it throughout our document. The key function adds
a level of complexity to our stylesheet, but it is very powerful and flexible.
Transformation #4 – A sales report
To
create our sales report, we need to total the sales per region, as well as
the total sales for the entire company. To do this, we’ll use two functions:
sum, defined in the XPath specification, and format-number, defined by XSLT.
Here are the two templates that define this information:
<xsl:template match="region">
<tr>
<td rowspan="6" valign="center" align="right" width="300">
<font size="+2">
<xsl:value-of select="name"/>
</font>
</td>
<td align="right" width="150">
<xsl:apply-templates select="product[1]"/>
</td>
</tr>
...
<tr>
<td align="right" width="150">
<font color="green" size="+2">
<b>
<xsl:text>Total: </xsl:text>
<xsl:value-of select="sum(product)"/>
</b>
</font>
</td>
</tr>
</xsl:template>
The
first <tr> created by the template contains the name of the region
in column one, followed by the list of all product sales figures for that
region. Once all of the product sales figures for a particular region are
output, the last <tr> contains the total sales for that region.
Notice
that the first cell has a rowspan of 6; this assumes that there are five
<product> tags within each <region>. You could modify this template
so that it would correctly handle any number of <product> tags; we’ll
leave this as an exercise for the reader. A simpler approach would be to
create a new report format to avoid the problem altogether.
Once
all of the <region> tags have been processed, we use the sum function
a final time to print the total sales for all <product> tags in all
<region>s:
<tr>
<td colspan="2" align="right">
<font color="green" size="+3">
<xsl:text>Total sales for all regions: </xsl:text>
<xsl:value-of
select="format-number(sum(//product), '$#,##0.0')"/>
</font>
</td>
</tr>
The
XPath notation //product tells the XSL processor to select all <product>
elements, regardless of where they appear in the document. The sum function
adds up all of the <product> elements, and the format-number function
formats the actual total with the requested currency symbols, commas, and
decimal places.
Transformation #5 – A technical manual
The
significant topic here is that our stylesheet uses the <xsl:number>
element to automatically number the sections of the document. There are a
couple of important points about the way we built our sample document and
our stylesheet.
First
of all, our XML source document uses a single tag, <section>, to indicate
the sections of the document. Any <section> element can contain any
number of nested <section> elements, and they can be nested to any
level of depth.
Secondly,
the title of each <section> element is defined in the <title>
tag. This means that as we’re generating the heading for a given section,
we need both the value of the <xsl:number> element, followed by the
text of the <title> element inside the section.
Here is the template at the heart of the stylesheet:
<xsl:for-each select="//section">>
<h2>>
<xsl:number level="multiple" count="section"
format="1.1.1. " />
<xsl:value-of select="./title"/>
</h2>
<xsl:apply-templates select="./para"/>
</xsl:for-each>
After
outputting the heading for the section, we call the template for the <para>
elements. As with our sorting transformation, the stylesheet is much simpler
than the Java code necessary to do the same thing.
Notice
that this template puts all of the headings inside an <h2> tag. If
you wanted to use different tags for different levels (for example, use an
<h1> tag for the heading of a first-level section, an <h2> tag
for the heading of a second-level section, etc.), the stylesheet would be
more complicated.
Transformation #6 – An excerpt from a novel
This
stylesheet is fairly straightforward; the main technical complication here
is that we invoke certain templates by name. We do this with the <xsl:call-template>
element. This allows us to invoke a particular template in a particular situation.
Here’s an example:
<xsl:template match="chapter">
<xsl:copy>
<center>
<xsl:call-template name="h3"/>
<xsl:apply-templates select="caption"/>
</center>
<xsl:apply-templates select="body/para"/>
</xsl:copy>
</xsl:template>
<xsl:template name="h3">
<h3>
<xsl:text>Chapter </xsl:text>
<xsl:value-of select="@name"/>
</h3>
</xsl:template>
The
template named h3 creates a heading based on the name attribute of the <chapter>
tag. We used <xsl:call-template> for education purposes only; we could
replace the two templates above with the following:
<xsl:template match="chapter">
<center>
<h3>
<xsl:text>Chapter </xsl:text>
<xsl:value-of select="./@name"/>
</h3>
<xsl:apply-templates select="caption"/>
</center>
<xsl:apply-template select="body/para"/>
</xsl:template>
Summary
Well,
by this point, we’ve done just about everything you’ll commonly do with XSLT.
Although none of these examples is terribly complicated, they should give
you a good idea of XSLT’s capabilities. Transforming documents into HTML
is just the beginning, though; we’ll expand this tutorial to include several
other transformations:
XML into Scalable Vector Graphics (SVG): Convert our XML documents into pie charts and other graphical formats defined with the W3C’s SVG markup language.
- XML into PDF: Convert our XML documents into PDF files. We’ll use James Tauber’s FOP (Formatting Objects to PDF) tool for this.
If
there are other transformations you’d like to see, let us know. As always,
we’d love to hear your comments, questions, complaints, and suggestions about
this tutorial.
Resources
About the author
Doug Tidwell is a Senior Programmer and Cyber Evangelist at IBM. He has
more than a seventh of a century of programming experience and has been working
with XML-like applications for several years. His work as a Cyber Evangelist
is basically to look busy, and to help customers evaluate and implement XML
technology. Using a specially designed pair of zircon-encrusted tweezers,
he holds a Masters Degree in Computer Science from Vanderbilt University
and a Bachelors Degree in English from the University of Georgia. He can
be reached at dtidwell@us.ibm.com.
|