Printing from XML: An Introduction to XSL-FO
Dave Pawson is the author of XSL-FO: Making XML Look Good in Print
One of the issues many users face when introduced to the production of print from XML is that of page layout. Without having the page layout right, its unlikely that much progress will be made. By way of introducing the W3C XSL Formatting Objects recommendation, I want to present a simplified approach that will enable a new user to gain a foothold with page layout.
The aim of this article is to produce that first page of output -- call it the "Hello World" program -- with enough information to allow a user to move on to more useful things. I'll introduce the most straightforward of page layouts for XSL-FO, using as few of the elements needed as I can to obtain reasonable output.
One of the problems is that, unlike the production of an HTML document
from an XML source using XSLT, the processing of the children of the root
elements is not a simple
xsl:apply-templates from within a
root element. Much more initial output is required in order to enable the
formatter to generate the pages.
Let's look at the processing necessary to get from your XML document to a PDF printable document. First, the XML must be fed to an XSLT processor with an appropriate stylesheet (developed below) in order to produce another XML document which uses the XSL-FO namespace and is intended for an XSL-FO formatter. The second stage is to feed the output of the first stage to the XSL-FO formatter, which can then produce the end product: a printable document, styled for visual presentation.
XML -> XSLT XSL-FO -> XSL-FO printable document engine document formatter document ^ | XSLT stylesheet
This approach has the advantage that the XML source document is still format neutral and may be used with other XSLT stylesheets to produce other media.
We need to be aware of the initial target of the XSLT transformation, the XSL-FO document. The document you are producing, which is fed to the XSL-FO formatter, contains a small number of elements:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set>  <fo:simple-page-master master-name="simple" >  <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple">  <fo:flow flow-name="xsl-region-body">  content  </fo:flow> </fo:page-sequence> </fo:root>
Let's look at each of the identified elements in turn.
 In order to layout content on a page, the formatter needs
to know what sizes it has to deal with. The
contains the 
simple-page-master which contains
this information, e.g. whether you use a European A4 page size or an
American US-letter size. It also contains the
element, which may be seen as the main body of the page layout.
 In order to support complex pagination, the
page-sequence element is used. For a simple page layout, very
little content is required here, other than to refer back to a particular
page definition (the
Also within the
page-sequence element is a
flow element . The idea of a flow may or may not be
familiar to you. I came across it using desktop publishing packages, where
I poured text into page areas to build up columns for a college magazine,
hence the content flowed into page areas.
Identifying which region of the page to pour the text into is the
rationale for the
xsl-region-body. This differentiates the
body of the page from the outer areas (margins, header, footer etc.) of
the page. Finally, some content , which is a child of the main
flow. Simple text cannot be inserted here, since the formatter would have
to guess what you wanted to do with it, so the real content for the flow
would take the form of
<fo:block>content</fo:block> which defines a
block of text (rectangular in shape, big as you like, taking a full list
of defaults for everything) which will be placed as the first item on the
In order to get a better grasp of all this, let's fill out, minimally, how it might fit into a stylesheet whose task is to take a simple XML document and produce another XML document, which is then fed to an XSL-FO formatter.
A basic XSLT stylesheet to produce XSL-FO is shown below.
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:fo="http://www.w3.org/1999/XSL/Format">  version="1.0"> <xsl:output method="xml"/>  <xsl:template match="/"> ....  </xsl:template> Other templates go here.  </xsl:stylesheet>
In  and  we see the namespaces, respectively, of the XSLT and FO content in this document, which differentiates transformation requests from output content.
If the XSLT engine sees content in the FO namespace, it simply writes it to the output, which is exactly what we want.  says that we want the output document to be valid XML, which is just what an XSL-FO document is, an XML document.  is the root template, which fires first, hence this is the point at which we add the essential outline content mentioned above.
Finally, at , we can start to add useful processing. We can now combine the two snippets above to do something useful. What we have below is a complete XSLT stylesheet, which is used by the XSLT engine to produce a valid XSL-FO document.
by Dave Pawson
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> version="1.0"> <xsl:output method="xml"/> <xsl:template match="/"> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="simple" page-height ="29.7cm"  page-width ="21cm" margin-left ="2.5cm" margin-right ="2.5cm"> <fo:region-body margin-top="3cm"/>  </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple">  <fo:flow flow-name="xsl-region-body">  <xsl:apply-templates/>  </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="document">  <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="head">  <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="para">  <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="em">  <fo:inline font-style="italic"> <xsl:apply-templates/> </fo:inline> </xsl:template> <xsl:template match="*">  <fo:block background-color="red"> <xsl:apply-templates/> </fo:block> </xsl:template> </xsl:stylesheet>
Before explaining the structure, the source document for which we are
designing this stylesheet should be mentioned. I'm assuming a feed from a
document class which has 4 elements, with the structure as shown
below. I've kept it simple because it represents the vast majority of XML
content meant for an XSL-FO document. It contains only two block items
para) and a single inline item
Our document is contained in an outer
and a mix of
para elements which
contain some emphasis:
<document> <head>My very first xsl-fo document</head> <para>has an <em>important</em> paragraph inside it</para> </document>
A page size is specified at , using European sizes. Change these to your local paper size if it's different. I've added margins since content which extends to the edges of the page is unsightly.
At  I've added a top margin to the main region of the
page.  and  are as before. At  we have a
crucial difference: at this point, where previously I simply said
"content", I now use the facilities of XSLT to instruct the XSLT engine to
process the input document. At  the XSLT engine processes the
document element of the input XML file by outputting an
fo:block element, inside which all remaining content is
placed. Since blocks can be nested quite happily in XSL-FO this isn't a
problem. What it does do is ensure that any content which leaks -- that
is, isn't handled explicitly by the stylesheet -- is still in a block.
At , , and  I'm back in the normal world of XML and XSLT. Matching a source document element and outputting an appropriate element from the XSL-FO vocabulary. The first two are identical and just need decorating, the latter is slightly different in that it is an inline formatting object and produces italic output.
 is a catch-all to show (in the output) which elements, if any, are not styled. Once styling is applied to all elements nothing will be processed by this template. It's good as a debugging option during development.
This stylesheet introduces two new elements. The first is the
fo:block element, used for many elements in the
stylesheet. This is the basic layout element which is used to wrap
content; think of it as a
p element in HTML.
fo:inline element is a container for inline elements
in XSL-FO. Each of these two elements has a whole range of properties,
expressed syntactically as attributes, which are used to decorate the
content that they wrap.
Let's extend the source document structure to include a section which should have a new page start point. So now the document might look like this:
<document> <section> <head>My very first xsl-fo document</head> <para>has an <em>important</em> paragraph inside it</para> </section> <section> <head>The second section, starting on a new page </head> <para>Some content in the second section</para> </section> </document>
Now I need to style this addition, using one of the available properties of a block.
<xsl:template match="section"> <fo:block break-before="page"> <xsl:apply-templates/> </fo:block> </xsl:template>
This tells the XSL-FO formatting engine to create a new page when it hits a section. All the content of that section is processed within that block. To make the head element stand out, I'll also improve the appearance by choosing a larger, bold font size and by adding a little space after the content.
<xsl:template match="head"> <fo:block font-size="14pt" font-weight="bold" space-after="1cm" space-after.conditionality = 'retain' > <xsl:apply-templates/> </fo:block> </xsl:template>
That's it. To review: processing is a two stage process at its simplest. Give your source document and the above XSLT stylesheet to an XSLT processor, and the output should be a valid XSL-FO document. This can then be fed to an XSL-FO engine -- RenderX or Antenna House (both commercial, with trial options) or to PassiveTeX or FOP (non-commercial offerings).
You can download the files developed in this article here: xsl-fo-assets.zip.
XML.com Copyright © 2000 O'Reilly & Associates, Inc.