Contents: Warning Introduction Advantages Disadvantages Examples Tips Conclusion Footer

Site technology


Warning: NoScript

The NoScript plugin for Firefox now disables XSLT by default. You can fix this either by allowing the site, or else by setting noscript.forbidXSLT to false in about:config.

Introduction to XML+XSLT

Extensible Markup Language and Extensible Stylesheet Language Transformations are two technologies which, like HTML, are administered by W3C.

Rather like the way that CSS stylesheets can be applied to HTML documents in order to provide style information, XSLT documents can be applied to XML documents. But in the case of XSLT, the effect isn't just to define the physical appearance of the document, it's a complete transformation of the XML. In the case of the pages on this site, the aim is to transform the XML into XHTML (which, as it happens, is then styled with external CSS documents).

In the same way that most of the HTML pages on this site use the same CSS stylesheet, the XML pages all refer to the same XSLT document and thus are all transformed according to the same rules. The main purpose of this is so that I can run the XSLT offline in order to generate HTML files which are put on my server. But since IE6+, Firefox, and all versions of Mozilla from (mumble mumble) will perform XSLT transforms in the browser, there's no particular reason not to put my raw XML online too. The link to get from an HTML page to its XML equivalent (or back again) is in the footer at the end of the page.

Advantages

The main advantage of using XSLT in the first place, from my point of view, is to avoid repetitions of the same HTML in lots of different places. Just as shared CSS can be used to make changes to the physical appearance of a whole site full of pages by modifying a single file, I can modify my XSLT in order to affect the content of a whole bunch of pages. The concept is similar to something like Dreamweaver templates, but XSLT is free, and it assumes that you are a programmer rather than a web designer. This can be quite a profitable assumption if you actually are a programmer.

The HTML added to my pages by XSLT is:

  • The "head" element, including links to the CSS stylesheets.
  • An "id" attribute added to the "body" element (to act as a "CSS signature" so that other people can easily apply their own CSS just to my site. Everyone should have one).
  • The page footer, which is a "div" element containing the "hr" at the bottom of the page and everything below it. The contents depend on the page, for example the relative url to "Eschatonic University Home" depends on the position of the file in my directory structure.
  • The "div" elements which, once styled with CSS, produce the boxes around each section of each page, with the funky cutout headers. Also the contents at the top (yep, that index bar is generated client-side, from the "id" attributes of the "block" elements, so it always matches the actual content. All the fun of Javascript without the browser incompatibilities).

If you want to reduce your site bandwidth, the most useful function would probably be the page footer. My XML files are usually only 2k smaller than the corresponding HTML files, because that's the size of the "head" boilerplate and the footer, although my big list o' books contains a lot of repeated HTML, so is (at time of writing) 53K of XML compared with 167K of HTML. If you had a lot of navigation on each page, then you'd avoid having to send it along with every page, and that could easily be a few kilobytes per hit, because the browser doesn't have to repeatedly download the XSLT file. So you get the benefits of faster download, without the disadvantages of frames. The same thing is often achieved with an external Javascript, which I think must remain the solution of choice for the time being, since XSLT isn't widely enough supported for serious use.

For the "markup should be used to indicate the meaning of the text" extremists, the ones who wept with delight when CSS arrived, this is also a useful technique. The semantics of HTML elements are at once too general for your particular site, and too ideosyncratic to cover all bases. XML lets you say in tags exactly what you mean, and then decide later what the best way is to express that in HTML. Displaying the XML directly in the browser means that your human users can look at the same document that their third-generation semantic-web buzzword-compatible web applications and crawlers are looking at, which is nice. You could use bare XML+CSS for this (in which case your XML is still going to end up using the same basic layout thingummies as HTML), but HTML gives you a fully-functional (over-general, ideosyncratic) default stylesheet for free, which is pretty useful if what you're writing is quite page-like to start with.

Disadvantages

I am not aware of an HTML validator which will apply XSLT before validating. So the "Validate HTML" link which appears at the bottom of each HTML page is not available in my XML versions. And even if there was a validator, it wouldn't necessarily be using the same XSLT library as your browser, so there would still be no guarantee that your browser is seeing valid XHTML. I'm not sure whether there's a really good solution to this.

The "file signatures" are also missing from the XML pages. It wouldn't be very difficult to add them in, but for my own convenience I'm not bothering at the moment.

The major disadvantage, though, is the lack of browser support. Opera doesn't seem to have any current plans to support XSLT, Konqueror doesn't support it, and (hence or otherwise) neither does Safari. So you can't replace your site unless you know your users.

Examples

Do "view source" on the XML version of this page. You should see something like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="esch-site.xsl"?>
<page>
  <root>./</root>
  <thispage>tech.xml</thispage>
  <issigned />
  <title>Site technology at Eschatonic University</title>
  <headers>
    <style type="text/css" >
      pre b { color: darkred; }
    </style>
  </headers>
  <maincontent>
    ... stuff that looks a bit more like HTML ...
  </maincontent>
</page>

It's obviously not HTML, it's XML using elements which I happen to find convenient. So what does it mean? It means whatever the XSLT document "esch-site.xsl" thinks it means. Here are some examples of code from that document (this isn't intended to be an XSLT tutorial, but I'll try to explain things as I go along. It's possible that some of these could become out of date if I change the .xsl and don't update this file):

<!-- The main bit -->
  <xsl:template match="/">
    <html>
      <head>

That little fragment, containg an "xsl:template" element with attribute "match" set to /, defines the output. In this case, it says that the output document should contain an "html" element with a "head" element as the first thing in it. This is pretty essential considering that we want the output to be a valid XHTML document. If you look further down the .xsl file, you'll see that the "head" is closed quite soon, and the "html" is closed as the last thing in the template, so that it's the last thing closed in the output.

"But hang on", you're probably saying, "the HTML version of this file has a 'DOCTYPE' as the first thing in it, not a 'head' element". Well spotted. This is because of the "xsl:output" element in the transform, which defines what kind of output we want (XML, HTML, plain text), and a bunch of other stuff including the doctype. Since you're so very observant, you can probably find that for yourself. The point is just that the XML document doesn't need to contain that stuff, because the XSLT stylesheet knows. On with the "head":

      <head>
        <meta name="generator" content="{$program} {$version} with esch-site.xsl" />
        <link rel="stylesheet" type="text/css" title="plain" href="{$style-plain}" />
        <link rel="alternate stylesheet" type="text/css" title="sidebar" href="{$style-plain}" />
        <link rel="alternate stylesheet" type="text/css" title="sidebar" href="{$style-sidebar}" />

        <title><xsl:value-of select="page/title" /></title>
        <xsl:apply-templates select="page/headers/node()" />
      </head>

The first thing in the template that isn't just HTML is "{$program}", and that's where we start using XSLT seriously. "program" is a variable, (strictly speaking, it's a parameter) which has a value set up earlier in the file (can you find where, boys and girls?). If you look at the corresponding HTML in the "head", you'll see something like:

<head>
  <meta name="generator" content=
  "HTML Tidy for Cygwin (vers 1st September 2004), see www.w3.org" />
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
  <meta name="generator" content=
  "xsltproc/libxslt 10114-CVS1011 with esch-site.xsl" />
  <link rel="stylesheet" type="text/css" title="plain" href=
  "./style/plain.css" />
  <link rel="alternate stylesheet" type="text/css" title="sidebar" href=
  "./style/plain.css" />
  <link rel="alternate stylesheet" type="text/css" title="sidebar" href=
  "./style/sidebar.css" />

  <title>Site technology at Eschatonic University</title>
  <style type="text/css">
  /*<![CDATA[*/
    pre b { color: darkred; }
  /*]]>*/
  </style>
</head>

This tells you firstly that I ran HTML Tidy on it after XSLT (that's the "generator" not mentioned in the XSLT), secondly that "program" had the value "xsltproc/libxslt" and "version" the value "10114-CVS1011", thirdly that "style-plain" and "style-sidebar" had values that were the urls of the relevant stylesheets, and finally that <xsl:value-of select="page/title" /> turned itself into "Site technology at Eschatonic University", which just so happens to be the contents of the "title" element of the "page" element in the XML. So that's one way XSLT use the XML input. If this was an XSLT tutorial, I'd have to start talking about "XPath" now, but you can look it up yourself. It's basically a way of writing expressions which evaluate to something which depends on a DOM tree.

There is one other main way that this particular XSLT is using the input. In the "body" you'll see:

        <div class="main">
          <xsl:apply-templates select="page/maincontent/node()" />
        </div>

This means "find every element which is a child of 'maincontent' in 'page', run its template, and stick the results into this 'div' in the output". There was something very similar in the "head", searching "page/headers", and which found my "style" element. Couldn't be simpler, especially since there happens to be a template in the XSLT, which applies to any element that doesn't have its own template and just copies the element across to the output. So that's why all the HTML inside the XML input ends up in the HTML output where it belongs.

Finally, those exciting section blocks are generated by this XSLT snippet:

  <xsl:template match="block">
    <div class="blocksection">
      <xsl:for-each select="*[1]">
        <xsl:copy>
          <xsl:attribute name="class">blockheader <xsl:value-of select="@class"/></xsl:attribute>
          <xsl:apply-templates select="@*[not(name()='class')]|node()"/>
        </xsl:copy>
      </xsl:for-each>
      <div class="blockcontent">
        <xsl:apply-templates select="*[position()&gt;1]"/>
      </div>
    </div>
  </xsl:template>

Here "*[1]" means "the first element inside the 'block'", and if you look again at the XML source you'll see that this is always an "h2" on this particular page. This "h2" gets copied into the output, then there's a bit of complexity to add "blockheader" to its "class" attribute, so that the CSS can do the rest, but the XML can specify other classes if it wants to. The content of the first element is also copied across by applying templates, excluding the "class" attribute because we've already done it.

"*[position()&gt;1]" means "every child element except the first element", and those things just get processed the same way as other HTML (or "block", or whatever they happen to be) and shoved into another "div", which again the CSS is responsible for dealing with.

Nowhere in there does the XSLT say that the "block" itself should be copied to the output, which is quite important because it isn't valid HTML. This allows me to define my own tags, meaning whatever I want them to mean, and just use them in my documents. How many HTML developers have never wanted to do that? All I need is to write a bit of XSLT which expresses what they should be transformed into in the target language (XHTML) so that the browser can display them.

The meaning of the rest of the XSLT can probably be divined by clever guessing or use of an XSLT reference.

Development tip

While developing, you can preview your XML+XSLT directly in your browser. This is as convenient as developing HTML with text editor and browser window, as God and nature intended, and it may even give you meaningful error messages when you mess up. But it won't show you the actual HTML, which can be a right pain when your mistake is at all subtle, and as mentioned above it doesn't let you validate the HTML (unless your browser does that, in which case good for it). So even if there isn't going to be an HTML-only version of your pages, you'll probably need a separate XSLT processor. I use xsltproc on Windows.

Conclusion

If XSLT in browser was a good enough idea to keep me busy for most of a weekend, it might just be good enough for you too. The technique is not yet ready for the big time, and by the time it is ready it might not be XHTML you want to transform into, you might want to go straight for XSL-FO, plus some SVG and whatever else comes along. But you'll probably be needing XSLT for that too, so there's no harm getting the hang of it now.