<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="esch-site.xsl"?>

<!-- $Header: /cygdrive/c/Zdata/CVSrepos/www.eschatonic.org/tech.xml,v 1.4 2008/07/28 15:48:05 Steve Exp $ -->

<page>
	<root>./</root>
	<thispage>tech.xml</thispage>
	<issigned />
	<public />
	<title>Site technology at Eschatonic University</title>
	<headers>
		<style type="text/css">
			pre b { color: darkred; }
		</style>
	</headers>
	<maincontent>
		<index-spacer />
		<h1>Site technology</h1>
		<hr class="separator" />

		<block id="Introduction">
			<h2>Introduction to XML+XSLT</h2>

			<p>
				E<b>x</b>tensible <b>M</b>arkup <b>L</b>anguage and
				E<b>x</b>tensible <b>S</b>tylesheet <b>L</b>anguage <b>T</b>ransformations
				are two technologies which, like HTML, are administered by W3C.
			</p>
			<p>
				Rather like the way that CSS stylesheets can be applied to HTML
				documents in order to provide style information, XSLT documents can
				be applied to XML documents. But in the case of XSLT, the effect isn't
				just to define the physical appearance of the document, it's a complete
				transformation of the XML. In the case of the pages on this site, the
				aim is to transform the XML into XHTML (which, as it happens, is then
				styled with external CSS documents).
			</p>
			<p>
				In the same way that most of the HTML pages on this site use the same
				CSS stylesheet, the XML pages all refer to the same XSLT document and
				thus are all transformed according to the same rules. The main purpose
				of this is so that I can run the XSLT offline in order to generate HTML
				files which are put on my server. But since IE6+, Firefox, and all versions
				of Mozilla from (mumble mumble) will perform XSLT transforms in the browser,
				there's no particular reason not to put my raw XML online too. The link
				to get from an HTML page to its XML equivalent (or back again) is in the
				<a href="#footer">footer</a> at the end of the page.
			</p>
		</block>

		<block id="Advantages">
			<h2>Advantages</h2>

			<p>
				The main advantage of using XSLT in the first place, from my point of
				view, is to avoid repetitions of the same HTML in lots of different places.
				Just as shared CSS can be used to make changes to the physical appearance
				of a whole site full of pages by modifying a single file, I can modify my
				XSLT in order to affect the content of a whole bunch of pages. The concept
				is similar to something like Dreamweaver templates, but XSLT is free, and it
				assumes that you are a programmer rather than a web designer. This can be
				quite a profitable assumption if you actually are a programmer.
			</p>

			<p>
				The HTML added to my pages by XSLT is:
			</p>
			<ul>
				<li>
					The "head" element, including links to the CSS stylesheets.
				</li>
				<li>
					An "id" attribute added to the "body" element (to act as a "CSS signature"
					so that other people can easily apply their own CSS just to my site.
					Everyone should have one).
				</li>
				<li>
					The page footer, which is a "div" element containing the "hr" at
					the bottom of the page and everything below it. The contents depend
					on the page, for example the relative url to "Eschatonic University Home"
					depends on the position of the file in my directory structure.
				</li>
				<li>
					The "div" elements which, once styled with CSS, produce the boxes around
					each section of each page, with the funky cutout headers. Also the contents
					at the top (yep, that index bar is generated client-side, from
					the "id" attributes of the "block" elements, so it
					always matches the actual content. All the fun
					of Javascript without the browser incompatibilities).
				</li>
			</ul>

			<p>
				If you want to reduce your site bandwidth, the most useful function would
				probably be the page footer.
				My XML files are usually only 2k smaller than the corresponding HTML files, because
				that's the size of the "head" boilerplate and the footer, although my
				<a href="lib/books.xml">big list o' books</a> contains a lot of repeated
				HTML, so is (at time of writing) 53K of XML compared with 167K of HTML.
				If you had a lot
				of navigation on each page, then you'd avoid having to send it along with
				every page, and that could easily be a few kilobytes per hit, because the
				browser doesn't have to repeatedly download the XSLT file. So you get the
				benefits of faster download, without the disadvantages of frames. The same
				thing is often achieved with an external Javascript, which I think must remain
				the solution of choice for the time being, since XSLT isn't widely enough
				supported for serious use.
			</p>
			<p>
				For the "markup should be used to indicate the meaning of the text"
				extremists, the
				ones who wept with delight when CSS arrived, this is also a useful
				technique. The semantics of HTML elements are at once too general for
				your particular site, and too ideosyncratic to cover all bases. XML lets
				you say in tags exactly what you mean, and then decide later what the
				best way is to express that in HTML. Displaying the XML directly in the
				browser means that your human users can look at the same document that their
				third-generation semantic-web buzzword-compatible web applications and
				crawlers are looking at, which is nice. You could use bare XML+CSS for this
				(in which case your XML is still going to end up using the same basic layout
				thingummies as HTML),
				but HTML gives you a fully-functional (over-general, ideosyncratic)
				default stylesheet for free, which is
				pretty useful if what you're writing is quite page-like to start with.
			</p>

		</block>

		<block id="Disadvantages">
			<h2>Disadvantages</h2>

			<p>
				I am not aware of an HTML validator which will apply XSLT before
				validating. So the "Validate HTML" link which appears at the bottom
				of each HTML page is not available in my XML versions. And even if
				there was a validator, it wouldn't necessarily be using the same XSLT
				library as your browser, so there would still be no guarantee that
				your browser is seeing valid XHTML. I'm not sure whether there's a
				really good solution to this.
			</p>

			<p>
				The "file signatures" are also missing from the XML pages. It wouldn't
				be very difficult to add them in, but for my own convenience I'm not
				bothering at the moment.
			</p>

			<p>
				The major disadvantage, though, is the lack of browser support. Opera
				doesn't seem to have any current plans to support XSLT, Konqueror doesn't
				support it, and (hence or otherwise) neither does Safari. So you can't
				replace your site unless you know your users.
			</p>
		</block>

		<block id="Examples">
			<h2>Examples</h2>

			<p>
				Do "view source" on <a href="tech.xxml">the XML version</a> of this page.
				You should see something like this:
			</p>

			<pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;?xml-stylesheet type="text/xsl" href="<b>esch-site.xsl</b>"?&gt;
&lt;page&gt;
  &lt;root&gt;./&lt;/root&gt;
  &lt;thispage&gt;tech.xml&lt;/thispage&gt;
  &lt;issigned /&gt;
  &lt;title&gt;Site technology at Eschatonic University&lt;/title&gt;
  &lt;headers&gt;
    &lt;style type="text/css" &gt;
      pre b { color: darkred; }
    &lt;/style&gt;
  &lt;/headers&gt;
  &lt;maincontent&gt;
    ... stuff that looks a bit more like HTML ...
  &lt;/maincontent&gt;
&lt;/page&gt;</pre>

			<p>
				It's obviously not HTML, it's XML using elements which I happen to
				find convenient. So what does it mean? It means whatever the XSLT
				document "<code><a href="esch-site.xsl">esch-site.xsl</a></code>"
				thinks it means. Here are some examples of code from that document
				(this isn't intended to be an XSLT tutorial, but I'll try to explain
				things as I go along. It's possible that some of these could
				become out of date if I change the .xsl and don't update this file):
			</p>

			<pre>&lt;!-- The main bit --&gt;
  &lt;xsl:template match="/"&gt;
    &lt;html&gt;
      &lt;head&gt;</pre>

			<p>
				That little fragment, containg an "xsl:template" element with attribute
				"match" set to /, defines the output. In this case, it says that the
				output document should contain
				an "html" element with a "head" element as the first thing in it. This is
				pretty essential
				considering that we want the output to be a valid XHTML document. If you
				look further down the .xsl file, you'll see that the "head" is closed quite
				soon, and the "html" is closed as the last thing in the template, so that
				it's the last thing closed in the output.
			</p>

			<p>
				"But hang on", you're probably saying, "the <a href="tech.html">HTML version</a>
				of this file has a 'DOCTYPE' as the first thing in it, not a 'head' element".
				Well spotted. This is because of the "xsl:output" element in the transform,
				which defines what kind of output we want (XML, HTML, plain text), and a bunch
				of other stuff including the doctype. Since you're so very observant, you can
				probably find that for yourself. The point is just that the XML document
				doesn't need to contain that stuff, because the XSLT stylesheet knows.
				On with the "head":
			</p>

			<pre>      &lt;head&gt;
        &lt;meta name="generator" content="<b>{$program} {$version}</b> with esch-site.xsl" /&gt;
        &lt;link rel="stylesheet" type="text/css" title="plain" href="{$style-plain}" /&gt;
        &lt;link rel="alternate stylesheet" type="text/css" title="sidebar" href="{$style-plain}" /&gt;
        &lt;link rel="alternate stylesheet" type="text/css" title="sidebar" href="{$style-sidebar}" /&gt;

        &lt;title&gt;&lt;xsl:value-of select="page/title" /&gt;&lt;/title&gt;
        &lt;xsl:apply-templates select="page/headers/node()" /&gt;
      &lt;/head&gt;</pre>

			<p>
				The first thing in the template that isn't just HTML is "<code>{$program}</code>",
				and that's where we start using XSLT seriously. "program" is a <cite>variable</cite>,
				(strictly speaking, it's a <cite>parameter</cite>) which has a value set up earlier
				in the file (can you find where, boys and girls?). If you look at the corresponding
				HTML in the "head", you'll see something like:
			</p>

			<pre>&lt;head&gt;
  <b>&lt;meta name="generator" content=
  "HTML Tidy for Cygwin (vers 1st September 2004), see www.w3.org" /&gt;</b>
  &lt;meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /&gt;
  <b>&lt;meta name="generator" content=
  "xsltproc/libxslt 10114-CVS1011 with esch-site.xsl" /&gt;</b>
  &lt;link rel="stylesheet" type="text/css" title="plain" href=
  "<b>./style/plain.css</b>" /&gt;
  &lt;link rel="alternate stylesheet" type="text/css" title="sidebar" href=
  "<b>./style/plain.css</b>" /&gt;
  &lt;link rel="alternate stylesheet" type="text/css" title="sidebar" href=
  "<b>./style/sidebar.css</b>" /&gt;

  &lt;title&gt;<b>Site technology at Eschatonic University</b>&lt;/title&gt;
  &lt;style type="text/css"&gt;
  /*&lt;![CDATA[*/
    pre b { color: darkred; }
  /*]]&gt;*/
  &lt;/style&gt;
&lt;/head&gt;</pre>

			<p>
				This tells you firstly that I ran HTML Tidy on it after XSLT (that's
				the "generator" not mentioned in the XSLT), secondly that
				"program" had the value "xsltproc/libxslt" and "version" the value
				"10114-CVS1011", thirdly that "style-plain" and "style-sidebar" had
				values that were the urls of the relevant stylesheets, and finally
				that <code>&lt;xsl:value-of select="page/title" /&gt;</code>
				turned itself into "Site technology at Eschatonic University", which
				just so happens to be the contents of the "title" element of the "page"
				element in the XML. So that's one way XSLT use the XML input.
				If this was an XSLT tutorial, I'd have to start talking about "XPath" now,
				but you can look it up yourself. It's basically a way of writing expressions
				which evaluate to something which depends on a DOM tree.
			</p>

			<p>
				There is one other main way that this particular XSLT is using
				the input. In the "body" you'll see:
			</p>

			<pre>        &lt;div class="main"&gt;
          &lt;xsl:apply-templates select="page/maincontent/node()" /&gt;
        &lt;/div&gt;</pre>

			<p>
				This means "find every element which is a child of 'maincontent' in 'page',
				run its template, and stick the results into this 'div' in the output". There
				was something very similar in the "head", searching "page/headers", and
				which found my "style" element.
				Couldn't be simpler, especially since there happens to be a template in the
				XSLT, which applies to any element that doesn't have its own template and just
				copies the element across to the output. So that's why all the HTML inside
				the XML input ends up in the HTML output where it belongs.
			</p>

			<p>
				Finally, those exciting section blocks are generated by this XSLT snippet:
			</p>

			<pre>  &lt;xsl:template match="block"&gt;
    &lt;div class="blocksection"&gt;
      &lt;xsl:for-each select="<b>*[1]</b>"&gt;
        &lt;xsl:copy&gt;
          &lt;xsl:attribute name="class"&gt;blockheader &lt;xsl:value-of select="@class"/&gt;&lt;/xsl:attribute&gt;
          &lt;xsl:apply-templates select="@*[not(name()='class')]|node()"/&gt;
        &lt;/xsl:copy&gt;
      &lt;/xsl:for-each&gt;
      &lt;div class="blockcontent"&gt;
        &lt;xsl:apply-templates select="<b>*[position()&amp;gt;1]</b>"/&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/xsl:template&gt;</pre>

			<p>
				Here "<code>*[1]</code>" means "the first element inside the 'block'",
				and if you look again at the XML source you'll see that this is always
				an "h2" on this particular page. This "h2" gets copied into the output, then
				there's a bit of complexity to add "blockheader" to its "class" attribute, so
				that the CSS can do the rest, but the XML can specify other classes
				if it wants to. The content of the first element is also copied across by
				applying templates, excluding the "class" attribute because we've already done it.
			</p>
			<p>
				"<code>*[position()&amp;gt;1]</code>" means "every child element except
				the first element", and those things just get processed the same way as other
				HTML (or "block", or whatever they happen to be) and shoved into another "div",
				which again the CSS is responsible for dealing with.
			</p>
			<p>
				Nowhere in there does the XSLT say that the "block" itself should be copied to
				the output, which is quite important because it isn't valid HTML. This allows
				me to define my own tags, meaning whatever I want them to mean, and just use
				them in my documents. How many HTML developers have never wanted to do that?
				All I need is to write a bit of XSLT which expresses what they
				should be transformed into in the target language (XHTML) so that the browser
				can display them.
			</p>

			<p>
				The meaning of the rest of the XSLT can probably be divined by clever guessing
				or use of an XSLT reference.
			</p>
		</block>

		<block id="Tips">
			<h2>Development tip</h2>
			<p>
				While developing, you can preview your XML+XSLT directly in your
				browser. This is as convenient as developing HTML with text editor and
				browser window, as God and nature intended, and it may even give you
				meaningful error messages when you mess up.
				But it won't show you the actual HTML, which can be
				a right pain when your mistake is at all subtle, and as mentioned above
				it doesn't let you validate the HTML (unless your browser does that, in which
				case good for it). So even if there isn't going to
				be an HTML-only version of your
				pages, you'll probably need a separate XSLT processor. I use
				<a href="http://xmlsoft.org/XSLT/">xsltproc</a> on Windows.
			</p>
		</block>

		<block id="Conclusion">
			<h2>Conclusion</h2>

			<p>
				If XSLT in browser was a good enough idea to keep me busy for most
				of a weekend,
				it might just be good enough for you too. The technique is not yet
				ready for the big time, and by the time it is ready it might not be
				XHTML you want to transform into, you might want to go straight for
				XSL-FO, plus some SVG and whatever else comes along. But you'll probably
				be needing XSLT for that too, so there's no harm getting the hang of it
				now.
			</p>
		</block>

	</maincontent>
</page>

