Using XSLT to create JSON output (Saxon-B 9.0 for Java)

I stumbled upon the necessity to make data available for Javascript via XSLT. Of course, the easiest way (from an XSLT perspective) would have been to just generate XML and use Javascript to parse the XML and extract the necessary information. But on the other hand, I thought it'd be very cool if I could just generate JSON via XSLT. Of course, <xsl:output method="text" /> and some templates would do the trick. On the other hand, that'd be quite ugly.

I came up with the idea that an additional output method would be very useful. Fortunately, the XSLT 2.0 standard allowes additional, implementation defined output methods to be specified. And more fortunately, Saxon (the XSLT processor I'm using) actually allows to plug in own Java classes via that method. The downside, however, was that the documentation for that interface is scarce (the source distribution comes with extensibility/output-filters.html which describes the general idea), so I had to figure out quite a bit myself by reading the Saxon source code.

Creating a new output method in Saxon

In order to create a new output method in Saxon, one has to create a class that either implements the net.sf.saxon.event.Receiver or the org.xml.sax.ContentHandler interface. The Receiver interface is proprietary to Saxon, the ContentHandler interface is a standard SAX interface. So in theory, it would be better to implement the SAX ContentHandler interface to provide a generic serialization method. Practically, this will not work for the simple reason that the SAX ContentHandler interface is originally supposed to be used for parsing XML data. Therefore, the interface in itself provides no provisions for defining the output strem (or writer). This presents no problem for the situation where one wants to use SAX to parse an XML document since one would create an object implementing that interface and pass that object on to the SAX parser. But for the use as a serializer for Saxon it will not work since Saxon will create the object automatically and not supplying any information to the object other than the raw SAX events. Therefore, the fact that Saxon accepts the ContentHandler interface is actually utterly useless for creating a custom serializer. Note: This is not true if you call Saxon directly from the API and supply the SAX ContentHandler packaged into a SAXResult for the output itself. In that case, one is in control over how to initialize the handler object. This, however, does not help the goal to create a new output method for use without any changes to the calling mechanism.

Due to this, the Receiver interface will be implemented. Saxon already provides a base class called Emitter that already covers certain necessary methods. The Receiver interface defines methods very similar to SAX ContentHandler, the main exception being that in order to keep the tree very memory efficient, Saxon uses integers as name codes instead of supplying the real names of the generated XML tags, so the actual names have to be first read from the Saxon name pool.

The file JSONEmitter.java defines an emitter that will create JSON output upon receiving a certain XML structure through SAX events. It is part of the package org.selfhtml.xslt, since I developed it for the SELFHTML build system (german link) that uses XSLT to generate the actual output documents. Since the character output handling portions of the class is based on Saxon's XMLEmitter which is released under the Mozilla Public License 1.0, the JSONEmitter class itself is released under that license, too.

Using the output method

In order to use the output method, the class must reside in the Java class path. The full class name must be specified as the local part of the identifier passed to the method attribute. The namespace may actually be any namespace, Saxon will simply ignore it - but specifying no namespace will cause Saxon to bail out with an error.

<xsl:output method="sdml:org.selfhtml.xslt.JSONEmitter" xmlns:sdml="http://www.selfhtml.org/sdml" />

The JSONEmitter will process a specific XML structure that defines the resulting JSON ouput. Any deviation from that structure will result in an unrecoverable exception.

The structure defines the following elements (namespaces are currently completely ignored):

array
Defines a JSON array, may contain any other element, the order of the elements defines the order in which they appear inside the array.
object
Defines a JSON object, may contain any other element. Each subelement must have the key attribute set that defines the name of the object property for which the element will be the value. Note that the key attribute must not be set for elements that are not subelements of object.
string
Defines a JSON string. Must contain the literal string contents.
number
Defines a JSON number. Must contain the literal number.
true
Defines the JSON boolean value true. Must be empty.
false
Defines the JSON boolean value false. Must be empty.
null
Defines the JSON special value null. Must be empty.

Please note that only object and array elements may be the outermost elements, this is due to contraints of the JSON specification. Any other element will cause an error!

Example

The following example shows how the JSON structure has to look like when generated:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:sdml="http://selfhtml.org/sdml" exclude-result-prefixes="sdml">
  <xsl:output method="sdml:org.selfhtml.xslt.JSONEmitter"/>
  <xsl:template match="/">
    <object>
      <string key="hello">world!</string>
      <number key="answer">42</number>
      <number key="lightspeed">3e8</number>
      <array key="urls">
        <string>http://example.com/</string>
        <string>http://example.org/</string>
        <string>http://example.net/</string>
      </array>
    </object>
  </xsl:template>
</xsl:stylesheet>

This will generate the following JSON string:

{"hello":"world!","answer":42.0,"lightspeed":3.0E8,"urls":["http://example.com/","http://example.org/","http://example.net/"]}

TODO / Bugs

Creating a custom XPath function in Saxon that outputs JSON

Creating a new output method that creates JSON output may not be sufficient for the JSON usage in XSLT stylesheets. Sometimes, it may be necessary to directly output JSON text in order to use it inside other templates. It is actually quite easy to make this feasible: Saxon allows to call public static methods of arbitrary classes by defining a new namespace prefix for that class (the namespace uri must bei java: + the full class name) and then directly using prefix:method as an XPath function.

The Utils.java defines a class Utils (also in the org.selfhtml.xslt namespace) that contains only a single static member function toJson that accepts a DOM Node and returns a string. The method simply plugs together certain necessary classes in order to make use of the JSONEmitter code. The function itself consists of essentially only eleven lines of code.

Example

The following XSLT stylesheet contains the same JSON object as shown above, but this time inside an <xsl:variable> element. The variable is then passed to the utils:to-json XPath function and the return value is used in the resulting XML document:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:sdml="http://selfhtml.org/sdml" xmlns:utils="java:org.selfhtml.xslt.Utils" exclude-result-prefixes="sdml utils">
<xsl:output indent="yes" />
<xsl:template match="/">
  <json>
    <xsl:text>var myObject = </xsl:text>
    <xsl:variable name="var">
      <object>
        <string key="hello">world!</string>
        <number key="books">42</number>
        <number key="lightspeed">3e8</number>
        <array key="urls">
          <string>http://example.com/</string>
          <string>http://example.org/</string>
          <string>http://example.net/</string>
        </array>
      </object>
    </xsl:variable>
    <xsl:value-of select="utils:to-json($var)" />
    <xsl:text>;</xsl:text>
  </json>
</xsl:template>
</xsl:stylesheet>

This template generates the following XML output:

<?xml version="1.0" encoding="UTF-8"?>
<json>var myObject = {"hello":"world!","books":42.0,"lightspeed":3.0E8,"urls":["http://example.com/","http://example.org/","http://example.net/"]};</json>

TODO / Bugs

Download


Last Change: 2008-03-21, Christian Seiler

Contact