About This Page

This report was written in early 2002. It includes a general comparison of XQuery and XSLT, as well as an overview of the tools that Nalleli Lopez wrote at the time: an XQuery parser, an XQuery to XSLT converter, and an XQuery processor. These tools were based on the December 2001 drafts.

Abstract

The goal of this project was to make the following contributions:

1) Create a JavaCC parser based on the latest available XQuery Working draft (presently 20th December 2001).

Benefits:

  • Provide a starting point for other XQuery processor implementers. To the best of my knowledge, no other publicly available JavaCC files exist that implement the latest XQuery grammar.

2) Create a Java tool that converts XQuery into XSLT (and vice versa in the future).

Benefits:

  • Act as a façade for XML tools and databases to be able to support XQuery using an XSLT processor under the hood. More interoperability.
  • Enable developers to use robust existing XSLT tools to process XQueries, taking advantage of all the research and development invested into those tools.
  • Provide a learning tool for those who know XQuery and want to learn XSLT and vice versa.
  • Convert existing XQueries into XSLT stylesheets and vice-versa as needed.
  • Enable those who are only comfortable or familiar with one technology to work with the other one.

To the best of my knowledge no such other tool presently exists, nor is there any other effort to create one.

3) Create an XQuery Processor that works by using the XQuery to XSLT converter.

Benefits:

  • Provide a much needed Java open source XQuery Processor with programmable interfaces and based on the latest specification.
  • Demonstrate a possible use for the XQuery to XSLT converter.
  • Supply sample code for using the API's.

Required Background

Understanding the concepts discussed in this report requires knowledge of :

  • XML
  • XQuery
  • XSLT

Additionally, understanding the code requires knowledge of:

  • Java
  • JavaCC
  • JJTree
  • DOM

Additionally, testing the parser requires:

  • JUnit

Additionally, running and compiling the code optionally requires:

  • Ant (strongly recommended)

Introduction

Introduction to XQuery

XQuery, while still in working draft stage, is the much awaited query language for XML. XQuery provides XML with the equivalent functionality of what SQL provides to relational databases*. It offers a convenient, efficient, and straightforward method to extract information from xml documents. This information may be filtered using arbitrary criteria and represented in any format chosen by the user.

XQuery is a declarative language. It considers the versatility of XML and is therefore flexible enough to be used for either document or data oriented XML. It offers two different syntaxes, one that is in XML format, and one that is easily read and written by humans. In addition, XQuery is a strongly typed language that supports namespaces and XML Schema datatypes. It can query multiple documents and has transformation capabilities.

* Unfortunately, version 1.0 will likely not provide a mechanism for updating an XML document.

Evan Lenz: "XQuery: Reinventing the Wheel?"

Evan Lenz (now a member of the XSL working group), wrote an article based on the first public XQuery working draft, where he demonstrated an XSLT answer for many of the XQuery sample solutions given in the document. The main goal of the article was to introduce the viability of XSLT as a query language. He defied the need for a separate W3C effort to create a query language, given that the differences in functionality between XQuery and XSLT were, in his opinion, too minute.

He argued that the benefits of using XSLT as a query language instead of having a separate specification would include concentrated specification efforts, simplifying use of XML technology and training of XML technicians, more focused and widely available vendor support, less tools to manage, as well as increased interoperability of products. He also encouraged the W3C to reuse and promote its presently available technologies.

As far as the limitations of XSLT as a query language, Evan suggested improving the specification as needed. He pointed out that XSLT 2.0 and XPath 2.0 will resolve some of the issues, such as support for XML schema datatypes. To improve efficiency, he mentioned the need to identify a core subset of XPath that does not require random access to the source tree. In conclusion, he urged readers not to discard using XSLT for querying XML so flippantly, and called it the "most widely used and implemented XML query language yet".

Rebuttal: The Need for XQuery

Evan Lenz's writings force us to answer the following questions: Do we really need XQuery? Do we need seven more complicated specifications added to the ever more complex world of XML? Are we trying to reinvent the wheel?

While it is true that XSLT can act in many ways as a query language, and that there is an overlap in functionalities between the two technologies, particularly due to their sharing of XPath, many compelling reasons exist for continuing efforts on XQuery.

To start with, XQuery is simpler to use. It is more straightforward and less error prone when written by hand. It also is much less verbose, which makes it easier to read and understand. Additionally, those with an SQL background find XQuery more intuitive since the reasoning behind the queries is very familiar to them. XQuery even provides a mechanism for effortlessly performing join operations and filtering the results based on given criteria. Where the overlap with XSLT arises, it too may count as an advantage of XQuery, as it will make XML transformations more accessible to a wider user base.

Besides being easier to use, XQuery is specifically designed to achieve more efficient queries. The syntax and algebra allow for the optimization of queries since they were specifically designed with that intention from the beginning. Unlike XSLT, XQuery does not require random access to the source tree (for example, there is no ancestor axis). The lack of need for loading whole trees into memory of large or multiple XML documents results in a performance boost that makes XQuery extremely scalable. All these reasons make XQuery appealing and necessary.

Finally, unlike XSLT, XQuery has the ability to work over multiple documents and is strongly typed according to the XML Schema datatypes. XQuery is able to do everything that XSLT is capable of and more, while omitting superfluous functionality that can slow down performance. Daniela Florescu, member of the XQuery working group and author of over fifty query language optimization research papers, has been quoted going as far as predicting that an extension of XQuery will eventually replace both SQL and Java.

The Future

The forecast for these two technologies is that both will gain enormous prominence as XML's popularity continues increasing. The effect, regardless of the valid arguments presented by either side of the XSLT vs. XQuery controversy, is that XML developers will be obligated to learn at least one of the two technologies.

The near future will hold two sets of people: those who will know and feel comfortable with XQuery and those who will prefer XSLT. Whatever their preference, however, these developers may often be confronted with the need to be able to read, learn, or even convert to the other one.

On the vendor side, XML products will most likely need to support both technologies, though some may support only one. At the beginning, many will support only XSLT, especially at present since XQuery is still in draft stage, and query optimization is a complex endeavor.

These factors, among others, will result in numerous XSLT stylesheets written as well as numerous XQueries. These materials may need to be converted into the other technology at times, often in large amounts.

The increasingly complex and versatile world of XML will continue offering new challenges for students, users, and implementers.

A Possible Approach

One approach to tackle (and welcome!) the future of these two rival technologies is to create a tool that converts XQuery into XSLT and vice versa. Once the conversion has taken place, one can simply study the output, or pass it through the appropriate processor to produce the desired result. The following sections consider the disadvantages and advantages of such approach.

Disadvantages of Conversion Approach

The conversion approach obviously has disadvantages, particularly when converting from XQuery to XSLT (which is what my project focuses on). Most of these stem from the fact that the conversion nullifies the advantages of using XQuery instead of XSLT, ignoring the reasons previously listed for needing XQuery in the first place.

Such a converting tool does not take advantage of the optimization achievable by using XQuery. Using XSLT to process an XQuery requires unnecessarily loading the whole tree into memory. In addition, performance also suffers from time and resources devoted to the conversion of one technology into the other. Therefore, this approach is not recommended for situations where large amounts of data must be processed.

Furthermore, using the converter tool limits XQuery to what XSLT can do, while XQuery is much broader. For example, without extra efforts by the converting and processing tools, no datatype support and no queries over multiple documents can be expected from the aforementioned tool.

Advantages of Conversion Approach

As we have explored, powerful reasons exist for not using the conversion approach for heavy loads of data. Nevertheless, the XML community can benefit greatly from such a tool.

As XQuery tools are still in infancy, the converter enables developers to use existing robust XSLT tools to process XQueries. Numerous XSLT resources and implementations are presently available which take advantage of all the research and development throughout the two years that XSLT has been available.

Using an XSLT implementation under the hood, XML tools and databases can support XQuery by using the converter tool as a façade, while actually processing the query as a stylesheet. On the user side, if a tool offers no XQuery support, the customer may choose to write an XQuery and request the appropriate information from the XML tool using the XSLT output of the converter.

In the future, as the quantity of both written XQueries and XSLT stylesheets explodes, need will arise to convert from one format to the other. Performing this task manually is a laughable proposal. With the converter tool this process can be automated, saving XML developers (and paying employers) valuable time wasted in an absurd task.

Additionally, the tool will facilitate use for those who are only comfortable or familiar with one technology. If they are temporarily forced to work with the other technology, they can use the converter as an aid.

Finally, the tool also offers pedagogical value. Those who are familiar with only one of the technologies can experiment and compare the two as they learn at their own pace. It will also aid with debugging their XQueries or stylesheets if they have misunderstood a concept in one of the technologies.

Goals of Project

As XQuery is still in its working draft stage, only a scarce number of pioneers exist that are presently exploring its implementation and exploitation. Even less exist that are addressing their resources into open source projects.

The goals of this project focused on creating publicly available tools for this promising emerging field. My contributions include the following:

Create a JavaCC parser based on the latest available XQuery Working draft .

The intent is to provide a starting tool for implementers of XQuery. The JJTree parser will be used by the converting tool, but is also meant to be a starting point for other XQuery processors. Implementers will probably edit it and make improvements, but building upon an already existing parser will save developers many hours of work.

To the best of my knowledge, no other publicly available JavaCC files exist that implement the latest XQuery grammar. Personally, I would have benefited greatly if one had existed when I began this project.

Create a Java tool to convert from XQuery to XSLT.

This tool will provide the benefits explored in previous sections. No such other tool presently exists, nor is there, as far as I know, any other effort to create one.

Create an XQuery Processor.

The XQuery community is sorely missing XQuery processors, particularly free, open source Java processors with available API's. Microsoft offers XQuery API's, but these are only available for the .NET framework. Software AG offers a wonderful downloadable java tool, but only a graphical user interface is provided; there are no API's, at least for the free version. The only java tool that I know with programming interfaces provided is Fatdog's XML Query Engine, but this is presently outdated, and is not open source.

Besides its intrinsic value, my Java based XQuery processor, XQueryP, demonstrates a possible use for the converter, and provides sample code of how to use the API's.

Implementation Details

Architecture

The project contains three main parts: the parser, the converter, and the processor. Additionally, a GUI is provided to demonstrate these components.

The parser reads the XQuery, validates it to a certain extent and creates a corresponding tree. Once this tree is in memory, the converter analyzes the query and constructs an equivalent XSLT DOM node. The processor takes the node and uses the XSLT processor provided with J.D.K. 1.4 to process the query on the given document.

The following sequence diagram illustrates how a query is processed.

These tools offer no public classes, only interfaces. The implementations use a singleton abstract factory pattern to provide the user with actual implementations.

Parser

The parser was implemented using JJTree. The grammar given uses the same names of the tokens and non-terminals given in the December 2002 specification, with a few adjustments for efficiency, clarity, and avoidance of ambiguity. The XqueryParserFactory returns a singleton instance.

Converter

The converter processes the tree returned by the parser. As it encounters different elements, it creates a DOM node to represent an XSLT sheet that produces equivalent output to that which would be expected from the input XQuery. The converter is capable of returning the XSLT as a DOM node, as a string, or it may save it to a specified file.

The creation of the XSLT Document object is achieved using a quasi chain of responsibility pattern in which methods do their part and then delegate responsibilities they can't handle to other methods. For example, when a query is received, the processQuery method is called. This method creates the skeleton XSLT sheet and analyzes the root of the query. If this is a FLWR expression, it then calls the processFLWR method. This does whatever it needs to and succeedingly calls the processForClause, processLetClause, processWhereClause, and processReturnClause as they appear. These in turn may contain embedded XQueries (such as a path expression), so they may call processQuery again, which may call processPathExpression, and so on. All these methods add DOM nodes to the XSLT Document as needed until the whole query tree has been traversed.

At the moment, the approaches used by the Converter are very naïve. For example, consider this query:

for $b in //bib/book
return $b

NOTE: This FLWR expression is only meant to be a simple example . Notice that the query could simply be written as:

//bib/book

The equivalent XSLT for this query is:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//bib/book">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

In this case, a simple xsl:for-each instruction is sufficient to represent the FOR part of a FLWR expression. There is no need to declare an xsl:variable to represent $b. The converter, however, presently represents this query like this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//bib/book">
<xsl:variable name="b" select="."/>
<xsl:copy-of select="$b"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

This is obviously unnecessary in many cases and the converter needs to be modified to omit the variable declaration whenever appropriate.

Another defect of the converter is also based on the use of xsl:variables to aid in representing FOR clauses. This is due to handling of result tree fragments in XSLT 1.0.

Consider this query (again, simple for argument's sake):

for $b in //book[publisher='Addison-Wesley'], $a in $b/author return $a

The Converter would translate this as:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//book[publisher='Addison-Wesley']">
<xsl:variable name="b" select="."/>
<xsl:for-each select="$b/author">
<xsl:variable name="a" select="."/>
<xsl:copy-of select="$a"/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Notice that the sixth line calls "$b/author". Since $b is an XSLT result tree fragment, this operation is prohibited by the recommendation. Xalan, however, as well as other XSLT processor support it anyway, but truly spec compliant processors shouldn't. Therefore, this XSLT translation may not work with all XSLT processors and isn't truly portable. In addition, use of result tree fragments limit the use the parent axis. In this example, as in the last one, the declaration of the xsl:variable "$b" is unnecessary.

In the future the Converter interface will also include methods for converting XSLT stylesheets to XQueries, making it a more complete and valuable tool.

Processor

Although extremely valuable, the processor is the simplest of the three components. The only task it performs to process a query is to receive the query, obtain an equivalent XSLT stylesheet from the converter, and ask the XSLT processor to finish the work. XQueryP is an example of the façade pattern to reduce complexity by providing a simple interface to shield the user from the individual components and to reduce coupling. The processor accepts the input XML to be queried in the form of either a DOM node, a filename, or a String.

Faced Challenges

Some of the challenges faced during the creation of the tool:

Lack of Available References and Materials

Implementing anything related to a technology that is still in working draft stages and has no established precedents poses many challenges. Among those faced were a lack of supporting material. I found only one article that was helpful. No books had yet been written on XQuery. Newsgroup discussions that started on XQuery -- for instance, the announcement of a new draft -- always veered to unrelated topics.

Changing Working Drafts

Another challenge was working with the drafts. As mentioned, no supporting material was available. All information about XQuery had to come from the specifications, which were verbose and extremely boring. The subject is interesting and practical, but the documentation can be extremely dull, especially if the specification is a developer's first introduction to the technology.

Besides lacking in thrilling drama, the drafts, as their name indicates, are simply sketches of possible ways the technology will go. They haven't been perfected. They have many unresolved issues. They haven't been thoroughly tested and criticized. To the advantage of future users, but the dismay of present implementers, they are constantly evolving, and staying abreast of all the developments can be difficult. Most of the changes introduced are welcomed by developers, but some don't seem to make sense to those who've gotten used to older specifications.

Inconsistent Specifications

With nine related working drafts, inconsistencies are bound to occur. As an implementer, one is forced to make an educated guesses, which may often be mistaken, or which may make the tool appear flawed.

Uncertainty

Unlike working drafts of technologies that have already reached recommendation status, the XQuery drafts need not be backward compatible in any way, and can change drastically without apology or explanation.

It is difficult to invest so much time and effort into a project, not knowing when a new draft will appear, and whether it will render one's tool completely obsolete. Ironically, on the same day I wrote the previous sentence, a new draft was released. One of the biggest selling points of this project was that the tools were based on the latest specification. Unfortunately, as of today, this no longer true. This makes one question whether there is still value to the work performed, whether one should spend more time updating it, and whether one should just wait for the actual recommendation before investing more effort into it.

Limited Resources

Finally, an obvious limitation to most independent projects is time constraints. Creation of a superior tool requires much time and research. The tools I created are only in the first iteration of development, and will have to be slowly improved. They are now at a point, however, where small tweaks can result in huge improvements and added functionalities. These improvements must be completed in a race against time so that the tools will indeed represent a valuable asset for the XML community.

Future Work

Parser

  • Update to April 30, 2002 draft.
  • Extend it to support full grammar.
  • Extend it to support old computed element constructor syntax while maintaining support for the most recent.
  • Make parser more efficient (i.e. void useless nodes)
  • Make grammar more restrictive. i.e. disallow : let $b := document("bib.xml")//book return $weirdvariable
  • Make approaches used within classes consistent.
  • Modify grammar to keep significant whitespace.
  • Allow attributes inside regular element constructors.

Converter

  • Extend interface and provide implementation for XSLT to XQuery conversion methods.
  • Explore better alternatives to perform conversion to XSLT.
  • Reduce coupling. Make converter less aware of tree structure
  • Correct scope of variables inside of FLWR expressions.
  • Fix buggy implementation of Let clauses.
  • Fix For expressions so that they don't use XSLT variables unless necessary.
  • Implement conversion of XQuery specific functions ( i.e. avg function), etc. into XSLT. (Or wait for XSLT 2.0?)
  • Allow queries on multiple documents?
  • Allow users to select more settings, for example, xsl:output-method settings.
  • Revise to make sure it's compliant with the April 30th 2002 draft.

Processor

  • Allow users to select more settings, for example, whether the xml declaration is excluded in the output.
  • Make possible a query without providing an input XML document (For example, allow the query <hello/> ).

Other

  • Write a tool to convert XQuery into XML XQuery syntax.

Resources

W3C XML Query Working Drafts

Early Adopter XQuery

Early Adopter XQuery . Kurt Cagle, Mark Fussel, Nalleli Lopez, Dan Maharry, Rogerio Saran. Birmingham, UK: Wrox Press Ltd, 2002.

The first book written about XQuery.

Chapters 1 - 3 introduce XQuery and the W3C working drafts.

Chapter 4 explains XQuery from the point of view of those who are familiar with SQL.

Chapter 5 explains XQuery from the point of view of those who are familiar with XSLT.

Chapter 6 shows how to use the .Net XQuery implementation

Chapter 7 shows how to use the Fatdog's Java XQuery implementation.

Evan Lenz's article on XQuery and XSLT

XQuery: Reinventing the Wheel?

XQuery vs. XSLT debate

XML-dev thread

Michael Kay at Forum XML on the Differences between XQuery and XSLT

XSLT et XQuery: une différence de culture.

Daniela Florescu on the future of XQuery, SQL, Java, and databases

Database Future Debated

XQuery website

I will probably post information contained in this report, as well as updated code, an online demonstration, XQuery tutorials, and other related XQuery information.

xquery.xmldevelopment.net