HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
PROFESSIONAL XML PART 2 - THE ORIGINS AND STRUCTURE OF SAX

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

The history of SAX is unusually well documented, because all the discussion took place on the public XML-DEV mailing list, whose archives are available at http://www.lists.ic.ac.uk/hypermail/xml-dev/. David Megginson has also summarized its history at http://www.megginson.com/SAX/history.html.


This free tutorial is a sample from the book Professional XML.


The process started late in 1997 as a result of pressure from XML users such as Peter Murray-Rust, who was developing XML applications and struggling with the needless incompatibility of different parsers. Suppliers of early XML parsers, including Tim Bray, David Megginson, and James Clark contributed to the discussion, and many other members of the list commented on the various drafts. David Megginson devised a process, rather in the spirit of the original Internet "Request for Comments", whereby comments and suggestions could be handled promptly yet fairly, and he eventually declared the specification frozen on 11 May 1998.

One of the major reasons for the success of SAX was that along with the initial specification, Megginson supplied front-end drivers for several popular XML parsers, including his own Ælfred, Tim Bray's Lark, and Microsoft's MSXML. Once SAX was established in this way, other parser writers such as IBM, Sun, and ORACLE were quick to incorporate native SAX interfaces into their own parsers, to enable existing applications to run with their products.

The definitive SAX specification is written in terms of Java interfaces. It has been adapted to other languages, though the only one we know of that is actively supported is an interface for the Python language, produced by Lars Marius Garshol (see http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxlib.html ). Of course, the Java interfaces can be used from other languages that interoperate with Java, for example by using Microsoft's Java VM that interfaces Java to COM. In this chapter, however, we'll stick to the original Java.

The Structure of SAX

SAX is structured as a number of Java interfaces. It's very important to understand the difference between an interface and a class:

  • An interface says what methods there are, and what kind of parameters they expect. It is purely a specification; it doesn't provide any code to execute when the methods are called. But it is a concrete specification, not just a scrap of paper, and the Java compiler will check that a class that claims to implement an interface does so correctly.
  • A class provides executable methods, including public methods that can be called by the code in other classes.
  • A class may implement one or more interfaces. In many cases SAX specifies several interfaces which could theoretically be implemented by separate classes, but which in practice are often implemented in combination by a single class. To implement an interface, a class must supply code for each of the methods defined in the interface.
  • Several classes may implement the same interface. Of course this is the whole point of the SAX exercise - there are lots of implementations of the SAX Parser interface for you to choose from, and because they all implement the same interface, your application doesn't care which one it is using.
Some of the interfaces in SAX are implemented by classes within the parser, and some must be implemented by classes within the application. There are some classes supplied with SAX itself, though you don't have to use these. And there are some classes (such as the error handling classes), which the parser must provide, but which your application can override if it wishes.

The Basic Structure

The components of a simple SAX application are shown in the diagram below.

In the diagram:

  • The Application is the "main program": the code that you write to start the whole process off.
  • The Document Handler is code that you write to process the contents of the document.
  • The Parser is an XML Parser that conforms to the SAX standard.

The job of the application is to create a parser (more technically, to instantiate a class that implements the org.xml.sax.Parser interface); to create a document handler (by instantiating a class that implements the org.xml.sax.DocumentHandler interface); to tell the parser what document handler to use (by calling the parser's setDocumentHandler() method); and to tell the parser to start processing a particular input document (by calling the parse() method of the parser).

The job of the parser is to notify the document handler of all the interesting things it finds in the document, such as element start tags and end tags.

The job of the document handler is to process these notifications to achieve whatever the application requires.

A Simple Example

Let's look at a very simple application: one that simply counts how many <book> elements there are in the supplied XML file (shown later).

In this example we will simplify the structure shown in the diagram above by using the same class to act as both the application and the document handler. The reason we can do this is that one Java class can implement several interfaces, so it can perform several roles at once.

The first thing the application must do is to create a parser:

import org.xml.sax.*;

...

Parser p = new com.jclark.xml.sax.Driver();

This is the only time you need to say which particular SAX parser you are using. We have chosen the xp parser produced by James Clark, and available from http://www.jclark.com. Like any other Java class you use, of course, it must be on the Java classpath.

The chosen parser must implement the SAX Parser interface org.xml.sax.Parser (if it doesn't, Java will complain loudly), so it can be assigned to a variable of type Parser. Because of the import statement at the top, Parser is actually a shorthand for org.xml.sax.Parser.

So you need to know the relevant class name of your chosen parser. Oddly, many of the available SAX parsers don't advertise their parser class name in bright lights. So here is a list of some of the more popular parsers, with the class name you need to use to instantiate them. (Note however that this may change with later versions of the products.)

So, you've created a parser. Now you can start telling it what to do.

First you need to tell the parser what document handler to call when events occur. This can be any class that implements the SAX org.xml.sax.DocumentHandler interface. The simplest and most common approach is to make your application itself act as the document handler.

DocumentHandler itself is an interface defined in SAX. You could make your application program implement this interface directly, in which case you would have to provide code for all the different methods required by that interface. In our example, however, we want to ignore most of the events, so it would be rather tedious to define lots of methods that do nothing. Fortunately SAX supplies an implementation of DocumentHandler that does nothing, HandlerBase, and we can make our application extend this, so it inherits all the "do nothing" methods. Let's do this:

import org.xml.sax.*;

...

public class BookCounter extends HandlerBase 
{
public void countBooks() 
{
  Parser p = new com.jclark.xml.sax.Driver();
  p.setDocumentHandler(this);
}
}

The call on setDocumentHandler() tells the parser that "this" class (your application program) is to receive notification of events. This class is an implementation of org.xml.sax.DocumentHandler, because it inherits from org.xml.sax.HandlerBase, which in turn implements DocumentHandler.

The parser is now almost ready to go; all it needs is a document to parse, and the Java main method that lets it operate as a standalone program. Let's give it a file to parse first:

import org.xml.sax.*;
...

public class BookCounter extends HandlerBase
{

public void countBooks() throws Exception 
{

  Parser p = new com.jclark.xml.sax.Driver();
  p.setDocumentHandler(this);
  p.parse("file:///C:/data/books.xml");
}
}

Note that the argument to parse() is a URL, supplied as a string. We'll show you later how to supply a filename rather than a URL. Because the program now involves data input and output we must also add "throws Exception" to the countBooks method to alert if there are errors.

We need to make one more addition to get the program to run as a standalone application: the Java main method. In the main method we create an instance of the class, with new BookCounter(), and then call the object's countBooks method; we also trap exceptions again for the new object as a whole. Our code should then look like this:

import org.xml.sax.*;
...

public class BookCounter extends HandlerBase
{ 

public static void main (String args[]) throws Exception 
{
  (new BookCounter()).countBooks();
}

  public void countBooks() throws Exception 
  {
   Parser p = new com.jclark.xml.sax.Driver();
   p.setDocumentHandler(this);
   p.parse("file:///C:/data/books.xml");
  }
}

The program can now be run: it will parse the document and run to completion (assuming, of course, that the document is there to be parsed).

Continued...


NEXT PAGE



5 RELATED COURSES AVAILABLE
HTML 4.0 INTRODUCTION
To create, format and publish a small website using HTML 4.0. You will learn to create web pages incorporating fo....
MICROSOFT INTERNET EXPLORER 6.0 INTERNET INTRODUCTION
This course provides readers with an introduction to the concept of the Internet and the opportunity to gain a br....
A+ MODULE 5 - THE INTERNET
At the end of this course you will be able to: describe the functions of an operating system, describe the featur....
JAVASCRIPT PROGRAMMING
This training course aims to teach the reader the fundamentals of JavaScript. This course covers topics such as -....
I-NET+ MODULE 8 - DEVELOPING A WEB SITE
On completion of this module, readers will be able to: create HTML pages incorporating different document-, parag....
 
0 RELATED JOBS AVAILABLE
CONTACT US
Thursday 8th January 2009  © COPYRIGHT 2009 - VISUALSOFT