Wednesday, April 8, 2009

Differences between DOM, SAX or StAX

Today we are using XML widely in order to exchange data. When you receive an xml resource you need to parse (read it) in order to get data you need.

However there are different approaches for parsing an xml source. You should select proper approach for your needs. You may choose one of these:
  • DOM - Document Object Model,
  • SAX - Simple API for XML,
  • StAX – Streaming API for XML
Let’s discuss each one.

Parsing with DOM:

If you prefer this technique you should know that the whole XML will be loaded into memory. Advantage of this technique is you can navigate/read to any node. You can append, delete or update a child node because data is available in the memory. However if the XML contains a large data, then it will be very expensive to load it into memory. Also the whole XML is loaded to memory although you are looking for something particular.

You should consider using this technique, when you need to alter xml structure and you are sure that memory consumption is not going to be expensive. Also this is the only choice where you can navigate to parent and child elements. This makes it easier to use.

If you are creating a XML document (which is not big!) you should use this technique. However, if you are going to export a data from a database to xml (where you do not need navigation in the xml and/or data is huge) then you should consider other approaches.

DOM API is standardized by w3c.

Parsing with SAX:

SAX has totally a different approach. It starts to read the XML document from beginning to end, but it does not store anything to memory. Instead it fires events and you can add your event handler depending on your requirements.

Your event handler will be called for example when an element begins or ends, when processing of document begins or ends. For all events please follow this link.
So you register a handler (or more then one handler) and those handlers are called when an event occurs.
Here is a sample code from a site which calculates the total amount from this xml.


With SAX, first of all you do not need to worry on memory consumptions. If the performance is the criteria, (and if you are only reading the xml, not modifying it), SAX is a much better choice then DOM. However you are not going to have a tree structure where you can require parent or child elements. You should be aware where you are.

Parsing with StAX:
StAX is a newer technology then the others we discussed and it is the only one with a JSR (JSR-173).

Parsing with StAX look like parsing with SAX. Again StAX does not store anything to memory and the document is read from beginning to end once.

However in SAX, your event handler is called by SAX when an event occurs. In StAX, you ask StAX to continue to next event.

You can use StAX in two methods, the “cursor model” and the “iterator model”.

Here is a simple code fragment I found on google. “cursor model” looks like:

As you see above, next event is required by us (parser.next();). In “iterator model” the logic is same but you receive an object while iterating which contains information about the current event like:



They were technologies, we also have implementaions.

After choosing your technology you can choose an implementaiton. There are different DOM,SAX and StAX implementations.

Check these:

http://java.sun.com/j2se/1.5.0/docs/guide/xml/jaxp/index.html
http://xerces.apache.org/xerces2-j/
http://stax.codehaus.org/Home

Or better, here is a link which has archieved all: http://java-source.net/open-source/xml-parsers

4 comments:

  1. There is also vtd-xml that is better than all the options above, for heavy duty xml processing (http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf)

    ReplyDelete
  2. There is also vtd-xml that is better than all the options above, for heavy duty xml processing (http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf)

    ReplyDelete