[ fromfile: xmlparsing.xml id: saxparsing ]
When using SAX-style XML parsers, the flow of execution depends entirely on the data being read sequentially from a file or stream. This inversion of control means that tracing the thread of execution requires a stack to keep track of passive calls to callback functions. Furthermore, our code (overrides of virtual functions) will be called by parsing code inside the Qt library.
Invoking the parser involves creating a reader
and a handler
, hooking them up, and calling parse()
, as shown in Example 15.4.
Example 15.4. src/xml/sax1/tagreader.cpp
#include "myhandler.h" #include <QFile> #include <QXmlInputSource> #include <QXmlSimpleReader> #include <QDebug> int main( int argc, char **argv ) { if ( argc < 2 ) { qDebug() << QString("Usage: %1 <xmlfiles>").arg(argv[0]); return 1; } MyHandler handler; QXmlSimpleReader reader; reader.setContentHandler( &handler ); for ( int i=1; i < argc; ++i ) { QFile xmlFile( argv[i] ); QXmlInputSource source( &xmlFile ); reader.parse( source ); } return 0; }
<include src="src/xml/sax1/tagreader.cpp" href="src/xml/sax1/tagreader.cpp" id="tagreadercpp" mode="cpp"/>
The interface for parsing XML is described in the abstract base class QXmlContentHandler. We call this a passive interface because it is not our own code that calls MyHandler methods. A QXmlSimpleReader object reads an XML file and generates parse events, to which it then responds by calling MyHandler methods. Figure 15.1 shows the main classes involved.
For the XML reader to provide any useful information, it needs an object to receive parse events. This object, a parse event handler, must implement the interface specified by its abstract base class, so it can "plug into" the parser, as shown in Figure 15.2.
The handler derives (directly or indirectly) from QXmlContentHandler.
The virtual
methods get called by the parser when it encounters various elements of the XML file during parsing.
This is event-driven parsing: You do not call these functions directly.
Example 15.5, shows a class that extends the default handler so that it can respond to parse events in the particular way required by our application.
Example 15.5. src/xml/sax1/myhandler.h
[ . . . . ] #include <QXmlDefaultHandler> class QString; class MyHandler : public QXmlDefaultHandler { public: bool startDocument(); bool startElement( const QString & namespaceURI, const QString & localName, const QString & qName, const QXmlAttributes & atts); bool characters(const QString& text); bool endElement( const QString & namespaceURI, const QString & localName, const QString & qName ); private: QString indent; }; [ . . . . ]
<include src="src/xml/sax1/myhandler.h" href="src/xml/sax1/myhandler.h" id="myhandlerh" mode="cpp"/>
Functions that are called passively are often referred to as callbacks.
They respond to events generated by the parser.
The client code for MyHandler
is the QXmlSimpleReader class, inside the Qt XML Module.
If you do not properly override each handler method that will be used by your application, the corresponding QXmlDefaultHandler method, which does nothing, is called instead. In the body of a handler function, you can
Store the parse results in a data structure.
Create objects according to certain rules.
Print or transform the data in a different format.
Do other useful things.
Example 15.6 contains the definition of a concrete event handler.
Example 15.6. src/xml/sax1/myhandler.cpp
[ . . . . ] QTextStream cout(stdout); bool MyHandler::startDocument() { indent = ""; return TRUE; } bool MyHandler::characters(const QString& text) { QString t = text; cout << t.remove('\n'); return TRUE; } bool MyHandler::startElement( const QString&, const QString&, const QString& qName, const QXmlAttributes& atts) { QString str = QString("\n%1\\%2").arg(indent).arg(qName); cout << str; if (atts.length()>0) { QString fieldName = atts.qName(0); QString fieldValue = atts.value(0); cout << QString("(%2=%3)").arg(fieldName).arg(fieldValue); } cout << "{"; indent += " "; return TRUE; } bool MyHandler::endElement( const QString&, const QString& , const QString& ) { indent.remove( 0, 4 ); cout << "}"; return TRUE; } [ . . . . ]
<include src="src/xml/sax1/myhandler.cpp" href="src/xml/sax1/myhandler.cpp" id="myhandlercpp" allfiles="1" mode="cpp"/>
The QXmlAttributes object passed into the startElement()
function is a map used to hold the name = value attribute pairs that were contained in the XML elements.
As it processes the file, the parse()
function calls characters()
, startElement()
, and endElement()
as these events are encountered in the file.
Whenever a string of ordinary characters is found between the beginning and end of a tag, it's passed to the characters()
function.
We ran this program on Example 15.3, and it transformed that document into Example 15.7, something that looks a little like LaTex, another document format.
Example 15.7. src/xml/sax1/tagreader-output.txt
\section(id=xmlintro){ \title{ Intro to XML } \para{ This is a paragraph } \ul{ \li{ This is an unordered list item. } \li(c=textbook){ This only shows up in the textbook } } \p{ Look at this example code below: } \include(src=xmlsamplecode.cpp){}}
<include src="src/xml/sax1/tagreader-output.txt" href="src/xml/sax1/tagreader-output.txt" id="tagreaderoutputtxt" mode="text"/>
Generated: 2012-03-02 | © 2012 Alan Ezust and Paul Ezust. |