com.thebuzzmedia.sjxp
Class XMLParser<T>

java.lang.Object
  extended by com.thebuzzmedia.sjxp.XMLParser<T>
Type Parameters:
T - The class type of any user-supplied object that the caller wishes to be passed through from one of the XMLParser's parse methods directly to the handler when an IRule matches. This is typically a data storage mechanism like a DAO or cache used to store the parsed value in some valuable way, but it can ultimately be anything. If you do not need to make use of the user object, there is no need to parameterize the class.

public class XMLParser<T>
extends Object

Class used to define a parser that makes parsing using the performance of an XML Pull Parser with the ease of XPath-like expressions possible.

Thread Safety

This class is not thread-safe, however instances of XMLParser can safely be re-used to parse multiple files once the previous parse operation is done.

Author:
Riyad Kalla (software@thebuzzmedia.com)

Nested Class Summary
(package private)  class XMLParser.Location
          Simple and fast class used to mock the behavior of a stack in the form of a string for the purposes of "pushing" and "popping" the parser's current location within an XML document as it processes START and END_TAG events.
 
Field Summary
static Boolean DEBUG
          Flag used to indicate if debugging output has been enabled by setting the "sjxp.debug" system property to true.
static Boolean ENABLE_NAMESPACES
          Flag used to indicate if this parser should be namespace-aware by setting the "sjxp.namespaces" system property to true.
static Boolean ENABLE_VALIDATION
          Flag used to indicate if this parser should validate the parsed XML against the references DTD or XML Schema by setting the "sjxp.validation" system property to true.
static String LOG_MESSAGE_PREFIX
          Prefix to every log message this library logs.
static org.xmlpull.v1.XmlPullParserFactory XPP_FACTORY
          Singleton XmlPullParserFactory instance used to create new underlying XmlPullParser instances for each instance of XMLParser.
 
Constructor Summary
XMLParser(IRule<T>... rules)
          Create a new parser that uses the given IRules when parsing any XML content.
 
Method Summary
protected  void doEndDocument(T userObject)
          Used to process a XmlPullParser.END_DOCUMENT event.
protected  void doEndTag(T userObject)
          Used to process a XmlPullParser.END_TAG event.
protected  void doParse(T userObject)
          Uses the underlying XmlPullParser to begin parsing through the XML content from the given stream.
protected  void doStartTag(T userObject)
          Used to process a XmlPullParser.START_TAG event.
protected  void doText(T userObject)
          Used to process a XmlPullParser.TEXT event.
protected  void initRules(IRule<T>... rules)
           
protected static void log(String message, Object... params)
          Helper method used to ensure a message is loggable before it is logged and then pre-pend a universal prefix to all log messages generated by this library to make the log entries easy to parse visually or programmatically.
 void parse(InputStream source)
          Parse the XML out of the given stream matching the IRules provided when the XMLParser was instantiated.
 void parse(InputStream source, String encoding)
          Parse the XML out of the given stream (producing content matching the given encoding) matching the IRules provided when the XMLParser was instantiated.
 void parse(InputStream source, String encoding, T userObject)
          Parse the XML out of the given stream (producing content matching the given encoding) matching the IRules provided when the XMLParser was instantiated.
 void parse(InputStream source, T userObject)
          Parse the XML out of the given stream matching the IRules provided when the XMLParser was instantiated.
 void stop()
          Used to indicate to the parser that you would like it to stop parsing.
 String toString()
          Overridden to provide a nicely formatted representation of the parser for easy debugging.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEBUG

public static final Boolean DEBUG
Flag used to indicate if debugging output has been enabled by setting the "sjxp.debug" system property to true. This value will be false if the "sjxp.debug" system property is undefined or set to false.

This system property can be set on startup with:
-Dsjxp.debug=true or by calling System.setProperty(String, String) before this class is loaded.

This is false by default.


ENABLE_NAMESPACES

public static final Boolean ENABLE_NAMESPACES
Flag used to indicate if this parser should be namespace-aware by setting the "sjxp.namespaces" system property to true. This value will be true if the "sjxp.namespaces" system property is undefined. Namespace awareness can only be disabled by setting this system property to false.

NOTE: If you intentionally disable namespace awareness, any IRule you provide that uses namespace qualified values (e.g. [http://w3.org/text]book) will fail to match as the parser can no longer see namespace URIs.

This system property can be set on startup with:
-Dsjxp.namespaces=true or by calling System.setProperty(String, String) before this class is loaded.

This is true by default.


ENABLE_VALIDATION

public static final Boolean ENABLE_VALIDATION
Flag used to indicate if this parser should validate the parsed XML against the references DTD or XML Schema by setting the "sjxp.validation" system property to true. This value will be false if the "sjxp.validation" system property is undefined or set to false.

This system property can be set on startup with:
-Dsjxp.validation=true or by calling System.setProperty(String, String) before this class is loaded.

This is false by default.


LOG_MESSAGE_PREFIX

public static final String LOG_MESSAGE_PREFIX
Prefix to every log message this library logs. Using a well-defined prefix helps make it easier both visually and programmatically to scan log files for messages produced by this library.

The value is "[sjxp] " (including the space).

See Also:
Constant Field Values

XPP_FACTORY

public static final org.xmlpull.v1.XmlPullParserFactory XPP_FACTORY
Singleton XmlPullParserFactory instance used to create new underlying XmlPullParser instances for each instance of XMLParser.

Constructor Detail

XMLParser

public XMLParser(IRule<T>... rules)
          throws IllegalArgumentException,
                 XMLParserException
Create a new parser that uses the given IRules when parsing any XML content.

Parameters:
rules - The rules applied to any parsed content.
Throws:
IllegalArgumentException - if rules is null or empty.
XMLParserException - if the XPP_FACTORY is unable to create a new XmlPullParser instance and throws an exception.
Method Detail

log

protected static void log(String message,
                          Object... params)
Helper method used to ensure a message is loggable before it is logged and then pre-pend a universal prefix to all log messages generated by this library to make the log entries easy to parse visually or programmatically.

If a message cannot be logged (logging is disabled) then this method returns immediately.

NOTE: Because Java will auto-box primitive arguments into Objects when building out the params array, care should be taken not to call this method with primitive values unless DEBUG is true; otherwise the VM will be spending time performing unnecessary auto-boxing calculations.

Parameters:
message - The log message in format string syntax that will be logged.
params - The parameters that will be swapped into all the place holders in the original messages before being logged.
See Also:
LOG_MESSAGE_PREFIX

toString

public String toString()
Overridden to provide a nicely formatted representation of the parser for easy debugging.

As an added bonus, since XMLParsers are intended to be immutable, the result of toString is cached on the first call and the cache returned every time to avoid re-computing the completed String.

Overrides:
toString in class Object
Returns:
a nicely formatted representation of the parser for easy debugging.

stop

public void stop()
Used to indicate to the parser that you would like it to stop parsing.

Internally the parser uses a simple boolean to indicate if it should keep parsing. A call to this method sets the boolean value to false which the parser checks at the next parse event and then stops.

This is a safe operation that simply flips a flag to tell the underlying XmlPullParser to stop working after it's done with its current parse event and return from whichever parse method was called.


parse

public void parse(InputStream source)
           throws IllegalArgumentException,
                  XMLParserException
Parse the XML out of the given stream matching the IRules provided when the XMLParser was instantiated.

The underlying XmlPullParser will attempt to determine the stream's encoding based on the pull parser spec or fall back to a default of UTF-8.

This class will make no attempt at closing the given InputStream, the caller must take care to clean up that resource.

Stopping Parsing

Parsing can be safely stopped by calling stop(). This allows IRule implementations control over stopping parsing, for example, if an arbitrary threshold is hit. A followup call to any of the parse methods will reset the stopped state.

Parameters:
source - The stream that XML content will be read out of.
Throws:
IllegalArgumentException - if source is null.
XMLParserException - if any error occurs with the underlying stream during parsing of if the XML content itself is malformed and the underlying pull parser cannot parse it.

parse

public void parse(InputStream source,
                  T userObject)
           throws IllegalArgumentException,
                  XMLParserException
Parse the XML out of the given stream matching the IRules provided when the XMLParser was instantiated.

The underlying XmlPullParser will attempt to determine the stream's encoding based on the pull parser spec or fall back to a default of UTF-8.

This class will make no attempt at closing the given InputStream, the caller must take care to clean up that resource.

Stopping Parsing

Parsing can be safely stopped by calling stop(). This allows IRule implementations control over stopping parsing, for example, if an arbitrary threshold is hit. A followup call to any of the parse methods will reset the stopped state.

Parameters:
source - The stream that XML content will be read out of.
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.
Throws:
IllegalArgumentException - if source is null.
XMLParserException - if any error occurs with the underlying stream during parsing of if the XML content itself is malformed and the underlying pull parser cannot parse it.

parse

public void parse(InputStream source,
                  String encoding)
           throws IllegalArgumentException,
                  UnsupportedEncodingException,
                  XMLParserException
Parse the XML out of the given stream (producing content matching the given encoding) matching the IRules provided when the XMLParser was instantiated.

This class will make no attempt at closing the given InputStream, the caller must take care to clean up that resource.

Stopping Parsing

Parsing can be safely stopped by calling stop(). This allows IRule implementations control over stopping parsing, for example, if an arbitrary threshold is hit. A followup call to any of the parse methods will reset the stopped state.

Parameters:
source - The stream that XML content will be read out of.
encoding - The character encoding (e.g. "UTF-8") of the data from the given stream. If the encoding is not known, passing null or calling parse(InputStream) instead will allow the underlying XmlPullParser to try and automatically determine the encoding.
Throws:
IllegalArgumentException - if source is null.
UnsupportedEncodingException - if encoding represents an encoding name that is not recognized by Charset.isSupported(String)
XMLParserException - if any error occurs with the underlying stream during parsing of if the XML content itself is malformed and the underlying pull parser cannot parse it.

parse

public void parse(InputStream source,
                  String encoding,
                  T userObject)
           throws IllegalArgumentException,
                  UnsupportedEncodingException,
                  XMLParserException
Parse the XML out of the given stream (producing content matching the given encoding) matching the IRules provided when the XMLParser was instantiated.

This class will make no attempt at closing the given InputStream, the caller must take care to clean up that resource.

Stopping Parsing

Parsing can be safely stopped by calling stop(). This allows IRule implementations control over stopping parsing, for example, if an arbitrary threshold is hit. A followup call to any of the parse methods will reset the stopped state.

Parameters:
source - The stream that XML content will be read out of.
encoding - The character encoding (e.g. "UTF-8") of the data from the given stream. If the encoding is not known, passing null or calling parse(InputStream) instead will allow the underlying XmlPullParser to try and automatically determine the encoding.
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.
Throws:
IllegalArgumentException - if source is null.
UnsupportedEncodingException - if encoding represents an encoding name that is not recognized by Charset.isSupported(String)
XMLParserException - if any error occurs with the underlying stream during parsing of if the XML content itself is malformed and the underlying pull parser cannot parse it.

initRules

protected void initRules(IRule<T>... rules)

doParse

protected void doParse(T userObject)
                throws IOException,
                       org.xmlpull.v1.XmlPullParserException
Uses the underlying XmlPullParser to begin parsing through the XML content from the given stream. This method's implementation is simple, acting like a traffic-cop responding to XmlPullParser.START_TAG, XmlPullParser.TEXT, XmlPullParser.END_TAG and XmlPullParser.END_DOCUMENT events by calling the appropriate doXXX methods.

Developers creating a subclass of XMLParser are meant to override one of the doStartTag(Object), doText(Object), doEndTag(Object) and doEndDocument(Object) methods to add custom behavior and not necessarily override this central method.

Stopping Parsing

Parsing can be safely stopped by calling stop(). This allows IRule implementations control over stopping parsing, for example, if an arbitrary threshold is hit. A followup call to any of the parse methods will reset the stopped state.

Parameters:
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.
Throws:
IOException - if an error occurs with reading from the underlying InputStream given to one of the public parse methods.
org.xmlpull.v1.XmlPullParserException - if an error occurs while parsing the XML content from the underlying stream; typically resulting from malformed or invalid XML.

doStartTag

protected void doStartTag(T userObject)
Used to process a XmlPullParser.START_TAG event.

By default this updates the internal location state of the parser, processes all IRules of type IRule.Type.TAG and processes all IRules of type IRule.Type.ATTRIBUTE that match the parser's current location.

Parameters:
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.

doText

protected void doText(T userObject)
Used to process a XmlPullParser.TEXT event.

By default this processes all IRules of type IRule.Type.CHARACTER that match the parser's current location.

Parameters:
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.

doEndTag

protected void doEndTag(T userObject)
Used to process a XmlPullParser.END_TAG event.

Parameters:
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.

doEndDocument

protected void doEndDocument(T userObject)
Used to process a XmlPullParser.END_DOCUMENT event.

By default this method simply logs a debug statement if debugging is enabled, but this stub is provided to make overriding the default behavior easier if desired.

Parameters:
userObject - The user-supplied object passed through from this parse method to the matching IRule's handleXXX method when a match is found, or null if no user object is needed. Passing through a user-object is just meant as a convenience for giving the handler methods on the IRule's access to objects like DAOs that can be used to persist or process parsed data easily.

Copyright 2011 The Buzz Media, LLC