com.thebuzzmedia.sjxp.rule
Interface IRule<T>

Type Parameters:
T - The class type of any user-supplied object that the caller wishes to be passed through from one of the XMLParser's parse methods directly to the handler when an IRule matches. This is typically a data storage mechanism like a DAO or cache used to store the parsed value in some valuable way, but it can ultimately be anything. If you do not need to make use of the user object, there is no need to parameterize the class.
All Known Implementing Classes:
DefaultRule

public interface IRule<T>

Interface used to describe a "rule" in SJXP.

The most important part of a rule is its locationPath, this literal String value is how the XMLParser matches up its current position inside of an XML doc with any IRules that want information from that location.

The type of the IRule indicates to the executing XMLParser when the rule should be queried for a match against its current position.

All implementors must provide an implementation for the handleParsedXXX method matching the type of rule they have created. More specifically, if you are creating a IRule.Type.ATTRIBUTE rule, you need to implement the handleParsedAttribute(XMLParser, int, String, Object) method; if you are implementing a IRule.Type.CHARACTER rule, you need to implement the handleParsedCharacters(XMLParser, String, Object) method.

Rule Matching

Rules will execute every single time they match an element in an XML document. There is no XPath-like expression system to tell them to only get you the first, or 10th or every-other value from a document; you must implement that logic yourself inside of the handleParsedXXX handlers.

Instance Reuse

Instances of IRule are meant to be immutable and maintain no internal state which makes them safe for reuse among multiple instances of XMLParser.

Rule Format

The format of a location path is like a simple XPath rule with no expressions, for example:
 /library/book/title
 
would point the "title" element inside of the "book" element which is inside the "library" element. If you are after a specific attribute of that element, simply provide its name as an attribute argument.

Rule Format - Namespaces

Referring to a namespace-qualified element in an XML doc is easy; whether it is part of the location path or an attribute name, all you have to do is prefix the local name of the element with brackets ([]) and the full namespace URI within the brackets, like:
 /library/[http://w3.org/texts]book/title
 
In the example above, the "book" element is from a namespace defined by "http://w3.org/texts". Inside the actual XML markup, it is likely written with a friendly URI prefix that is defined at the top of the file, and would look more like this: <txt:books> but using the URI prefixes is not exact, as they can change from document to document, so SJXP requires that you reference the namespace using the URI itself, and not a prefix.

In the case where the attribute itself is namespace-qualified, like <item rdf:about="blah" />, you use the same notation for the attribute name, in this case (assuming the official RDF namespace) the attribute name argument you would actually return would look like this:

  [http://www.w3.org/1999/02/22-rdf-syntax-ns#]about
 
It can look a little confusing, but it is exact and won't lead to impossible-to-debug scenarios.

Rule Format - Default Namespaces

Some XML files will define a default namespace using the xmlns argument, by itself, in the header. If your document does this, any tag in the document that isn't defined with a namespace prefix, will have to be referenced with the default namespace because that is how the XML file is technically defined.

An example of this is Slashdot's RDF feed (http://rss.slashdot.org/Slashdot/slashdot); a default namespace of "http://purl.org/rss/1.0/" is defined, so all un-prefixed tags in the document (like <title>, <link> or <description>) all need to be qualified with that default URI, looking like this:

  [http://purl.org/rss/1.0/]title
 
when you define the location path for those parse elements.

It is important to be aware of this aspect of XML files otherwise you will run into scenarios where you can't understand why the parse value isn't being passed to you.

Location Path & Attribute Name Strictness

The implementation of SJXP is all based around strict name and namespace URI matching. If you do not specify a namespace URI for your element or attribute names, then only non-namespace-qualified elements will be looked for and matched; and visa-versa.

If the XML content you are parsing is sloppy and you aren't sure if the values will be qualified correctly in every case, you will need to define 2 IRules; 1 for non-namespace-qualified values and 1 for namespace-qualified values.

The SJXP library was purposefully designed to be pedantic to avoid "fuzzy" behavior that becomes maddening to debug in edge-case scenarios where you can't figure out why it is working one minute and breaking the next.

Given the need of XML parsing in everything from video games to banking applications, SJXP had to take a very conservative approach and be as pedantic as possible so as not to hide any behavior from the caller.

Author:
Riyad Kalla (software@thebuzzmedia.com)

Nested Class Summary
static class IRule.Type
          Used to describe the type of the parse rule.
 
Method Summary
 String[] getAttributeNames()
          Used to get a list of attribute names that are to be parsed from the element located at getLocationPath().
 String getLocationPath()
          Used to get the location path of the element inside the XML document that this rule is interested in.
 IRule.Type getType()
          Used to get the type of the rule.
 void handleParsedAttribute(XMLParser<T> parser, int index, String value, T userObject)
          Handler method called by the XMLParser when an IRule of type IRule.Type.ATTRIBUTE matches the parser's current location in the document.
 void handleParsedCharacters(XMLParser<T> parser, String text, T userObject)
          Handler method called by the XMLParser when an IRule of type IRule.Type.CHARACTER matches the parser's current location in the document.
 void handleTag(XMLParser<T> parser, boolean isStartTag, T userObject)
          Handler method called by the XMLParser when an IRule of type IRule.Type.TAG matches the parser's current location in the document.
 

Method Detail

getType

IRule.Type getType()
Used to get the type of the rule.

The XMLParser uses this value to decide when to call this rule to see if it matches the current position inside the doc and how to parse out the values the rule wants.

Returns:
the type of the rule.

getLocationPath

String getLocationPath()
Used to get the location path of the element inside the XML document that this rule is interested in.

This value is compared literally against the internal path state of the XMLParser to see if they match before processing the rule. If you have a rule that isn't executing, chances are your location path is incorrect or mistyped or it is possible that your location path is correct but you have implemented the wrong handleXXX method so the default no-op one in DefaultRule is getting called.

Namespaces

Please refer to the class notes on the correct format used to define a path element that is namespace-qualified by using brackets.

Namespace qualifiers can be specified for both element paths and attribute names.

Returns:
the location path of the element inside the XML document that this rule is interested in.

getAttributeNames

String[] getAttributeNames()
Used to get a list of attribute names that are to be parsed from the element located at getLocationPath().

If the rule type is IRule.Type.CHARACTER, the attribute name list should be ignored.

Namespaces

Please refer to the class notes on the correct format used to define a path element that is namespace-qualified by using brackets.

Namespace qualifiers can be specified for both element paths and attribute names.

Returns:
a list of attribute names that are to be parsed from the element located at getLocationPath().

handleTag

void handleTag(XMLParser<T> parser,
               boolean isStartTag,
               T userObject)
Handler method called by the XMLParser when an IRule of type IRule.Type.TAG matches the parser's current location in the document.

This is a notification-style method, no data is parsed from the underlying document, the handler is merely called to give custom handling code a chance to respond to the matching open or close tag.

Parameters:
parser - The source XMLParser currently executing this rule. Providing access to the originating parser is handy if the rule wants to stop parsing by calling XMLParser.stop() .
isStartTag - Used to indicate if this notification is being made because the START_TAG (true) was encountered or the END_TAG (false) was encountered.
userObject - The user-supplied object passed through from the XMLParser's parse method directly to this handler. This is typically a data storage mechanism like a DAO or cache used to hold parsed data or null if you do not need to make use of this pass-through mechanism and passed nothing to the XMLParser when you initiated the parse.

handleParsedAttribute

void handleParsedAttribute(XMLParser<T> parser,
                           int index,
                           String value,
                           T userObject)
Handler method called by the XMLParser when an IRule of type IRule.Type.ATTRIBUTE matches the parser's current location in the document.

Parameters:
parser - The source XMLParser currently executing this rule. Providing access to the originating parser is handy if the rule wants to stop parsing by calling XMLParser.stop() .
index - The index of the attribute name (from getAttributeNames()) that this value belongs to.
value - The value for the given attribute.
userObject - The user-supplied object passed through from the XMLParser's parse method directly to this handler. This is typically a data storage mechanism like a DAO or cache used to hold parsed data or null if you do not need to make use of this pass-through mechanism and passed nothing to the XMLParser when you initiated the parse.
See Also:
getLocationPath(), getAttributeNames()

handleParsedCharacters

void handleParsedCharacters(XMLParser<T> parser,
                            String text,
                            T userObject)
Handler method called by the XMLParser when an IRule of type IRule.Type.CHARACTER matches the parser's current location in the document.

This method is not called by the XMLParser until all the character data has been coalesced together into a single String. You don't need to worry about re-combining chunked text elements.

Parameters:
parser - The source XMLParser currently executing this rule. Providing access to the originating parser is handy if the rule wants to stop parsing by calling XMLParser.stop() .
text - The character data contained between the open and close tags described by getLocationPath().
userObject - The user-supplied object passed through from the XMLParser's parse method directly to this handler. This is typically a data storage mechanism like a DAO or cache used to hold parsed data or null if you do not need to make use of this pass-through mechanism and passed nothing to the XMLParser when you initiated the parse.
See Also:
getLocationPath()

Copyright 2011 The Buzz Media, LLC