001    /**   
002     * Copyright 2011 The Buzz Media, LLC
003     * 
004     * Licensed under the Apache License, Version 2.0 (the "License");
005     * you may not use this file except in compliance with the License.
006     * You may obtain a copy of the License at
007     *
008     *   http://www.apache.org/licenses/LICENSE-2.0
009     *
010     * Unless required by applicable law or agreed to in writing, software
011     * distributed under the License is distributed on an "AS IS" BASIS,
012     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
013     * See the License for the specific language governing permissions and
014     * limitations under the License.
015     */
016    package com.thebuzzmedia.sjxp.rule;
017    
018    import com.thebuzzmedia.sjxp.XMLParser;
019    
020    /**
021     * Interface used to describe a "rule" in SJXP.
022     * <p/>
023     * The most important part of a rule is its <code>locationPath</code>, this
024     * literal {@link String} value is how the {@link XMLParser} matches up its
025     * current position inside of an XML doc with any {@link IRule}s that want
026     * information from that location.
027     * <p/>
028     * The <code>type</code> of the {@link IRule} indicates to the executing
029     * {@link XMLParser} when the rule should be queried for a match against its
030     * current position.
031     * <p/>
032     * All implementors must provide an implementation for the
033     * <code>handleParsedXXX</code> method matching the <code>type</code> of rule
034     * they have created. More specifically, if you are creating a
035     * {@link Type#ATTRIBUTE} rule, you need to implement the
036     * {@link #handleParsedAttribute(XMLParser, int, String, Object)} method; if you
037     * are implementing a {@link Type#CHARACTER} rule, you need to implement the
038     * {@link #handleParsedCharacters(XMLParser, String, Object)} method.
039     * <h3>Rule Matching</h3>
040     * Rules will execute every single time they match an element in an XML
041     * document. There is no XPath-like expression system to tell them to only get
042     * you the first, or 10th or every-other value from a document; you must
043     * implement that logic yourself inside of the <code>handleParsedXXX</code>
044     * handlers.
045     * <h3>Instance Reuse</h3>
046     * Instances of {@link IRule} are meant to be immutable and maintain no internal
047     * state which makes them safe for reuse among multiple instances of
048     * {@link XMLParser}.
049     * <h3>Rule Format</h3>
050     * The format of a location path is like a simple XPath rule with no
051     * expressions, for example:
052     * 
053     * <pre>
054     * /library/book/title
055     * </pre>
056     * 
057     * would point the "title" element inside of the "book" element which is inside
058     * the "library" element. If you are after a specific attribute of that element,
059     * simply provide its name as an attribute argument.
060     * <h3>Rule Format - Namespaces</h3>
061     * Referring to a namespace-qualified element in an XML doc is easy; whether it
062     * is part of the location path or an attribute name, all you have to do is
063     * prefix the local name of the element with brackets ([]) and the full
064     * namespace URI within the brackets, like:
065     * 
066     * <pre>
067     * /library/[http://w3.org/texts]book/title
068     * </pre>
069     * 
070     * In the example above, the "book" element is from a namespace defined by
071     * "http://w3.org/texts". Inside the actual XML markup, it is likely written
072     * with a friendly URI prefix that is defined at the top of the file, and would
073     * look more like this: <em>
074     * &lt;txt:books&gt;
075     * </em> but using the URI prefixes is not exact, as they can change from
076     * document to document, so SJXP requires that you reference the namespace using
077     * the URI itself, and not a prefix.
078     * <p/>
079     * In the case where the attribute itself is namespace-qualified, like
080     * <em>&lt;item rdf:about="blah" /&gt;</em>, you use the same notation for the
081     * attribute name, in this case (assuming the official RDF namespace) the
082     * attribute name argument you would actually return would look like this:
083     * 
084     * <pre>
085     *  [http://www.w3.org/1999/02/22-rdf-syntax-ns#]about
086     * </pre>
087     * 
088     * It can look a little confusing, but it is exact and won't lead to
089     * impossible-to-debug scenarios.
090     * <h3>Rule Format - Default Namespaces</h3>
091     * Some XML files will define a default namespace using the <code>xmlns</code>
092     * argument, by itself, in the header. If your document does this, any tag in
093     * the document that isn't defined with a namespace prefix, will have to be
094     * referenced with the default namespace because that is how the XML file is
095     * technically defined.
096     * <p/>
097     * An example of this is Slashdot's RDF feed
098     * (http://rss.slashdot.org/Slashdot/slashdot); a default namespace of
099     * "http://purl.org/rss/1.0/" is defined, so all un-prefixed tags in the
100     * document (like &lt;title&gt;, &lt;link&gt; or &lt;description&gt;) all need
101     * to be qualified with that default URI, looking like this:
102     * 
103     * <pre>
104     *  [http://purl.org/rss/1.0/]title
105     * </pre>
106     * 
107     * when you define the location path for those parse elements.
108     * <p/>
109     * It is important to be aware of this aspect of XML files otherwise you will
110     * run into scenarios where you can't understand why the parse value isn't being
111     * passed to you.
112     * <h3>Location Path &amp; Attribute Name Strictness</h3>
113     * The implementation of SJXP is all based around strict name and namespace URI
114     * matching. If you do not specify a namespace URI for your element or attribute
115     * names, then only non-namespace-qualified elements will be looked for and
116     * matched; and visa-versa.
117     * <p/>
118     * If the XML content you are parsing is sloppy and you aren't sure if the
119     * values will be qualified correctly in every case, you will need to define 2
120     * {@link IRule}s; 1 for non-namespace-qualified values and 1 for
121     * namespace-qualified values.
122     * <p/>
123     * The SJXP library was purposefully designed to be pedantic to avoid "fuzzy"
124     * behavior that becomes maddening to debug in edge-case scenarios where you
125     * can't figure out why it is working one minute and breaking the next.
126     * <p/>
127     * Given the need of XML parsing in everything from video games to banking
128     * applications, SJXP had to take a very conservative approach and be as
129     * pedantic as possible so as not to hide any behavior from the caller.
130     * 
131     * @param <T>
132     *            The class type of any user-supplied object that the caller wishes
133     *            to be passed through from one of the {@link XMLParser}'s
134     *            <code>parse</code> methods directly to the handler when an
135     *            {@link IRule} matches. This is typically a data storage mechanism
136     *            like a DAO or cache used to store the parsed value in some
137     *            valuable way, but it can ultimately be anything. If you do not
138     *            need to make use of the user object, there is no need to
139     *            parameterize the class.
140     * 
141     * @author Riyad Kalla (software@thebuzzmedia.com)
142     */
143    public interface IRule<T> {
144            /**
145             * Used to describe the type of the parse rule.
146             */
147            public static enum Type {
148                    /**
149                     * Type used to indicate a rule interested in START_TAG and END_TAG
150                     * events for the matching location path.
151                     * <p/>
152                     * This can be handy when no parsed data is needed from the underlying
153                     * XML, but rather a simple notification that the location path existed
154                     * in the XML (e.g. counting element occurrences).
155                     */
156                    TAG,
157                    /**
158                     * Type used to indicate that this rule describes 1 or more attribute
159                     * values that the caller wants parsed.
160                     */
161                    ATTRIBUTE,
162                    /**
163                     * Used to describe a rule that will be called
164                     * 
165                     * Type used to indicate that this rule describes the character data
166                     * between an open and close tag that the caller wants parsed.
167                     */
168                    CHARACTER;
169            }
170    
171            /**
172             * Used to get the type of the rule.
173             * <p/>
174             * The {@link XMLParser} uses this value to decide when to call this rule to
175             * see if it matches the current position inside the doc and how to parse
176             * out the values the rule wants.
177             * 
178             * @return the type of the rule.
179             */
180            public Type getType();
181    
182            /**
183             * Used to get the location path of the element inside the XML document that
184             * this rule is interested in.
185             * <p/>
186             * This value is compared literally against the internal path state of the
187             * {@link XMLParser} to see if they match before processing the rule. If you
188             * have a rule that isn't executing, chances are your location path is
189             * incorrect or mistyped or it is possible that your location path is
190             * correct but you have implemented the wrong <code>handleXXX</code> method
191             * so the default no-op one in {@link DefaultRule} is getting called.
192             * <h3>Namespaces</h3>
193             * Please refer to the class notes on the correct format used to define a
194             * path element that is namespace-qualified by using brackets.
195             * <p/>
196             * Namespace qualifiers can be specified for both element paths and
197             * attribute names.
198             * 
199             * @return the location path of the element inside the XML document that
200             *         this rule is interested in.
201             */
202            public String getLocationPath();
203    
204            /**
205             * Used to get a list of attribute names that are to be parsed from the
206             * element located at {@link #getLocationPath()}.
207             * <p/>
208             * If the rule type is {@link Type#CHARACTER}, the attribute name list
209             * should be ignored.
210             * <h3>Namespaces</h3>
211             * Please refer to the class notes on the correct format used to define a
212             * path element that is namespace-qualified by using brackets.
213             * <p/>
214             * Namespace qualifiers can be specified for both element paths and
215             * attribute names.
216             * 
217             * @return a list of attribute names that are to be parsed from the element
218             *         located at {@link #getLocationPath()}.
219             */
220            public String[] getAttributeNames();
221    
222            /**
223             * Handler method called by the {@link XMLParser} when an {@link IRule} of
224             * type {@link Type#TAG} matches the parser's current location in the
225             * document.
226             * <p/>
227             * This is a notification-style method, no data is parsed from the
228             * underlying document, the handler is merely called to give custom handling
229             * code a chance to respond to the matching open or close tag.
230             * 
231             * @param parser
232             *            The source {@link XMLParser} currently executing this rule.
233             *            Providing access to the originating parser is handy if the
234             *            rule wants to stop parsing by calling {@link XMLParser#stop()}
235             *            .
236             * @param isStartTag
237             *            Used to indicate if this notification is being made because
238             *            the START_TAG (<code>true</code>) was encountered or the
239             *            END_TAG (<code>false</code>) was encountered.
240             * @param userObject
241             *            The user-supplied object passed through from the
242             *            {@link XMLParser}'s <code>parse</code> method directly to this
243             *            handler. This is typically a data storage mechanism like a DAO
244             *            or cache used to hold parsed data or <code>null</code> if you
245             *            do not need to make use of this pass-through mechanism and
246             *            passed nothing to the {@link XMLParser} when you initiated the
247             *            parse.
248             */
249            public void handleTag(XMLParser<T> parser, boolean isStartTag, T userObject);
250    
251            /**
252             * Handler method called by the {@link XMLParser} when an {@link IRule} of
253             * type {@link Type#ATTRIBUTE} matches the parser's current location in the
254             * document.
255             * 
256             * @param parser
257             *            The source {@link XMLParser} currently executing this rule.
258             *            Providing access to the originating parser is handy if the
259             *            rule wants to stop parsing by calling {@link XMLParser#stop()}
260             *            .
261             * @param index
262             *            The index of the attribute name (from
263             *            {@link #getAttributeNames()}) that this value belongs to.
264             * @param value
265             *            The value for the given attribute.
266             * @param userObject
267             *            The user-supplied object passed through from the
268             *            {@link XMLParser}'s <code>parse</code> method directly to this
269             *            handler. This is typically a data storage mechanism like a DAO
270             *            or cache used to hold parsed data or <code>null</code> if you
271             *            do not need to make use of this pass-through mechanism and
272             *            passed nothing to the {@link XMLParser} when you initiated the
273             *            parse.
274             * 
275             * @see #getLocationPath()
276             * @see #getAttributeNames()
277             */
278            public void handleParsedAttribute(XMLParser<T> parser, int index,
279                            String value, T userObject);
280    
281            /**
282             * Handler method called by the {@link XMLParser} when an {@link IRule} of
283             * type {@link Type#CHARACTER} matches the parser's current location in the
284             * document.
285             * <p/>
286             * This method is not called by the {@link XMLParser} until all the
287             * character data has been coalesced together into a single {@link String}.
288             * You don't need to worry about re-combining chunked text elements.
289             * 
290             * @param parser
291             *            The source {@link XMLParser} currently executing this rule.
292             *            Providing access to the originating parser is handy if the
293             *            rule wants to stop parsing by calling {@link XMLParser#stop()}
294             *            .
295             * @param text
296             *            The character data contained between the open and close tags
297             *            described by {@link #getLocationPath()}.
298             * @param userObject
299             *            The user-supplied object passed through from the
300             *            {@link XMLParser}'s <code>parse</code> method directly to this
301             *            handler. This is typically a data storage mechanism like a DAO
302             *            or cache used to hold parsed data or <code>null</code> if you
303             *            do not need to make use of this pass-through mechanism and
304             *            passed nothing to the {@link XMLParser} when you initiated the
305             *            parse.
306             * 
307             * @see #getLocationPath()
308             */
309            public void handleParsedCharacters(XMLParser<T> parser, String text,
310                            T userObject);
311    }