org.cdlib.xtf.textEngine
Class QueryRequestParser

Object
  extended by QueryRequestParser

public class QueryRequestParser
extends Object

Processes URL parameters into a Lucene query, using a stylesheet to perform the heavy lifting.

Author:
Martin Haye

Nested Class Summary
private static class QueryRequestParser.QueryEntry
          Keeps track of all the queries for a given field
 class QueryRequestParser.QueryFormatError
          Exception class used to report errors from the query generator.
 
Field Summary
private  File baseDir
          Keeps track of the servlet base directory, used to map relative file paths.
private  Configuration config
          Configuration object used when building trees (only created if necessary.)
private static int DEFAULT_MAX_SNIPPETS
          Default value for maxSnippets, so we can recognize difference between the default and a user-specified value.
private  Vector groupSpecs
          Accumulated list of grouping specifications
private  QueryRequest req
          Partially parsed request in progress
private  HashSet specifiedGlobalAttrs
          Global attributes that were actually specified in the query
private  NodeInfo topNode
          The top-level source node
 
Constructor Summary
QueryRequestParser()
           
 
Method Summary
private  Query createMultiFieldQuery(EasyNode parent, String[] fields, float[] boosts, SpanQuery[] spanQueries, Vector<Query> notVec, int slop, int maxMetaSnippets, int maxTextSnippets)
          Does the work of creating the guts of a keyword query.
static Query deChunk(Query q)
          Ensures that the given query, if it is a span query on the "text" field, is wrapped by a de-chunking query.
private  void error(String message)
          Convenience function to throw a QueryGenException with the given message.
 File getBaseDir()
          Get the base directory from which relative paths are resolved
 Source getSource()
          Get an XML source suitable for re-creating this query
private  String getText(EasyNode el)
          Ensures that the element has only a single child node (ignoring attributes), and that it's a text node.
private  boolean isWildcardTerm(Term term)
          Determines if the term contains a wildcard character ('*' or '?')
(package private)  Query makeProxQuery(EasyNode parent, int slop, String field, int maxSnippets)
          Generate a proximity query on a field.
private  int onceOnlyAttrib(int oldVal, EasyNode el, String attribName)
          Like parseIntAttrib(), but adds additional processing to ensure that global parameters are only specified once (or if multiple times, that the same value is used each time.)
private  String onceOnlyAttrib(String oldVal, EasyNode el, String attribName)
          Like parseStringAttrib(), but adds additional processing to ensure that global parameters are only specified once (or if multiple times, that the same value is used each time.)
private  String onceOnlyPath(String oldVal, EasyNode el, String attribName)
          Like onceOnlyAttrib(), but also ensures that the given file can actually be resolved as a path that can be read.
private  boolean parseBooleanAttrib(EasyNode el, String attribName)
          Locate the named attribute and retrieve its value as an boolean.
private  boolean parseBooleanAttrib(EasyNode el, String attribName, boolean defaultVal)
          Locate the named attribute and retrieve its value as an boolean.
private  boolean parseBooleanAttrib(EasyNode el, String attribName, boolean useDefault, boolean defaultVal)
          Locate the named attribute and retrieve its value as an boolean.
(package private)  void parseFacetSpec(EasyNode el)
          Parses a 'facet' element and adds a FacetSpec to the query.
private  String parseField(EasyNode el, String parentField)
          If the given element has a 'field' attribute, return its value; otherwise return 'parentField'.
private  float[] parseFieldBoosts(EasyNode parent, String attrName)
          Parse a list of field boosts.
private  String[] parseFieldNames(EasyNode parent, String attrName)
          Parse a list of field names.
private  float parseFloatAttrib(EasyNode el, String attribName)
          Locate the named attribute and retrieve its value as a float.
private  float parseFloatAttrib(EasyNode el, String attribName, boolean useDefault, float defaultVal)
          Locate the named attribute and retrieve its value as a float.
private  float parseFloatAttrib(EasyNode el, String attribName, float defaultVal)
          Locate the named attribute and retrieve its value as a float.
private  int parseIntAttrib(EasyNode el, String attribName)
          Locate the named attribute and retrieve its value as an integer.
private  int parseIntAttrib(EasyNode el, String attribName, boolean useDefault, int defaultVal)
          Locate the named attribute and retrieve its value as an integer.
private  int parseIntAttrib(EasyNode el, String attribName, int defaultVal)
          Locate the named attribute and retrieve its value as an integer.
(package private)  void parseMainAttrib(EasyNode el, String attrName, String val)
          Parse an attribute on the main query element (or, for backward compatability, on its immediate children.)
private  Query parseMoreLike(EasyNode parent, String field, int maxSnippets)
          Parses a "more like this" query.
private  Query parseMultiFieldQuery(EasyNode parent, String field, int maxSnippets)
          Parse a 'keyword' query, known internally as a multi-field AND.
private  void parseOutput(EasyNode main)
          Processes the main query node, turning it into a Lucene query.
private  void parseOutputTop(EasyNode output)
          Processes the output of the generator stylesheet, turning it into a Lucene query.
private  Query parseQuery(EasyNode parent, String field, int maxSnippets)
          Recursively parse a query.
private  Query parseQuery2(EasyNode parent, String name, String field, int maxSnippets)
          Main work of recursively parsing a query.
private  Query parseRange(EasyNode parent, String field, int maxSnippets)
          Parse a range query.
 QueryRequest parseRequest(Source queryDoc, File baseDir)
          Produce a Lucene query from the intermediate format that is normally produced by the formatting stylesheet.
 QueryRequest parseRequest(Source queryDoc, File baseDir, String defaultIndexPath)
          Produce a Lucene query from the intermediate format that is normally produced by the formatting stylesheet.
private  SpanQuery parseSectionType(EasyNode parent, String field, int maxSnippets)
          Parse a 'sectionType' query element, if one is present.
(package private)  void parseSpellcheck(EasyNode el)
          Parses a 'spellcheck' element and adds a SpellcheckParams to the query.
private  String parseStringAttrib(EasyNode el, String attribName)
          Locate the named attribute and retrieve its value as a string.
private  String parseStringAttrib(EasyNode el, String attribName, boolean useDefault, String defaultVal)
          Locate the named attribute and retrieve its value as a string.
private  String parseStringAttrib(EasyNode el, String attribName, String defaultVal)
          Locate the named attribute and retrieve its value as a string.
private  SpanQuery parseSubDocument(EasyNode parent, String field, int maxSnippets)
          Parse a 'subDocument' query element, if one is present.
private  Term parseTerm(EasyNode parent, String field, String expectedName)
          Parses a 'term' element.
private  SpanQuery processSpanJoin(String name, Vector subVec, Vector notVec, int maxSnippets)
          Joins a number of span queries together using a span query.
(package private)  SpanQuery processSpanNots(SpanQuery query, Vector notClauses, int maxSnippets)
          If any 'not' clauses are present, this builds a query that filters them out of the main query.
private  Query simplifyBooleanQuery(BooleanQuery bq)
          Simplify a BooleanQuery that contains other BooleanQuery/ies with the same type of clauses.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

req

private QueryRequest req
Partially parsed request in progress


baseDir

private File baseDir
Keeps track of the servlet base directory, used to map relative file paths.


config

private Configuration config
Configuration object used when building trees (only created if necessary.)


topNode

private NodeInfo topNode
The top-level source node


specifiedGlobalAttrs

private HashSet specifiedGlobalAttrs
Global attributes that were actually specified in the query


groupSpecs

private Vector groupSpecs
Accumulated list of grouping specifications


DEFAULT_MAX_SNIPPETS

private static final int DEFAULT_MAX_SNIPPETS
Default value for maxSnippets, so we can recognize difference between the default and a user-specified value.

See Also:
Constant Field Values
Constructor Detail

QueryRequestParser

public QueryRequestParser()
Method Detail

parseRequest

public QueryRequest parseRequest(Source queryDoc,
                                 File baseDir,
                                 String defaultIndexPath)
                          throws QueryGenException,
                                 QueryRequestParser.QueryFormatError
Produce a Lucene query from the intermediate format that is normally produced by the formatting stylesheet. Includes setting a default indexPath, so the query doesn't have to contain one internally.

Parameters:
queryDoc - A document containing the query.
Throws:
QueryGenException
QueryRequestParser.QueryFormatError

parseRequest

public QueryRequest parseRequest(Source queryDoc,
                                 File baseDir)
                          throws QueryGenException,
                                 QueryRequestParser.QueryFormatError
Produce a Lucene query from the intermediate format that is normally produced by the formatting stylesheet.

Parameters:
queryDoc - A document containing the query.
Throws:
QueryGenException
QueryRequestParser.QueryFormatError

getSource

public Source getSource()
Get an XML source suitable for re-creating this query


getBaseDir

public File getBaseDir()
Get the base directory from which relative paths are resolved


error

private void error(String message)
            throws QueryGenException
Convenience function to throw a QueryGenException with the given message.

Throws:
QueryGenException

parseOutputTop

private void parseOutputTop(EasyNode output)
                     throws QueryGenException,
                            QueryRequestParser.QueryFormatError
Processes the output of the generator stylesheet, turning it into a Lucene query.

Parameters:
output - The stylesheet output, whose first (and only) child should be a 'query' element.
Throws:
QueryGenException
QueryRequestParser.QueryFormatError

parseOutput

private void parseOutput(EasyNode main)
Processes the main query node, turning it into a Lucene query.

Parameters:
main - The 'query' element

parseFacetSpec

void parseFacetSpec(EasyNode el)
Parses a 'facet' element and adds a FacetSpec to the query.

Parameters:
el - The 'facet' element to parse

parseSpellcheck

void parseSpellcheck(EasyNode el)
Parses a 'spellcheck' element and adds a SpellcheckParams to the query.

Parameters:
el - The 'spellcheck' element to parse

parseQuery

private Query parseQuery(EasyNode parent,
                         String field,
                         int maxSnippets)
                  throws QueryGenException
Recursively parse a query.

Throws:
QueryGenException

parseQuery2

private Query parseQuery2(EasyNode parent,
                          String name,
                          String field,
                          int maxSnippets)
                   throws QueryGenException
Main work of recursively parsing a query.

Throws:
QueryGenException

parseMultiFieldQuery

private Query parseMultiFieldQuery(EasyNode parent,
                                   String field,
                                   int maxSnippets)
Parse a 'keyword' query, known internally as a multi-field AND.


createMultiFieldQuery

private Query createMultiFieldQuery(EasyNode parent,
                                    String[] fields,
                                    float[] boosts,
                                    SpanQuery[] spanQueries,
                                    Vector<Query> notVec,
                                    int slop,
                                    int maxMetaSnippets,
                                    int maxTextSnippets)
Does the work of creating the guts of a keyword query.


simplifyBooleanQuery

private Query simplifyBooleanQuery(BooleanQuery bq)
Simplify a BooleanQuery that contains other BooleanQuery/ies with the same type of clauses. If there's any boosting involved, don't do the optimization.


parseMainAttrib

void parseMainAttrib(EasyNode el,
                     String attrName,
                     String val)
Parse an attribute on the main query element (or, for backward compatability, on its immediate children.) If the attribute isn't recognized, an error exception is thrown.


parseSectionType

private SpanQuery parseSectionType(EasyNode parent,
                                   String field,
                                   int maxSnippets)
                            throws QueryGenException
Parse a 'sectionType' query element, if one is present. If not, simply returns null.

Throws:
QueryGenException

parseSubDocument

private SpanQuery parseSubDocument(EasyNode parent,
                                   String field,
                                   int maxSnippets)
                            throws QueryGenException
Parse a 'subDocument' query element, if one is present. If not, simply returns null.

Throws:
QueryGenException

parseField

private String parseField(EasyNode el,
                          String parentField)
                   throws QueryGenException
If the given element has a 'field' attribute, return its value; otherwise return 'parentField'. Also checks that field cannot be specified if parentField has already been.

Throws:
QueryGenException

processSpanJoin

private SpanQuery processSpanJoin(String name,
                                  Vector subVec,
                                  Vector notVec,
                                  int maxSnippets)
Joins a number of span queries together using a span query.

Parameters:
name - 'and', 'or', 'near', etc.
subVec - Vector of sub-clauses
notVec - Vector of not clauses (may be empty)
Returns:
A new Span query joining the sub-clauses.

deChunk

public static Query deChunk(Query q)
Ensures that the given query, if it is a span query on the "text" field, is wrapped by a de-chunking query.


isWildcardTerm

private boolean isWildcardTerm(Term term)
Determines if the term contains a wildcard character ('*' or '?')


parseRange

private Query parseRange(EasyNode parent,
                         String field,
                         int maxSnippets)
                  throws QueryGenException
Parse a range query.

Throws:
QueryGenException

processSpanNots

SpanQuery processSpanNots(SpanQuery query,
                          Vector notClauses,
                          int maxSnippets)
If any 'not' clauses are present, this builds a query that filters them out of the main query.


makeProxQuery

Query makeProxQuery(EasyNode parent,
                    int slop,
                    String field,
                    int maxSnippets)
              throws QueryGenException
Generate a proximity query on a field. This uses the de-duplicating span system.

Parameters:
parent - The element containing the field name and terms.
Throws:
QueryGenException

parseMoreLike

private Query parseMoreLike(EasyNode parent,
                            String field,
                            int maxSnippets)
Parses a "more like this" query.


parseFieldNames

private String[] parseFieldNames(EasyNode parent,
                                 String attrName)
Parse a list of field names. They can be separated by spaces, tabs, commas, semicolons, or pipe symbols.

Parameters:
parent - Node to look at
attrName - Attribute to get the list from
Returns:
Array of field names, or null if none.

parseFieldBoosts

private float[] parseFieldBoosts(EasyNode parent,
                                 String attrName)
Parse a list of field boosts. They can be separated by spaces, tabs, commas, semicolons, or pipe symbols.

Parameters:
parent - Node to look at
attrName - Attribute to get the list from
Returns:
Array of field boosts, or null if none.

parseTerm

private Term parseTerm(EasyNode parent,
                       String field,
                       String expectedName)
                throws QueryGenException
Parses a 'term' element. If not so marked, an exception is thrown.

Parameters:
parent - The element to parse
Throws:
QueryGenException

getText

private String getText(EasyNode el)
                throws QueryGenException
Ensures that the element has only a single child node (ignoring attributes), and that it's a text node.

Parameters:
el - The element to get the text of
Returns:
The string value of the text
Throws:
QueryGenException

onceOnlyAttrib

private int onceOnlyAttrib(int oldVal,
                           EasyNode el,
                           String attribName)
Like parseIntAttrib(), but adds additional processing to ensure that global parameters are only specified once (or if multiple times, that the same value is used each time.)

Parameters:
oldVal - Current value of the global parameter
el - Element to get the attribute from
attribName - Name of the attribute
Returns:
New value for the parameter

onceOnlyAttrib

private String onceOnlyAttrib(String oldVal,
                              EasyNode el,
                              String attribName)
Like parseStringAttrib(), but adds additional processing to ensure that global parameters are only specified once (or if multiple times, that the same value is used each time.)

Parameters:
oldVal - Current value of the global parameter
el - Element to get the attribute from
attribName - Name of the attribute
Returns:
New value for the parameter

onceOnlyPath

private String onceOnlyPath(String oldVal,
                            EasyNode el,
                            String attribName)
Like onceOnlyAttrib(), but also ensures that the given file can actually be resolved as a path that can be read.

Parameters:
oldVal - Current value of the global parameter
el - Element to get the attribute from
attribName - Name of the attribute
Returns:
New value for the parameter

parseIntAttrib

private int parseIntAttrib(EasyNode el,
                           String attribName)
                    throws QueryGenException
Locate the named attribute and retrieve its value as an integer. If not found, an error exception is thrown.

Parameters:
el - Element to search
attribName - Attribute to find
Throws:
QueryGenException

parseIntAttrib

private int parseIntAttrib(EasyNode el,
                           String attribName,
                           int defaultVal)
                    throws QueryGenException
Locate the named attribute and retrieve its value as an integer. If not found, return a default value.

Parameters:
el - EasyNode to search
attribName - Attribute to find
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseIntAttrib

private int parseIntAttrib(EasyNode el,
                           String attribName,
                           boolean useDefault,
                           int defaultVal)
                    throws QueryGenException
Locate the named attribute and retrieve its value as an integer. Handles default processing if requested.

Parameters:
el - EasyNode to search
attribName - Attribute to find
useDefault - true to supply a default value if none found, false to throw an exception if not found.
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseFloatAttrib

private float parseFloatAttrib(EasyNode el,
                               String attribName)
                        throws QueryGenException
Locate the named attribute and retrieve its value as a float. If not found, an error exception is thrown.

Parameters:
el - Element to search
attribName - Attribute to find
Throws:
QueryGenException

parseFloatAttrib

private float parseFloatAttrib(EasyNode el,
                               String attribName,
                               float defaultVal)
                        throws QueryGenException
Locate the named attribute and retrieve its value as a float. If not found, return a default value.

Parameters:
el - EasyNode to search
attribName - Attribute to find
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseFloatAttrib

private float parseFloatAttrib(EasyNode el,
                               String attribName,
                               boolean useDefault,
                               float defaultVal)
                        throws QueryGenException
Locate the named attribute and retrieve its value as a float. Negative values are not allowed. Handles default processing if requested.

Parameters:
el - EasyNode to search
attribName - Attribute to find
useDefault - true to supply a default value if none found, false to throw an exception if not found.
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseBooleanAttrib

private boolean parseBooleanAttrib(EasyNode el,
                                   String attribName)
                            throws QueryGenException
Locate the named attribute and retrieve its value as an boolean. If not found, an error exception is thrown.

Parameters:
el - Element to search
attribName - Attribute to find
Throws:
QueryGenException

parseBooleanAttrib

private boolean parseBooleanAttrib(EasyNode el,
                                   String attribName,
                                   boolean defaultVal)
                            throws QueryGenException
Locate the named attribute and retrieve its value as an boolean. If not found, return a default value.

Parameters:
el - EasyNode to search
attribName - Attribute to find
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseBooleanAttrib

private boolean parseBooleanAttrib(EasyNode el,
                                   String attribName,
                                   boolean useDefault,
                                   boolean defaultVal)
                            throws QueryGenException
Locate the named attribute and retrieve its value as an boolean. Handles default processing if requested.

Parameters:
el - EasyNode to search
attribName - Attribute to find
useDefault - true to supply a default value if none found, false to throw an exception if not found.
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException

parseStringAttrib

private String parseStringAttrib(EasyNode el,
                                 String attribName)
                          throws QueryGenException
Locate the named attribute and retrieve its value as a string. If not found, an error exception is thrown.

Parameters:
el - EasyNode to search
attribName - Attribute to find
Throws:
QueryGenException

parseStringAttrib

private String parseStringAttrib(EasyNode el,
                                 String attribName,
                                 String defaultVal)
                          throws QueryGenException
Locate the named attribute and retrieve its value as a string. If not found, return a default value.

Parameters:
el - EasyNode to search
attribName - Attribute to find
defaultVal - If not found, return this value.
Throws:
QueryGenException

parseStringAttrib

private String parseStringAttrib(EasyNode el,
                                 String attribName,
                                 boolean useDefault,
                                 String defaultVal)
                          throws QueryGenException
Locate the named attribute and retrieve its value as a string. Handles default processing if requested.

Parameters:
el - EasyNode to search
attribName - Attribute to find
useDefault - true to supply a default value if none found, false to throw an exception if not found.
defaultVal - If not found and useDefault is true, return this value.
Throws:
QueryGenException