org.cdlib.xtf.textEngine
Class DefaultQueryProcessor

Object
  extended by QueryProcessor
      extended by DefaultQueryProcessor

public class DefaultQueryProcessor
extends QueryProcessor

Takes a QueryRequest, rewrites the queries if necessary to remove stop- words and form bi-grams, then consults the index(es), and produces a QueryResult.

Author:
Martin Haye

Nested Class Summary
private static class DefaultQueryProcessor.DocHitMakerImpl
           
private static class DefaultQueryProcessor.HitQueueMakerImpl
           
 
Field Summary
private  CharMap accentMap
          Mapping of accented chars to chars without diacritics
private  int chunkOverlap
          Number of words a chunk shares with its successor
private  int chunkSize
          Max size of a chunk (in words)
private static FlippableStringComparator compactStringComparator
          Comparator used for sorting strings in "compact" indexes
private  DocNumMap docNumMap
          Keeps track of which chunks belong to which documents
private  float docScoreNorm
          Document normalization factor (calculated from maxDocScore)
private  IndexReader indexReader
          Lucene reader from which to read index data
private  IndexWarmer indexWarmer
          Used to warm up indexes prior to use
private  boolean isSparse
          Whether the index is "sparse" (i.e. more than 5 chunks per doc)
private  float maxDocScore
          Maximum document score (used to normalize scores)
private  int nDocsHit
          Total number of documents hit (not just those that scored high)
private  WordMap pluralMap
          Mapping of plural words to singular words
private static HashMap searchers
          Map of all XtfSearchers, so we can re-use them
private static SparseStringComparator sparseStringComparator
          Comparator used for sorting strings in "sparse" indexes
private  SpellReader spellReader
          Fetches spelling suggestions
private  Set stopSet
          Stop-words to remove (e.g.
private  Set tokFields
          Names of fields that are tokenized in this index
private static TotalHitsComparator totalHitsComparator
          Comparator used to sort by total number of hits
 
Constructor Summary
DefaultQueryProcessor()
           
 
Method Summary
private  float applyBoost(int doc, float score, BoostSet boostSet, QueryRequest req)
          If a boost set was specified, boost the given document's score according to the set.
private  GroupData createDynamicGroup(IndexReader indexReader, String field)
          Create a dynamic group based on a field specification.
private static PriorityQueue createHitQueue(IndexReader reader, int inSize, String sortFields, boolean isSparse)
          Creates either a standard score-sorting hit queue, or a field-sorting hit queue, depending on whether the query is to be sorted.
private  void finishGroup(ResultGroup group, SnippetMaker snippetMaker, QueryRequest req, Weight weight, BoostSet boostSet)
          Finishes DocHits within a single group (also processes all its descendant groups.)
private  LinkedHashMap gatherKeywords(Query query, Set desiredFields)
          Make a list of all the terms present in the given query, grouped by field set.
private  GroupCounts[] prepGroups(QueryRequest req, BoostSet boostSet, RecordingSearcher searcher, Query query)
          Create the GroupCounts objects for the given query request.
 QueryResult processRequest(QueryRequest req)
          This is main entry point.
 void resetCache()
          QueryProcessor maintains a static cache of Lucene searchers, one for each index directory.
 void setIndexWarmer(IndexWarmer warmer)
          Record an index warmer to use for background warming.
private  void spellCheck(QueryRequest req, QueryResult res, Set tokFields)
          Checks spelling of query terms, if spelling suggestion is enabled and the result falls below the cutoff threshholds.
private  boolean spellingImprovesResults(QueryRequest origReq, QueryResult origRes, Set spellFieldSet, LinkedHashMap suggs)
          Re-runs the original query, except with terms replaced by their suggestions.
 
Methods inherited from class QueryProcessor
setXtfHome
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

searchers

private static HashMap searchers
Map of all XtfSearchers, so we can re-use them


indexReader

private IndexReader indexReader
Lucene reader from which to read index data


spellReader

private SpellReader spellReader
Fetches spelling suggestions


docNumMap

private DocNumMap docNumMap
Keeps track of which chunks belong to which documents


chunkSize

private int chunkSize
Max size of a chunk (in words)


chunkOverlap

private int chunkOverlap
Number of words a chunk shares with its successor


stopSet

private Set stopSet
Stop-words to remove (e.g. "the", "a", "and", etc.)


pluralMap

private WordMap pluralMap
Mapping of plural words to singular words


accentMap

private CharMap accentMap
Mapping of accented chars to chars without diacritics


isSparse

private boolean isSparse
Whether the index is "sparse" (i.e. more than 5 chunks per doc)


tokFields

private Set tokFields
Names of fields that are tokenized in this index


nDocsHit

private int nDocsHit
Total number of documents hit (not just those that scored high)


maxDocScore

private float maxDocScore
Maximum document score (used to normalize scores)


docScoreNorm

private float docScoreNorm
Document normalization factor (calculated from maxDocScore)


indexWarmer

private IndexWarmer indexWarmer
Used to warm up indexes prior to use


sparseStringComparator

private static final SparseStringComparator sparseStringComparator
Comparator used for sorting strings in "sparse" indexes


compactStringComparator

private static final FlippableStringComparator compactStringComparator
Comparator used for sorting strings in "compact" indexes


totalHitsComparator

private static final TotalHitsComparator totalHitsComparator
Comparator used to sort by total number of hits

Constructor Detail

DefaultQueryProcessor

public DefaultQueryProcessor()
Method Detail

setIndexWarmer

public void setIndexWarmer(IndexWarmer warmer)
Record an index warmer to use for background warming.

Overrides:
setIndexWarmer in class QueryProcessor

processRequest

public QueryResult processRequest(QueryRequest req)
                           throws IOException
This is main entry point. Takes a pre-parsed query request and handles searching the index and forming the results.
This method is synchronized because it uses two instance variables, so access by multiple threads would result in incorrect counting. For maximum efficiency, each thread should really use its own instance.

Specified by:
processRequest in class QueryProcessor
Parameters:
req - The pre-parsed request to process
Returns:
Zero or more document hits
Throws:
IOException

spellCheck

private void spellCheck(QueryRequest req,
                        QueryResult res,
                        Set tokFields)
                 throws IOException
Checks spelling of query terms, if spelling suggestion is enabled and the result falls below the cutoff threshholds.

Parameters:
req - Original query request
res - Results of the query
tokFields - Set of tokenized fields (in case no field list was specified in the query request.)
Throws:
IOException

spellingImprovesResults

private boolean spellingImprovesResults(QueryRequest origReq,
                                        QueryResult origRes,
                                        Set spellFieldSet,
                                        LinkedHashMap suggs)
                                 throws IOException
Re-runs the original query, except with terms replaced by their suggestions. Checks that the results are improved -- at present that means that there are more of them, and their max score is higher.

Parameters:
origReq - Original query request
origRes - Results of the original query
spellFieldSet - Set of fields to rewrite terms within
suggs - Map of terms to their suggested replacements
Returns:
true if the suggestions improve the results.
Throws:
IOException

gatherKeywords

private LinkedHashMap gatherKeywords(Query query,
                                     Set desiredFields)
Make a list of all the terms present in the given query, grouped by field set.

Parameters:
query - The query to traverse
desiredFields - The set of fields to limit to. If null, all fields are considered.
Returns:
An ordered map consisting of entries of a key and a value. The key is an ordered set of field names. The value is an ordered set of words.

prepGroups

private GroupCounts[] prepGroups(QueryRequest req,
                                 BoostSet boostSet,
                                 RecordingSearcher searcher,
                                 Query query)
                          throws IOException
Create the GroupCounts objects for the given query request. Also handles creating the proper hit queue for each one.

Parameters:
req - query request containing group specs
query - query to use to form dynamic groups
searcher - searcher for dynamic groups
boostSet - boost set for dynamic groups
Throws:
IOException

createDynamicGroup

private GroupData createDynamicGroup(IndexReader indexReader,
                                     String field)
                              throws IOException
Create a dynamic group based on a field specification.

Parameters:
indexReader - Where to get the data from
field - Special field name starting with "java:"
Returns:
Dynamic group data
Throws:
IOException

finishGroup

private void finishGroup(ResultGroup group,
                         SnippetMaker snippetMaker,
                         QueryRequest req,
                         Weight weight,
                         BoostSet boostSet)
                  throws IOException
Finishes DocHits within a single group (also processes all its descendant groups.)

Parameters:
group - Group to finish
snippetMaker - Used to make snippets for any DocHits inside the group.
req - Determines whether to finish with 'explain' or not
weight - Used for score explanations
boostSet - Used for score explanations
Throws:
IOException

resetCache

public void resetCache()
QueryProcessor maintains a static cache of Lucene searchers, one for each index directory. If data is changed, normally it's not recognized until a periodic (every 30 seconds) check. Calling this method forces new changes to an index to be immediately recognized.

Overrides:
resetCache in class QueryProcessor

applyBoost

private float applyBoost(int doc,
                         float score,
                         BoostSet boostSet,
                         QueryRequest req)
If a boost set was specified, boost the given document's score according to the set.


createHitQueue

private static PriorityQueue createHitQueue(IndexReader reader,
                                            int inSize,
                                            String sortFields,
                                            boolean isSparse)
                                     throws IOException
Creates either a standard score-sorting hit queue, or a field-sorting hit queue, depending on whether the query is to be sorted.

Parameters:
reader - will be used to read the field contents
inSize - size of the queue (typically startDoc + maxDocs). If this number is >= 999999, an infinitely resizing queue will be created.
sortFields - space or comma delimited list of fields to sort by
isSparse - if index is sparse (i.e. more than 5 chunks per doc)
Returns:
an appropriate hit queue
Throws:
IOException