org.cdlib.xtf.textEngine
Class XtfBigramQueryRewriter

Object
  extended by QueryRewriter
      extended by BigramQueryRewriter
          extended by XtfBigramQueryRewriter

public class XtfBigramQueryRewriter
extends BigramQueryRewriter

Rewrites a query to eliminate stop words by combining them with adjacent non-stop-words, forming "bi-grams". This is a fairly in-depth process, as bi-gramming across NEAR and OR queries is complex.


Nested Class Summary
 
Nested classes/interfaces inherited from class QueryRewriter
QueryRewriter.SpanClauseJoiner
 
Field Summary
static Tester tester
          Basic regression test
private  Set tokenizedFields
           
 
Fields inherited from class BigramQueryRewriter
maxSlop, removedTerms, stopSet
 
Constructor Summary
XtfBigramQueryRewriter(Set stopSet, int maxSlop, Set tokFields)
          Constructs a rewriter using the given stopword set.
 
Method Summary
protected  Query rewrite(MoreLikeThisQuery mlt)
          Rewrite a "more like this" query
protected  Query rewrite(NumericRangeQuery nrq)
          Rewrite a numeric range query
protected  Query rewrite(SpanExactQuery q)
          Rewrite a span EXACT query.
protected  Query rewrite(SpanSectionTypeQuery stq)
          Rewrite a section type query.
 Query rewriteQuery(Query q)
          Rewrite a query of any supported type.
 
Methods inherited from class BigramQueryRewriter
bigramQueries, bigramTermsExact, bigramTermsInexact, convertToSpanQuery, extractTerm, extractTermText, glomInside, glomInside, glomInside, glomQueries, isBigram, makeStopSet, newTerm, reduceBoost, rewrite, rewrite, rewrite, rewrite, rewriteClauses
 
Methods inherited from class QueryRewriter
combineBoost, copyBoost, copyBoost, forceRewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewriteClauses
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizedFields

private Set tokenizedFields

tester

public static final Tester tester
Basic regression test

Constructor Detail

XtfBigramQueryRewriter

public XtfBigramQueryRewriter(Set stopSet,
                              int maxSlop,
                              Set tokFields)
Constructs a rewriter using the given stopword set.

Parameters:
stopSet - Set of stopwords to remove or bi-gram. This can be constructed easily by calling BigramQueryRewriter.makeStopSet(String).
maxSlop - Maximum slop to allow in a query, based on the index being queried.
tokFields - List of fields that are tokenized. We won't rewrite queries for non-tokenized fields.
Method Detail

rewriteQuery

public Query rewriteQuery(Query q)
Rewrite a query of any supported type. Stop words will either be removed or bi-grammed. Skips all queries for un-tokenized fields.

Overrides:
rewriteQuery in class QueryRewriter
Parameters:
q - Query to rewrite
Returns:
A new query, or 'q' unchanged if no change was needed.

rewrite

protected Query rewrite(SpanSectionTypeQuery stq)
Rewrite a section type query. If's very simple: simply rewrite the sub-queries.

Parameters:
stq - The query to rewrite
Returns:
Rewritten version, or 'nq' unchanged if no changed needed.

rewrite

protected Query rewrite(SpanExactQuery q)
Rewrite a span EXACT query. Stop words will be bi-grammed into adjacent terms.

Parameters:
q - The query to rewrite
Returns:
Rewritten version, or 'q' unchanged if no changed needed.

rewrite

protected Query rewrite(MoreLikeThisQuery mlt)
Rewrite a "more like this" query


rewrite

protected Query rewrite(NumericRangeQuery nrq)
Rewrite a numeric range query