org.cdlib.xtf.textEngine
Class UnicodeNormalizingRewriter

Object
  extended by QueryRewriter
      extended by XtfQueryRewriter
          extended by UnicodeNormalizingRewriter

public class UnicodeNormalizingRewriter
extends XtfQueryRewriter

Rewrites a Lucene query to replace all non-normalized words (i.e. not encoded in Normalized-Form-C) with normalized ones. For instance, many diacritics actually need to be combined with their main letter rather than as separate combining marks.

Author:
Martin Haye

Nested Class Summary
 
Nested classes/interfaces inherited from class QueryRewriter
QueryRewriter.SpanClauseJoiner
 
Field Summary
private  FastCache<String,String> cache
          Keep a cache of lookups performed to-date
private static int CACHE_SIZE
          How many recent mappings to maintain
private  Set tokenizedFields
          Set of fields that are tokenized in the index
 
Constructor Summary
UnicodeNormalizingRewriter(Set tokFields)
          Construct a new rewriter.
 
Method Summary
protected  Query rewrite(SpanTermQuery q)
          Rewrite a span term query.
protected  Query rewrite(SpanWildcardQuery q)
          Rewrite a wildcard term query.
protected  Query rewrite(TermQuery q)
          Rewrite a term query.
 
Methods inherited from class XtfQueryRewriter
rewrite, rewrite, rewrite, rewrite, rewriteQuery
 
Methods inherited from class QueryRewriter
combineBoost, copyBoost, copyBoost, forceRewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewriteClauses
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CACHE_SIZE

private static final int CACHE_SIZE
How many recent mappings to maintain

See Also:
Constant Field Values

cache

private FastCache<String,String> cache
Keep a cache of lookups performed to-date


tokenizedFields

private Set tokenizedFields
Set of fields that are tokenized in the index

Constructor Detail

UnicodeNormalizingRewriter

public UnicodeNormalizingRewriter(Set tokFields)
Construct a new rewriter. Will only operate on tokenized fields.

Method Detail

rewrite

protected Query rewrite(TermQuery q)
Rewrite a term query. This is only called for artificial queries introduced by XTF system itself, and therefore we don't map here.

Overrides:
rewrite in class QueryRewriter
Parameters:
q - The query to rewrite
Returns:
Rewritten version, or 'q' unchanged if no changed needed.

rewrite

protected Query rewrite(SpanTermQuery q)
Rewrite a span term query. Normalizes Unicode to NFC.

Overrides:
rewrite in class QueryRewriter
Parameters:
q - The query to rewrite
Returns:
Rewritten version, or 'q' unchanged if no changed needed.

rewrite

protected Query rewrite(SpanWildcardQuery q)
Rewrite a wildcard term query. Normalizes Unicode encoding to NFC in all words.

Overrides:
rewrite in class QueryRewriter
Parameters:
q - The query to rewrite
Returns:
Rewritten version, or 'q' unchanged if no changed needed.