org.cdlib.xtf.textEngine
Class XtfSearcher

Object
  extended by XtfSearcher

public class XtfSearcher
extends Object

Used to keep a set of searcher, reader, and doc-num-map that are consistent with each other and also up-to-date.

Author:
Martin Haye

Field Summary
private  CharMap accentMap
          Map of accented chars to remove diacritics from
private  int chunkOverlap
          Amount of overlap, in words, between adjacent chunks
private  int chunkSize
          Max # of words in a chunk
private  long curVersion
          Version number of the index in memory
private  Directory directory
          The index directory to read from
private  DocNumMap docNumMap
          Keeps track of which chunks belong to which documents
private  Set indexedFields
          Set of all indexed fields in the index
private  String indexPath
          Path to the index directory
private  IndexReader indexReader
          Reader used to access the index
private  boolean isSparse
          Whether this index is "sparse" (i.e. more than 5 chunks per doc)
private  long lastCheckTime
          Last time we checked for out-of-date
private  long newVersion
          Version number of index on disk
private  WordMap pluralMap
          Map of plural words to singular words
private  SpellReader spellReader
          Fetching spelling suggestions
private  Set stopSet
          Stop-words associated with the index (e.g.
private  Set tokenizedFields
          Set of all fields which are tokenized in the index
private  long updatePeriod
          How often to check for an out-of-date directory
 
Constructor Summary
XtfSearcher(String indexPath, Directory dir, int updateCheckSeconds)
          Construct a searcher set on the given directory.
XtfSearcher(String indexPath, int updateCheckSeconds)
          Construct a searcher set on the given directory.
 
Method Summary
 CharMap accentMap()
          Find out the accent mapping, or null if none.
 int chunkOverlap()
          Find out how many words adjacent chunks can overlap.
 int chunkSize()
          Find out how many words (max) are in a chunk.
 void close()
          Close down the searcher and all its dependencies.
 DocNumMap docNumMap()
          Gets a map for translating chunk IDs to document IDs (and vice-versa)
 Set indexedFields()
          Gets the set of all fields that have been indexed.
 IndexReader indexReader()
          Gets the reader this searcher is using to read indexes.
 boolean isSparse()
          Find out if the index is sparse (i.e. more than 5 chunks per doc)
 boolean isUpToDate()
          Check if the version we have in memory is up-to-date relative to that on disk.
 WordMap pluralMap()
          Find out the plural mapping, or null if none.
static LinkedHashSet readTokenizedFields(String indexPath, IndexReader indexReader)
          Read in the list of fields that are tokenized in this index.
 SpellReader spellReader()
           
 Set stopSet()
          Find out the set of stop words, or null if none.
 Set tokenizedFields()
          Get the list of all tokenized fields.
 void update()
          Ensures that this searcher is up-to-date with regards to the index on disk.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

indexPath

private String indexPath
Path to the index directory


directory

private Directory directory
The index directory to read from


updatePeriod

private long updatePeriod
How often to check for an out-of-date directory


lastCheckTime

private long lastCheckTime
Last time we checked for out-of-date


curVersion

private long curVersion
Version number of the index in memory


newVersion

private long newVersion
Version number of index on disk


indexReader

private IndexReader indexReader
Reader used to access the index


docNumMap

private DocNumMap docNumMap
Keeps track of which chunks belong to which documents


spellReader

private SpellReader spellReader
Fetching spelling suggestions


chunkSize

private int chunkSize
Max # of words in a chunk


chunkOverlap

private int chunkOverlap
Amount of overlap, in words, between adjacent chunks


stopSet

private Set stopSet
Stop-words associated with the index (e.g. "the", "a", "and", etc.)


pluralMap

private WordMap pluralMap
Map of plural words to singular words


accentMap

private CharMap accentMap
Map of accented chars to remove diacritics from


indexedFields

private Set indexedFields
Set of all indexed fields in the index


tokenizedFields

private Set tokenizedFields
Set of all fields which are tokenized in the index


isSparse

private boolean isSparse
Whether this index is "sparse" (i.e. more than 5 chunks per doc)

Constructor Detail

XtfSearcher

public XtfSearcher(String indexPath,
                   int updateCheckSeconds)
            throws IOException
Construct a searcher set on the given directory.

Parameters:
indexPath - Directory to load index data from
updateCheckSeconds - How often to check for an updated index
Throws:
IOException

XtfSearcher

public XtfSearcher(String indexPath,
                   Directory dir,
                   int updateCheckSeconds)
            throws IOException
Construct a searcher set on the given directory.

Parameters:
indexPath - Path to index directory
dir - Lucene version of the index directory
updateCheckSeconds - How often to check for an updated index
Throws:
IOException
Method Detail

isUpToDate

public boolean isUpToDate()
                   throws IOException
Check if the version we have in memory is up-to-date relative to that on disk.

Throws:
IOException

update

public void update()
            throws IOException
Ensures that this searcher is up-to-date with regards to the index on disk.

Throws:
IOException

readTokenizedFields

public static LinkedHashSet readTokenizedFields(String indexPath,
                                                IndexReader indexReader)
                                         throws IOException
Read in the list of fields that are tokenized in this index.

Throws:
IOException

tokenizedFields

public Set tokenizedFields()
Get the list of all tokenized fields.


indexReader

public IndexReader indexReader()
Gets the reader this searcher is using to read indexes.


indexedFields

public Set indexedFields()
Gets the set of all fields that have been indexed.


docNumMap

public DocNumMap docNumMap()
Gets a map for translating chunk IDs to document IDs (and vice-versa)


chunkSize

public int chunkSize()
Find out how many words (max) are in a chunk.


chunkOverlap

public int chunkOverlap()
Find out how many words adjacent chunks can overlap.


stopSet

public Set stopSet()
Find out the set of stop words, or null if none.


pluralMap

public WordMap pluralMap()
Find out the plural mapping, or null if none.


accentMap

public CharMap accentMap()
Find out the accent mapping, or null if none.


spellReader

public SpellReader spellReader()

isSparse

public boolean isSparse()
Find out if the index is sparse (i.e. more than 5 chunks per doc)


close

public void close()
           throws IOException
Close down the searcher and all its dependencies.

Throws:
IOException