Package org.cdlib.xtf.textEngine

The XTF Text Engine is responsible for parsing and executing queries against the Lucene index.

See:
          Description

Class Summary
AccentFoldingRewriter Rewrites a Lucene query to replace all accented words with the same word minus diacritics.
BoostSet Holds a set of boost factors to apply to individual documents in the document set.
BoostSetParams Various parameters for applying a boost set to query results.
BoundedMarkPos Helps with marking fields that contain bump markers.
BoundedWordIter Just like a BasicWordIter, except that it enforces "soft" boundaries if the source text contains XTF "bump markers" of a certain size.
ConfigCache Used to maintain a simple cache of config files, so we don't have to keep loading the same one over and over.
ConfigCache.ConfigCacheKey A key in the ConfigCache.
Constants Holds global constants for the XTF text system.
DefaultQueryProcessor Takes a QueryRequest, rewrites the queries if necessary to remove stop- words and form bi-grams, then consults the index(es), and produces a QueryResult.
DefaultQueryProcessor.DocHitMakerImpl  
DefaultQueryProcessor.HitQueueMakerImpl  
DocHit Represents a query hit at the document level.
DocHitImpl Represents a query hit at the document level.
FlippingDirectory Represents a Lucene directory in every way except that it allows the underlying directory pointer to be flipped.
HitQueue  
IndexUtil This class provides methods related to, but not always part of, a text index.
IndexValidator This class performs the validation steps for a specified index, checking that the results are acceptable.
IndexWarmer Handles background warming of new (or changed) indexes, so that servlets can continue serving using their existing index, and switch quickly to the new one when it is ready.
IndexWarmer.BgThread Thread that sits in the background and periodically checks if there are indexes in need of warming, and warms them.
IndexWarmer.Entry An entry mapping indexPath to XtfSearcher
MoreLikeThisQuery Processes the sub-query and uses the first document as the "target".
MoreLikeThisQuery.Flt Used for scores and to avoid renewing Floats.
MoreLikeThisQuery.Int Used for frequencies and to avoid renewing Integers.
MoreLikeThisQuery.QueryWord  
MoreLikeThisQuery.QueryWordQueue PriorityQueue that orders query words by score.
NativeFSDirectory  
NumericFieldData Holds numeric data for a field from a Lucene index.
NumericRangeQuery A query that implements efficient range searching on numeric data.
PluralFoldingRewriter Rewrites a Lucene query to replace all plural words with their singular equivalents.
QueryContext Tracks the context in which a query was executed.
QueryProcessor Takes a QueryRequest, rewrites the queries if necessary to remove stop- words and form bi-grams, then consults the index(es), and produces a QueryResult.
QueryRequest Stores a single query request to be processed by the XTF text engine.
QueryRequestParser Processes URL parameters into a Lucene query, using a stylesheet to perform the heavy lifting.
QueryRequestParser.QueryEntry Keeps track of all the queries for a given field
QueryResult Represents the results of a query.
RefieldingQueryRewriter This class swaps the current field of every sub-query to the specified field.
SlopFixupRewriter Fix up all the "infinite" slop entries to be actually limited to the chunk overlap size.
Snippet Holds all the information regarding a specific text snippet within a document.
SnippetMaker Does the heavy lifting of interpreting span hits using the actual document text stored in the index.
SpanExactQuery Just like a SpanNearQuery with slop set to zero, except that it also looks for the special 'start-of-field' and 'end-of-field' tokens inserted by the text indexer.
SpanSectionTypeQuery Supports sectionType filtering of text chunks.
SpellcheckParams Various parameters that affect spell-checking of query terms.
SpellingSuggestion Contains one or more suggestions for a specific term in a query.
SpellSuggRewriter Rewrites a Lucene query to replace all misspelled words with their suggested replacements.
StdTermFilter Performs standard tokenization activities for terms, such as mapping to lowercase, removing apostrophes, etc.
StdTermRewriter Rewrites a Lucene query to perform standard tokenization actions on each term, such as converting them to lowercase, removing apostrophes, etc.
TotalHitsComparator  
UnicodeNormalizingRewriter Rewrites a Lucene query to replace all non-normalized words (i.e. not encoded in Normalized-Form-C) with normalized ones.
UnspanningQueryRewriter This class converts some common span queries to their faster, non-span equivalents.
XtfBigramQueryRewriter Rewrites a query to eliminate stop words by combining them with adjacent non-stop-words, forming "bi-grams".
XtfChunk Keeps track of the tokens for a chunk, plus node and word offsets
XtfChunkedWordIter Handles iterating over XTF's tokenized documents, including special tracking of node numbers and word offsets.
XtfChunkMarkPos Extends ChunkMarkPos by adding node number, word offset, and section type information.
XtfChunkSource Performs special loading duties for our XTF chunks
XtfDocNumMap Used to map chunk indexes to the corresponding document index, and vice-versa.
XtfLimIndexReader Just like a LimIndexReader except it also does a periodic check if the request has taken too long and should kill itself.
XtfQueryRewriter Utility class for performing external rewriting, or transformation, tasks on Lucene queries.
XtfQueryTraverser Utility class for performing external rewriting, or transformation, tasks on Lucene queries.
XtfSearcher Used to keep a set of searcher, reader, and doc-num-map that are consistent with each other and also up-to-date.
XtfSpanRangeQuery Matches spans containing terms within a specified range.
XtfSpanWildcardQuery Matches spans containing a wildcard term.
XtfWordEquiv Used for eliminating redundant spelling suggestions
 

Exception Summary
BoundedMarkPos.UnmarkableException Exception thrown if asked to mark past XML elements or attributes
HitLoadException Thrown if a problem (most likely an I/O error) occurs while loading a hit.
IndexValidator.ValidationError Internal exception for quickly passing errors up the call chain.
QueryGenException Exception class used to report errors from the query parser stylesheet.
 

Package org.cdlib.xtf.textEngine Description

The XTF Text Engine is responsible for parsing and executing queries against the Lucene index. The actual work of building an index is done by the textIndexer tool.

Here's a breakdown of the Texdt Engine's major functions, and the classes associated with each function:

Query Parsing

Takes care of calling the queryParser stylesheet to transform a URL query request into a strictly structured, XML-formatted query. Also accumulates a list of terms present in the query (useful later for term highlighting.)

Query Processing

These classes implement the main logic of the package, taking a structured query and applying it to the Lucene indexes. They handle stop-word (n-gram) query pre-processing, and scoring and sorting the hits

Limiting Work Performed by a Query

To help ease server load problems due to unwitting or malicious queries, XTF has added an extensive layer of work limiting on top of Lucene.

Retrieving Query Results

Once a query has been performed, the following classes provide access to the document hits, and to text snippets within each document.

Utility classes

These classes don't fit into any other category.