See: Description
Class | Description |
---|---|
AccentFoldingFilter |
Improves query results by converting accented characters to normal
characters by removing diacritics.
|
CrimsonBugWorkaround |
There's a very nasty bug in the Apache Crimson XML parser.
|
CrimsonBugWorkaround.BlockEnum |
Presents the input stream as a series of blocks of data
|
DocSelCache |
This class represents the contents of the Document Selector Cache maintained
by the indexer.
|
DocSelCache.Entry |
One entry in the docSelector cache
|
FacetTokenizer |
Performs special tokenization for facet fields.
|
HTMLIndexSource |
Transforms an HTML file to a single-record XML file.
|
HTMLToString |
This class provides a single static
convert()
method that converts an HTML file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
IdxTreeCleaner |
This class purges "incomplete" documents from a Lucene index.
|
IdxTreeCuller |
This class provides a simple mechanism for removing documents from an index
when the source text no longer exists in the document library.
|
IdxTreeDictMaker |
This class provides a simple mechanism for generating a spelling correction
dictionary after new documents have been added or updated.
|
IdxTreeOptimizer |
This class provides a simple mechanism for optimizing Lucene indices
after new documents have been added , updated, or removed.
|
IndexDump |
This class dumps the contents of user-selected fields from an XTF text
index.
|
IndexerConfig |
This class records configuration information about the current state of
the TextIndexer application.
|
IndexInfo |
This class maintains configuration information about the current index that
the TextIndexer program is processing.
|
IndexMerge |
This class merges the contents of two or more XTF indexes, with certain
caveats.
|
IndexMerge.DirInfo | |
IndexRecord |
A single record within a
IndexSource . |
IndexSource |
Represents a single source of data for an XTF index.
|
IndexStats |
This class calculates and prints out some useful statistics about an
existing index, such as number of documents, size, etc.
|
IndexSync |
Takes care of copying the differences between a source index and a dest
index to make them exactly equal.
|
MARCIndexSource |
Supplies MARC data to an XTF index, breaking it up into individual MARCXML
records.
|
MSWordIndexSource |
Transforms a Microsoft Word file to a single-record XML file.
|
PDFIndexSource |
Transforms a PDF file to a single-record XML file.
|
PDFToString |
This class provides a single static
convert()
method that converts the text in a PDF file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
PluralFoldingFilter |
Improves query results by converting plural words to their singular
forms.
|
SectionInfo |
This class maintains information about the current section in a text
document that the TextIndexer program is processing.
|
SectionInfoStack |
This class maintains information about the current nesting of sections
in a text document that the TextIndexer program is processing.
|
SpellWritingFilter |
Adds words from the token stream to a SpellWriter.
|
SrcTreeProcessor |
This class is the main processing shell for files in the source text
tree.
|
StartEndFilter |
Ensures that the tokens at the start and end of the stream are indexed both
with and without the special start-of-field/end-of-field markers.
|
StructuredFileProxy |
Used to put off actually creating a structured store until it is needed.
|
TagFilter |
Spots XML elements in a token stream and marks them specially.
|
TextIndexer |
This class is the main class for the TextIndexer program.
|
TextIndexSource |
Transforms an HTML file to a single-record XML file.
|
UnicodeNormalizingFilter |
Apply Unicode Normalization to the tokens.
|
XMLConfigParser |
This class parses TextIndexer configuration XML files.
|
XMLIndexSource |
Supplies a single file containing a single record to the
XMLTextProcessor . |
XMLTextProcessor |
This class performs the actual parsing of the XML source text files and
generates index information for it.
|
XtfSpecialTokensFilter |
The
XtfSpecialTokensFilter class is used by the
XTFTextAnalyzer class to convert special "bump" count values in
text chunks to actual position increments for words prior to adding them
to a Lucene index. |
XTFTextAnalyzer |
The
XTFTextAnalyzer class performs the task of breaking up a
contiguous chunk of text into a list of separate words (tokens
in Lucene parlance.) |
Exception | Description |
---|---|
TextIndexerException |
This exception is thrown by classes related to the textIndexer tool.
|
Contains all the classes that make up the textIndexer tool.
The TextIndexer class is the main command-line interface, while XMLTextProcessor does most of the heavy lifting (scanning documents, breaking them into chunks, passing the chunks to Lucene.)