org.cdlib.xtf.textIndexer
Class TextIndexer
Object
TextIndexer
public class TextIndexer
- extends Object
This class is the main class for the TextIndexer program.
Internally, this class retrieves command line arguments, and processes them
in order to index source XML files into one or more Lucene databases. The
command line arguments required by the TextIndexer program are as follows:
TextIndexer -config CfgFilePath
{ {-clean|-incremental}?
{-trace errors|warnings|info|debug}?
-index IndexName }+
The -config
argument identifies an XML configuration file that
defines one or more indices to be created, updated, or deleted. This argument
must be the first argument passed, and it must be passed only once. For a
complete description of the contents of the configuration file, see the
XMLConfigParser
class.
The -clean
/ -incremental
argument is an optional
argument that specifies whether Lucene indices should be rebuilt from scratch
(-clean
) or should be updated (-incremental
). If
this argument is not specified, the default behavior is incremental.
The -buildlazy
/ -nobuildlazy
argument is an
optional argument that specifies whether the indexer should build a
persistent ("lazy") version of each document during the indexing process.
The lazy files are stored in the index directory, and they speed dynaXML
access later. If this argument is not specified, the default behavior is
to build lazy versions of the documents.
The -optimize
/ -nooptimize
argument is an optional
argument that specifies whether the indexer should optimize the indexes after
they are built. Optimization improves query speed, but can take a very long
time to complete depending on the index size. If this argument is not
specified, the default behavior is to optimize.
The -trace
argument is an optional argument that sets the level
of output displayed by the text indexer. The output levels are defined as
follows:
errors
- Only error messages are displayed.
warnings
- Both error and warning messages are displayed.
info
- Error, warning, and informational messages are displayed.
debug
- Low level debug output is displayed in addition to
error, warning and informational messages.
If this argument is not specified, the TextIndexer defaults to displaying
informational (info
) level messages.
The -index
argument identifies the name of the index to be
created/updated. The name must be one of the index names contained in the
configuration file specified as the first parameter. As is mentioned above,
the -config
parameter must be specified first. After that,
the remaining arguments may be used one or more times to update a single
index or multiple indices.
A simple example of a command line parameters for the TextIndexer might
look like this:
TextIndexer -config IdxConfig.xml -clean -index AllText
This example assumes that the config file is called IdxConfig.xml
,
that the config file contains an entry for an index called AllText, and
that the user wants the index to be rebuilt from scratch (because of the
-clean
argument.
Field Summary |
static String |
CURRENT_VERSION
The version of the text indexer (placed into any indexes created |
static String |
REQUIRED_VERSION
The minimum index version that we can read |
Method Summary |
static void |
main(String[] args)
Main entry-point for the Text Indexer. |
Methods inherited from class Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CURRENT_VERSION
public static final String CURRENT_VERSION
- The version of the text indexer (placed into any indexes created
- See Also:
- Constant Field Values
REQUIRED_VERSION
public static final String REQUIRED_VERSION
- The minimum index version that we can read
- See Also:
- Constant Field Values
TextIndexer
public TextIndexer()
main
public static void main(String[] args)
- Main entry-point for the Text Indexer.
This function takes the command line arguments passed and uses them to
create or update the specified indices with the specified source text.
- Parameters:
args
- Command line arguments to process. The command line
arguments required by the TextIndexer program are as follows:
TextIndexer -config CfgFilePath
{ {-clean|-incremental}?
{-trace errors|warnings|info|debug}?
-index IndexName }+
For a complete description of each command line argument, see the
TextIndexer
class description.