[ You are here:
XTF ->
Programming -> textIndexer ]
textIndexer Programming
Introduction
The purpose of the
textIndexer tool is to create or update a document search index whenever documents are updated, added to, or removed from the document library. If we would isolate and zoom in on the
textIndexer portion of the XTF Overview Diagram shown in the
Introduction, we'd see something like this:

What the diagram shows, is that the
textIndexer uses a
Document Selector stylesheet to select which files in the document library need to be indexed. For non-XML document files, the text to index is extracted and converted to XML. This base XML is then processed by the
Document Pre-Filter stylesheet to add additional meta-data and/or sectioning information to the text. The resulting filtered XML is then passed on to the actual
Text Indexer Engine, which breaks the text up into smaller overlapping chunks and then adds them to a Lucene based word index. The index can then be used by the
crossQuery servlet to quickly locate files in the document library containing any text requested by the user. Optionally, the
dynaXML servlet can also use the index to highlight any matches in the context of their original XML documents.
The
textIndexer is capable of handling many documents, of various types, that are filtered in different ways. Here is a diagram showing how the decisions are made.

The
textIndexer.conf file, the
Document Selector stylesheet, and the
Pre-filter stylesheet together define how the
textIndexer performs the document indexing process. A complete discussion of the
textIndexer.conf file appears in the
XTF Deployment Guide. The next two subsections discuss the inner workings of the
Document Selector and
Pre-Filter stylesheets.