org.cdlib.xtf.textIndexer
Class XMLConfigParser

Object
  extended by DefaultHandler
      extended by XMLConfigParser
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class XMLConfigParser
extends DefaultHandler

This class parses TextIndexer configuration XML files.

The TextIndexer uses a configuration file that describes one or more index names. Each index description identifies the source text and Lucene database directories associated with the index, and the chunk size and overlap for the index.

The format of the configuration file is as follows:

<?xml version="1.0" encoding="utf-8"?>
<textIndexer-config>

<index name="IndexName">
<db path="LuceneIndexPath"/>
<src path="XMLSourcePath"/>
<chunk size="ChunkSize" overlap="ChunkOverlap"/>
<skip files= "*.xxx*, *.yyy, ... "/>
<inputfilter path="XSLPreFilterFile"/>

</index>

</textIndexer-config>
The arguments should appear at most once for each index specified. If multiple instances of the arguments are specified for an index, the last one is used.

A simple example of a TextIndexer config file might look as follows:

<?xml version="1.0" encoding="utf-8"?>
<textIndexer-config>
<index name="AllText">
<db path="./IndexDBs"/>
<src path="./SourceText"/>
<chunk size="100" overlap="50"/>
<skip files="*.mets*, *AuthMech*"/>
<inputfilter path="./BasicFilter.xsl"/>
</index>

</textIndexer-config>

Notes:
This class is derived from the SAX DefaultHandler class so that its startElement() and endElement() methods can be called internally from the Java SAXParser class.

To use this class, simply instantiate a copy, and then call its configure() method.


Field Summary
private  IndexerConfig configInfo
           
private  boolean indexNameFound
           
private  boolean inNamedIndexBlock
           
private  boolean isConfigFile
           
 
Constructor Summary
XMLConfigParser()
           
 
Method Summary
 int configure(IndexerConfig cfgInfo)
          This method parses a config file and stores the resulting parameters in a config info structure.
 void endElement(String uri, String localName, String qName)
          Methed called when the end tag is encountered in the config file.
 void startElement(String uri, String localName, String qName, Attributes atts)
          Methed called when the start tag is encountered in the config file.
 
Methods inherited from class DefaultHandler
characters, endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

isConfigFile

private boolean isConfigFile

indexNameFound

private boolean indexNameFound

inNamedIndexBlock

private boolean inNamedIndexBlock

configInfo

private IndexerConfig configInfo
Constructor Detail

XMLConfigParser

public XMLConfigParser()
Method Detail

configure

public int configure(IndexerConfig cfgInfo)
              throws Exception
This method parses a config file and stores the resulting parameters in a config info structure.

To read indexing configuration info, create an instance of this class and call this method with the path/name of the config file to read.

Parameters:
cfgInfo - Upon entry, a config structure with the path/name of the config file to read in the cfgFilePath field.

Upon return, the same config structure with parameter values from the config file stored in their respective fields.

Throws:
Exception - Any internal exceptions generated while parsing the configuration file.

Notes:
The format of the XML file is explained in greater detail in the description for the XMLConfigParser class.


startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Methed called when the start tag is encountered in the config file.

This class is derived from the SAX DefaultHandler class so that the parser can call this method each time a start tag is encountered in the XML config file.

Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class DefaultHandler
Parameters:
uri - The current namespace URI in use.
localName - The local name (i.e., without prefix) of the current element, or the empty string if namespace processing is disabled.
qName - The qualified name (i.e., with prefix) for the current element, or the empty string if qualified names are disabled.
atts - The specified or defaulted arguments for the current element. These consist of any xxx = "yyy" style arguments for the element within the < and >.

Throws:
SAXException - Any internal exceptions generated due to syntax problems in the element.

Notes:
For an explanation of the config file format, see the main description for the XMLConfigParser class.


endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Methed called when the end tag is encountered in the config file.

This class is derived from the SAX DefaultHandler class so that the parser can call this method each time an end tag is encountered in the XML config file.

Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class DefaultHandler
Parameters:
uri - The current namespace URI in use.
localName - The local name (i.e., without prefix) of the current element, or the empty string if namespace processing is disabled.
qName - The qualified name (i.e., with prefix) for the current element, or the empty string if qualified names are disabled.
Throws:
SAXException - If any internal exceptions generated due to syntax problems in the element.

Notes:
For an explanation of the config file format, see the main description for the XMLConfigParser class.