org.cdlib.xtf.textEngine
Class IndexUtil

Object
  extended by IndexUtil

public class IndexUtil
extends Object

This class provides methods related to, but not always part of, a text index. For instance, there are methods to calculate document keys (as used in an index), or lazy file paths. It also maintains a publicly accessible cache of index info entries read from the index config file(s).

Author:
Martin Haye

Field Summary
private static ConfigCache configCache
           
private static SAXParserFactory saxParserFactory
           
private static TransformerFactory transformerFactory
           
 
Constructor Summary
IndexUtil()
           
 
Method Summary
static void applyPreFilters(Templates[] prefilterStylesheets, XMLReader reader, InputSource xmlSource, Result ultimateResult)
          Apply one or more prefilter stylesheets to an XML input source.
static String calcDocKey(File xtfHome, File idxConfigFile, String idxName, File srcTextFile)
          Given an index within a config file and the path to the source XML text of a document, this method infers the correct document key that should be stored in the index.
static String calcDocKey(File xtfHomeFile, IndexInfo idxInfo, File srcTextFile)
          Given an index within a config file and the path to the source XML text of a document, this method infers the correct document key that should be stored in the index.
static File calcLazyPath(File xtfHome, File idxConfigFile, String idxName, File srcTextFile, boolean createDir)
          Given an index within a config file and the path to the source XML text of a document, this method infers the correct path to the lazy version of that source document.
static File calcLazyPath(File xtfHome, IndexInfo idxInfo, File srcTextFile, boolean createDir)
          Given an index within a config file and the path to the source XML text of a document, this method infers the correct path to the lazy version of that source document.
static SAXParser createSAXParser()
          Create a SAX parser using the best implementation we can find.
static Transformer createTransformer()
          Create a Saxon transformer.
static XMLReader createXMLReader()
          Create an XML reader using the best implementation we can find.
static InputStream filterXMLDocument(InputStream inStream, boolean applyCrimsonWorkaround, boolean removeDoctypeDecl)
          Applies the standard set of filters for an XML document.
static InputStream filterXMLDocument(InputStream inStream, SAXParser saxParser, boolean removeDoctypeDecl)
          Applies the standard set of filters for an XML document.
static IndexInfo getIndexInfo(File idxConfigFile, String idxName)
          Given an index configuration file and the name of an index within that file, fetch the configuration info.
private static TransformerFactory getTransformerFactory()
          Get a TransformerFactory.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

configCache

private static ConfigCache configCache

saxParserFactory

private static SAXParserFactory saxParserFactory

transformerFactory

private static TransformerFactory transformerFactory
Constructor Detail

IndexUtil

public IndexUtil()
Method Detail

getIndexInfo

public static IndexInfo getIndexInfo(File idxConfigFile,
                                     String idxName)
                              throws Exception
Given an index configuration file and the name of an index within that file, fetch the configuration info. This is a memo function, so any given index name will be cached and thus only loaded once.

Parameters:
idxConfigFile - Index configuration file to read
idxName - Name of the index within that file
Returns:
Information for the specified index.
Throws:
Exception - If there is a problem reading the config file.

calcLazyPath

public static File calcLazyPath(File xtfHome,
                                File idxConfigFile,
                                String idxName,
                                File srcTextFile,
                                boolean createDir)
                         throws IOException
Given an index within a config file and the path to the source XML text of a document, this method infers the correct path to the lazy version of that source document. The lazy version will be somewhere within the index's directory.

Parameters:
idxConfigFile - File to load index configuration from
idxName - Index name within the config
srcTextFile - Source text file of interest
createDir - true to create the directory for the lazy file if it doesn't exist; false to never create the directory.
Returns:
Expected location of the lazy version of the source file
Throws:
IOException

calcLazyPath

public static File calcLazyPath(File xtfHome,
                                IndexInfo idxInfo,
                                File srcTextFile,
                                boolean createDir)
                         throws IOException
Given an index within a config file and the path to the source XML text of a document, this method infers the correct path to the lazy version of that source document. The lazy version will be somewhere within the index's directory.

Parameters:
xtfHome - File at the root of the XTF directory tree
idxInfo - Configuration info for the index in question.
srcTextFile - Source text file of interest
createDir - true to create the directory for the lazy file if it doesn't exist; false to never create the directory.
Returns:
Expected location of the lazy version of the source file
Throws:
IOException

calcDocKey

public static String calcDocKey(File xtfHome,
                                File idxConfigFile,
                                String idxName,
                                File srcTextFile)
                         throws IOException
Given an index within a config file and the path to the source XML text of a document, this method infers the correct document key that should be stored in the index.

Parameters:
idxConfigFile - File to load index configuration from
idxName - Index name within the config
srcTextFile - Source text file of interest
Returns:
Document key to store or look for in the index
Throws:
IOException

calcDocKey

public static String calcDocKey(File xtfHomeFile,
                                IndexInfo idxInfo,
                                File srcTextFile)
                         throws IOException
Given an index within a config file and the path to the source XML text of a document, this method infers the correct document key that should be stored in the index.

Parameters:
xtfHomeFile - The XTF_HOME directory
idxInfo - Configuration info for the index in question.
srcTextFile - Source text file of interest
Returns:
Document key to store or look for in the index
Throws:
IOException

createSAXParser

public static SAXParser createSAXParser()
Create a SAX parser using the best implementation we can find. We prefer the new parser supplied by Java 1.5. Failing that, we try for the Crimson parser, and if that's not found, we try the default.


createXMLReader

public static XMLReader createXMLReader()
Create an XML reader using the best implementation we can find. We prefer the new parser supplied by Java 1.5. Failing that, we try for the Crimson parser, and if that's not found, we try the default.


getTransformerFactory

private static TransformerFactory getTransformerFactory()
Get a TransformerFactory.


createTransformer

public static Transformer createTransformer()
Create a Saxon transformer.


filterXMLDocument

public static InputStream filterXMLDocument(InputStream inStream,
                                            boolean applyCrimsonWorkaround,
                                            boolean removeDoctypeDecl)
Applies the standard set of filters for an XML document. In our case, this involves removing document type declarations, and working around a bug in the Apache Crimson parser.

Parameters:
inStream - Document stream to filter
applyCrimsonWorkaround - true to apply the workaround for the 8193-byte bug in the Crimson XML parser.
removeDoctypeDecl - true to remove DOCTYPE declaration; false to leave them alone.
Returns:
Filtered input stream

filterXMLDocument

public static InputStream filterXMLDocument(InputStream inStream,
                                            SAXParser saxParser,
                                            boolean removeDoctypeDecl)
Applies the standard set of filters for an XML document. In our case, this involves removing document type declarations, and working around a bug in the Apache Crimson parser.

Parameters:
inStream - Document stream to filter
saxParser - Parser that will be used to parse the document; used to determine whether or not to apply the Crimson parser workaround.
removeDoctypeDecl - true to remove DOCTYPE declaration; false to leave them alone.
Returns:
Filtered input stream

applyPreFilters

public static void applyPreFilters(Templates[] prefilterStylesheets,
                                   XMLReader reader,
                                   InputSource xmlSource,
                                   Result ultimateResult)
                            throws SAXException,
                                   TransformerException,
                                   TransformerConfigurationException
Apply one or more prefilter stylesheets to an XML input source. Pass the filtered data to to the specified Result.

Parameters:
prefilterStylesheets - Stylesheets to process
reader - Reader to use for parsing the input XML
xmlSource - Source of XML data
ultimateResult - Where to send the output
Throws:
SAXException
TransformerException
TransformerConfigurationException