org.cdlib.xtf.servletBase
Class TextServlet

Object
  extended by GenericServlet
      extended by HttpServlet
          extended by TextServlet
All Implemented Interfaces:
Serializable, Servlet, ServletConfig
Direct Known Subclasses:
CrossQuery, DynaXML

public abstract class TextServlet
extends HttpServlet

Base class for the crossQuery and dynaXML servlets. Handles first-time initialization, config file loading and some parsing, error handling, and a few utility methods.

See Also:
Serialized Form

Nested Class Summary
private  class TextServlet.RequestWrapper
          Wraps a servlet request, substituting a different parameter set that allows ';' in addition to '&' as a separator.
private  class TextServlet.ResponseWrapper
          Wraps a servlet response, substituting a different output stream Note: Some deprecated methods are included in HttpServletResponseWrapper.
 
Field Summary
private  String baseDir
          Base directory specified in servlet config (if any)
private  long configFileLastModified
          Last modification time of the configuration file, so we can decide when we need to re-initialize the servlet.
private static ThreadLocal curRequest
          Keeps track, per thread, of the HTTP servlet request being processed
private static ThreadLocal curResponse
          Keeps track, per thread, of the HTTP servlet response
private static ThreadLocal curServlet
          Keeps track, per thread, of the servlet performing a request
private  ThreadLocal<String> errorGenSheet
          The error generator stylesheet to use
private static HashMap<String,IndexWarmer> indexWarmers
          Used for warming up indexes in the background
private  boolean isInitted
          Flag to discern whether class has been initialized yet
private static String SAVE_WILD_QMARK
          During tokenization, the '?'
private static String SAVE_WILD_STAR
          During tokenization, the '*' wildcard has to be changed to a word to keep it from being removed.
private  ServletContext staticContext
          Context useful for mapping partial paths to full paths
 StylesheetCache stylesheetCache
          Caches stylesheets (based on their URL)
 
Constructor Summary
TextServlet()
           
 
Method Summary
protected  void addParam(XMLFormatter fmt, String name, String val, Map tokenizerMap)
          Adds the tokenized and un-tokenized version of the attribute to the given formatter.
protected  void addToken(XMLFormatter fmt, String str, boolean isWord)
          Adds a token element to a parameter node.
protected  void addTokens(char inQuote, XMLFormatter fmt, String str)
          Adds one or more token elements to a parameter node.
 void buildParamBlock(AttribList atts, XMLFormatter fmt, Map tokenizerMap, String extra)
          Creates a document containing tokenized and untokenized versions of each parameter.
protected static String calcMimeType(Templates stylesheet)
          Given a stylesheet, determine what the Mime type of the servlet response should be.
static String convertUTF8inURL(String value)
          Although not completely standardized yet, most modern browsers encode Unicode characters above U+007F to UTF8 in the URL.
protected  void cqlTokenize(XMLFormatter fmt, String name, String val)
          Parse 'val' as a CQL query, and add the resulting XCQL to the parameter.
 Result createFilteredReceiver(Transformer trans, HttpServletRequest req, HttpServletResponse res)
          Makes a Saxon Receiver that will transparently add a session IDs to URLs if they match the servlet URL, or other patterns configured in the conf file.
 QueryProcessor createQueryProcessor()
          Create a QueryProcessor.
static String decodeURL(String value)
          Certain methods of HttpServletRequest do not decode escaped characters in the URL.
protected  void defaultTokenize(XMLFormatter fmt, String name, String val)
          Break 'val' up into its component tokens and add elements for them.
 void destroy()
          Called by the servlet container to indicate this servlet is being taken out of service.
abstract  void doGet(HttpServletRequest req, HttpServletResponse res)
          Derived classes must supply this method.
 void doPost(HttpServletRequest req, HttpServletResponse resp)
          Derived classes may optionally supply this method.
private  void firstTimeInit(boolean forceInit)
          Ensures that the servlet has been properly initialized.
protected  void genErrorPage(HttpServletRequest req, HttpServletResponse res, Exception exc)
          Generate an error page based on the given exception.
abstract  TextConfig getConfig()
          Derived classes must supply this method.
abstract  String getConfigName()
          Derived classes must supply this method.
static HttpServletRequest getCurRequest()
          Get the HTTP servlet request that is currently being processed by this thread, or null if none is being processed by this thread.
static HttpServletResponse getCurResponse()
          Get the HTTP servlet response that is currently being generated by this thread, or null if no request is being processed.
static TextServlet getCurServlet()
          Get the servlet that is currently executing a request in this thread, or null if no request is being processed by this thread.
 String getRealPath(String partialPath)
          Translate a partial filesystem path to a full path.
static String getRequestURL(HttpServletRequest req)
          Gets the full URL, including query parameters, from an HTTP request.
static String getRequestURL(HttpServletRequest req, boolean raw)
          Gets the full URL, including query parameters, from an HTTP request.
static String getText(EasyNode element)
          Extracts all of the text data from a tree element node.
static boolean isEmpty(String s)
          Utility function - check if string is null or ""
 boolean isSessionTrackingEnabled()
          Tells whether session tracking was enabled in the config file
protected  AttribList makeAttribList(HttpServletRequest req)
          Generate an AttribList from the parameters in a servlet request.
static String makeHtmlString(String s)
          Translates any HTML-special characters (like quote, ampersand, etc.)
static String makeHtmlString(String s, boolean keepTags)
          Translates any HTML-special characters (like quote, ampersand, etc.)
protected  void rawTokenize(XMLFormatter fmt, String name, String val)
          Interpret 'val' as a raw XML element, and output it.
protected  void readBranding(String path, HttpServletRequest req, Transformer targetTrans)
          Reads a brand profile (a simple stylesheet) and stuffs all the output tags into the specified transformer as parameters.
protected abstract  TextConfig readConfig(String path)
          Derived classes must supply this method.
static void requireOrElse(String value, String descrip)
          Utlity function - if the value is null, throws an exception.
protected static String restoreWildcards(String s)
          Restores wildcards saved by saveWildcards(String).
protected static String saveWildcards(String s)
          Converts wildcard characters into word-looking bits that would never occur in real text, so the standard tokenizer will keep them part of words.
protected  void service(HttpServletRequest req, HttpServletResponse res)
          General service method.
 void setErrorGenSheet(String newPath)
          Switch to using a different error generator stylesheet than the default.
protected  void setupTrace(TextConfig config)
          Sets up the trace facility for serlvet operation: 1.
static void stuffAttribs(Transformer trans, AttribList list)
          Adds all the attributes in the list to the transformer as parameters that can be used by the stylesheet.
 void stuffAttribs(Transformer trans, HttpServletRequest req)
          Adds all URL attributes from the request into a transformer.
 void stuffSpecialAttribs(HttpServletRequest req, Transformer trans)
          Calculates and adds the "servlet.path" and "root.path" attributes to the given transformer.
 
Methods inherited from class HttpServlet
doDelete, doHead, doOptions, doPut, doTrace, getLastModified, service
 
Methods inherited from class GenericServlet
getInitParameter, getInitParameterNames, getServletConfig, getServletContext, getServletInfo, getServletName, init, init, log, log
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

stylesheetCache

public StylesheetCache stylesheetCache
Caches stylesheets (based on their URL)


staticContext

private ServletContext staticContext
Context useful for mapping partial paths to full paths


baseDir

private String baseDir
Base directory specified in servlet config (if any)


isInitted

private boolean isInitted
Flag to discern whether class has been initialized yet


errorGenSheet

private ThreadLocal<String> errorGenSheet
The error generator stylesheet to use


configFileLastModified

private long configFileLastModified
Last modification time of the configuration file, so we can decide when we need to re-initialize the servlet.


curServlet

private static ThreadLocal curServlet
Keeps track, per thread, of the servlet performing a request


curRequest

private static ThreadLocal curRequest
Keeps track, per thread, of the HTTP servlet request being processed


curResponse

private static ThreadLocal curResponse
Keeps track, per thread, of the HTTP servlet response


indexWarmers

private static HashMap<String,IndexWarmer> indexWarmers
Used for warming up indexes in the background


SAVE_WILD_STAR

private static final String SAVE_WILD_STAR
During tokenization, the '*' wildcard has to be changed to a word to keep it from being removed.

See Also:
Constant Field Values

SAVE_WILD_QMARK

private static final String SAVE_WILD_QMARK
During tokenization, the '?' wildcard has to be changed to a word to keep it from being removed.

See Also:
Constant Field Values
Constructor Detail

TextServlet

public TextServlet()
Method Detail

getText

public static String getText(EasyNode element)
Extracts all of the text data from a tree element node.

Parameters:
element - element to get text from
Returns:
Concatenated text from the element.

getRealPath

public String getRealPath(String partialPath)
Translate a partial filesystem path to a full path.

Parameters:
partialPath - A partial (or full) path
Returns:
The full path

isEmpty

public static boolean isEmpty(String s)
Utility function - check if string is null or ""

Parameters:
s - String to check
Returns:
true if the string is null or the empty string ("")

requireOrElse

public static void requireOrElse(String value,
                                 String descrip)
                          throws GeneralException
Utlity function - if the value is null, throws an exception.

Parameters:
value - The value to check for null
descrip - If exception is thrown, this will be the message.
Throws:
GeneralException - Only if the value is null

getCurServlet

public static TextServlet getCurServlet()
Get the servlet that is currently executing a request in this thread, or null if no request is being processed by this thread.


getCurRequest

public static HttpServletRequest getCurRequest()
Get the HTTP servlet request that is currently being processed by this thread, or null if none is being processed by this thread.


getCurResponse

public static HttpServletResponse getCurResponse()
Get the HTTP servlet response that is currently being generated by this thread, or null if no request is being processed.


destroy

public void destroy()
Called by the servlet container to indicate this servlet is being taken out of service. We clean up all resources we can.

Specified by:
destroy in interface Servlet
Overrides:
destroy in class GenericServlet

firstTimeInit

private final void firstTimeInit(boolean forceInit)
Ensures that the servlet has been properly initialized. If init() hasn't been called yet, or if the config file changes, then this method reads the config file, then calls derivedInit().


setupTrace

protected void setupTrace(TextConfig config)
Sets up the trace facility for serlvet operation: 1. Print timestamps with each line 2. Flush output immediately rather than buffering til end of line 3. Output level from config 4. Log a message that we're restarting the servlet


service

protected void service(HttpServletRequest req,
                       HttpServletResponse res)
                throws ServletException,
                       IOException
General service method. We set a watch on each request in case it becomes a "runaway", and institute various filters.

Overrides:
service in class HttpServlet
Throws:
ServletException
IOException

setErrorGenSheet

public void setErrorGenSheet(String newPath)
Switch to using a different error generator stylesheet than the default.


doGet

public abstract void doGet(HttpServletRequest req,
                           HttpServletResponse res)
                    throws IOException
Derived classes must supply this method. It is the main entry point for processing an HTTP request.

Overrides:
doGet in class HttpServlet
Throws:
IOException

doPost

public void doPost(HttpServletRequest req,
                   HttpServletResponse resp)
            throws IOException,
                   ServletException
Derived classes may optionally supply this method. If not supplied, doGet(HttpServletRequest, HttpServletResponse) is called and the parameters are decoded automatically by the HttpServletRequest, assuming they're URL encoded..

Overrides:
doPost in class HttpServlet
Throws:
IOException
ServletException

getConfigName

public abstract String getConfigName()
Derived classes must supply this method. Simply returns the relative path name of the configuration file (e.g. "conf/dynaXml.conf").


readConfig

protected abstract TextConfig readConfig(String path)
Derived classes must supply this method. It reads in the servlet's configuration file, and performs any derived class initialization as necessary.

Parameters:
path - Path to the configuration file
Returns:
Parsed config information

getConfig

public abstract TextConfig getConfig()
Derived classes must supply this method. It simply returns the configuration info that was read previously by readConfig()


isSessionTrackingEnabled

public boolean isSessionTrackingEnabled()
Tells whether session tracking was enabled in the config file


getRequestURL

public static String getRequestURL(HttpServletRequest req)
Gets the full URL, including query parameters, from an HTTP request. This is a bit tricky since different servlet containers return slightly different info.


getRequestURL

public static String getRequestURL(HttpServletRequest req,
                                   boolean raw)
Gets the full URL, including query parameters, from an HTTP request. This is a bit tricky since different servlet containers return slightly different info.

Parameters:
raw - true to suppress un-escaping of % codes and probable utf-8 coding in the URL.

stuffAttribs

public void stuffAttribs(Transformer trans,
                         HttpServletRequest req)
Adds all URL attributes from the request into a transformer.

Parameters:
trans - The transformer to stuff the parameters in
req - The request containing the parameters

stuffAttribs

public static void stuffAttribs(Transformer trans,
                                AttribList list)
Adds all the attributes in the list to the transformer as parameters that can be used by the stylesheet.

Parameters:
trans - The transformer to stuff the parameters in
list - The list containing attributes to stuff

stuffSpecialAttribs

public void stuffSpecialAttribs(HttpServletRequest req,
                                Transformer trans)
Calculates and adds the "servlet.path" and "root.path" attributes to the given transformer. Also adds "xtf.home" based on the servlet root directory.


readBranding

protected void readBranding(String path,
                            HttpServletRequest req,
                            Transformer targetTrans)
                     throws Exception
Reads a brand profile (a simple stylesheet) and stuffs all the output tags into the specified transformer as parameters.

Parameters:
path - Filesystem path to the brand profile
req - HTTP servlet request containing URL parameters
targetTrans - Where to stuff the attributes into
Throws:
Exception - If an error occurs loading or parsing the profile.

createFilteredReceiver

public Result createFilteredReceiver(Transformer trans,
                                     HttpServletRequest req,
                                     HttpServletResponse res)
                              throws XPathException,
                                     IOException
Makes a Saxon Receiver that will transparently add a session IDs to URLs if they match the servlet URL, or other patterns configured in the conf file.

Parameters:
trans - The transformer that will do the work
req - The servlet request being processed
res - The servlet response to output to
Returns:
A Receiver suitable for the target of the transform
Throws:
XPathException
IOException

makeHtmlString

public static String makeHtmlString(String s)
Translates any HTML-special characters (like quote, ampersand, etc.) into the corresponding code (like &quot;)

Parameters:
s - The string to transform

makeHtmlString

public static String makeHtmlString(String s,
                                    boolean keepTags)
Translates any HTML-special characters (like quote, ampersand, etc.) into the corresponding code (like &quot;)

Parameters:
s - The string to transform

createQueryProcessor

public QueryProcessor createQueryProcessor()
Create a QueryProcessor. Checks the system property "org.cdlib.xtf.QueryProcessorClass" to see if there is a user- supplied implementation. If not, a DefaultQueryProcessor is created.


decodeURL

public static String decodeURL(String value)
Certain methods of HttpServletRequest do not decode escaped characters in the URL. This method decodes them, and also translates UTF-8 byte sequences into normal characters.


convertUTF8inURL

public static String convertUTF8inURL(String value)
Although not completely standardized yet, most modern browsers encode Unicode characters above U+007F to UTF8 in the URL. This method looks for probably UTF8 encodings and converts them back to normal Unicode characters. One might ask why this is necessary... doesn't URLDecoder handle it? Well, some servlet containers seem to partially decode URLs; they decode the escapes, but then they don't do the UTF-8 conversion.

Parameters:
value - value to convert
Returns:
equivalent value with UTF8 decoded to Unicode

buildParamBlock

public void buildParamBlock(AttribList atts,
                            XMLFormatter fmt,
                            Map tokenizerMap,
                            String extra)
Creates a document containing tokenized and untokenized versions of each parameter.


addParam

protected void addParam(XMLFormatter fmt,
                        String name,
                        String val,
                        Map tokenizerMap)
Adds the tokenized and un-tokenized version of the attribute to the given formatter.

Parameters:
fmt - formatter to add to
name - Name of the URL parameter
val - String value of the URL parameter
tokenizerMap - tells which parameters to tokenize, and how

rawTokenize

protected void rawTokenize(XMLFormatter fmt,
                           String name,
                           String val)
Interpret 'val' as a raw XML element, and output it.

Parameters:
fmt - formatter to add to
name - Name of the URL parameter
val - value to tokenize
Throws:
TransformerException

defaultTokenize

protected void defaultTokenize(XMLFormatter fmt,
                               String name,
                               String val)
Break 'val' up into its component tokens and add elements for them.

Parameters:
fmt - formatter to add to
name - Name of the URL parameter
val - value to tokenize

cqlTokenize

protected void cqlTokenize(XMLFormatter fmt,
                           String name,
                           String val)
Parse 'val' as a CQL query, and add the resulting XCQL to the parameter.

Parameters:
fmt - formatter to add to
name - Name of the URL parameter
val - value to tokenize

addTokens

protected void addTokens(char inQuote,
                         XMLFormatter fmt,
                         String str)
Adds one or more token elements to a parameter node. Also handles phrase nodes.

Parameters:
inQuote - Non-zero means this is a quoted phrase, in which case the element will be 'phrase' instead of 'token', and it will be given sub-token elements.
fmt - formatter to add to
str - The token value

addToken

protected void addToken(XMLFormatter fmt,
                        String str,
                        boolean isWord)
Adds a token element to a parameter node.

Parameters:
fmt - formatter to add to
str - The token value
isWord - true if token is a real word, false if only punctuation

saveWildcards

protected static String saveWildcards(String s)
Converts wildcard characters into word-looking bits that would never occur in real text, so the standard tokenizer will keep them part of words. Resurrect using restoreWildcards(String).


restoreWildcards

protected static String restoreWildcards(String s)
Restores wildcards saved by saveWildcards(String).


calcMimeType

protected static String calcMimeType(Templates stylesheet)
Given a stylesheet, determine what the Mime type of the servlet response should be.


genErrorPage

protected void genErrorPage(HttpServletRequest req,
                            HttpServletResponse res,
                            Exception exc)
Generate an error page based on the given exception. Utilizes the system error stylesheet to produce a nicely formatted HTML page.

Parameters:
req - The HTTP request we're responding to
res - The HTTP result to write to
exc - The exception producing the error. If it's a DynaXMLException, the attributes will be passed to the error stylesheet.

makeAttribList

protected AttribList makeAttribList(HttpServletRequest req)
Generate an AttribList from the parameters in a servlet request. Deals some URL encoding issues introduced by many browsers.

Parameters:
req - Request to scan for attributes
Returns:
An AttribList containing the parameter names and value