org.cdlib.xtf.lazyTree
Class LazyDocument

Object
  extended by NodeImpl
      extended by ParentNodeImpl
          extended by LazyDocument
All Implemented Interfaces:
Source, SourceLocator, DocumentInfo, FingerprintedNode, Item, NodeInfo, ValueRepresentation, PersistentTree
Direct Known Subclasses:
SearchTree

public class LazyDocument
extends ParentNodeImpl
implements DocumentInfo, PersistentTree

LazyDocument accesses the binary persistent disk file created by LazyTreeBuilder, loading nodes on demand rather than holding all of them in RAM.

This class should never be instatiated directly, but rather loaded by LazyTreeBuilder.

Once loaded, a soft reference to the node is kept in RAM; if memory runs low, these soft references will be thrown away. This behavior can be defeated by calling setAllPermanent(boolean).

Author:
Martin Haye

Field Summary
(package private)  boolean allPermanent
          True if nodes in the cache should be permanent, false for weak refs
protected  PackedByteBuf attrBuf
          Buffer for unpacking nodes
protected  byte[] attrBytes
          Byte buffer for reading nodes
protected  SubStoreReader attrFile
          Contains all the attributes
protected  Configuration config
          Saxon configuration info
protected  boolean debug
          Flag denoting whether to print out when key indexes are created
protected  int documentNumber
          Unique number assigned to each document
private  int killCheckCounter
          Counter to govern periodic checking for thread time limit
protected  StructuredStore mainStore
          The structured file that contains all our subfiles
protected  int maxAttrSize
          The max size of any attribute block
protected  int maxNodeSize
          The size of the largest node entry on disk
(package private)  int[] nameNumToCode
          Maps name numbers in the file to namecodes in the current NamePool
protected  NamePool namePool
          Name pool used to look up namecodes
 int[] namespaceCode
          namespaceCode is the namespace code used by the name pool: the top half is the prefix code, the bottom half the URI code
 int[] namespaceParent
          namespaceParent is the index of the element node owning the namespace declaration
protected static int NODE_FILE_HEADER_SIZE
          Size of the header on the node file
protected  PackedByteBuf nodeBuf
          Buffer for unpacking nodes
protected  byte[] nodeBytes
          Byte buffer for reading nodes
(package private)  HashMap nodeCache
          Caches nodes in memory so they only have to be loaded once.
protected  SubStoreReader nodeFile
          Contains all the nodes
 int numberOfNamespaces
          Number of namespaces currently declared
protected  int numberOfNodes
          How many nodes, excluding attributes and namespaces.
private  ProfilingListener profileListener
          Notified of profile-related events
protected  int rootNodeNum
          This structure supports trees whose root is an element node rather than a document node.
 SystemIdMap systemIdMap
          Maps system IDs to nodes in the tree
protected  SubStoreReader textFile
          Contains all the text, processing instructions, and comments
protected  boolean usesNamespaces
          Determines whether this document is using namespaces.
 
Fields inherited from class ParentNodeImpl
childNum
 
Fields inherited from class NodeImpl
document, nameCode, nextSibNum, NODE_LETTER, nodeNum, parentNum, prevSibNum
 
Fields inherited from interface NodeInfo
ALL_NAMESPACES, EMPTY_NAMESPACE_LIST, IS_DTD_TYPE, IS_NILLED, LOCAL_NAMESPACES, NO_NAMESPACES
 
Fields inherited from interface ValueRepresentation
EMPTY_VALUE_ARRAY
 
Constructor Summary
LazyDocument(Configuration config)
          Construct a new (empty) document.
 
Method Summary
protected  NodeImpl checkCache(int num)
          Checks to see if we've already loaded the node corresponding with the given number.
 void close()
          Closes all disk files opened by the document.
 void copy(Receiver out, int whichNamespaces, boolean copyAnnotations, int locationId)
          Copy this node to a given outputter
protected  NodeImpl createElementNode()
          Create an element node.
protected  NodeImpl createTextNode()
          Create a text node.
 String generateId()
          Get a character string that uniquely identifies this node
 void generateId(FastStringBuffer buffer)
          Get a character string that uniquely identifies this node
protected  AxisIterator getAllElements(int fingerprint)
          Get a list of all elements with a given name.
 String getBaseURI()
          Get the base URI of this root node.
 Configuration getConfiguration()
          Get the configuration previously set using setConfiguration
 boolean getDebug()
          Find out whether debug lines are printed during key index creation
 int getDocumentNumber()
          Get the unique document number
 DocumentInfo getDocumentRoot()
          Get the root (document) node
 DiskHashReader getIndex(String indexName)
          Access a disk-based xsl:key index stored by putIndex().
 int getItemType()
          Return the type of node.
 int getLineNumber()
          Get the line number of this root node.
protected  int getLineNumber(int sequence)
          Get the line number for an element.
 NamePool getNamePool()
          Get the name pool used for the names in this document
 NodeInfo getNextSibling()
          Get next sibling - always null
 NodeImpl getNode(int num)
          Get a node by its node number, loading it from disk if necessary.
 int getNodeKind()
          Get the type of node this document is -- ie it's a document node.
 NodeInfo getPreviousSibling()
          Get previous sibling - always null
 NodeInfo getRoot()
          Get the root node
 long getSequenceNumber()
          Get the node sequence number (in document order).
 String getSystemId()
          Get the system id of this root node
protected  String getSystemId(int seq)
          Get the system id of an element in the document
protected  int getTypeAnnotation(int nodeNum)
          Get the type annotation of a node
 String[] getUnparsedEntity(String name)
          Get the unparsed entity with a given nameID if there is one, or null if not.
 void init(NamePool pool, StructuredStore store)
          Open a lazy tree and read in the root node.
protected  boolean isUsingNamespaces()
          determine whether this document uses namespaces
 void printProfile()
          Print out the profile (if one was collected)
 void putIndex(String indexName, Map index)
          Writes a disk-based version of an xsl:key index.
private  void readNames(SubStoreReader in)
          Fetches the name list from a sub-file in the persistent disk file.
 NodeInfo selectID(String id)
          Get the element with a given ID.
 void setAllPermanent(boolean flag)
          If 'flag' is true, all loaded nodes will be cached until the tree goes away, instead of being held by weak references.
 void setDebug(boolean flag)
          Establish whether to print out debugging statements when key indexes are created.
protected  void setElementAnnotation(int nodeNum, int typeCode)
          Set the type annotation of an element node
protected  void setLineNumber(int sequence, int line)
          Set the line number for an element.
 void setLineNumbering()
          Set line numbering on
 void setRootNode(NodeInfo root)
          Set the root node.
protected  void setSystemId(int seq, String uri)
          Set the system id of an element in the document
 void setSystemId(String uri)
          Set the system id of this node
 
Methods inherited from class ParentNodeImpl
enumerateChildren, getFirstChild, getLastChild, getStringValue, getStringValueCS, hasChildNodes, iterateAxis, iterateAxis
 
Methods inherited from class NodeImpl
atomize, compareOrder, equals, getAttributeValue, getColumnNumber, getDeclaredNamespaces, getDisplayName, getFingerprint, getLocalPart, getNameCode, getNextInDocument, getParent, getPrefix, getPreviousInDocument, getPublicId, getTypeAnnotation, getTypedValue, getURI, hashCode, init, isSameNodeInfo, sendNamespaceDeclarations
 
Methods inherited from class Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface NodeInfo
atomize, compareOrder, equals, getAttributeValue, getDeclaredNamespaces, getDisplayName, getFingerprint, getLocalPart, getNameCode, getParent, getPrefix, getStringValue, getTypeAnnotation, getURI, hasChildNodes, hashCode, isSameNodeInfo, iterateAxis, iterateAxis, sendNamespaceDeclarations
 
Methods inherited from interface Item
getStringValueCS, getTypedValue
 

Field Detail

config

protected Configuration config
Saxon configuration info


namePool

protected NamePool namePool
Name pool used to look up namecodes


documentNumber

protected int documentNumber
Unique number assigned to each document


usesNamespaces

protected boolean usesNamespaces
Determines whether this document is using namespaces. Not sure why this works when false, but it does.


rootNodeNum

protected int rootNodeNum
This structure supports trees whose root is an element node rather than a document node. The document node still exists, for implementation reasons, but it is not regarded as part of the tree. The variable rootNode identifies the actual root of the tree, which is the document node by default.


debug

protected boolean debug
Flag denoting whether to print out when key indexes are created


mainStore

protected StructuredStore mainStore
The structured file that contains all our subfiles


textFile

protected SubStoreReader textFile
Contains all the text, processing instructions, and comments


nodeFile

protected SubStoreReader nodeFile
Contains all the nodes


numberOfNodes

protected int numberOfNodes
How many nodes, excluding attributes and namespaces.


NODE_FILE_HEADER_SIZE

protected static final int NODE_FILE_HEADER_SIZE
Size of the header on the node file

See Also:
Constant Field Values

maxNodeSize

protected int maxNodeSize
The size of the largest node entry on disk


nodeBytes

protected byte[] nodeBytes
Byte buffer for reading nodes


nodeBuf

protected PackedByteBuf nodeBuf
Buffer for unpacking nodes


attrFile

protected SubStoreReader attrFile
Contains all the attributes


maxAttrSize

protected int maxAttrSize
The max size of any attribute block


attrBytes

protected byte[] attrBytes
Byte buffer for reading nodes


attrBuf

protected PackedByteBuf attrBuf
Buffer for unpacking nodes


numberOfNamespaces

public int numberOfNamespaces
Number of namespaces currently declared


namespaceParent

public int[] namespaceParent
namespaceParent is the index of the element node owning the namespace declaration


namespaceCode

public int[] namespaceCode
namespaceCode is the namespace code used by the name pool: the top half is the prefix code, the bottom half the URI code


systemIdMap

public SystemIdMap systemIdMap
Maps system IDs to nodes in the tree


nameNumToCode

int[] nameNumToCode
Maps name numbers in the file to namecodes in the current NamePool


nodeCache

HashMap nodeCache
Caches nodes in memory so they only have to be loaded once.


allPermanent

boolean allPermanent
True if nodes in the cache should be permanent, false for weak refs


profileListener

private ProfilingListener profileListener
Notified of profile-related events


killCheckCounter

private int killCheckCounter
Counter to govern periodic checking for thread time limit

Constructor Detail

LazyDocument

public LazyDocument(Configuration config)
Construct a new (empty) document. Should call init(NamePool, StructuredStore) afterward.

Method Detail

init

public void init(NamePool pool,
                 StructuredStore store)
          throws IOException
Open a lazy tree and read in the root node.

Parameters:
pool - The name pool to map namecodes with
store - The file to open
Throws:
IOException

setAllPermanent

public void setAllPermanent(boolean flag)
If 'flag' is true, all loaded nodes will be cached until the tree goes away, instead of being held by weak references.

Specified by:
setAllPermanent in interface PersistentTree
Parameters:
flag - True to hold nodes for the life of the tree, false to hold only soft references to them.

setDebug

public void setDebug(boolean flag)
Establish whether to print out debugging statements when key indexes are created.


getDebug

public boolean getDebug()
Find out whether debug lines are printed during key index creation


printProfile

public void printProfile()
                  throws IOException
Print out the profile (if one was collected)

Specified by:
printProfile in interface PersistentTree
Throws:
IOException

close

public void close()
Closes all disk files opened by the document. While this will theoretically be done when the LazyDocument is garbage collected, it's a good idea to conserve file handles by closing them promptly as soon as the tree's usefulness is done.

Specified by:
close in interface PersistentTree

readNames

private void readNames(SubStoreReader in)
                throws IOException
Fetches the name list from a sub-file in the persistent disk file.

Parameters:
in - The subfile to load from
Throws:
IOException

putIndex

public void putIndex(String indexName,
                     Map index)
              throws IOException
Writes a disk-based version of an xsl:key index. Use getIndex() later to read it back.

Parameters:
indexName - Uniquely computed name
index - HashMap mapping String -> ArrayList[NodeImpl]
Throws:
IOException

getIndex

public DiskHashReader getIndex(String indexName)
Access a disk-based xsl:key index stored by putIndex(). Note that the entire index isn't loaded, just the header. Individual entries will be loaded as needed by the DiskHashReader.

Parameters:
indexName - Name of the index to load
Returns:
Reader to access the index with.

getConfiguration

public Configuration getConfiguration()
Get the configuration previously set using setConfiguration

Specified by:
getConfiguration in interface NodeInfo
Overrides:
getConfiguration in class NodeImpl

getNamePool

public NamePool getNamePool()
Get the name pool used for the names in this document

Specified by:
getNamePool in interface NodeInfo
Overrides:
getNamePool in class NodeImpl

getDocumentNumber

public int getDocumentNumber()
Get the unique document number

Specified by:
getDocumentNumber in interface NodeInfo
Overrides:
getDocumentNumber in class NodeImpl

setRootNode

public void setRootNode(NodeInfo root)
Set the root node. Parentless elements are implemented using a full tree structure containing a document node, but the document node is not regarded as part of the tree


setElementAnnotation

protected void setElementAnnotation(int nodeNum,
                                    int typeCode)
Set the type annotation of an element node


getTypeAnnotation

protected int getTypeAnnotation(int nodeNum)
Get the type annotation of a node. -1 if there is no type annotation


getNodeKind

public int getNodeKind()
Get the type of node this document is -- ie it's a document node.

Specified by:
getNodeKind in interface NodeInfo

getNode

public NodeImpl getNode(int num)
Get a node by its node number, loading it from disk if necessary.

Parameters:
num - The number to get
Returns:
A node, or null if the number is invalid.

checkCache

protected NodeImpl checkCache(int num)
Checks to see if we've already loaded the node corresponding with the given number. If so, return it, else null.


createElementNode

protected NodeImpl createElementNode()
Create an element node. Derived classes can override this to provide their own element implementation.


createTextNode

protected NodeImpl createTextNode()
Create a text node. Derived classes can override this to provide their own text implementation.


getSequenceNumber

public long getSequenceNumber()
Get the node sequence number (in document order). Sequence numbers are monotonic but not consecutive.

Overrides:
getSequenceNumber in class NodeImpl

getNextSibling

public final NodeInfo getNextSibling()
Get next sibling - always null

Overrides:
getNextSibling in class NodeImpl
Returns:
null

getPreviousSibling

public final NodeInfo getPreviousSibling()
Get previous sibling - always null

Overrides:
getPreviousSibling in class NodeImpl
Returns:
null

generateId

public void generateId(FastStringBuffer buffer)
Get a character string that uniquely identifies this node

Specified by:
generateId in interface NodeInfo
Overrides:
generateId in class NodeImpl
Parameters:
buffer - a buffer into which will be placed a string based on the document number

isUsingNamespaces

protected boolean isUsingNamespaces()
determine whether this document uses namespaces


setSystemId

public void setSystemId(String uri)
Set the system id of this node

Specified by:
setSystemId in interface Source
Overrides:
setSystemId in class NodeImpl

getSystemId

public String getSystemId()
Get the system id of this root node

Specified by:
getSystemId in interface Source
Specified by:
getSystemId in interface SourceLocator
Specified by:
getSystemId in interface NodeInfo
Overrides:
getSystemId in class NodeImpl

getBaseURI

public String getBaseURI()
Get the base URI of this root node. For a root node the base URI is the same as the System ID.

Specified by:
getBaseURI in interface NodeInfo
Overrides:
getBaseURI in class NodeImpl

setSystemId

protected void setSystemId(int seq,
                           String uri)
Set the system id of an element in the document


getSystemId

protected String getSystemId(int seq)
Get the system id of an element in the document


setLineNumbering

public void setLineNumbering()
Set line numbering on


setLineNumber

protected void setLineNumber(int sequence,
                             int line)
Set the line number for an element. Ignored if line numbering is off.


getLineNumber

protected int getLineNumber(int sequence)
Get the line number for an element. Return -1 if line numbering is off.


getLineNumber

public int getLineNumber()
Get the line number of this root node.

Specified by:
getLineNumber in interface SourceLocator
Specified by:
getLineNumber in interface NodeInfo
Overrides:
getLineNumber in class NodeImpl
Returns:
0 always

getItemType

public final int getItemType()
Return the type of node.

Returns:
Type.DOCUMENT (always)

getRoot

public NodeInfo getRoot()
Get the root node

Specified by:
getRoot in interface NodeInfo
Overrides:
getRoot in class NodeImpl
Returns:
the NodeInfo that is the root of the tree - not necessarily a document node

getDocumentRoot

public DocumentInfo getDocumentRoot()
Get the root (document) node

Specified by:
getDocumentRoot in interface NodeInfo
Overrides:
getDocumentRoot in class NodeImpl
Returns:
the DocumentInfo representing the document node, or null if the root of the tree is not a document node

generateId

public String generateId()
Get a character string that uniquely identifies this node

Returns:
an identifier based on the document number

getAllElements

protected AxisIterator getAllElements(int fingerprint)
Get a list of all elements with a given name. This is implemented as a memo function: the first time it is called for a particular element type, it remembers the result for next time.


selectID

public NodeInfo selectID(String id)
Get the element with a given ID.

Specified by:
selectID in interface DocumentInfo
Parameters:
id - The unique ID of the required element, previously registered using registerID()
Returns:
The NodeInfo (always an Element) for the given ID if one has been registered, otherwise null.

getUnparsedEntity

public String[] getUnparsedEntity(String name)
Get the unparsed entity with a given nameID if there is one, or null if not. If the entity does not exist, return null.

Specified by:
getUnparsedEntity in interface DocumentInfo
Parameters:
name - the name of the entity
Returns:
if the entity exists, return an array of two Strings, the first holding the system ID of the entity, the second holding the public

copy

public void copy(Receiver out,
                 int whichNamespaces,
                 boolean copyAnnotations,
                 int locationId)
          throws XPathException
Copy this node to a given outputter

Specified by:
copy in interface NodeInfo
Throws:
XPathException