org.cdlib.xtf.textIndexer
Class SectionInfoStack

Object
  extended by SectionInfoStack

public class SectionInfoStack
extends Object

This class maintains information about the current nesting of sections in a text document that the TextIndexer program is processing.

On-line documents are stored as "nodes" in XML files that contain information about the document, and the document text itself. The nodes usually form a heirarchical tree structure, with the outer-most nodes recording various bits of information about the text within. Inside the outer nodes are additional nodes that record the organization of the text itself, including things like section, chapter, and paragraph information. To the text indexer program and search engine, sections have special significance. Text in two adjacent sections that have different names, are considered to not be "near" one another, so that proximity searches will not produce results that span across two or more sections.

Since sections can be nested inside one-another, a stack of the current nesting level needs to be maintained by the text indexer when a document is being processed. Doing so does two things:

- It allows unnamed inner sections to inherit properties from the parent sections that contain them.
- When the end of an named section has been reached, the text indexer can return to using the parent section's properties and continue processing.

The SectionInfoStack class is used to maintain the current state of nested sections encountered in a document by the text indexer, while the SectionInfo class holds the section attributes for each entry in the current stack.


Field Summary
private  Stack infoStack
          Actual generic stack that holds the SectionInfo objects.
 
Constructor Summary
SectionInfoStack()
           
 
Method Summary
 int depth()
          Return the current depth of the top section on the nesting stack.
 int indexFlag()
          Return the index flag for the top section on the nesting stack.
 boolean isEmpty()
          Query method to determine if there are any nested sections currently on the nesting stack.
 SectionInfo peek()
          Return a copy of the section currently at the top of the nesting without popping the stack.
 void pop()
          Section de-stacking operator.
 void push()
          Implicit depth-push operator.
 void push(int indexFlag, String sectionType, int sectionBump, float wordBoost, int sentenceBump, int spellFlag)
          Explicit section push operator.
private  void push(SectionInfo info)
          Push a SectionInfo instance onto the top of the section stack.
 int sectionBump()
          Return the section bump value for the top section on the nesting stack.
 String sectionType()
          Return the section type name for the top section on the nesting stack.
 int sentenceBump()
          Return the sentence bump value for the top entry in the stack.
 int setSectionBump(int newBump)
          This function sets the section bump value for the top entry in the stack.
 int spellFlag()
          Return the spell flag for the top section on the nesting stack.
private  SectionInfo top()
          Return a reference to the top entry in the section stack, if any.
 int useSectionBump()
          Use and clear the section bump value for the top section on the nesting stack.
 boolean valuesChanged(int indexFlag, String sectionType, int sectionBump, float wordBoost, int sentenceBump, int spellFlag)
          Query method to determine if the passed set of section attributes differs from the section at the top of the nesting stack.
 float wordBoost()
          Return the word boost value for the top entry in the stack.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

infoStack

private Stack infoStack
Actual generic stack that holds the SectionInfo objects.

Constructor Detail

SectionInfoStack

public SectionInfoStack()
Method Detail

push

public void push(int indexFlag,
                 String sectionType,
                 int sectionBump,
                 float wordBoost,
                 int sentenceBump,
                 int spellFlag)
Explicit section push operator.

Call this method to push a new section onto the stack with explicitly specified values for the section's attributes.

Parameters:
indexFlag - A flag indicating whether or not the current section should be indexed. Valid values are parentIndex, index, noIndex.

sectionType - The type name for the section being pushed. This may either a caller defined string or an empty string (""). Note that if an empty string is passed, the section name is inherited from the parent section (if defined.)

sectionBump - The offset (in words) of the current section from the previous section. Used to lower the relevance of (or completely avoid) proximity matches that span two sections. This value is typically set to zero (for no de-emphasis of proximity matches across adjacent sections), or a value greater than or equal to the chunk overlap used by the index (to completely avoid proximity matches across adjacent sections.)

wordBoost - Boost factor to apply to words in this section. values greater than 1.0 make the words found in this section more relevant in a search, while values less than 1.0 make words in the section less relevant.

sentenceBump - The offset (in words) for this section between the start of a new sentence and the end of the previous one. Like the section bump, this value is used to adjust the relevance of proximity matches made across sentence boundaries. Typical values are one (for no de-emphasis of proximity matches across sentence boundaries), a value between one and the chunk overlap for the index (for partial de-emphasis of proximity matches across sentence boundaries), or a value greater than or equal to the chunk size to completely avoid proximity matches across sentence boundaries.)

spellFlag - A flag indicating whether or not words in the current section should be added to the spelling correction dictionary. Valid values are parentSpell, spell, noSpell.

Notes:
This method compares the passed attributes to the section currently at the top of the stack (if any.) If the attributes are identical, the depth-push method is called to save space. Otherwise, the new section entry with the passed attributes is created and placed on the stack.

For a more complete description of the above listed attributes, see the SectionInfo class.


push

public void push()
Implicit depth-push operator.

Call this method to push a section onto the stack with all the same attributes as the previous section. This method uses the depth field of the SectionInfo class to maintain the correct depth for nested sections with identical attributes while avoiding pushing entire duplicate entries.

Notes:
Use the valuesChanged() method to determine if your attributes for a new section are identical to the section currently at the top of the stack before calling this method. Alternately, you can simply pass your new attributes to the explicit section-push operator, which performs the same check internally and calls this method as needed.


pop

public void pop()
Section de-stacking operator.

Call this method to pop a section off the nesting stack.

Notes:
Internally, this method decrements the depth of the topmost entry in the stack, and if the depth goes to zero, it removes the topmost entry from the stack.

This method does nothing if there nesting stack is empty.

peek

public SectionInfo peek()
Return a copy of the section currently at the top of the nesting without popping the stack.

Returns:
A copy of the top entry in the nesting stack.

isEmpty

public boolean isEmpty()
Query method to determine if there are any nested sections currently on the nesting stack.

Returns:
true - No nested sections currently on the stack.
false - One or more nested sections are currently on the stack.


valuesChanged

public boolean valuesChanged(int indexFlag,
                             String sectionType,
                             int sectionBump,
                             float wordBoost,
                             int sentenceBump,
                             int spellFlag)
Query method to determine if the passed set of section attributes differs from the section at the top of the nesting stack.

Returns:
true - One or more of the passed attributes do not match the attributes for the section currently at the top of the stack.
false - The passed attributes are identical to those for the section currently at the top of the stack.

Notes:
If the stack is empty when this method is called, the value true.


depth

public int depth()
Return the current depth of the top section on the nesting stack.

Returns:
The current depth of the entry at the top of the section stack, or -1 if the stack is empty.

indexFlag

public int indexFlag()
Return the index flag for the top section on the nesting stack.

Returns:
Returns index or noIndex.

Notes:
This function will never return parentIndex. That value is only used as an argument when calling the explicit section-push operator to force the new section to adopt it's parents index flag.

For a complete explanation of the indexFlag attribute, see the indexFlag field in the SectionInfo class.


spellFlag

public int spellFlag()
Return the spell flag for the top section on the nesting stack.

Returns:
Returns spell or noSpell.

Notes:
This function will never return parentSpell. That value is only used as an argument when calling the explicit section-push operator to force the new section to adopt it's parents spell flag.

For a complete explanation of the spellFlag attribute, see the spellFlag field in the SectionInfo class.


sectionType

public String sectionType()
Return the section type name for the top section on the nesting stack.

Returns:
Returns the name of the top section entry on the stack (if any) or an empty string if no type name is assigned or the stack is empty.

Notes:
For a complete explanation of the sectionType attribute, see the sectionType field in the SectionInfo class.


sectionBump

public int sectionBump()
Return the section bump value for the top section on the nesting stack.

Returns:
Returns the bump value for the top section entry on the stack (if any), or the defaultSectionBump value if the stack is empty.

Notes:
For a complete explanation of the sectionBump attribute, see the sectionBump field in the SectionInfo class.


useSectionBump

public int useSectionBump()
Use and clear the section bump value for the top section on the nesting stack.

Returns:
Returns the bump value for the top section entry on the stack (if any), or the defaultSectionBump value if the stack is empty.

Notes:
"Using" the section bump at the top of the stack consists of retrieving its value and resetting its field to zero. This is done so that any accumulated bump from nested sections is used only once. After the reset, subsequent calls to this function will return zero, thus preventing any unwanted repeat bumping.

For a complete explanation of the sectionBump attribute, see the sectionBump field in the SectionInfo class.


setSectionBump

public int setSectionBump(int newBump)
This function sets the section bump value for the top entry in the stack.

Parameters:
newBump - New bump value to set for top entry.

Returns:
The bump value set for the top entry in the stack just before this call was made.

Notes:
For a complete explanation of the sectionBump attribute, see the sectionBump field in the SectionInfo class.


wordBoost

public float wordBoost()
Return the word boost value for the top entry in the stack.

Returns:
If the stack is empty, this function returns SectionInfo.defaultWordBoost. Otherwise, it returns the word boost for the section currently at the top of the stack.

Notes:
For a complete explanation of the wordBoost attribute, see the wordBoost field in the SectionInfo class.


sentenceBump

public int sentenceBump()
Return the sentence bump value for the top entry in the stack.

Returns:
If the stack is empty, this function returns SectionInfo.defaultSentenceBump. Otherwise, it returns the sentence bump for the section currently at the top of the stack.

Notes:
For a complete explanation of the sentenceBump attribute, see the sentenceBump field in the SectionInfo class.


push

private void push(SectionInfo info)
Push a SectionInfo instance onto the top of the section stack.

Notes:
This method is a convenience function that does the necessary down- casting to have the generic stack object take a SectionInfo instance.

top

private SectionInfo top()
Return a reference to the top entry in the section stack, if any.

Returns:
A reference to the top item in the section info stack, or null if the stack is empty.
Notes:
This method is a convenience function that does the necessary up- casting to have the generic stack object return a SectionInfo instance.