org.cdlib.xtf.textIndexer
Class SectionInfo

Object
  extended by SectionInfo

public class SectionInfo
extends Object

This class maintains information about the current section in a text document that the TextIndexer program is processing.

On-line documents are stored as "nodes" in XML files that contain information about the document, and the document text itself. The nodes usually form a heirarchical tree structure, with the outer-most nodes recording various bits of information about the text within. Inside the outer nodes are additional nodes that record the organization of the text itself, including things like section, chapter, and paragraph information. To the text indexer program and search engine, sections have special significance. Text in two adjacent sections that have different names, are considered to not be "near" one another, so that proximity searches will not produce results that span across two or more sections.

Since sections can be nested inside one-another, a stack of the current nesting level needs to be maintained by the text indexer when a document is being processed. Doing so does two things:

- It allows unnamed inner sections to inherit properties from the parent sections that contain them.
- When the end of an named section has been reached, the text indexer can return to using the parent section's properties and continue processing.

Information recorded for each section consists of the following:

- The type name of the current section.
- The repeat depth, if the section name is the same as the parent's.
- The number of words that this section should offset from the previous one.
- The previous word bump for this section, if any.
- The word bump to apply at the end of each sentence.
- The relevance boost to apply to words in this section.

This class is then used as the entry for a SectionInfoStack that maintains the current stacking order within the source text being processed.


Field Summary
static int defaultDepth
          Default depth for a section: Value = 0.
static int defaultIndexFlag
          Default state for Index/No-Index Flag.
static int defaultSectionBump
          Default word bump for a section: Value = 0.
static String defaultSectionType
          Default section type name: Value = "".
static int defaultSentenceBump
          Default sentence bump for a section: Value = 5.
static int defaultSpellFlag
          Default state for Spell/No-Spell Flag.
static float defaultWordBoost
          Default word boost for a section: Value = 1.0f.
 int depth
          Depth count for a section.
static int index
          Index/No-Index Flag Value: Index the current section.
 int indexFlag
          Index flag for a section.
static int noIndex
          Index/No-Index Flag Value: Index the current section.
static int noSpell
          No-Spell Flag Value: Do not add words from the current section to the spelling correction dictionary.
static int parentIndex
          Index/No-Index Flag Value: Use parent section index/no-index state.
static int parentSectionBump
          Special Section Bump: Value = Use parent's section bump.
static int parentSpell
          Spell/No-Spell Flag Value: Use parent section spell/no-spell state.
 int prevSectionBump
          Previous section bump for this section.
 int sectionBump
          Word bump to add for a section.
 String sectionType
          Type name for a section.
 int sentenceBump
          Sentence bump value for this section.
static int spell
          Spell Flag Value: Add words from the current section to the spelling correction dictionary.
 int spellFlag
          Spell flag for a section.
 float wordBoost
          Word boost value for this section.
 
Constructor Summary
SectionInfo()
          Default Constructor.
SectionInfo(int depth, int indexFlag, String sectionType, int sectionBump, float wordBoost, int sentenceBump, int spellFlag)
          Explicit Constructor.
 
Method Summary
 void restoreSectionBump()
          Restore a previously saved section bump value.
 int saveSectionBump()
          Saves the section bump value for later restore.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

parentIndex

public static final int parentIndex
Index/No-Index Flag Value: Use parent section index/no-index state.

See Also:
Constant Field Values
Notes:
This index flag value is never actually stored in the index flag attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's index flag value, which will be either index or noIndex.


noIndex

public static final int noIndex
Index/No-Index Flag Value: Index the current section.

This value is used for the indexFlag field to indicate that the current section should not be indexed.

See Also:
Constant Field Values

index

public static final int index
Index/No-Index Flag Value: Index the current section.

This value is used for the indexFlag field to indicate that the current section should be indexed.

See Also:
Constant Field Values

parentSectionBump

public static final int parentSectionBump
Special Section Bump: Value = Use parent's section bump.

This special value when used for the sectionBump field indicates that the parent's section bump value should be used.

See Also:
Constant Field Values
Notes:
This section bump value is never actually stored in the section bump attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's bump value for the new entry on the stack.


defaultIndexFlag

public static final int defaultIndexFlag
Default state for Index/No-Index Flag. Value = index.

This is the default value applied to the indexFlag field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

defaultSectionType

public static final String defaultSectionType
Default section type name: Value = "".

This is the default value applied to the sectionType field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

defaultSectionBump

public static final int defaultSectionBump
Default word bump for a section: Value = 0.

This is the default value applied to the sectionBump field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

defaultWordBoost

public static final float defaultWordBoost
Default word boost for a section: Value = 1.0f.

This is the default value applied to the wordBoost field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

defaultSentenceBump

public static final int defaultSentenceBump
Default sentence bump for a section: Value = 5.

This is the default value applied to the sentenceBump field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

defaultDepth

public static final int defaultDepth
Default depth for a section: Value = 0.

This is the default value applied to the depth field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

parentSpell

public static final int parentSpell
Spell/No-Spell Flag Value: Use parent section spell/no-spell state.

See Also:
Constant Field Values
Notes:
This spell flag value is never actually stored in the spell flag attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's spell flag value, which will be either spell or noSpell.


noSpell

public static final int noSpell
No-Spell Flag Value: Do not add words from the current section to the spelling correction dictionary.

This value is used for the spellFlag field to indicate that words from the current section should not be added to the spelling correction dictionary.

See Also:
Constant Field Values

spell

public static final int spell
Spell Flag Value: Add words from the current section to the spelling correction dictionary.

This value is used for the spellFlag field to indicate that words from the current section should be added to the spelling correction dictionary.

See Also:
Constant Field Values

defaultSpellFlag

public static final int defaultSpellFlag
Default state for Spell/No-Spell Flag. Value = spell.

This is the default value applied to the spellFlag field whenever a SectionInfo class is constructed.

See Also:
Constant Field Values

depth

public int depth
Depth count for a section.

This field is used to count the depth of a section when more than one section with the same attributes nests inside another. Using a depth count saves having to add an entire duplicate entry to the stack.


indexFlag

public int indexFlag
Index flag for a section.

This field indicates whether the associated section should be indexed or not. There are three valid values for this flag: parentIndex, noIndex, and index.

Notes:
The value parentIndex is never actually stored in the index flag attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's index flag value, which will be either index or noIndex.


sectionType

public String sectionType
Type name for a section.

This field indentifies the name of the associated section. This field can be an empty string (""), in which case the parent section name (if any) is inherited, or an arbitrary string.


sectionBump

public int sectionBump
Word bump to add for a section.

This field specifies how far in words a section is from the previous or containing section, and is used to adjust the likelyhood of a proximity match being found across section boundaries as compared to within a single section.


prevSectionBump

public int prevSectionBump
Previous section bump for this section.

This field is used correctly accumulate section bump values when multiple nested sections starts are encountered with no intervening text.

Notes:
The value parentSectionBump is never actually stored in the sectionBump attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's bump value.


wordBoost

public float wordBoost
Word boost value for this section.

This field is identifies a relevance multiplier for words found in this section. If greater than 1.0, words in this section are considered better matches for searches when added to the index. If less than 1.0, words in this section are considered poorer matches.


sentenceBump

public int sentenceBump
Sentence bump value for this section.

This field is identifies the distance (in number of words) that occurs between the end of one sentence and the beginning of the next. This value is used to adjust the likelyhood that a proximity match is found across multiple sentences as compared to within a single sentence.

.


spellFlag

public int spellFlag
Spell flag for a section.

This field indicates whether words from the associated section should be added to the spelling correction dictionary or not. There are three valid values for this flag: parentSpell, noSpell, and spell.

Notes:
The value parentSpell is never actually stored in the spell flag attribute for a SectionInfo instance. It is only passed as an argument to the explicit section push method defined by the SectionInfoStack class. That method in turn uses the parent section's spell flag value, which will be either spell or noSpell.

Constructor Detail

SectionInfo

public SectionInfo()
Default Constructor.

Initializes all the fields in a SectionInfo instance to reasonable default values.

Notes:
See the defaultDepth, defaultIndexFlag, defaultSectionType, defaultSectionBump, defaultWordBoost, and defaultSentenceBump constants for more on the actual values set.


SectionInfo

public SectionInfo(int depth,
                   int indexFlag,
                   String sectionType,
                   int sectionBump,
                   float wordBoost,
                   int sentenceBump,
                   int spellFlag)
Explicit Constructor.

Initializes all the fields in a SectionInfo instance to values passed by the caller.

Method Detail

saveSectionBump

public int saveSectionBump()
Saves the section bump value for later restore.

This method is used to save the specific bump value assigned to a section when accumulating nested section bumps with no intervening text.

Returns:
The previous section bump value saved.

Notes:
Once saved, the sectionBump field is reset to zero in anticipation of accumulating bump values from previous sections.


restoreSectionBump

public void restoreSectionBump()
Restore a previously saved section bump value.

This method is a convenience method for restoring the section bump value previously saved via saveSectionBump().