org.cdlib.xtf.textEngine
Class BoundedWordIter

Object
  extended by BasicWordIter
      extended by BoundedWordIter
All Implemented Interfaces:
Cloneable, WordIter

 class BoundedWordIter
extends BasicWordIter

Just like a BasicWordIter, except that it enforces "soft" boundaries if the source text contains XTF "bump markers" of a certain size. Basically, this prevents snippets from spanning section boundaries, or the boundaries between different fields of the same name.

Author:
Martin Haye

Field Summary
(package private)  int boundSize
           
 
Fields inherited from class BasicWordIter
maxWordPos, text, tokens, tokNum, wordPos
 
Fields inherited from interface WordIter
FIELD_END, FIELD_START, TERM_END, TERM_END_PLUS, TERM_START
 
Constructor Summary
BoundedWordIter(String text, TokenStream stream, int boundSize)
          Construct a bounded word iterator on the given text.
 
Method Summary
 MarkPos getPos(int startOrEnd)
          Create a new place to hold position info
 void getPos(MarkPos pos, int startOrEnd)
          Get the position of the end of the current word.
 boolean next(boolean force)
          Advance to the next token.
 boolean prev(boolean force)
          Go to the previous token.
 
Methods inherited from class BasicWordIter
clone, seekFirst, seekLast, term
 
Methods inherited from class Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

boundSize

int boundSize
Constructor Detail

BoundedWordIter

public BoundedWordIter(String text,
                       TokenStream stream,
                       int boundSize)
                throws IOException
Construct a bounded word iterator on the given text. The tokens from the stream must refer to the same text. The skip() method works as normal, but next() and prev() will enforce a soft boundary for any tokens where the position offset meets or exceeds boundSize.

Throws:
IOException
Method Detail

next

public final boolean next(boolean force)
Advance to the next token.

Specified by:
next in interface WordIter
Overrides:
next in class BasicWordIter
Parameters:
force - true to ignore section boundaries
Returns:
true if ok, false if no more.

prev

public final boolean prev(boolean force)
Go to the previous token.

Specified by:
prev in interface WordIter
Overrides:
prev in class BasicWordIter
Parameters:
force - true to ignore section boundaries
Returns:
true if ok, false if no more.

getPos

public MarkPos getPos(int startOrEnd)
Create a new place to hold position info

Specified by:
getPos in interface WordIter
Overrides:
getPos in class BasicWordIter
Parameters:
startOrEnd - FIELD_START for the very start of the field; TERM_START for the first character of the word; TERM_END for the last character of the word; TERM_END_PLUS for the last character plus any trailing punctuation and/or spaces; FIELD_END for the very last end of the field.

getPos

public void getPos(MarkPos pos,
                   int startOrEnd)
Get the position of the end of the current word.

Specified by:
getPos in interface WordIter
Overrides:
getPos in class BasicWordIter
startOrEnd - FIELD_START for the very start of the field; TERM_START for the first character of the word; TERM_END for the last character of the word; TERM_END_PLUS for the last character plus any trailing punctuation and/or spaces; FIELD_END for the very last end of the field.