org.apache.lucene.spelt
Class SpellTestCmdLine.TextRipper

Object
  extended by SpellTestCmdLine.TextRipper
All Implemented Interfaces:
Iterator
Enclosing class:
SpellTestCmdLine

private static class SpellTestCmdLine.TextRipper
extends Object
implements Iterator

Scans a directory for files, and rips text from all of them. The words are accessible in the form of an Iterator.


Field Summary
(package private)  Stack fileStack
           
(package private)  String line
           
(package private)  boolean more
           
(package private)  BufferedReader reader
           
(package private)  Pattern wordPat
          Pattern for matching words
(package private)  Matcher words
           
(package private)  Pattern xmlPat
          Pattern for matching XML and HTML elements
 
Constructor Summary
SpellTestCmdLine.TextRipper(String dir)
           
 
Method Summary
(package private)  void advance()
          Advance to the next word in the current file, or the next file if at the end of the current one.
 boolean hasNext()
          Check if there's another word to get
 Object next()
          Get the next word in the sequence
(package private)  boolean nextFile()
          Scan to the next file in the sequence, and open it.
 void remove()
          Not implemented
(package private)  String stripXML(String line)
          Try to strip XML and HTML elements from a line
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fileStack

Stack fileStack

reader

BufferedReader reader

more

boolean more

line

String line

words

Matcher words

wordPat

Pattern wordPat
Pattern for matching words


xmlPat

final Pattern xmlPat
Pattern for matching XML and HTML elements

Constructor Detail

SpellTestCmdLine.TextRipper

SpellTestCmdLine.TextRipper(String dir)
                      throws IOException
Throws:
IOException
Method Detail

nextFile

boolean nextFile()
           throws IOException
Scan to the next file in the sequence, and open it.

Returns:
true if there was a file to open
Throws:
IOException

advance

void advance()
       throws IOException
Advance to the next word in the current file, or the next file if at the end of the current one.

Throws:
IOException

stripXML

String stripXML(String line)
Try to strip XML and HTML elements from a line


hasNext

public boolean hasNext()
Check if there's another word to get

Specified by:
hasNext in interface Iterator

next

public Object next()
Get the next word in the sequence

Specified by:
next in interface Iterator

remove

public void remove()
Not implemented

Specified by:
remove in interface Iterator