org.cdlib.xtf.textIndexer
Class PDFToString

Object
  extended by PDFToString

public class PDFToString
extends Object

This class provides a single static convert() method that converts the text in a PDF file into an XML string that can be pre-filtered and added to a Lucene database by the XMLTextProcessor class.

Internally, the text of the PDF file is extracted using the PDFBox library.


Field Summary
(package private) static boolean mustConfigureLogger
           
(package private) static PDFTextStripper stripper
          PDFBox text stripper.
 
Constructor Summary
PDFToString()
           
 
Method Summary
(package private) static String convert(InputStream PDFInputStream)
          Convert a PDF file into an XML string.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mustConfigureLogger

static boolean mustConfigureLogger

stripper

static PDFTextStripper stripper
PDFBox text stripper. Created once to save time.

Constructor Detail

PDFToString

public PDFToString()
Method Detail

convert

static String convert(InputStream PDFInputStream)
               throws IOException
Convert a PDF file into an XML string.

Parameters:
PDFInputStream - The stream of PDF data to convert to an XML string.
Returns:
If successful, a string containing the XML equivalent of the source PDF file. If an error occurred, this method returns null.
Throws:
IOException