public class ContextMarker
extends Object
Created: Dec 26, 2004
Modifier and Type | Field and Description |
---|---|
private MarkCollector |
collector
Client instance which receives the resulting marks
|
private String |
field
Field name (for debugging)
|
private WordIter |
iter0
Iterator used for locating the start of the hit/context
|
private WordIter |
iter1
Iterator used for locating the end of the hit/context
|
static int |
MARK_ALL_TERMS
See
MARK_NO_TERMS |
static int |
MARK_CONTEXT_TERMS
See
MARK_NO_TERMS |
static int |
MARK_NO_TERMS
The following modes can be used for term marking:
MARK_NO_TERMS: Terms are not marked
MARK_SPAN_TERMS: Search terms are marked only within span hits.
|
static int |
MARK_SPAN_TERMS
See
MARK_NO_TERMS |
private int |
maxContext
Target size (in chars) of the context surrounding each hit
|
private int |
prevEndWord
End of the previous context
|
private Set |
stopSet
Set of stop-words to avoid marking outside of hits
|
private int |
termMode
Whether to mark terms inside/outside hits, context, etc.
|
private Set |
terms
Set of search terms to mark
|
private int |
termsMarkedPos
Word position up to which we've marked all terms
|
private MarkPos |
tmpPos
Used to temporary position storage
|
Constructor and Description |
---|
ContextMarker(int maxContext,
int termMode,
Set terms,
Set stopSet,
WordIter wordIter,
MarkCollector collector,
String field)
Construct a new marker
|
Modifier and Type | Method and Description |
---|---|
(package private) void |
emitMarks(Span posSpan,
MarkPos contextStart,
MarkPos contextEnd)
Emit all the marks for the given hit.
|
(package private) void |
findContext(Span posSpan,
Span nextSpan,
MarkPos contextStart,
MarkPos contextEnd)
Locate the start and end of context for the given hit.
|
void |
mark(Span[] posOrderSpans,
int maxContext)
Mark a series of spans.
|
static void |
markField(FieldSpans fieldSpans,
String field,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
Mark context, spans, and terms a field of data.
|
void |
markField(String field,
FieldSpans fieldSpans,
MarkCollector collector)
Mark context, spans, and terms within the given field of this document.
|
void |
markField(String field,
FieldSpans fieldSpans,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
Mark context, spans, and terms within the given field of this document.
|
private void |
markTerms(WordIter iter,
int fromPos,
int toPos,
boolean markStopWords)
Mark terms up to (but not including) 'wordPos'
|
public static final int MARK_NO_TERMS
MARK_NO_TERMS: Terms are not marked
MARK_SPAN_TERMS: Search terms are marked only within span hits.
MARK_CONTEXT_TERMS: Search terms are marked within span hits and, if found, within the context surrounding those hits.
MARK_ALL_TERMS: Search terms are marked wherever they are found.
public static final int MARK_SPAN_TERMS
MARK_NO_TERMS
public static final int MARK_CONTEXT_TERMS
MARK_NO_TERMS
public static final int MARK_ALL_TERMS
MARK_NO_TERMS
private int maxContext
private WordIter iter0
private WordIter iter1
private MarkCollector collector
private Set terms
private Set stopSet
private int termMode
MARK_SPAN_TERMS
, etc.private int termsMarkedPos
private MarkPos tmpPos
private int prevEndWord
private String field
public ContextMarker(int maxContext, int termMode, Set terms, Set stopSet, WordIter wordIter, MarkCollector collector, String field)
public void markField(String field, FieldSpans fieldSpans, MarkCollector collector)
field
- field name to markfieldSpans
- spans to mark withcollector
- collector to receive the markspublic void markField(String field, FieldSpans fieldSpans, WordIter iter, int maxContext, int termMode, Set stopSet, MarkCollector collector)
field
- field name to markiter
- iterator over the words in the fieldmaxContext
- target number of characters for context around
each hit (including the text of the hit itself.)
80 is often a good choice. Specify zero to turn off
context marking.termMode
- what areas to mark hits - see MARK_NO_TERMS
.stopSet
- set of stop words to avoid marking outside hitscollector
- collector to receive the markspublic static void markField(FieldSpans fieldSpans, String field, WordIter iter, int maxContext, int termMode, Set stopSet, MarkCollector collector)
field
- field name to markiter
- iterator over the words in the fieldmaxContext
- target number of characters for context around
each hit (including the text of the hit itself.)
80 is often a good choice. Specify zero to turn off
context marking.termMode
- what areas to mark hits - see MARK_NO_TERMS
.stopSet
- set of stop words to avoid marking outside hitscollector
- collector to receive the markspublic void mark(Span[] posOrderSpans, int maxContext)
posOrderSpans
- Spans to mark, in ascending position order.maxContext
- Target # of chars for context around hits
(0 for none)void findContext(Span posSpan, Span nextSpan, MarkPos contextStart, MarkPos contextEnd)
posSpan
- hit for which to find contextnextSpan
- following hit (or null if none)contextStart
- OUT: start of contextcontextEnd
- OUT: end of contextvoid emitMarks(Span posSpan, MarkPos contextStart, MarkPos contextEnd)
posSpan
- hit for which to emit markscontextStart
- start of context (or null if context disabled)contextEnd
- end of context (or null if context disabled)private void markTerms(WordIter iter, int fromPos, int toPos, boolean markStopWords)