|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
ObjectQuery
MoreLikeThisQuery
public class MoreLikeThisQuery
Processes the sub-query and uses the first document as the "target". Then we determine the most "interesting" terms in the target document, and finally perform a query on those terms to find more like the target. The target document itself will NOT be included in the results.
Nested Class Summary | |
---|---|
private static class |
MoreLikeThisQuery.Flt
Used for scores and to avoid renewing Floats. |
private static class |
MoreLikeThisQuery.Int
Used for frequencies and to avoid renewing Integers. |
class |
MoreLikeThisQuery.MoreLikeWrapper
Exclude the target document from the set. |
private static class |
MoreLikeThisQuery.QueryWord
|
private static class |
MoreLikeThisQuery.QueryWordQueue
PriorityQueue that orders query words by score. |
Field Summary | |
---|---|
private CharMap |
accentMap
|
private boolean |
boost
Should we apply a boost to the Query based on the scores? |
private Map |
boostMap
Boost values for the fields |
private float[] |
fieldBoosts
Boost value per field. |
private String[] |
fieldNames
Field name(s) we'll analyze. |
private int |
maxDocFreq
Ignore words which occur in at least this many docs. |
private int |
maxNumTokensParsed
The maximum number of tokens to parse in each example doc field that is not stored with TermVector support |
private int |
maxQueryTerms
Don't return a query longer than this. |
private int |
maxWordLen
Ignore words if greater than this len. |
private int |
minDocFreq
Ignore words which do not occur in at least this many docs. |
private int |
minTermFreq
Ignore words less freqent that this. |
private int |
minWordLen
Ignore words if less than this len. |
private WordMap |
pluralMap
|
private Similarity |
similarity
For idf() calculations. |
private Set |
stopSet
|
private Query |
subQuery
|
private int |
targetDoc
|
Constructor Summary | |
---|---|
MoreLikeThisQuery(Query subQuery)
Constructs a span query selecting all terms greater than lowerTerm but less than upperTerm . |
Method Summary | |
---|---|
private void |
addTermFrequencies(TokenStream tokens,
String field,
Map termFreqMap)
Adds term frequencies found by tokenizing text from reader into the Map words. |
private Map |
condenseTerms(IndexReader indexReader,
Map words)
Condense the same term in multiple fields into a single term with a total score. |
private Query |
createQuery(IndexReader indexReader,
PriorityQueue q)
Create the More like query from a PriorityQueue |
private PriorityQueue |
createQueue(IndexReader indexReader,
Map words)
Create a PriorityQueue from a word->tf map. |
float[] |
getFieldBoosts()
|
String[] |
getFieldNames()
|
Query |
getSubQuery()
Retrieve the sub-query |
protected boolean |
isNoiseWord(String term)
Determines if the passed term is likely to be of interest in "more like" comparisons |
private PriorityQueue |
retrieveTerms(IndexReader indexReader,
int docNum,
Analyzer analyzer)
Find words for a more-like-this query former. |
Query |
rewrite(IndexReader reader)
Generate a query that will produce "more documents like" the first in the sub-query. |
void |
setAccentMap(CharMap map)
Establish the accent map in use |
void |
setBoost(boolean boost)
Should we apply a boost to the Query based on the scores? |
void |
setFieldBoosts(float[] fieldBoosts)
Boost value per field |
void |
setFieldNames(String[] fieldNames)
Field name(s) we'll analyze. |
void |
setMaxDocFreq(int maxDocFreq)
Ignore words which occur in at least this many docs. |
void |
setMaxNumTokensParsed(int maxNumTokensParsed)
The maximum number of tokens to parse in each example doc field that is not stored with TermVector support |
void |
setMaxQueryTerms(int maxQueryTerms)
Don't return a query longer than this. |
void |
setMaxWordLen(int maxWordLen)
Ignore words if greater than this len. |
void |
setMinDocFreq(int minDocFreq)
Ignore words which do not occur in at least this many docs. |
void |
setMinTermFreq(int minTermFreq)
Ignore words less freqent that this. |
void |
setMinWordLen(int minWordLen)
Ignore words if less than this len. |
void |
setPluralMap(WordMap map)
Establish the plural map in use |
void |
setStopWords(Set set)
Establish the set of stop words to ignore |
void |
setSubQuery(Query subQuery)
Set the sub-query |
String |
toString(String field)
Prints a user-readable version of this query. |
Methods inherited from class Query |
---|
clone, combine, createWeight, extractTerms, getBoost, getSimilarity, mergeBooleanQueries, setBoost, toString, weight |
Methods inherited from class Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
private Query subQuery
private int targetDoc
private Set stopSet
private WordMap pluralMap
private CharMap accentMap
private int minTermFreq
private int minDocFreq
private int maxDocFreq
private boolean boost
private String[] fieldNames
private float[] fieldBoosts
private Map boostMap
private int maxNumTokensParsed
private int minWordLen
private int maxWordLen
private int maxQueryTerms
private Similarity similarity
Constructor Detail |
---|
public MoreLikeThisQuery(Query subQuery)
lowerTerm
but less than upperTerm
.
There must be at least one term and either term may be null,
in which case there is no bound on that side, but if there are
two terms, both terms must be for the same field. Applies
a limit on the total number of terms matched.
Method Detail |
---|
public Query getSubQuery()
public void setSubQuery(Query subQuery)
public void setStopWords(Set set)
public void setPluralMap(WordMap map)
public void setAccentMap(CharMap map)
public void setMaxDocFreq(int maxDocFreq)
public void setFieldNames(String[] fieldNames)
public String[] getFieldNames()
public void setFieldBoosts(float[] fieldBoosts)
public float[] getFieldBoosts()
public void setMaxNumTokensParsed(int maxNumTokensParsed)
public void setMaxQueryTerms(int maxQueryTerms)
public void setMaxWordLen(int maxWordLen)
public void setMinDocFreq(int minDocFreq)
public void setMinTermFreq(int minTermFreq)
public void setMinWordLen(int minWordLen)
public void setBoost(boolean boost)
public Query rewrite(IndexReader reader) throws IOException
rewrite
in class Query
IOException
private Query createQuery(IndexReader indexReader, PriorityQueue q) throws IOException
IOException
private PriorityQueue createQueue(IndexReader indexReader, Map words) throws IOException
words
- a map of words keyed on the word(String) with Int objects as the values.
IOException
private Map condenseTerms(IndexReader indexReader, Map words) throws IOException
words
- a map of words keyed on the word(String) with Int objects as the values.
IOException
private PriorityQueue retrieveTerms(IndexReader indexReader, int docNum, Analyzer analyzer) throws IOException
docNum
- the id of the lucene document from which to find terms
IOException
private void addTermFrequencies(TokenStream tokens, String field, Map termFreqMap) throws IOException
tokens
- a source of tokensfield
- Specifies the field being tokenizedtermFreqMap
- a Map of terms and their frequencies
IOException
protected boolean isNoiseWord(String term)
term
- The word being considered
public String toString(String field)
toString
in class Query
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |