[ You are here: XTF -> Under the Hood -> Hit Scoring ]

Hit Scoring

Table of Contents

Hit Scoring
Individual Hit Scoring
Text Hit Scoring
Meta-data Hit Scoring
Combined Document Score
XTF uses and extends Lucene's built-in scoring mechanism to provide a relevance score for each hit, and to return hits in ranked order (i.e. highest score first.) This section describes briefly how the Text Engine determines the score for hits in the full text and hits on meta-data fields.

You can observe the scoring engine in action by enabling the explainScores attribute on the <query> element produced by your Query Parser stylesheet. See the Tag Reference for more information on how to enable this.

The following sections break down XTF's scoring calculation like this: first we cover the common aspects shared by both meta-data and text chunk scoring, then talk about the differences, and finally take a look at how the final combined score for a document is computed.

Individual Hit Scoring

Whether a hit (another name for a single match) is in a meta-data field or within the full text of a document, the scoring for that particular hit is the same. How the scores are combined differs, and those differences are covered in later sections.

For those intimately familiar with Lucene, it will be helpful to know that XTF makes extensive use of Lucene's "span" queries, to enable the exact identification of particular matches within a large document. XTF's implementation of spans includes enhancements that calculate the score of each span in addition to its "slop".

Queries on the contents of an XTF index are scored using an enhanced version of Lucene's standard formula. The structure of the scoring formula is fixed, but one can override the calculation of the various factors by providing a Java implementation of the Similarity interface.

Plain English

Mathematical Details
For a given query q, the score for a matching span s consisting of terms t, in field (or text chunk) f of document d, is calculated as follows:
where

Text Hit Scoring

The full text of a document might contain thousands of individual matching spans, each of which will be scored according to the method above. How are these scores combined into a single score for the text?

Plain English

Mathematical Details
For a given query q, the score for all matching spans s in all text chunks of document d is calculated as follows:
where

Meta-data Hit Scoring

The scores for multiple hits within a single meta-data field are combined in a similar manner to text hits, above.

Plain English

Mathematical Details
For a given query q, the score for all matching spans s in field f of document d is calculated as follows:
where

Combined Document Score

The final type of scoring XTF performs is to combine the scores of all text hits with a document with that document's meta-data scores, to form the final score for that document. Again, the structure of this computation is fixed, but the calculations can be overridden by providing a Java implementation of the Similarity interface.

Plain English

Mathematical Details
For a given query q consisting of meta-data queries qmf on fields f, and a text query qt, the score for a specific document d is as follows:
where metaScore and textScore are computed as outlined in the previous two sections.