[ You are here:
XTF ->
Programming ->
crossQuery -> Result Formatter ]
Result Formatter Programming
The last stage in the
crossQuery data flow is formatting the results. Recall that the URL parameters were parsed into an XTF-compatible query by the
Query Parser stylesheet. Then the
Text Engine runs that query against indexed data, resulting in a list of matching documents. The final task is to put a pretty face on things, and that's where the
Result Formatter stylesheet enters in. It transforms the XML list of documents into an easy-to-use HTML result page.
How does XTF know which stylesheet to use? Simple: the
Query Parser tells it. The
<query> tag it outputs specifies a
style attribute, which points at the
Result Formatter stylesheet that you want XTF to run. Thus, it is quite possible -- and often useful -- to have multiple result formatters for different purposes or display modes, and program the
Query Parser to decide which formatter to run based on a URL parameter. But for simplicity we'll assume for now that you only have one formatter.
To accomplish its work, the
Result Formatter receives three pieces of data:
- First, it receives the same <parameters> block that was passed to the Query Parser. This contains parsed versions of all the URL parameters, in case the Result Formatter wants to act on these as well.
- Next, it also receives a copy of the full <query> element that was produced by the Query Parser.
- Finally and most importantly comes a list of documents that matched the query. Each <docHit> element will contain meta-data in addition to snippets of matching text from the main body of each document.
It's easy to view the XML that
crossQuery sends to the
Result Formatter. Simply append
;raw=1 to the URL, and the servlet will bypass the formatter completely and display the raw XML directly in your browser. A great way to plan your stylesheet is to run some sample queries and look at the raw XML, then try to envision how you want it to look in HTML.
Here's a real-life sample of
Result Formatter input, coming from a query for the words "man" and "war". Much of the repetitive information has been snipped out so you can get a quick idea of the structure without getting bogged down in details.
<crossQueryResult queryTime="0.32" totalDocs="8" startDoc="1" endDoc="8">
<parameters>
<param name="text" value="man war">
<token value="man" isWord="yes"/>
<token value="war" isWord="yes"/>
</param>
<!-- ...additional URL parameters here... -->
</parameters>
<query indexPath="index" termLimit="1000" workLimit="1000000"
style="style/crossQuery/resultFormatter/default/resultFormatter.xsl"
startDoc="1" maxDocs="10">
<and field="text" maxSnippets="3" maxContext="100">
<term>man</term>
<term>war</term>
</and>
</query>
<docHit rank="1" path="default:r1/ft2s2004r1/ft2s2004r1.xml" score="100" totalHits="3">
<meta>
<title>Asylia: Territorial Inviolability in the Hellenistic World</title>
<creator>Kent J. Rigsby</creator>
<!-- ...more meta-data here... -->
</meta>
<snippet rank="1" score="100">inspoliatus : [Sall. ] Resp . 1.2.7, in the civil <hit>
<term>war</term>
<term>men</term>
</hit> fled to Pompey "as debtors use a sacred</snippet>
<snippet rank="2" score="53">he explains, will win the favor of gods and <hit>
<term>men</term>, and just <term>wars</term>
</hit> are defensive. The locus classicus is</snippet>
<snippet rank="3" score="53">the Roman peace, which ended the state of <hit>
<term>war</term> among <term>men</term>
</hit>). More generally, legend told of various</snippet>
</docHit>
<docHit rank="2" path="default:7d/ft7w10087d/ft7w10087d.xml" score="76" totalHits="6">
<meta> ...meta-data here... </meta>
<snippet rank="1" score="100">the mother of shields." Kunu refers to the <hit>
<term>war</term> shields <term>men</term>
</hit> used to fashion from lighter bark, some</snippet>
<!-- ...more snippets here... -->
</docHit>
<docHit rank="3" path="default:pf/ft7r29p1pf/ft7r29p1pf.xml" score="76" totalHits="2">
<!-- ...meta-data and snippets here... -->
</docHit>
<!-- ...additional document hits here... -->
</crossQueryResult>
Essentially, each matching document will have a corresponding
<docHit> tag, and these will be sorted in some order, generally by descending score (relevance). Each document hit contains corresponding meta-data within a
<meta> sub-tag. Hits on the full text of the document will have
<snippet> tags, each with its own
<hit> tag inside it.
A little more formally, the result formatter receives a
<crossQueryResult> tag that looks like this:
<crossQueryResult queryTime = "TimeInSeconds"
totalDocs = "NumberOfDocs"
startDoc = "FirstDocNumber"
endDoc = "LastDocNumber">
Parameters
Query
DocumentHit
DocumentHit
…
</crossQueryResult>
Note that, depending on the query and the size of the document repository, there might be thousands of matching documents, and this thousands of
<docHit> tags. Suppose you only wanted to display the first page of hits, say ten of them? It would be simple to make a
Result Formatter that simply picked the first 10 and ignored the rest, but that would be very inefficient because the XSLT processor will still have to parse and process
all of the document hits. A much more efficient way to handle paging is to modify the
Query Parser to specify
maxDocs="10" in the
<query> element; then only the first ten document hits will be passed to the
Result Formatter and the user interface will be much more responsive.
Each Document Hit looks like this:
<docHit rank="DocRelevanceRank" path="DocumentLocation" score="DocRelevanceScore">
<meta>
Meta-data defined by index Pre-Filter stylesheet
</meta>
Snippet
Snippet
…
</docHit>
The meta-data is copied directly from the tags in the input document marked by the index Pre-Filter stylesheet using the
xtf:meta="yes" attribute. If the query targets meta-data fields, these may have
<snippet> and/or
<hit> tags embedded within them, marking the exact location of the matching terms.
If the query targets the "text" field -- that is, the full document text -- then the
<docHit> tag will have one or more
<snippet> tags containing the matching text and some surrounding context:
<snippet rank="MatchRelevanceRank" score="MatchRelevanceScore">
Hit Text (and context text, if any)
</snippet>
Within each snippet will appear a
<hit> tag with one or more
<term> tags marking the exact matching terms.
The bulk of the Result Formatter's work will be in transforming all these
<docHit>,
<meta>,
<snippet>,
<hit>, and
<term> XML tags into meaningful HTML output. Writing XSLT is beyond the scope of this document, but a good way to learn is to begin modifying the sample
Result Formatter stylesheet. The stylesheet is included with the XTF distribution in the style/crossQuery/resultFormatter directory.
It should also be noted that the various input tags have bells and whistles not mentioned in this short tutorial. For a full specification, please refer to the
Result Formatter Tag Reference.