[ You are here:
XTF ->
Change Log -> Version 1.7 ]
Version 1.7 Changes
- Change log now contains item numbers from the SourceForge trackers ("Feature Requests" or "Bugs") which can be referenced for more detailed information.
- Added new front end to crossQuery servlet. The new "query router" stylesheet allows the use of multiple query parsers. Those just starting out, or who only need one parser, can use the default queryRouter.xsl without change. [Feature Req 1470967]
- textIndexer now allows "deep" section type indexing. A new attribute "sectionTypeAdd" can be inserted by the prefilter stylesheet. This causes the text in that section to inherit its parent's sectionType and add the specified text. This allows simple processing of hierarchical sections without complex prefilter code. [Feature Req 1491315]
- Many users have expressed confusion over the way document IDs were handled in dynaXML, and observed that much CDL-centric code is present in the default stylesheets. These have been refactored, and document IDs are now simply the path from the data directory to each document, instead of a strict 10-character code. [Feature Req 1499142]
- XTF now allows stylesheets to track data on a per-user-session basis. A simple API is provided to get and set state data. The session identifier is tracked using cookies, or if the user has cookies disabled, though URL rewriting. [Feature Req 1470973]
- Default stylesheets now expose "Book Bag" and "More Like This" functionality. The former is based on the session state API, the latter on the new <moreLikeThis> query operator. These also demonstrate an AJAX style of programming, updating pages on the fly. [Feature Req 1470975]
- New "exact" query operator added. To match, the field must contain exactly the query phrase; no more, no less. [Feature Req 1120263]
- Added new "moreLike" query operator which uses a simple index-based algorithm to locate additional documents that resemble a specified document. This feature is considered experimental and subject to improvement/change. [Feature Req 1470968]
- Made minor changes to the experimental "boost set" facility.
- Fixed bug in phrase query if stop-words appeared at start or end of a meta-data field. [Bug 1470978]
- Fixed bug with where apostrophe and other combined words at start or end of a meta-data field would cause queries to not match. [Bug 1437031]
- Fixed bug causing boost values to have no effect on an <or> query. [Bug 1471061]
- Refactored Lucene integration. The result is more modular, which will help in upgrading to Lucene 1.9 and 2.0. Back-ported selected classes to improve span processing on indexes with millions of records. [Feature Req 1470982]
- Config file parameters are now case insensitive. Also, boolean parameters all uniformly accept "true", "yes", "1" as synonyms, and "false", "no", and "0" as synonyms. [Feature Req 1471004]
- Added ability to display non-normalized scores (or raw) scores in crossQuery. [Feature Req 1471009]
- Added optional "score explanation" in crossQuery, to give a very detailed description of how each document's score was computed. [Feature Req 1471015]
- Made several changes and fixes to the experimental 'facet' feature.
- Multiple index prefilters may now be specified for one document by the docSelector stylesheet. The prefilters will be run in a chain. [Feature Req 1471018]
- Added support for parsing MARC21 data files. The indexer will break them into records, convert them to MARCXML format, and pass each converted record to the prefilter(s). Very large files are supported, and the indexer will try to skip bad records and recover. [Feature Req 1471020]
- Fixed null pointer exception in dynaXML when an empty query was specified. [Bug 1471022]
- Servlets now allow ";" to separate URL parameters. This can be quite handy as opposed to "&", since the latter requires special escaping in stylesheets. Both are now supported interchangeably. [Feature Req 1471023]
- All references to "ngrams" have been changed to the more specific term "bigrams".
- Improved efficiency of span collection in the Text Engine.
- Vastly reduced memory usage of cached sorting arrays for indexes that contain only meta-data.
- All servlets now pass a "servlet.dir" parameter to stylesheets. This is the home directory of the XTF installation, and can be used by stylesheets to locate data files or for other purposes. [Feature Req 1397346]
- crossQueryResult input to resultFormatter stylesheet now contains the original parsed URL parameters, and the query that resulted from the queryParser stylesheet. Both of these can be quite useful in result formatting. [Feature Req 1471062]
- Queries output from queryParser stylesheet may now optionally contain <resultData> elements. These are ignored by the Text Engine, but passed on to the result formatter stylesheet. They're a handy way for the query parser to pass data directly to the result formatter. [Feature Req 1471063]
- Meta-data fields can now be marked in index prefilter as xtf:store="no", which prevents them from showing up in query results. The field is still indexed, just not stored or displayed. [Feature Req 1471065]
- Similarly, the index prefilter can mark a field with xtf:index="no", causing it to not be indexed (and this not searchable) but still be stored and displayed. [Feature Req 1471065]
- Improved efficiency of textIndexer's culling phase. In particular, it no longer runs out of memory and crashes on indexes with millions of documents. [Bug 1471067]
- 'indexStats' tool is now much faster, and attempts to provide as much information early in the process as possible. Also, doesn't crash on large indexes. [Feature Req 1291547]
- Added new 'indexDump' tool, which can dump selected meta-data fields from all documents in an index. [Feature Req 1471070]
- Fixed bug where indexer would occasionally crash when trying to create a lazy tree file without creating its directory first. [Bug 1471071]
- Fixed bug that caused XML namespace declarations to be dropped from the beginning of in lazy tree files. [Bug 1397341]
- textIndexer now tracks and displays the elapsed time of each indexing run. [Feature Req 1471072]
- crossQuery wasn't paying attention to the MIME type specified by Result Formatter stylesheet output specification. Now the default is (text/html) is only used if none specified. [Bug 1499137]
- Fixed assertion failure when a <not> clause appeared within a <near> query. [Bug 1489230]
- Fixed a bug in the internal simplification of boolean queries that caused an assertion failure when searching for "the". [Bug 1482066]
- Fixed bug in dynaXML that gave an unenlightening error message if the source file specified by the docReqParser is actually a directory. [Bug 1499148]