[VUFIND-546] use of trie based date field instead of solr.DateField Created: 03/Apr/12  Updated: 18/Sep/13  Resolved: 18/Sep/13

Status: Resolved
Project: VuFind®
Components: None
Affects versions: None
Fix versions: 2.2

Type: Improvement Priority: Trivial
Reporter: Tuan Nguyen Assignee: Demian Katz
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: Text File schema.xml.patch    

 Description   
A common boost function used to boost newer document recip(ms(NOW,indexedDateField),3.16e-11,1,1)
requires that the indexedDateField be Trie based.

VuFind has 2 fields first_indexed and last_indexed which can be used for boosting newly indexed records. However, because they are declared as solr.DateField, one can not use the above function.

One can try to use the Dynamic Fields, but that does not work since VuFind dynamic field definition for date type is not Trie based either.

I am suggesting we use Trie based date field for all date fields, Blacklight appears to be doing that already

https://github.com/projectblacklight/blacklight-jetty/blob/master/solr/conf/schema.xml#L134

This would make it simpler for people who want to implement such a boost function without having to modify schema.xml.

A patch is included here with definition of date/tdate fields lifted from Blacklight schema.xml.

 Comments   
Comment by Demian Katz [ 13/Apr/12 ]
This patch seems to be trying to do two different things at the same time, and we probably want to choose one or the other -- either:

a) Make the existing date field Trie-based

or

b) Add a new Trie-based type with dynamic field capabilities.

If we go with option a, I'm not sure why we still need separate date/tdate fields (I see there is a different precisionStep value in the two field definitions, but I'm not sure what the distinction is -- I gather that precisionStep affects the accuracy of search and the size of the index, but I'm not sure why we want two different values).

If we go with option b, the patch should be revised a bit -- right now, all of the dynamic fields are using the "date" type rather than "tdate."

In any case, once we sort out these issues, I'm certainly open to revising the default schema -- this seems like a step in the right direction, though it will unfortunately require reindexing.
Comment by Tuan Nguyen [ 13/Apr/12 ]
Not quite sure about the precisionStep, that's why I copied both types of dates.
Comment by Demian Katz [ 18/Sep/13 ]
From what I gather, the precisionStep value will not impact the accuracy of the dates in the index, merely some performance/storage characteristics. Thus, for now, it seems sensible to start with a value of "6" which is widely cited as an accepted default. We can adjust later as needed.

I have also done some testing to see how Solr behaves if you change the schema from DateField to TrieDateField without reindexing. Not surprisingly, this breaks date-based searches and filters, but it doesn't kill the whole index, and reindexing records seems to clean things up. It's probably safest to wipe the index and start clean after making such a significant schema change, but it is encouraging to see that failing to do this does not appear to cause a disaster.
Comment by Demian Katz [ 18/Sep/13 ]
As of this writing (and in time for the 2.2 release), all DateFields are now TrieDateFields.
Generated at Thu Apr 25 19:38:34 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100251-rev:2d0d695520e7095763476433152508933e579798.