[VUFIND-546] use of trie based date field instead of solr.DateField Created: 03/Apr/12 Updated: 18/Sep/13 Resolved: 18/Sep/13 |
|
Status: | Resolved |
Project: | VuFind® |
Components: | None |
Affects versions: | None |
Fix versions: | 2.2 |
Type: | Improvement | Priority: | Trivial |
Reporter: | Tuan Nguyen | Assignee: | Demian Katz |
Resolution: | Fixed | Votes: | 1 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original estimate: | Not Specified |
Attachments: | schema.xml.patch |
Description |
A common boost function used to boost newer document recip(ms(NOW,indexedDateField),3.16e-11,1,1) requires that the indexedDateField be Trie based. VuFind has 2 fields first_indexed and last_indexed which can be used for boosting newly indexed records. However, because they are declared as solr.DateField, one can not use the above function. One can try to use the Dynamic Fields, but that does not work since VuFind dynamic field definition for date type is not Trie based either. I am suggesting we use Trie based date field for all date fields, Blacklight appears to be doing that already https://github.com/projectblacklight/blacklight-jetty/blob/master/solr/conf/schema.xml#L134 This would make it simpler for people who want to implement such a boost function without having to modify schema.xml. A patch is included here with definition of date/tdate fields lifted from Blacklight schema.xml. |
Comments |
Comment by Demian Katz [ 13/Apr/12 ] |
This patch seems to be trying to do two different things at the same time, and we probably want to choose one or the other -- either: a) Make the existing date field Trie-based or b) Add a new Trie-based type with dynamic field capabilities. If we go with option a, I'm not sure why we still need separate date/tdate fields (I see there is a different precisionStep value in the two field definitions, but I'm not sure what the distinction is -- I gather that precisionStep affects the accuracy of search and the size of the index, but I'm not sure why we want two different values). If we go with option b, the patch should be revised a bit -- right now, all of the dynamic fields are using the "date" type rather than "tdate." In any case, once we sort out these issues, I'm certainly open to revising the default schema -- this seems like a step in the right direction, though it will unfortunately require reindexing. |
Comment by Tuan Nguyen [ 13/Apr/12 ] |
Not quite sure about the precisionStep, that's why I copied both types of dates. |
Comment by Demian Katz [ 18/Sep/13 ] |
From what I gather, the precisionStep value will not impact the accuracy of the dates in the index, merely some performance/storage characteristics. Thus, for now, it seems sensible to start with a value of "6" which is widely cited as an accepted default. We can adjust later as needed. I have also done some testing to see how Solr behaves if you change the schema from DateField to TrieDateField without reindexing. Not surprisingly, this breaks date-based searches and filters, but it doesn't kill the whole index, and reindexing records seems to clean things up. It's probably safest to wipe the index and start clean after making such a significant schema change, but it is encouraging to see that failing to do this does not appear to cause a disaster. |
Comment by Demian Katz [ 18/Sep/13 ] |
As of this writing (and in time for the 2.2 release), all DateFields are now TrieDateFields. |