VuFind
  1. VuFind
  2. VUFIND-546

use of trie based date field instead of solr.DateField

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: None
    • Labels:
      None

      Description

      A common boost function used to boost newer document recip(ms(NOW,indexedDateField),3.16e-11,1,1)
      requires that the indexedDateField be Trie based.

      VuFind has 2 fields first_indexed and last_indexed which can be used for boosting newly indexed records. However, because they are declared as solr.DateField, one can not use the above function.

      One can try to use the Dynamic Fields, but that does not work since VuFind dynamic field definition for date type is not Trie based either.

      I am suggesting we use Trie based date field for all date fields, Blacklight appears to be doing that already

      https://github.com/projectblacklight/blacklight-jetty/blob/master/solr/conf/schema.xml#L134

      This would make it simpler for people who want to implement such a boost function without having to modify schema.xml.

      A patch is included here with definition of date/tdate fields lifted from Blacklight schema.xml.

        Activity

        Hide
        Demian Katz added a comment -
        This patch seems to be trying to do two different things at the same time, and we probably want to choose one or the other -- either:

        a) Make the existing date field Trie-based

        or

        b) Add a new Trie-based type with dynamic field capabilities.

        If we go with option a, I'm not sure why we still need separate date/tdate fields (I see there is a different precisionStep value in the two field definitions, but I'm not sure what the distinction is -- I gather that precisionStep affects the accuracy of search and the size of the index, but I'm not sure why we want two different values).

        If we go with option b, the patch should be revised a bit -- right now, all of the dynamic fields are using the "date" type rather than "tdate."

        In any case, once we sort out these issues, I'm certainly open to revising the default schema -- this seems like a step in the right direction, though it will unfortunately require reindexing.
        Show
        Demian Katz added a comment - This patch seems to be trying to do two different things at the same time, and we probably want to choose one or the other -- either: a) Make the existing date field Trie-based or b) Add a new Trie-based type with dynamic field capabilities. If we go with option a, I'm not sure why we still need separate date/tdate fields (I see there is a different precisionStep value in the two field definitions, but I'm not sure what the distinction is -- I gather that precisionStep affects the accuracy of search and the size of the index, but I'm not sure why we want two different values). If we go with option b, the patch should be revised a bit -- right now, all of the dynamic fields are using the "date" type rather than "tdate." In any case, once we sort out these issues, I'm certainly open to revising the default schema -- this seems like a step in the right direction, though it will unfortunately require reindexing.
        Hide
        Tuan Nguyen added a comment -
        Not quite sure about the precisionStep, that's why I copied both types of dates.
        Show
        Tuan Nguyen added a comment - Not quite sure about the precisionStep, that's why I copied both types of dates.
        Hide
        Demian Katz added a comment -
        From what I gather, the precisionStep value will not impact the accuracy of the dates in the index, merely some performance/storage characteristics. Thus, for now, it seems sensible to start with a value of "6" which is widely cited as an accepted default. We can adjust later as needed.

        I have also done some testing to see how Solr behaves if you change the schema from DateField to TrieDateField without reindexing. Not surprisingly, this breaks date-based searches and filters, but it doesn't kill the whole index, and reindexing records seems to clean things up. It's probably safest to wipe the index and start clean after making such a significant schema change, but it is encouraging to see that failing to do this does not appear to cause a disaster.
        Show
        Demian Katz added a comment - From what I gather, the precisionStep value will not impact the accuracy of the dates in the index, merely some performance/storage characteristics. Thus, for now, it seems sensible to start with a value of "6" which is widely cited as an accepted default. We can adjust later as needed. I have also done some testing to see how Solr behaves if you change the schema from DateField to TrieDateField without reindexing. Not surprisingly, this breaks date-based searches and filters, but it doesn't kill the whole index, and reindexing records seems to clean things up. It's probably safest to wipe the index and start clean after making such a significant schema change, but it is encouraging to see that failing to do this does not appear to cause a disaster.
        Hide
        Demian Katz added a comment -
        As of this writing (and in time for the 2.2 release), all DateFields are now TrieDateFields.
        Show
        Demian Katz added a comment - As of this writing (and in time for the 2.2 release), all DateFields are now TrieDateFields.

          People

          • Assignee:
            Demian Katz
            Reporter:
            Tuan Nguyen
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: