[VUFIND-657] Normalized LC Callnumber field Created: 13/Sep/12  Updated: 29/Sep/15  Resolved: 29/Sep/15

Status: Resolved
Project: VuFind®
Components: Import Tools
Affects versions: None
Fix versions: 2.4

Type: Improvement Priority: Trivial
Reporter: Luke O'Sullivan Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File LCCNormalizeFilter.class     Java Source File LCCNormalizeFilter.java     File LCCNormalizeFilterFactory.class     Java Source File LCCNormalizeFilterFactory.java     Java Archive File MarcImporter.jar     Java Archive File SolrMarc.jar     Java Archive File VuFindTools.jar     Java Archive File VufindTools.jar     Java Archive File VufindTools.jar     File callnumberRange.patch     Java Archive File lucene-analyzers-common-4.2.1.jar     Java Archive File marc4j-2.6.0.jar    

 Description   
Using the Normalization script available in the SolrMarc tools, it's possible to create a normalization plugin for Solr. This will produce a Normalized LC Callnumber which can then be used for accurate callnumber sorting and range searches.

Instructions:

1) Drop the vuFindTools.Jar file into solr/lib

2) Edit schema.xml

<fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/>
  </analyzer>
</fieldType>

<field name="callnumber-normalized" type="LCNormalized" indexed="true" stored="false" />

3a) Edit callnumber.bsh

/**
 * Extract the raw call number a record
 * @param record
 * @return Call number
 */
public String getRawFullCallNumber(Record record, String fieldSpec) {

    String val = indexer.getFirstFieldVal(record, fieldSpec);
    return val;
}

3b) Edit marc_local.properties

callnumber-normalized = script(callnumber.bsh), getRawFullCallNumber(099ab:090ab:050ab)

4) Edit searches.ini

[sorting]
callnumber = callnumber-normalized

5) Re-index

6) VuFind is not set up to perform ranged searches other than published date - to see the callnumber range search in option search under allfields for:

callnumber-normalized:[AD400 TO CG8000]

 Comments   
Comment by Luke O'Sullivan [ 14/Sep/12 ]
New version of the filter which resolves Java memory leaks
Comment by Luke O'Sullivan [ 17/Sep/12 ]
Though the sorting works as expected, the range searches do not. It appears that solr does not support alphanumeric ranges. Can anyone confirm this?
Comment by Luke O'Sullivan [ 20/Mar/13 ]
See VUFIND-598 also

It is possible that Solr 3.6+ might resolve the issues with ranged searches
http://wiki.apache.org/solr/MultitermQueryAnalysis
Comment by Luke O'Sullivan [ 03/Apr/13 ]
I have tried this with VuFind 2.0 and the ranged search is still not working as expected

callnumber-normalized=[DS+TO+FE] will correctly list items with callnumbers between DS and FE, starting with DT* and finishing with FD*

callnumber-normalized=[DS763+TO+FE] incorrectly starts at DT* and finishes with FD*

Comment by Luke O'Sullivan [ 03/Apr/13 ]
Upgrading to Solr 3.6.2 and using the muliterm facility fixes the problems

Use the following in the schema:

<fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" />
</analyzer>
<analyzer type="multiterm">
<tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" />
      </analyzer>
</fieldType>

Also, rather than editing callnumber.bsh, you could just use :

callnumber-normalized = custom, getFullCallNumber(852khi:099ab:090ab:050ab)

in your marc_local.properties file
Comment by Luke O'Sullivan [ 17/Oct/13 ]
The attached patched and VufindTools.jar (updated for solr 4.2.1) file should allow users to add an LCC Callnumber Range Facet to Search results.

Comment by Demian Katz [ 13/Dec/13 ]
A couple of questions:

1.) If you have this analyzer installed, is there any reason to process the call numbers at index time? Can't we just omit the getFullCallNumberNormalized() indexing routine, copyField the callnumber field into the callnumber-normalized field and rely on the LCCNormalizeFilterFactory to do the work for us?

2.) Do you have the source available for the VufindTools.jar file? If we're going to incorporate this into VuFind, it seems that we should have the source somewhere so we can recompile it as new versions of Solr come along.

We should probably discuss on a future dev call whether we want to do anything more with VUFIND-598; does this ticket completely supersede that one, or are there cases where the less robust solution is still preferable because it doesn't rely on a custom Solr plug-in?
Comment by Demian Katz [ 13/Dec/13 ]
Regarding VUFIND-598, it occurred to me that I might as well just resolve it by getting the normalized routines into the standard callnumber.bsh/SolrMarc VuFind indexer. It doesn't hurt to have the options available. We just need to figure out what changes, if any, ought to be made to the default configuration as part of the resolution of this ticket.
Comment by Demian Katz [ 13/Dec/13 ]
See also VUFIND-332.
Comment by Luke O'Sullivan [ 16/Dec/13 ]
I have attached the LCC Java Files as requested

They basically mirror the index script so should be updated to match that
Comment by Demian Katz [ 20/Mar/14 ]
I have adapted the range logic from the attached patch into https://github.com/vufind-org/vufind/pull/115 (with some additions/improvements).
Comment by Demian Katz [ 04/Apr/14 ]
Now that VUFIND-919 is implemented, we may want to regenerate patches here to use the new configurable generic range functionality.
Comment by Luke O'Sullivan [ 17/Jun/14 ]
I've updated the VuFindTools.jar file to work with version 4.2.1 and to use the methods found in SolrMarc. Unfortunately, as Solr/Lucene have changed things around and SolrMarc requires certain libraries to work, this has increased the dependencies required for the Filter. In addition to VuFindTools.jar, you will also need to add SolrMarc.jar, MarcImporter.jar, marc4j-2.6.0.jar and lucene-analzers-common-4.2.1.jar to the lib directory.

I'm sure someone with even a modicum of Java experience could put them all together in one package (and select the individual classes required) - I really am just hacking things together.

Initial tests seem promising though - Sorting and range searching appear to be working

 

Comment by Demian Katz [ 29/Sep/15 ]
Call number normalization issues were addressed in release 2.4.
Generated at Fri Apr 19 20:54:40 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100250-rev:2b88e55752dc82be8616a67bc2b73a87c8e22b48.