[VUFIND-657] Normalized LC Callnumber field Created: 13/Sep/12 Updated: 29/Sep/15 Resolved: 29/Sep/15 |
|
Status: | Resolved |
Project: | VuFind® |
Components: | Import Tools |
Affects versions: | None |
Fix versions: | 2.4 |
Type: | Improvement | Priority: | Trivial |
Reporter: | Luke O'Sullivan | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original estimate: | Not Specified |
Attachments: | LCCNormalizeFilter.class LCCNormalizeFilter.java LCCNormalizeFilterFactory.class LCCNormalizeFilterFactory.java MarcImporter.jar SolrMarc.jar VuFindTools.jar VufindTools.jar VufindTools.jar callnumberRange.patch lucene-analyzers-common-4.2.1.jar marc4j-2.6.0.jar |
Description |
Using the Normalization script available in the SolrMarc tools, it's possible to create a normalization plugin for Solr. This will produce a Normalized LC Callnumber which can then be used for accurate callnumber sorting and range searches. Instructions: 1) Drop the vuFindTools.Jar file into solr/lib 2) Edit schema.xml <fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/> </analyzer> </fieldType> <field name="callnumber-normalized" type="LCNormalized" indexed="true" stored="false" /> 3a) Edit callnumber.bsh /** * Extract the raw call number a record * @param record * @return Call number */ public String getRawFullCallNumber(Record record, String fieldSpec) { String val = indexer.getFirstFieldVal(record, fieldSpec); return val; } 3b) Edit marc_local.properties callnumber-normalized = script(callnumber.bsh), getRawFullCallNumber(099ab:090ab:050ab) 4) Edit searches.ini [sorting] callnumber = callnumber-normalized 5) Re-index 6) VuFind is not set up to perform ranged searches other than published date - to see the callnumber range search in option search under allfields for: callnumber-normalized:[AD400 TO CG8000] |
Comments |
Comment by Luke O'Sullivan [ 14/Sep/12 ] |
New version of the filter which resolves Java memory leaks |
Comment by Luke O'Sullivan [ 17/Sep/12 ] |
Though the sorting works as expected, the range searches do not. It appears that solr does not support alphanumeric ranges. Can anyone confirm this? |
Comment by Luke O'Sullivan [ 20/Mar/13 ] |
See It is possible that Solr 3.6+ might resolve the issues with ranged searches http://wiki.apache.org/solr/MultitermQueryAnalysis |
Comment by Luke O'Sullivan [ 03/Apr/13 ] |
I have tried this with VuFind 2.0 and the ranged search is still not working as expected callnumber-normalized=[DS+TO+FE] will correctly list items with callnumbers between DS and FE, starting with DT* and finishing with FD* callnumber-normalized=[DS763+TO+FE] incorrectly starts at DT* and finishes with FD* |
Comment by Luke O'Sullivan [ 03/Apr/13 ] |
Upgrading to Solr 3.6.2 and using the muliterm facility fixes the problems Use the following in the schema: <fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" /> </analyzer> <analyzer type="multiterm"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" /> </analyzer> </fieldType> Also, rather than editing callnumber.bsh, you could just use : callnumber-normalized = custom, getFullCallNumber(852khi:099ab:090ab:050ab) in your marc_local.properties file |
Comment by Luke O'Sullivan [ 17/Oct/13 ] |
The attached patched and VufindTools.jar (updated for solr 4.2.1) file should allow users to add an LCC Callnumber Range Facet to Search results. |
Comment by Demian Katz [ 13/Dec/13 ] |
A couple of questions: 1.) If you have this analyzer installed, is there any reason to process the call numbers at index time? Can't we just omit the getFullCallNumberNormalized() indexing routine, copyField the callnumber field into the callnumber-normalized field and rely on the LCCNormalizeFilterFactory to do the work for us? 2.) Do you have the source available for the VufindTools.jar file? If we're going to incorporate this into VuFind, it seems that we should have the source somewhere so we can recompile it as new versions of Solr come along. We should probably discuss on a future dev call whether we want to do anything more with |
Comment by Demian Katz [ 13/Dec/13 ] |
Regarding |
Comment by Demian Katz [ 13/Dec/13 ] |
See also VUFIND-332. |
Comment by Luke O'Sullivan [ 16/Dec/13 ] |
I have attached the LCC Java Files as requested They basically mirror the index script so should be updated to match that |
Comment by Demian Katz [ 20/Mar/14 ] |
I have adapted the range logic from the attached patch into https://github.com/vufind-org/vufind/pull/115 (with some additions/improvements). |
Comment by Demian Katz [ 04/Apr/14 ] |
Now that |
Comment by Luke O'Sullivan [ 17/Jun/14 ] |
I've updated the VuFindTools.jar file to work with version 4.2.1 and to use the methods found in SolrMarc. Unfortunately, as Solr/Lucene have changed things around and SolrMarc requires certain libraries to work, this has increased the dependencies required for the Filter. In addition to VuFindTools.jar, you will also need to add SolrMarc.jar, MarcImporter.jar, marc4j-2.6.0.jar and lucene-analzers-common-4.2.1.jar to the lib directory. I'm sure someone with even a modicum of Java experience could put them all together in one package (and select the individual classes required) - I really am just hacking things together. Initial tests seem promising though - Sorting and range searching appear to be working |
Comment by Demian Katz [ 29/Sep/15 ] |
Call number normalization issues were addressed in release 2.4. |