Table of Contents
Remote MARC Records
Swap MARC-fullrecord to external service using RecordDriver SolrMarcRemote
IMPORTANT: This page refers to a feature added in VuFind 2.5.
Introduction
We (finc-team at Leipzig University Library) have been struggling with a huge index for quite a while and mitigated it by moving the index's MARC-fullrecord field to a dedicated binary-server - the index's size was reduced by ~25%.
As the fullrecord-field is not indexed we concluded that it could also be removed from the index and made available through a binary-server. In fact our setup stores the indexed .mrc-files in a simple folder-structure made up of two digit folder-names corresponding with the .mrc-file's record-id (our record-ids are sequential and digits-only):
e.g. /00/00/34/23/34.mrc
By usage of HTTP-GET-Requests the .mrc-files are being served. We extended the SolrMarc-RecordDriver by implementing a method that gets the binary .mrc-file from an URL configured in the [Record] section in config.ini if the fullrecord-field is empty or non-set in the current index. All the stock-methods of SolrMarc-RecordDriver for parsing binary MARC-data still work and are used by the new RecordDriver SolrMarcRemote.
Therefore if you are in need of reducing size of your Solr-Index, swapping the MARC-fullrecord to a remote service might be a solution for you.
Prerequisites
In this guide it is assumed that
- your server operating system is Linux
- you have an additional http-server running which will be used for serving the MARC-files (e.g. nginx)
- you have direct access to the MARC-files serving http-server's filesystem during index-time
- your MARC-records reside in one single .mrc file
- your unique identifier for the records (and for the MARC-files) consists of 10 digits (those will be sliced into chunks of two digits being used for the folder-structure) - if your unique identifier for your MARC-records differs from ours you will need to adjust the folder-structure and slicing logic accordingly
Setup-Guide
1. Setup the remote service providing MARC-files
We use nginx for our http-server, therefore our nginx config looks like:
user nginx; worker_processes 4; location / { rewrite "^/([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})" /$1/$2/$3/$4/$5.mrc; root /var/marc; }
The corresponding files will be placed in /var/marc/$1/$2/$3/$4/$5.mrc
2. Modify marc_local.properties
Set in your marc_local.properties:
fullrecord = "" record_format = "marcremote"
In VuFind 5.x and earlier, use “recordtype” in place of “record_format.”
This will prevent import-marc.sh from loading the MARC-Record into the Solr-field fullrecord and mark the Solr-records as the type “marcremote” in order to load the correct RecordDriver in VuFind.
3. Populate remote service with MARC-files during indexing
The following shell-script (linux bash) is a proof-of-concept that needs to be adapted by your needs (e.g. your unique identifier/folder-structure) and should be executed before/after/during (as you wish) marc-import.sh:
#!/bin/bash tmpfix="/tmp/singlemarcprefix_$(date +%F)_" yaz-marcdump -s "$tmpfix" "$1" > /dev/null for source in $(ls -1 "$tmpfix"*) do # extract MARC 001 (pos 4+), insert slash after every other char (..), replace trailing slash by extension .mrc target=$(yaz-marcdump "$source"|grep ^001|sed -e 's/^....//' -e 's#\(..\)#\1/#g' -e 's#/$#.mrc#') # create target directory path (up to last slash) mkdir -p $(echo "$target"|sed 's#/[^/]*$##') # rename/move marc file mv "$source" "$target" done
This script
- uses yaz-marcdump to extract your MARC-records from the single MARC-file containing several records
- uses sed, mv to create the folder-structure
- does not check whether the id found in MARC-001 is appropriate for the folder-structure
4. Configure RecordDriver SolrMarcRemote
Turn on the appropriate setting in the [Record] section of config.ini:
remote_marc_url = http://your.marc.record.server/%s
Conclusion
This setup should have reduced the size of your Solr index and populated your additional http-server with the raw MARC-files. VuFind should load all records flawlessly by pulling the raw MARC-files if they are needed from the additional http-server. If you have questions regarding this setup please fell free to contact us: team@finc.info