About Features Downloads Getting Started Documentation Events Support GitHub

Site Tools


configuration:remote_marc_records

Remote MARC Records

Swap MARC-fullrecord to external service using RecordDriver SolrMarcRemote

IMPORTANT: This page refers to a feature added in VuFind 2.5.

Introduction

We (finc-team at Leipzig University Library) have been struggling with a huge index for quite a while and mitigated it by moving the index's MARC-fullrecord field to a dedicated binary-server - the index's size was reduced by ~25%.

As the fullrecord-field is not indexed we concluded that it could also be removed from the index and made available through a binary-server. In fact our setup stores the indexed .mrc-files in a simple folder-structure made up of two digit folder-names corresponding with the .mrc-file's record-id (our record-ids are sequential and digits-only):

e.g. /00/00/34/23/34.mrc

By usage of HTTP-GET-Requests the .mrc-files are being served. We extended the SolrMarc-RecordDriver by implementing a method that gets the binary .mrc-file from an URL configured in the [Record] section in config.ini if the fullrecord-field is empty or non-set in the current index. All the stock-methods of SolrMarc-RecordDriver for parsing binary MARC-data still work and are used by the new RecordDriver SolrMarcRemote.

Therefore if you are in need of reducing size of your Solr-Index, swapping the MARC-fullrecord to a remote service might be a solution for you.

Prerequisites

In this guide it is assumed that

  • your server operating system is Linux
  • you have an additional http-server running which will be used for serving the MARC-files (e.g. nginx)
  • you have direct access to the MARC-files serving http-server's filesystem during index-time
  • your MARC-records reside in one single .mrc file
  • your unique identifier for the records (and for the MARC-files) consists of 10 digits (those will be sliced into chunks of two digits being used for the folder-structure) - if your unique identifier for your MARC-records differs from ours you will need to adjust the folder-structure and slicing logic accordingly

Setup-Guide

1. Setup the remote service providing MARC-files

We use nginx for our http-server, therefore our nginx config looks like:

user              nginx;
worker_processes  4;
location / {
  rewrite "^/([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})" /$1/$2/$3/$4/$5.mrc;
  root   /var/marc;
}

The corresponding files will be placed in /var/marc/$1/$2/$3/$4/$5.mrc

2. Modify marc_local.properties

Set in your marc_local.properties:

fullrecord = ""
record_format = "marcremote"

:!: In VuFind 5.x and earlier, use “recordtype” in place of “record_format.”

This will prevent import-marc.sh from loading the MARC-Record into the Solr-field fullrecord and mark the Solr-records as the type “marcremote” in order to load the correct RecordDriver in VuFind.

3. Populate remote service with MARC-files during indexing

The following shell-script (linux bash) is a proof-of-concept that needs to be adapted by your needs (e.g. your unique identifier/folder-structure) and should be executed before/after/during (as you wish) marc-import.sh:

#!/bin/bash
tmpfix="/tmp/singlemarcprefix_$(date +%F)_" 
yaz-marcdump -s "$tmpfix" "$1" > /dev/null
for source in $(ls -1 "$tmpfix"*)
do
    # extract MARC 001 (pos 4+), insert slash after every other char (..), replace trailing slash by extension .mrc
    target=$(yaz-marcdump "$source"|grep ^001|sed -e 's/^....//' -e 's#\(..\)#\1/#g' -e 's#/$#.mrc#')
    # create target directory path (up to last slash)
    mkdir -p $(echo "$target"|sed 's#/[^/]*$##')
    # rename/move marc file
    mv "$source" "$target" 
done

This script

  • uses yaz-marcdump to extract your MARC-records from the single MARC-file containing several records
  • uses sed, mv to create the folder-structure
  • does not check whether the id found in MARC-001 is appropriate for the folder-structure

4. Configure RecordDriver SolrMarcRemote

Turn on the appropriate setting in the [Record] section of config.ini:

remote_marc_url = http://your.marc.record.server/%s

Conclusion

This setup should have reduced the size of your Solr index and populated your additional http-server with the raw MARC-files. VuFind should load all records flawlessly by pulling the raw MARC-files if they are needed from the additional http-server. If you have questions regarding this setup please fell free to contact us: team@finc.info

configuration/remote_marc_records.txt · Last modified: 2019/02/04 20:36 by demiankatz