====== Remote MARC Records ====== Swap MARC-fullrecord to external service using RecordDriver SolrMarcRemote ===== Introduction ===== We (finc-team at Leipzig University Library) have been struggling with a huge index for quite a while and mitigated it by moving the index's MARC-fullrecord field to a dedicated binary-server - the index's size was reduced by ~25%. As the fullrecord-field is not indexed we concluded that it could also be removed from the index and made available through a binary-server. In fact our setup stores the indexed .mrc-files in a simple folder-structure made up of two digit folder-names corresponding with the .mrc-file's record-id (our record-ids are sequential and digits-only): e.g. /00/00/34/23/34.mrc By usage of HTTP-GET-Requests the .mrc-files are being served. We extended the SolrMarc-RecordDriver by implementing a method that gets the binary .mrc-file from an URL configured in the [Record] section in config.ini if the fullrecord-field is empty or non-set in the current index. All the stock-methods of SolrMarc-RecordDriver for parsing binary MARC-data still work and are used by the new RecordDriver SolrMarcRemote. Therefore if you are in need of reducing size of your Solr-Index, swapping the MARC-fullrecord to a remote service might be a solution for you. ===== Prerequisites ===== In this guide it is assumed that * your server operating system is Linux * you have an additional http-server running which will be used for serving the MARC-files (e.g. nginx) * you have direct access to the MARC-files serving http-server's filesystem during index-time * your MARC-records reside in one single .mrc file * your unique identifier for the records (and for the MARC-files) consists of 10 digits (those will be sliced into chunks of two digits being used for the folder-structure) - if your unique identifier for your MARC-records differs from ours you will need to adjust the folder-structure and slicing logic accordingly ===== Setup-Guide ===== ==== 1. Setup the remote service providing MARC-files ==== We use nginx for our http-server, therefore our nginx config looks like: user nginx; worker_processes 4; location / { rewrite "^/([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})" /$1/$2/$3/$4/$5.mrc; root /var/marc; } The corresponding files will be placed in /var/marc/$1/$2/$3/$4/$5.mrc ==== 2. Modify marc_local.properties ==== Set in your marc_local.properties: fullrecord = "" record_format = "marcremote" This will prevent import-marc.sh from loading the MARC-Record into the Solr-field fullrecord and mark the Solr-records as the type "marcremote" in order to load the correct RecordDriver in VuFind®. ==== 3. Populate remote service with MARC-files during indexing ==== The following shell-script (linux bash) is a proof-of-concept that needs to be adapted by your needs (e.g. your unique identifier/folder-structure) and should be executed before/after/during (as you wish) marc-import.sh: #!/bin/bash tmpfix="/tmp/singlemarcprefix_$(date +%F)_" yaz-marcdump -s "$tmpfix" "$1" > /dev/null for source in $(ls -1 "$tmpfix"*) do # extract MARC 001 (pos 4+), insert slash after every other char (..), replace trailing slash by extension .mrc target=$(yaz-marcdump "$source"|grep ^001|sed -e 's/^....//' -e 's#\(..\)#\1/#g' -e 's#/$#.mrc#') # create target directory path (up to last slash) mkdir -p $(echo "$target"|sed 's#/[^/]*$##') # rename/move marc file mv "$source" "$target" done This script * uses yaz-marcdump to extract your MARC-records from the single MARC-file containing several records * uses sed, mv to create the folder-structure * does not check whether the id found in MARC-001 is appropriate for the folder-structure ==== 4. Configure RecordDriver SolrMarcRemote ==== Turn on the appropriate setting in the [Record] section of [[configuration:files:config.ini]]: remote_marc_url = http://your.marc.record.server/%s ===== Conclusion ===== This setup should have reduced the size of your Solr index and populated your additional http-server with the raw MARC-files. VuFind® should load all records flawlessly by pulling the raw MARC-files if they are needed from the additional http-server. If you have questions regarding this setup please fell free to contact us: