About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
configuration:remote_marc_records

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
remote_marc_records [2015/05/13 13:02] – created demiankatzconfiguration:remote_marc_records [2019/02/04 20:36] – [2. Modify marc_local.properties] demiankatz
Line 1: Line 1:
 ====== Remote MARC Records ====== ====== Remote MARC Records ======
 +Swap MARC-fullrecord to external service using RecordDriver SolrMarcRemote
 +
 +// IMPORTANT: This page refers to a feature added in VuFind 2.5. //
 +
 +===== Introduction =====
 +
 +We (finc-team at Leipzig University Library) have been struggling with a huge index for quite a while and mitigated it by moving the index's MARC-fullrecord field to a dedicated binary-server - the index's size was reduced by ~25%.
 +
 +As the fullrecord-field is not indexed we concluded that it could also be removed from the index and made available through a binary-server. In fact our setup stores the indexed .mrc-files in a simple folder-structure made up of two digit folder-names corresponding with the .mrc-file's record-id (our record-ids are sequential and digits-only):
 +
 +e.g. /00/00/34/23/34.mrc
 +
 +By usage of HTTP-GET-Requests the .mrc-files are being served. We extended the SolrMarc-RecordDriver by implementing a method that gets the binary .mrc-file from an URL configured in the [Record] section in config.ini if the fullrecord-field is empty or non-set in the current index.
 +All the stock-methods of SolrMarc-RecordDriver for parsing binary MARC-data still work and are used by the new RecordDriver SolrMarcRemote.
 +
 +Therefore if you are in need of reducing size of your Solr-Index, swapping the MARC-fullrecord to a remote service might be a solution for you.
 +
 +===== Prerequisites =====
 +
 +In this guide it is assumed that
 +
 +  * your server operating system is Linux
 +  * you have an additional http-server running which will be used for serving the MARC-files (e.g. nginx)
 +  * you have direct access to the MARC-files serving http-server's filesystem during index-time
 +  * your MARC-records reside in one single .mrc file
 +  * your unique identifier for the records (and for the MARC-files) consists of 10 digits (those will be  sliced into chunks of two digits being used for the folder-structure) - if your unique identifier for your MARC-records differs from ours you will need to adjust the folder-structure and slicing logic accordingly
 +
 +===== Setup-Guide =====
 +
 +==== 1. Setup the remote service providing MARC-files ====
 +
 +We use nginx for our http-server, therefore our nginx config looks like:
 +
 +<code>
 +user              nginx;
 +worker_processes  4;
 +location / {
 +  rewrite "^/([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})" /$1/$2/$3/$4/$5.mrc;
 +  root   /var/marc;
 +}
 +</code>
 +
 +The corresponding files will be placed in /var/marc/$1/$2/$3/$4/$5.mrc
 +
 +==== 2. Modify marc_local.properties ====
 +
 +Set in your marc_local.properties:
 +
 +<code properties>
 +fullrecord = ""
 +record_format = "marcremote"
 +</code>
 +
 +:!: In VuFind 5.0 and earlier, use "recordtype" in place of "record_format."
 +
 +This will prevent import-marc.sh from loading the MARC-Record into the Solr-field fullrecord and mark the Solr-records as the type "marcremote" in order to load the correct RecordDriver in VuFind.
 +
 +==== 3. Populate remote service with MARC-files during indexing ====
 +
 +The following shell-script (linux bash) is a proof-of-concept that needs to be adapted by your needs (e.g. your unique identifier/folder-structure) and should be executed before/after/during (as you wish) marc-import.sh:
 +
 +<code bash>
 +#!/bin/bash
 +tmpfix="/tmp/singlemarcprefix_$(date +%F)_" 
 +yaz-marcdump -s "$tmpfix" "$1" > /dev/null
 +for source in $(ls -1 "$tmpfix"*)
 +do
 +    # extract MARC 001 (pos 4+), insert slash after every other char (..), replace trailing slash by extension .mrc
 +    target=$(yaz-marcdump "$source"|grep ^001|sed -e 's/^....//' -e 's#\(..\)#\1/#g' -e 's#/$#.mrc#')
 +    # create target directory path (up to last slash)
 +    mkdir -p $(echo "$target"|sed 's#/[^/]*$##')
 +    # rename/move marc file
 +    mv "$source" "$target" 
 +done
 +</code>
 +
 +This script 
 +
 +  * uses yaz-marcdump to extract your MARC-records from the single MARC-file containing several records
 +  * uses sed, mv to create the folder-structure
 +  * does not check whether the id found in MARC-001 is appropriate for the folder-structure
 +
 +==== 4. Configure RecordDriver SolrMarcRemote ====
 +
 +Turn on the appropriate setting in the [Record] section of [[configuration:files:config.ini]]:
 +
 +<code ini>
 +remote_marc_url = http://your.marc.record.server/%s
 +</code>
 +
 +===== Conclusion =====
 +
 +This setup should have reduced the size of your Solr index and populated your additional http-server with the raw MARC-files. VuFind should load all records flawlessly by pulling the raw MARC-files if they are needed from the additional http-server.
 +If you have questions regarding this setup please fell free to contact us: <team@finc.info>
 +
 ---- struct data ---- ---- struct data ----
 ---- ----
  
configuration/remote_marc_records.txt · Last modified: 2023/11/09 19:13 by demiankatz