Vufind: The library OPAC meets Web 2.0


 

Automation

Once VuFind is running, you still need to do some work to keep it up to date and stable. The exact details of VuFind automation will vary significantly based on your ILS and operating system. However, these are the general common goals:

  • Update VuFind's index with the latest changed and added records from your ILS. See the MARC Export Notes for help with this.
  • Remove deleted and suppressed records from VuFind's index. VuFind is packaged with tools in the util directory that can help with this, but functionality may be limited depending on the capabilities of your ILS.
  • Make sure the Solr index is regularly optimized for minimal space usage and improved performance. The provided tools for dealing with deleted and suppressed records should help with this, plus there is a stand-alone optimize tool in the util directory if you choose not to use the other utilities.
  • Dump the MySQL database for backup purposes.
  • Periodically restart Solr – if it runs for extremely long periods of time, like any complex piece of software, it may become unstable.

The remainder of this page shows specific solutions to these goals. Feel free to add your own ideas to this page if you wish. You may not be able to use these solutions exactly as-is, but hopefully the ideas included will help you get things running the way you want.

General Advice

Using cron

If you need to automate tasks under Linux, you will want to become familiar with the cron process. This allows you to schedule programs to run at specific intervals, either as root or as a particular user. The details of using cron are beyond the scope of this document, but there are many helpful resources available elsewhere. The Wikipedia page on the subject is probably a good starting point.

cron-friendly VuFind script

If you try to run the standard vufind.sh script to restart VuFind from within cron, you may run into problems. The context in which VuFind runs may be wrong, and the attempted output to the TTY may cause unexpected failures. If you have trouble, try adding this script (you can call it vufind_cron.sh) to your VuFind home directory:

#!/bin/sh
 
# Disable JETTY_CONSOLE output -- it causes problems when run by cron:
export JETTY_CONSOLE=/dev/null
 
# Pass parameters along to vufind.sh:
CURRENTPATH=`dirname $0`
cd $CURRENTPATH
$CURRENTPATH/vufind.sh $*

Important: Note that because of the way CURRENTPATH is determined, this script only works if it is in the same directory as vufind.sh!

Why restart VuFind?

The main reason to regularly restart the VuFind is to ensure system stability. Due to the way Java garbage collection works, Solr can eventually run out of memory and stop responding; periodic restarts are an easy way to avoid this problem. If you want to automatically detect garbage collection problems in a more sophisticated way, you can try this solution suggested by Erik Mitchell:

1. Modify the JAVA_OPTIONS to include Garbage Collector logging (Xloggc)

JAVA_OPTIONS="-server -Xms1024m -Xmx3800m -XX:+UseParallelGC -XX:NewRatio=5 -Xloggc:/var/log/vufindr/gc.log"

2. Create and cron a script that searches this log for the word Full. If it finds it, the GC has failed and it is only a matter of time before vufind bonks.

if [[ $(grep -c 'Full' /var/log/vufind/gc.log) != 0 ]];then 
exec /usr/local/vufind/vufind_cron.sh restart >> /home/vufind/cron.log
fi 

(File paths may need to be adjusted to match your own system).

Using mysqldump

If you want to back up VuFind's MySQL database, the mysqldump tool that comes with MySQL itself makes this pretty simple. You can see the tool's man page for all the details. Here is a sample script for nightly backups via cron:

#!/bin/sh
WORKDIR=/usr/local/vufind/mysqldump
DATE=`date '+%y%m%d'`
MYSQLUSER=root
MYSQLPASS=password
VUFIND_DB=vufind
 
# Dump MySQL database to disk
/usr/bin/mysqldump -l --default-character-set=utf8 --password=$MYSQLPASS -u$MYSQLUSER $VUFIND_DB >$WORKDIR/vufind_mysql_dump.$DATE
 
# Compress the dump to save space
gzip -f $WORKDIR/vufind_mysql_dump.$DATE
exit

Important: The backup will cause your database to lock while it is running, so you should schedule it for a time of low activity.

Automation with Voyager and Linux

The following steps list a simplified version of the process used by Villanova to maintain its VuFind installation. These are intended as guidelines only, and you will want to customize the provided scripts for your own needs. In particular, better error checking is important so you can figure out when things go wrong. You may wish to add code to send emails when conditions are not met. You may wish to add “echo” statements at various points to output status messages and then redirect the output of the scripts to a log file for later analysis. None of this logic is included here since your needs may vary, and the point is to show the general processes needed by VuFind.

Also note that some of these cron jobs will put a heavy load on your VuFind server. You should schedule them to run at times of low usage if at all possible!

Updating VuFind's Index with the Latest MARC Records

Getting MARC records from Voyager into VuFind requires configuration on two servers – your Voyager server AND your VuFind server. The Voyager processes dump out the changed MARC records, while the VuFind processes read them in and update VuFind's index.

Voyager Configuration

You will want to set up this script to run regularly by cron:

#!/bin/sh
 
# Note: The path below will vary from system to system:
exporter=/m1/voyager/yourdb/sbin/Pmarcexport
file=catalog_INCR
PROG="$exporter -o $file -rB -mB -t"
YEST=`/usr/local/bin/gdate -d yesterday +%F`
TODAY=`/usr/local/bin/gdate +%F`
 
# Dump the latest MARC records
exec $PROG$YEST:$TODAY
 
# Wait a second for things to settle down
/usr/bin/sleep 1
 
# Move the file into position to be retrieved by the VuFind server
chmod 777 $file
mv $file /scratch/$file-`date +%Y%m%d`.mrc
 
# now exit gracefully...
exit 0

VuFind Configuration

The following script needs to run by cron on the VuFind server, obviously at a later time than the Voyager script runs so that there is sufficient time for the records to be exported. It should be run as the user that controls VuFind.

Note that you will need to customize some variables in the script to match your environment. You may also wish to add some error checks and e-mail alerts to administrators to detect unexpected failures in the process.

In order for this script to work, it is also necessary to set up SSH keys so that the VuFind server can complete the SFTP operation without being prompted for a password. (This article may help).

#!/bin/bash
 
# This is the directory on the Voyager server where MARC records may be found:
DIR="/scratch/"
 
# This is the filename of the MARC records on the Voyager server:
VOYRECS=catalog_INCR-`date +%Y%m%d`.mrc
 
# This is the directory on the VuFind server where MARC records will be downloaded:
LDIR="/tmp/marc_data"
 
# This is the user and server name for the Voyager server:
VOYUSER=vufind
VOYSERVER=voyager.myinstitution.edu
 
# Open sftp session and start the transfer
/usr/bin/sftp $VOYUSER@$VOYSERVER <<EOF>/dev/null
		cd $DIR
		lcd $LDIR
		mget $VOYRECS
		quit
 
EOF
# done
 
# Import the records to VuFind:
IMPORT_SCRIPT=import-marc.sh
$VUFIND_HOME/$IMPORT_SCRIPT $LDIR/$VOYRECS
 
# Remove suppressed records and optimize index:
cd $VUFIND_HOME/util
/usr/bin/php suppressed.php
 
exit

Removing Deleted and Suppressed Records

Suppressed records are actually already taken care of by the scripts above. Deleted records require a bit more scripting. Voyager keeps a MARC file of all deleted records. VuFind comes with a utility to read a MARC file and delete all listed records within it. We just need to automate downloading of the deleted record list and use of the appropriate VuFind utility. A sample script follows that can be run by cron on your VuFind server.

Notes:

  • As with the previous script, you need to have SSH keys set up so that SFTP can transfer files automatically.
  • You'll need to customize some variables to match your local setup.
  • Be sure you run this script a significant amount of time AFTER the update script listed above. Both the update's suppressed cleanup and this script will optimize your Solr index. You don't want to have two optimize actions going on at the same time!
#!/bin/bash
 
# Directory and filename on the Voyager server containing your deleted records file:
DIR="/m1/voyager/yourdb/rpt"
FILE="deleted.bib.marc"
 
# Directory on your VuFind server to download the file:
LDIR="/tmp/marc_data"
 
# This is the user and server name for the Voyager server:
VOYUSER=vufind
VOYSERVER=voyager.myinstitution.edu
 
# Download the deleted records:
cd $LDIR
/usr/bin/sftp $VOYUSER@$VOYSERVER <<EOF>/dev/null
                cd $DIR
                get $FILE
                quit
EOF
 
# Remove deleted records from VuFind's index:
cd $VUFIND_HOME/util
/usr/bin/php deletes.php $LDIR/$FILE
 
exit
 
automation.txt · Last modified: 2011/05/20 08:25 by demiankatz
 
Recent changes RSS feed Driven by DokuWiki