Vufind: The library OPAC meets Web 2.0


 

MARC Records

VuFind was initially designed with the MARC bibliographic record format in mind, though additional formats are supported through the use of Record Drivers starting with version 1.0. For general information on MARC, see Understanding MARC Bibliographic from the Library of Congress. The Code4Lib Working with MARC page also provides some useful tools.

Importing Records

VuFind comes packaged with the SolrMarc tool for importing MARC records. Follow these steps to take advantage of it.

1. Export the Records

Before you can load the records into VuFind, you need to get them out of your Integrated Library System (ILS). If you are just testing VuFind, you can also download sample records from sources listed lower on this page.

Every ILS has a different procedure for exporting records, and detailing all of them is beyond the scope of this document. Check your ILS documentation or talk to your vendor if you need help. You can also check the MARC Export Notes page to see if there are notes specific to your ILS; please consider adding to the page if you have knowledge to share. If you still need help, you can always ask on the mailing lists on the Support page – the VuFind community is always happy to help when it can.

Keep these notes in mind to ensure that your records can be imported without any problems:

  • Export your records in binary (ISO2709) MARC format, not human-readable ASCII. If for some reason you cannot export the records in binary form, you can use a tool like yaz-marcdump from the YAZ toolkit to convert one MARC format to another.
  • Make sure your resulting file has a ”.mrc” extension. Most versions of SolrMarc require this extension, so it is a good practice to use it just to be on the safe side.
  • Each exported record must contain a unique identifier so that VuFind can tell it apart from the others. We recommend including your ILS's bibliographic record ID in the exported data for this purpose; you may need to add a special configuration option to your ILS's exporter to make this happen. VuFind's importer expects to find the unique ID in the 001 field, but you can customize this by editing the marc.properties file (for more details, see Customizing Import Mappings).

2. Configure the Importer

The import tool relies on settings in import/import.properties. If this is the first time you are indexing, make sure that file paths and URLs in this file are correct for your setup. For more details on what everything means, see the SolrMarc documentation.

3. Import the Records

To begin an import, follow the platform-specific instructions listed below. This may take hours or days for very large data sets!

Linux Method

Switch to your VuFind installation directory and run:

./import-marc.sh your_records_file.mrc

Note: In versions of VuFind prior to 1.0RC2, import-marc.sh was named import.sh.

Windows Method

Switch to your VuFind installation directory and run:

import-marc.bat your_records_file.mrc

Note: prior to VuFind 1.0RC2, import-marc.bat was not available, and it was necessary to run SolrMarc manually:

Java -Xms256m -Xmx256m -Dsolr.core.name=biblio -Dsolrmarc.path=C:/vufind/import -Dsolr.path=C:/vufind/solr -Dmarc.path=c:/vufind/import/catdump.mrc -jar c:/vufind/import/dist/MarcImporter.jar c:/vufind/import/import.properties 

(thanks to mike_beccaria)

Advanced Options

The following optional feature was introduced after the release of VuFind 1.0.1. If you want to take advantage of it without upgrading the rest of VuFind, you can download updated scripts from the trunk here.

In both Linux and Windows, you can use the optional ”-p” switch to override SolrMarc's default import.properties file with a different file. For example:

./import-marc.sh -p /usr/local/vufind/import/custom.properties your_records.mrc

This may be useful if you need to import different sets of records using different mappings.

Importing Authority Records

Starting with VuFind 1.1, it is also possible to import authority records into VuFind's separate authority index (see the authority control page for more details). A special tool (import-marc-auth.sh under Linux, import-marc-auth.bat under Windows) is provided to help with this. This works exactly like the standard import-marc script, except the SolrMarc settings are found in import/import_auth.properties, the default MARC mappings are found in import/marc_auth.properties, and you may provide a second parameter after the MARC filename to specify a set of additional MARC mappings to override the defaults in marc_auth.properties.

Authority data is currently used in two ways: it can be searched through the simple Authority module (found at http://your_server/vufind/Authority/Home), and it provides “see also” and “use instead” references within the index generated by the Alphabetical Heading Browse feature. Additionally, you can choose to activate the Authority Recommend module which will provide Search recommendations to users based on a search of the Authority Index for their current search terms. E.g., if users search for a known pseudonym, the Authority Recommend module will suggest that they search for the registered heading instead.

Troubleshooting Under Windows

If you have trouble importing authority records under Windows, it may have to do with the classpath settings in some of the .bsh files found in the import/index_scripts subdirectory of your VuFind installation. Try changing the addClassPath(”../import”); lines to addClassPath(“c:/vufind/import”); where “c:/vufind/import” is the path to the import subdirectory of your VuFind installation. Note the use of forward slashes – this is acceptable and simplifies escaping issues, even in the Windows environment.

4. Restart VuFind

If the imported records do not show up in VuFind immediately, you will have to restart the program as described here.

5. Optimize Your Index

For improved performance (and, if applicable, correct spellchecker behavior), it is a good idea to optimize your Solr index after you import records.

Customizing SolrMarc

See the SolrMarc page for more details on how you can customize the behavior of the import process to meet your needs.

Indexing Full Text

Starting with VuFind 1.2, it is possible to harvest full text from URLs found in MARC records. This requires that you first install a full-text extraction tool and then uncomment the appropriate fulltext line in import/marc_local.properties. Comments in the property file explain exactly how the functionality works. Full text indexing is disabled by default.

Sources for Sample Records

This section is for listing sources of binary MARC records helpful for testing purposes if you want to try VuFind without using your own records:

XML Records

If the data you want to import is not available in MARC format, chances are that you can access it in some flavor of XML. Fortunately, loading XML into VuFind's index is straightforward if you are familiar with the XSLT language – you simply need to translate from the XML format you have available into Solr's XML Message Format, then post the result to the Solr server.

Importing with XSLT

IMPORTANT: The XSLT tool described in this section was added in VuFind 1.1. If you are using an earlier version, you will have to upgrade.

VuFind's XSLT tool is designed to make posting XSLT-transformed documents to the Solr index simple while offering flexibility for extending XSLT and applying local customizations.

The Basics

The XSLT tool is driven by a properties file which provides a few key pieces of information:

  • The name of the XSLT file to use.
  • The names of custom PHP functions and classes that will be called by the XSLT file.
  • Any custom values you want to pass in as parameters to the XSLT file (i.e. local institution names, ID prefixes, etc.)

You can see an example properties file here. The comments in this example file explain the available settings.

Once a properties file is set up, you can import an XML file by switching to the import subdirectory of your VuFind installation and typing:

php import-xsl.php myFile.xml mySettings.properties

(substituting the appropriate XML and properties files as needed).

Full Text

VuFind's XSLT tool includes support for extracting full text from external documents (PDF, Word, etc.). In order to take advantage of this, you need to install and configure a full-text extraction tool.

For an example of full text extraction in action in VuFind, see the full text settings near the bottom of the VuDL Sample XSLT File.

Batch Importing

If you need to load a number of XML files at once, you can load them into a subdirectory under the harvest subdirectory of your VuFind installation and use the batch-import-xsl.sh script to load them all. This is commonly used in combination with OAI-PMH harvesting (described below).

OAI-PMH Harvesting

Starting with VuFind 1.0.1, a simple tool is included for harvesting records using the OAI-PMH protocol.

Setting up OAI-PMH

To set up OAI-PMH harvesting, simply edit the oai.ini file in the harvest subdirectory of your VuFind installation. You can set up one or more OAI-PMH repositories here – details are included in comments within the file.

Harvest Workflow

Once OAI-PMH is configured, you can follow these steps to get documents from an OAI-PMH repository into your VuFind index:

  • Run the harvester by switching to the harvest subdirectory of your VuFind installation and running “php harvest_oai.php”. If you configured multiple repositories and want to harvest from just one, you can add the name of the repository (as specified as a section header in oai.ini) as a parameter to limit your harvesting.
  • For each OAI-PMH repository you harvested, a number of files will have been created in a subdirectory of harvest whose name matches the appropriate section of the oai.ini configuration file. This subdirectory will be found under $VUFIND_HOME/harvest in VuFind 1.x; in VuFind 2.x, it may be found under $VUFIND_LOCAL_DIR/harvest if the $VUFIND_LOCAL_DIR environment variable is set.
  • Run the ./batch-delete.sh file (with a harvest subdirectory name as a parameter) to remove any records from your index that have been reported as deleted by the OAI-PMH server.
  • Run the ./batch-import-marc.sh file (with a harvest subdirectory name as a parameter) to index all MARC records harvested from an OAI-PMH server. If you are harvesting non-MARC data, you may wish to use ./batch-import-xsl.sh instead – see notes on XSLT above.
  • After all deleted and new records have been processed, the records retrieved from the OAI-PMH server will have been moved to a “processed” subdirectory of their containing directory. You can periodically clear out this directory if you no longer feel you need to retain records. However, it may be useful to keep them, since you can always move them back up a directory level and re-run the batch processing scripts in order to reindex everything.
  • A “last_harvest.txt” file is created in each OAI-PMH harvest directory to keep track of the most recent harvest. This allows subsequent harvest operations to pick up where previous ones left off. To reindex all records, you can simply delete this file. Note that it is normal for some duplicate records to be retrieved on subsequent harvests – new harvests overlap slightly with the previous set in order to ensure that nothing is missed.

It should be possible to automate this process using a top-level script and cron job in order to do a nightly harvest/index operation.

Important notes

  • Processing a large number of MARC files is currently very slow, since records are processed one file at a time. It may be worth developing a new tool to merge all the MARC records into a single file as an intermediate step before indexing them.

Specific Examples

DSpace

EPrints

Greenstone Digital Library (GSDL)

Related Pages

  • Automation - Notes on automating VuFind, including how to regularly load the latest records.
  • Open Data Sources - A list of potential sources for additional records to add to your index.
  • Re-indexing - How to clear out your index if you want to start over.

User-Provided Notes

importing_records.txt · Last modified: 2013/01/23 13:48 by demiankatz
 
Recent changes RSS feed Driven by DokuWiki