About Features Downloads Getting Started Documentation Events Support GitHub

Site Tools


indexing:xml

XML Records

If the data you want to import is not available in MARC format, chances are that you can access it in some flavor of XML. Fortunately, loading XML into VuFind's index is straightforward if you are familiar with the XSLT language – you simply need to translate from the XML format you have available into Solr's XML Message Format, then post the result to the Solr server.

Importing with XSLT

The XSLT tool described in this section was added in VuFind 1.1.

VuFind's XSLT tool is designed to make posting XSLT-transformed documents to the Solr index simple while offering flexibility for extending XSLT and applying local customizations.

The Basics

The XSLT tool is driven by a properties file which provides a few key pieces of information:

  • The name of the XSLT file to use.
  • The names of custom PHP functions and classes that will be called by the XSLT file.
  • Any custom values you want to pass in as parameters to the XSLT file (i.e. local institution names, ID prefixes, etc.)

You can see an example properties file here. The comments in this example file explain the available settings.

Once a properties file is set up, you can import an XML file by switching to the import subdirectory of your VuFind installation and typing:

php import-xsl.php myFile.xml mySettings.properties

(substituting the appropriate XML and properties files as needed).

Full Text

VuFind's XSLT tool includes support for extracting full text from external documents (PDF, Word, etc.). In order to take advantage of this, you need to install and configure a full-text extraction tool.

For an example of full text extraction in action in VuFind, see the full text settings near the bottom of the VuDL Sample XSLT File.

Batch Importing

If you need to load a number of XML files at once, you can load them into a subdirectory under the harvest subdirectory of your VuFind installation and use the batch-import-xsl.sh script to load them all. This is commonly used in combination with OAI-PMH harvesting.

indexing/xml.txt · Last modified: 2018/12/19 14:04 by demiankatz