About Features Downloads Getting Started Documentation Events Support GitHub

Site Tools


videos:oai-pmh_server_and_harvest_functionality

Video 6: OAI-PMH Server and Harvest Functionality

The sixth VuFind instructional video provides a brief overview of VuFind's use of the OAI-PMH protocol for sharing and harvesting metadata.

Video is available as an mp4 download or through YouTube.

Transcript

This is a raw machine-generated transcript; it has been partially cleaned up, but more work needs to be done on the later parts of the text.

Hello and welcome to this VuFind tutorial video, in which I am going to talk about how VuFind uses the OAI-PMH protocol to both share and receive records.

OAI-PMH is the open archives initiative protocol for metadata harvesting and is a well supported and widely used method of sharing xml metadata between systems. It supports not just harvesting entire collections of metadata but also doing incremental harvests so you can get only things that have changed since your prior harvest, and it can also address deleted records so you can find out what has been removed from an upstream system. The protocol always supports Dublin core metadata but it also can support any kind of XML format as well. The server and client are both able to deal with the same standard.

First of all I am going to show you how you can turn on VuFind's OAI-PMH server. I'm going to the command line and I'm going to edit my local config.ini file and you'll see that in the default configuration that comes with VuFind the entire [OAI] section is commented out, so by deleting this semicolon and uncommenting the section header I have now activated my OAI-PMH server.

That's all I need to do to turn on the basic functionality but there are a few things here that I would probably also want to do like give the name, and you can set a separate administrative email for your OAI server or otherwise it will use the default email address.

There are also some settings related to sets since OAI servers can divide a collection into specific sets. You can use a Solr field like a facet for defining sets or you can specify particular named sets with particular queries associated with them if you want to allow people to harvest specific subsets of your collection, but if you just leave all this stuff commented out then set functionality will be disabled and people will only be able to harvest your entire collection.

There is another important step though that you have to take before you can use OAI-PMH server capabilities in VuFind, and that is to turn on record change tracking because the OAI-PMH protocol needs to know the history of when everything in your system was created or changed so that it can do incremental updates. VuFind needs to track more information at index time so that the server has the information that it needs. By default, VuFind does not track record change information because doing so makes the index process slower, but if you do turn this on you not only get the benefit of being able to use the OAI-PMH server but you also gain access to some other functionality that otherwise won't work including RSS feeds that are sorted based on actual record creation times and the ability to use Solr-based new record searching where you can actually limit your search by how recently records were added to the index.

To turn this on you just need to uncomment a couple of lines in the default marc_local.properties file, so I'm going to bring that up. This is the same file that we've worked on. You can see here near the top there are two lines, first_indexed and last_indexed, and just by uncommenting these I turn on change tracking. The difference between these two fields is that the first_indexed field will contain the date of the first time a particular record ID was indexed into the system and the last_indexed date will contain the most recent time that record changed, so when you index a record for the first time first_indexed and last_indexed will be set, but if that record gets revised over time last_indexed will change to reflect those changes but first_indexed will always stay the same so you know the age of the overall record as well as the date of its most recent change and this is sort of the minimum amount of information needed to implement OAI-PMH.

Of course, simply making a change to my marc_local.properties file is not enough. I also index all of my records and just in keeping with past demos I'm going to index 3 of the sample MARC record files included with VuFind: journals.mrc, geo.mrc and authoritybibs.mrc.

Of course I've showed you how to turn on change tracking for MARC records. At some point in the future we'll also index XML. When we get that far you can also turn on change tracking there, it's just done in a different way. For now we've got our index updated the way we need it to be. We have the OAI server functionality turned on in config.ini, so I'm going to switch over to a web browser and show you how this works.

If you go to your VuFind URL with /oai on the end of it you will get to a convenient page that shows you all of the verbs supported by the OAI-PMH protocol. It lets you test them out on your instance, so for example, the most simple thing you can do is just say “identify” which will dump out basic information about the server and as you can see that the “Demian's repo” repository name I put into config.ini comes through here.

Of course, much more interesting is finding out what kind of metadata formats are supported by an OAI-PMH server. As I mentioned before, they always support Dublin Core but different formats may be supported by different servers so in a view case I'm just going to give one of the records and the index and find out what formats are supported so here is the Oei DC which is dublin core but you'll also see there's a mark 21 format supported so Mark XML can be and if we wanted to actually see some records we can use the list records verb which at a bare minimum requires that we give it a metadata format so I'm going to give it one hit go and there's my response and as you can see there's some mark XML getting dumped out here so by turning on this functionality you can share all of the records in your view find index with other systems Union catalogs participating in projects like the digital Public Library of America and also actually indexing things into VuFind so now that we've showed how OAI-PMH server functionality works let's show what VuFind can do as an AI pmh client and actually make it harvest itself as an example so going back to the command line there is a folder we haven't looked at yet called harvest in the VuFind directory and like just about everything in VuFind you can override things from the harvest directory inside the local harvest direct so one of the important files under harvest is called oai ini which is just an inny file that you can use to set up Oh a harvesting so I'm going to copy harvest /oe III and I into local harvest local copy that I local settings directory so oai dot I and I has lots of comments at the top and the many many settings that are supported by this file through those at your convenience at a bare minimum all you need to do to perform an OE I harvest is to create a section name you find because we are are you find and the main purpose of the section Natan is that records that are harvested will be saved in a directory whose name matches the section when I perform a harvest I will end up with a local slash harvest slash you find directory filled with XML files now I need to give it the base URL of an OE I server in this case that's gonna be HTTP localhost you find AI slash server this is the URL that you would share with others who want to harvest from you though of course in a real life scenario the host name would be something other than localhost I also have to provide a metadata prefix telling it what metadata format to harvest and in this example I actually just want to see what the Dublin Oh a IDC save this file once you have your oai and I set up there is a PHP script called harvest slash harvest AI dot PHP and when you run that it will loop through oai i and i and harvest every section it volumes or you can tell it the name of a specific section and it will be that one repository I'll do that I'll tell it harvest view fund now there we go it just downloaded 250 Dublin core records in just a couple of seconds so now if I go into my local harvest few find directory and list my files I have lots and lots of XML files and if I was out there is a little bit of Dublin core with title and a creator and identifier so that's really all I wanted to show this month this will become much more interesting when we talk about ingesting XML because you can harvest with oai and then load a whole directory of records into VuFinds we will look at that next time in the meantime I also just wanted to quickly mention that if you want to do this o AIP MH harvesting without having to install all of you find it has actually been split out into a separate project called you find harvest so you can just check out view find harvest and run a simplified version of the script without having to carry the whole way to VuFind around with you and I will include a link to that project in the notes with the video that's all for now thank you and I will provide more information next month

videos/oai-pmh_server_and_harvest_functionality.txt · Last modified: 2020/12/23 13:08 by demiankatz