Next revision | Previous revision |
indexing:oai-pmh [2015/12/14 15:29] – created demiankatz | indexing:oai-pmh [2020/09/22 14:35] (current) – demiankatz |
---|
====== OAI-PMH Harvesting ====== | ====== OAI-PMH Harvesting ====== |
| |
Starting with VuFind 1.0.1, a simple tool is included for harvesting records using the [[http://www.openarchives.org/pmh/|OAI-PMH]] protocol. | ===== About OAI-PMH ===== |
| |
| Open Archives Initiative Protocol for Metadata Harvesting ([[http://www.openarchives.org/pmh/|OAI-PMH]]) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked via HTTP. |
| |
| VuFind has been able to act as a Service Provider since release 1.0.1, since a simple tool is included for harvesting records using the protocol. This tool is also available as a standalone project called [[https://github.com/vufind-org/vufindharvest|VuFindHarvest]]. |
| |
| |
===== Setting up OAI-PMH ===== | ===== Setting up OAI-PMH ===== |
| |
To set up OAI-PMH harvesting, simply edit the [[https://github.com/vufind-org/vufind/blob/master/harvest/oai.ini|oai.ini]] file in the harvest subdirectory of your VuFind installation (or better still, edit a copy of it inside the harvest subdirectory of your [[vufind2:local_settings_directory|local settings directory]]). | To set up OAI-PMH harvesting, simply edit the [[https://github.com/vufind-org/vufind/blob/dev/harvest/oai.ini|oai.ini]] file in the harvest subdirectory of your VuFind installation (or better still, edit a copy of it inside the harvest subdirectory of your [[configuration:local_settings_directory|local settings directory]]). |
| |
You can set up one or more OAI-PMH repositories in the configuration -- details are included in comments within the file. | You can set up one or more OAI-PMH repositories in the configuration -- details are included in comments within the file. |
| |
* Run the harvester by switching to the harvest subdirectory of your VuFind installation and running "php harvest_oai.php". If you configured multiple repositories and want to harvest from just one, you can add the name of the repository (as specified as a section header in oai.ini) as a parameter to limit your harvesting. | * Run the harvester by switching to the harvest subdirectory of your VuFind installation and running "php harvest_oai.php". If you configured multiple repositories and want to harvest from just one, you can add the name of the repository (as specified as a section header in oai.ini) as a parameter to limit your harvesting. |
* For each OAI-PMH repository you harvested, a number of files will have been created in a subdirectory of harvest whose name matches the appropriate section of the oai.ini configuration file. // This subdirectory will be found under $VUFIND_HOME/harvest in VuFind 1.x; in VuFind 2.x, it may be found under $VUFIND_LOCAL_DIR/harvest if the $VUFIND_LOCAL_DIR environment variable is set. // | * For each OAI-PMH repository you harvested, a number of files will have been created in a subdirectory of harvest whose name matches the appropriate section of the oai.ini configuration file. // This subdirectory will be found under $VUFIND_LOCAL_DIR/harvest or $VUFIND_HOME/harvest depending on whether the $VUFIND_LOCAL_DIR environment variable is set. // |
* Run the ./batch-delete.sh file (with a harvest subdirectory name as a parameter) to remove any records from your index that have been reported as deleted by the OAI-PMH server. | * Run the ./batch-delete.sh file (with a harvest subdirectory name as a parameter) to remove any records from your index that have been reported as deleted by the OAI-PMH server. |
* Run the ./batch-import-marc.sh file (with a harvest subdirectory name as a parameter) to index all MARC records harvested from an OAI-PMH server. If you are harvesting non-MARC data, you may wish to use ./batch-import-xsl.sh instead -- see notes on XSLT above. | * Run the ./batch-import-marc.sh file (with a harvest subdirectory name as a parameter) to index all MARC records harvested from an OAI-PMH server. If you are harvesting non-MARC data, you may wish to use ./batch-import-xsl.sh instead -- see notes on XSLT above. |
===== Important notes ===== | ===== Important notes ===== |
| |
* Processing a large number of MARC files is currently very slow, since records are processed one file at a time. It may be worth developing a new tool to merge all the MARC records into a single file as an intermediate step before indexing them. | * Processing a large number of MARC files using default settings can be very slow, since records are processed one file at a time. The "combineRecords" and "combineRecordsTag" settings in oai.ini can be used to counteract this problem. These settings were introduced in VuFind 2.4. |
| |
| ===== Related Video ===== |
| |
| You can learn more about VuFind's OAI-PMH functionality in the [[https://vufind.org/wiki/videos:oai-pmh_server_and_harvest_functionality|OAI-PMH Server and Harvest Functionality]] tutorial video. |
---- struct data ---- | ---- struct data ---- |
| properties.Page Owner : |
---- | ---- |
| |