About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
indexing:dspace

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
indexing:dspace [2017/04/21 11:48] – [Steps] demiankatzindexing:dspace [2023/08/16 19:52] (current) demiankatz
Line 1: Line 1:
-====== How to index DSpace with VuFind ======+====== How to index DSpace with VuFind® ======
  
-These are the instructions used by the Naval Postgraduate School in Monterey, California to index DSpace records in VuFind.+===== 1. Make sure OAI-PMH is turned on and properly indexed in DSpace =====
  
-:!: ** These instructions were written for VuFind 2.x or newer; See [[legacy:indexing:dspace|this page]] for VuFind 1.x **+To retrieve records from DSpace, you will need to use the [[indexing:oai-pmh|OAI-PMH]] protocol.
  
-===== Steps =====+In newer versions of DSpace, OAI-PMH should be enabled by default; however, in DSpace 3.x and earlier, some [[indexing:dspace:enable_oai|additional configuration]] was needed.
  
-OAI must be enabled on the DSpace repository first: +You can check whether the service is enabled by visiting <nowiki>{DSpace Base URL}/oai</nowiki> in your web browser. If the service is turned on, you should see something similar to this:
-  - Modify the DSpace server config in **nginx.conf** on the DSpace server:<code>Location /oai/ { +
-    Proxy_set_header X-Forwarded-Host $host; +
-    Proxy_set_header X-Forwarded-Server $host; +
-    Proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; +
-    +
-    Proxy_pass http://yourdspacehostname:8080/oai/+
-    Proxy_redirect http://yourdspacehostname:8080/oai/  http://yourdspacehostname/oai;+
  
-    Proxy_buffering off; +{{ :indexing:dspace-oai.jpg?600 |Default Context. This is the default context of the DSpace OAI-PMH data provider. Identify. List Sets. List Metadata Formats. List Identifiers. List Records. }}
-    Proxy_store off;+
  
-    Proxy_connect_timeout 120; +Before harvesting data from DSpace to VuFind®, you should check that DSpace's OAI (Open Archives Initiative) Solr indexing is working correctlyFor thisin the above screen click the List Records button, and you should see something similar to:
-    Proxy_send_timeout 120; +
-    Proxy_read_timeout 120; +
-+
-</code> Comparable configuration in Apache makes use of [[http://httpd.apache.org/docs/2.2/mod/mod_proxy.html|mod_proxy]]. Note that the proxy configuration is only necessary if you are unable to open port 8080 to your VuFind instanceIf you are not limited by such restrictionsfeel free to use your full DSpace hostname appended with ":8080" and skip the above proxy configuration. +
-  - Modify the **server.xml** for the appropriate DSpace Tomcat instance in the **HOST** block:<code><Context path="/oai" docBase="/path_to_dspace/webapps/oai" debug="0" +
-    Reloadable="true" cachingAllowed="false" +
-    allowLinking="true" /></code> +
-  - Modify the **dspace.conf** config file for the appropriate DSpace instance:<code>... +
-harvest.includerestricted.oai = true +
-harvester.autoStart = true +
-...</code>+
  
-Then you may proceed to import the OAI feed into VuFind: +{{ :indexing:dspace-records.jpg?600 List of RecordsResults fetched: 1. Identifier oai:localhost:123456789/3Last Modified 2020-09-12 08:49:34Sets. Metadata}}
-  Modify **$VUFIND_LOCAL_DIR/harvest/oai.ini** as per [[#oaiini|oai.ini]] below +
-  - cd $VUFIND_HOME/harvest +
-  - php harvest_oai.php +
-  ./batch-import-xsl.sh DSpace dspace.properties+
  
-===== Required Files =====+In the above image, if you click Metadata, it should provide you with an oai_dc-formatted metadata response.
  
-==== oai.ini ==== +==== Fixing Missing Indexing ==== 
-<code> + 
-[DSpace]+If metadata does not display correctly through the server (for example, if you receive an "Error No matches for the query" message), this is probably a sign that DSpace's OAI Solr index has not been updated correctly. DSpace's Solr does not index metadata automatically when it is imported through a manual submission process or batch imported using SIP or AIP. To overcome this limitation, the Data Provider Repository (DPR) administrator must run a process from the command line to correctly update the index. First, switch to the "bin" subdirectory of your DSpace installation, then run: 
 + 
 +<code bash> 
 +./dspace oai import -o -c 
 +</code> 
 + 
 +The meaning of parameters –o and –c is as follows: 
 + 
 +  * -o Optimize index after indexing 
 +  * -c Clears the Solr index before indexing (it will import all items again) 
 + 
 +===== 2Import records into VuFind® using OAI-PMH harvest ===== 
 + 
 +These steps use VuFind®'s [[indexing:oai-pmh|OAI-PMH harvest tool]] and [[indexing:xml|XML indexing tool]]. You can follow the links to learn more about the tools. 
 + 
 +  - Modify **$VUFIND_LOCAL_DIR/harvest/oai.ini**<code>[DSpace]
 url=http://yourdspacehostname/oai/request url=http://yourdspacehostname/oai/request
 metadataPrefix=oai_dc metadataPrefix=oai_dc
Line 53: Line 45:
 dateGranularity=auto dateGranularity=auto
 harvestedIdLog=harvest.log harvestedIdLog=harvest.log
-</code>+combineRecords=true</code> 
 +  - Run these commands:<code>cd $VUFIND_HOME/harvest 
 +php harvest_oai.php 
 +./batch-import-xsl.sh DSpace dspace.properties</code> 
 + 
 +==== Troubleshooting ====
  
 +If you receive an error message during harvesting, you may need to rebuild the OAI indexes on your DSpace server. Log into that system and run these commands:
  
-==== dspace.properties ==== 
 <code> <code>
-[General] +<path to dspace directory>/bin/dspace oai clean-cache 
-xslt = dspace.xsl +<path to dspace directory>/bin/dspace oai import -c
-custom_class[] = VuFind +
- +
-[Parameters] +
-institution = "Library" +
-collection = "DSpace"+
 </code> </code>
  
-==== dspace.xsl ==== +After that processing completes, retry the harvest process on the VuFind® server as described above.
-<code> +
-<!-- available fields are defined in solr/biblio/conf/schema.xml --> +
-<xsl:stylesheet version="1.0" +
-    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" +
-    xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" +
-    xmlns:dc="http://purl.org/dc/elements/1.1/" +
-    xmlns:php="http://php.net/xsl" +
-    xmlns:xlink="http://www.w3.org/2001/XMLSchema-instance"> +
-    <xsl:output method="xml" indent="yes" encoding="utf-8"/> +
-    <xsl:param name="institution">My University</xsl:param> +
-    <xsl:param name="collection">DSpace</xsl:param> +
-    <xsl:param name="urlPrefix">http</xsl:param> +
-    <xsl:template match="oai_dc:dc"> +
-        <add> +
-            <doc> +
-                <!-- ID --> +
-                <!-- Important: This relies on an <identifier> tag being injected by the OAI-PMH harvester--> +
-                <field name="id"> +
-                    <xsl:value-of select="//identifier"/> +
-                </field>+
  
-                <!-- RECORDTYPE --> +===== 3. Customize Import Rules (optional) =====
-                <field name="recordtype">dspace</field>+
  
-                <!-- FULLRECORD --> +If you wish to customize the way your records are ingested, see the [[indexing:xml|indexing XML]] page for details. The instructions above use the example [[https://github.com/vufind-org/vufind/blob/dev/import/dspace.properties|dspace.properties]] and [[https://github.com/vufind-org/vufind/blob/dev/import/xsl/dspace.xsl|dspace.xsl]] files that ship with VuFind®. You can copy these into appropriate subdirectories of your [[configuration:local_settings_directory|$VUFIND_LOCAL_DIR directory]] and modify them as needed to change the way data is indexed.
-                <!-- disabled for now; records are so large that they cause memory problems! +
-                <field name="fullrecord"> +
-                    <xsl:copy-of select="php:function('VuFind::xmlAsText', //oai_dc:dc)"/+
-                </field> +
-                  -->+
  
-                <!-- ALLFIELDS --> +:!: If you change import rulesnote that you will need to remove your $VUFIND_LOCAL_DIR/harvest/DSpace directory, re-harvest the records, and repeat the indexing process in step 2 above.
-                <field name="allfields"> +
-                    <xsl:value-of select="normalize-space(string(//oai_dc:dc))"/> +
-                </field> +
- +
-                <!-- INSTITUTION --> +
-                <field name="institution"> +
-                    <xsl:value-of select="$institution" /> +
-                </field> +
- +
-                <!-- COLLECTION --> +
-                <field name="collection"> +
-                    <xsl:value-of select="$collection" /> +
-                </field> +
- +
-                <!-- LANGUAGE --> +
-                <xsl:if test="//dc:language"> +
-                    <xsl:for-each select="//dc:language"> +
-                        <xsl:if test="string-length() > 0"> +
-                            <field name="language"> +
-                                <xsl:value-of select="php:function('VuFind::mapString', normalize-space(string(.)), 'language_map_iso639-1.properties')"/> +
-                            </field> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- FORMAT --> +
-                <!-- populating the format field with dc.type instead, see TYPE below. +
-                     if you like, you can uncomment this to add a hard-coded format +
-                     in addition to the dynamic ones extracted from the record. +
-                <field name="format">Online</field> +
-                --> +
- +
-                <!-- SUBJECT --> +
-                <xsl:if test="//dc:subject"> +
-                    <xsl:for-each select="//dc:subject"> +
-                        <xsl:if test="string-length() > 0"> +
-                            <field name="topic"> +
-                                <xsl:value-of select="normalize-space()"/> +
-                            </field> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- DESCRIPTION --> +
-                <xsl:if test="//dc:description"> +
-                    <field name="description"> +
-                        <xsl:value-of select="//dc:description" /> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- ADVISOR / CONTRIBUTOR --> +
-                <xsl:if test="//dc:contributor[normalize-space()]"> +
-                    <field name="author_additional"> +
-                        <xsl:value-of select="//dc:contributor[normalize-space()]" /> +
-                    </field> +
-                </xsl:if> +
-                 +
-                <!-- TYPE --> +
-                <xsl:if test="//dc:type"> +
-                    <field name="format"> +
-                        <xsl:value-of select="//dc:type" /> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- AUTHOR --> +
-                <xsl:if test="//dc:creator"> +
-                    <xsl:for-each select="//dc:creator"> +
-                        <xsl:if test="normalize-space()"> +
-                            <!-- author is not a multi-valued fieldso we'll put +
-                                 first value there and subsequent values in author2. +
-                             --> +
-                            <xsl:if test="position()=1"> +
-                                <field name="author"> +
-                                    <xsl:value-of select="normalize-space()"/> +
-                                </field> +
-                                <field name="author-letter"> +
-                                    <xsl:value-of select="normalize-space()"/> +
-                                </field> +
-                            </xsl:if> +
-                            <xsl:if test="position()>1"> +
-                                <field name="author2"> +
-                                    <xsl:value-of select="normalize-space()"/> +
-                                </field> +
-                            </xsl:if> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- TITLE --> +
-                <xsl:if test="//dc:title[normalize-space()]"> +
-                    <field name="title"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_short"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_full"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_sort"> +
-                        <xsl:value-of select="php:function('VuFind::stripArticles', string(//dc:title[normalize-space()]))"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- PUBLISHER --> +
-                <xsl:if test="//dc:publisher[normalize-space()]"> +
-                    <field name="publisher"> +
-                        <xsl:value-of select="//dc:publisher[normalize-space()]"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- PUBLISHDATE --> +
-                <xsl:if test="//dc:date"> +
-                    <field name="publishDate"> +
-                        <xsl:value-of select="substring(//dc:date, 1, 4)"/> +
-                    </field> +
-                    <field name="publishDateSort"> +
-                        <xsl:value-of select="substring(//dc:date, 1, 4)"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- URL --> +
-               <xsl:for-each select="//dc:identifier"> +
-                   <xsl:if test="substring(., 1, string-length($urlPrefix)) = $urlPrefix"> +
-                       <field name="url"> +
-                           <xsl:value-of select="." /> +
-                       </field> +
-                   </xsl:if> +
-               </xsl:for-each> +
-            </doc> +
-        </add> +
-    </xsl:template> +
-</xsl:stylesheet> +
-</code> +
- +
-==== DspaceRecord.php ==== +
-<code> +
-<?php +
-require_once 'RecordDrivers/IndexRecord.php'; +
-class DspaceRecord extends IndexRecord { +
-     public function getSearchResult ($view = "list") { +
-          global $interface; +
-          $template = parent :: getSearchResult (); +
-          $interface -> assign ('summAjaxStatus', false); //Don't show Callnumber and Location +
-          $interface -> assign ('summDate', false); //Don't show date +
-          $interface -> assign ('summPublisher', $this->getPublishers()); //Show publisher name +
-          $interface -> assign ('summNotes', false); //Preventing to show any general note +
-          return $template; +
-     } +
-+
-?> +
-</code>+
  
 +===== 4. Customize Record Display (optional) =====
  
 +By default, VuFind® does not include any DSpace-specific display logic; records indexed from DSpace are displayed using the standard "SolrDefault" record driver and templates. However, the default import setup marks DSpace records with a record_format value of "dspace" which means that you can create a custom record driver named SolrDspace in order to create custom DSpace-only display options. See [[development:howtos:displaying_a_custom_field|displaying a custom field]] for some examples of record display customization.
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : 
 ---- ----
  
indexing/dspace.1492775300.txt.gz · Last modified: 2017/04/21 11:48 by demiankatz