About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
legacy:indexing:dspace

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

legacy:indexing:dspace [2017/04/21 11:47] – created demiankatzlegacy:indexing:dspace [2018/12/19 17:21] (current) demiankatz
Line 1: Line 1:
 ====== How to index DSpace with VuFind ====== ====== How to index DSpace with VuFind ======
  
-These are the instructions used by the Naval Postgraduate School in Monterey, California to index DSpace records in VuFind. +// This outdated page has been deleted to prevent confusion; for current documentation, see [[indexing:dspace|this page]]. To view old content for historical interest, see the "Old Revisionslist below. //
- +
-:!: ** These instructions were written for VuFind 1.x; see [[indexing:dspace|this page]] for VuFind 2.x or newer. ** +
- +
-===== Steps ===== +
- +
-OAI must be enabled on the DSpace repository first: +
-  - Modify the DSpace server config in **nginx.conf** on the DSpace server:<code>Location /oai/ { +
-    Proxy_set_header X-Forwarded-Host $host; +
-    Proxy_set_header X-Forwarded-Server $host; +
-    Proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; +
-    +
-    Proxy_pass http://yourdspacehostname:8080/oai/; +
-    Proxy_redirect http://yourdspacehostname:8080/oai/  http://yourdspacehostname/oai; +
- +
-    Proxy_buffering off; +
-    Proxy_store off; +
- +
-    Proxy_connect_timeout 120; +
-    Proxy_send_timeout 120; +
-    Proxy_read_timeout 120; +
-+
-</code> Comparable configuration in Apache makes use of [[http://httpd.apache.org/docs/2.2/mod/mod_proxy.html|mod_proxy]]. Note that the proxy configuration is only necessary if you are unable to open port 8080 to your VuFind instance. If you are not limited by such restrictions, feel free to use your full DSpace hostname appended with ":8080" and skip the above proxy configuration. +
-  - Modify the **server.xml** for the appropriate DSpace Tomcat instance in the **HOST** block:<code><Context path="/oai" docBase="/path_to_dspace/webapps/oai" debug="0" +
-    Reloadable="true" cachingAllowed="false" +
-    allowLinking="true" /></code> +
-  - Modify the **dspace.conf** config file for the appropriate DSpace instance:<code>... +
-harvest.includerestricted.oai = true +
-harvester.autoStart = true +
-...</code> +
- +
-Then you may proceed to import the OAI feed into VuFind: +
-  - Modify **$VUFIND_HOME/harvest/oai.ini** as per [[#oaiini|oai.ini]] below +
-  - Modify **$VUFIND_HOME/import/dspace.properties** as per [[#dspaceproperties|dspace.properties]] below +
-  - Modify **$VUFIND_HOME/import/xsl/dspace.xsl** as per [[#dspacexsl|dspace.xsl]] below +
-  - Modify **$VUFIND_HOME/web/RecordDrivers/DspaceRecord.php** as per [[#dspacerecordphp|DspaceRecord.php]] below +
-  - cd $VUFIND_HOME/harvest +
-  - php harvest_oai.php +
-  - sh batch-import-xsl.sh ./DSpace ../import/dspace.properties +
-  - ../vufind.sh restart +
- +
-===== Required Files ===== +
- +
-==== oai.ini ==== +
-<code> +
-[DSpace] +
-url=http://yourdspacehostname/oai/request +
-metadataPrefix=oai_dc +
-idSearch[]="/^oai:yourdspacehostname:/" +
-idReplace[]="ir-" +
-idSearch[]="/\//" +
-idReplace[]="-" +
-injectDate="datestamp" +
-injectId="identifier" +
-dateGranularity=auto +
-harvestedIdLog=harvest.log +
-</code> +
- +
- +
-==== dspace.properties ==== +
-<code> +
-[General] +
-xslt = dspace.xsl +
-custom_class[] = VuFind +
- +
-[Parameters] +
-institution = "Library" +
-collection = "DSpace" +
-</code> +
- +
-==== dspace.xsl ==== +
-<code> +
-<!-- available fields are defined in solr/biblio/conf/schema.xml --> +
-<xsl:stylesheet version="1.0" +
-    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" +
-    xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" +
-    xmlns:dc="http://purl.org/dc/elements/1.1/" +
-    xmlns:php="http://php.net/xsl" +
-    xmlns:xlink="http://www.w3.org/2001/XMLSchema-instance"> +
-    <xsl:output method="xml" indent="yes" encoding="utf-8"/> +
-    <xsl:param name="institution">My University</xsl:param> +
-    <xsl:param name="collection">DSpace</xsl:param> +
-    <xsl:param name="urlPrefix">http</xsl:param> +
-    <xsl:template match="oai_dc:dc"> +
-        <add> +
-            <doc> +
-                <!-- ID --> +
-                <!-- Important: This relies on an <identifier> tag being injected by the OAI-PMH harvester. --> +
-                <field name="id"> +
-                    <xsl:value-of select="//identifier"/> +
-                </field> +
- +
-                <!-- RECORDTYPE --> +
-                <field name="recordtype">dspace</field> +
- +
-                <!-- FULLRECORD --> +
-                <!-- disabled for now; records are so large that they cause memory problems! +
-                <field name="fullrecord"> +
-                    <xsl:copy-of select="php:function('VuFind::xmlAsText', //oai_dc:dc)"/> +
-                </field> +
-                  --> +
- +
-                <!-- ALLFIELDS --> +
-                <field name="allfields"> +
-                    <xsl:value-of select="normalize-space(string(//oai_dc:dc))"/> +
-                </field> +
- +
-                <!-- INSTITUTION --> +
-                <field name="institution"> +
-                    <xsl:value-of select="$institution" /> +
-                </field> +
- +
-                <!-- COLLECTION --> +
-                <field name="collection"> +
-                    <xsl:value-of select="$collection" /> +
-                </field> +
- +
-                <!-- LANGUAGE --> +
-                <xsl:if test="//dc:language"> +
-                    <xsl:for-each select="//dc:language"> +
-                        <xsl:if test="string-length() > 0"> +
-                            <field name="language"> +
-                                <xsl:value-of select="php:function('VuFind::mapString', normalize-space(string(.)), 'language_map_iso639-1.properties')"/> +
-                            </field> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- FORMAT --> +
-                <!-- populating the format field with dc.type instead, see TYPE below. +
-                     if you like, you can uncomment this to add a hard-coded format +
-                     in addition to the dynamic ones extracted from the record. +
-                <field name="format">Online</field> +
-                --> +
- +
-                <!-- SUBJECT --> +
-                <xsl:if test="//dc:subject"> +
-                    <xsl:for-each select="//dc:subject"> +
-                        <xsl:if test="string-length() > 0"> +
-                            <field name="topic"> +
-                                <xsl:value-of select="normalize-space()"/> +
-                            </field> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- DESCRIPTION --> +
-                <xsl:if test="//dc:description"> +
-                    <field name="description"> +
-                        <xsl:value-of select="//dc:description" /> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- ADVISOR / CONTRIBUTOR --> +
-                <xsl:if test="//dc:contributor[normalize-space()]"> +
-                    <field name="author_additional"> +
-                        <xsl:value-of select="//dc:contributor[normalize-space()]" /> +
-                    </field> +
-                </xsl:if> +
-                 +
-                <!-- TYPE --> +
-                <xsl:if test="//dc:type"> +
-                    <field name="format"> +
-                        <xsl:value-of select="//dc:type" /> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- AUTHOR --> +
-                <xsl:if test="//dc:creator"> +
-                    <xsl:for-each select="//dc:creator"> +
-                        <xsl:if test="normalize-space()"> +
-                            <!-- author is not a multi-valued field, so we'll put +
-                                 first value there and subsequent values in author2. +
-                             --> +
-                            <xsl:if test="position()=1"> +
-                                <field name="author"> +
-                                    <xsl:value-of select="normalize-space()"/+
-                                </field> +
-                                <field name="author-letter"> +
-                                    <xsl:value-of select="normalize-space()"/> +
-                                </field> +
-                            </xsl:if> +
-                            <xsl:if test="position()>1"> +
-                                <field name="author2"> +
-                                    <xsl:value-of select="normalize-space()"/> +
-                                </field> +
-                            </xsl:if> +
-                        </xsl:if> +
-                    </xsl:for-each> +
-                </xsl:if> +
- +
-                <!-- TITLE --> +
-                <xsl:if test="//dc:title[normalize-space()]"> +
-                    <field name="title"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_short"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_full"> +
-                        <xsl:value-of select="//dc:title[normalize-space()]"/> +
-                    </field> +
-                    <field name="title_sort"> +
-                        <xsl:value-of select="php:function('VuFind::stripArticles', string(//dc:title[normalize-space()]))"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- PUBLISHER --> +
-                <xsl:if test="//dc:publisher[normalize-space()]"> +
-                    <field name="publisher"> +
-                        <xsl:value-of select="//dc:publisher[normalize-space()]"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- PUBLISHDATE --> +
-                <xsl:if test="//dc:date"> +
-                    <field name="publishDate"> +
-                        <xsl:value-of select="substring(//dc:date, 1, 4)"/> +
-                    </field> +
-                    <field name="publishDateSort"> +
-                        <xsl:value-of select="substring(//dc:date, 1, 4)"/> +
-                    </field> +
-                </xsl:if> +
- +
-                <!-- URL --> +
-               <xsl:for-each select="//dc:identifier"> +
-                   <xsl:if test="substring(., 1, string-length($urlPrefix)) = $urlPrefix"> +
-                       <field name="url"> +
-                           <xsl:value-of select="." /> +
-                       </field> +
-                   </xsl:if> +
-               </xsl:for-each> +
-            </doc> +
-        </add> +
-    </xsl:template> +
-</xsl:stylesheet> +
-</code> +
- +
-==== DspaceRecord.php ==== +
-<code> +
-<?php +
-require_once 'RecordDrivers/IndexRecord.php'; +
-class DspaceRecord extends IndexRecord { +
-     public function getSearchResult ($view = "list") { +
-          global $interface; +
-          $template = parent :: getSearchResult (); +
-          $interface -> assign ('summAjaxStatus', false); //Don't show Callnumber and Location +
-          $interface -> assign ('summDate', false); //Don't show date +
-          $interface -> assign ('summPublisher', $this->getPublishers()); //Show publisher name +
-          $interface -> assign ('summNotes', false); //Preventing to show any general note +
-          return $template; +
-     } +
-+
-?> +
-</code> +
- +
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : 
 ---- ----
  
legacy/indexing/dspace.1492775227.txt.gz · Last modified: 2017/04/21 11:47 by demiankatz