Differences

This shows you the differences between two versions of the page.

--- videos:indexing_xml_records [2023/04/25 18:56] – [Transcript] crhallberg
+++ videos:indexing_xml_records [2023/04/26 13:35] (current) – crhallberg
@@ Line 1: / Line 1: @@
 ====== Video 7: Indexing XML Records ======
-The seventh VuFind instructional video explains how to import XML records using XSLT, with an emphasis on records that were harvested via OAI-PMH.
+The seventh VuFind® instructional video explains how to import XML records using XSLT, with an emphasis on records that were harvested via OAI-PMH.
 Video is available as an [[https://vufind.org/video/Ingesting_XML.mp4|mp4 download]] or through [[https://www.youtube.com/watch?v=qzY5nC9PLLQ&feature=youtu.be|YouTube]].
@@ Line 56: / Line 56: @@
 So, this is a really important feature of view find's harvester that enables you to harvest just about anything and reliably be able to index it in Solr with a unique ID. But the IDs that you get back from OAI PMH are often extremely verbose, and they would make for ugly and unreadable URLs. So, we also have some settings called ID search and ID replace, which let us use regular expressions to transform the identifiers at the same time that we're injecting them.
-So in the case of OJS, the IDs have a long prefix: ''/oai:ojs.pkp.sfu.ca:/''. We don't want to show that to our users, so we're going to replace it with ''expositions-''. This way, everything that we index from Expositions will have a distinctive prefix on the ID, so we don't have to worry about Expositions records clashing with records from other sources. The other thing about this is that there are several slashes in some of the IDs, and slashes in IDs can create problems because slashes have a special meaning in URLs, and it requires extra configuration of your web server to make things work nicely. So let's just get rid of all the slashes as well. We're going to say ''isSearch[] = '|/|''' and ''isReplace[] = '-'''.
+So in the case of OJS, the IDs have a long prefix: ''/oai:ojs.pkp.sfu.ca:/''. We don't want to show that to our users, so we're going to replace it with ''expositions-''. This way, everything that we index from Expositions will have a distinctive prefix on the ID, so we don't have to worry about Expositions records clashing with records from other sources. The other thing about this is that there are several slashes in some of the IDs, and slashes in IDs can create problems because slashes have a special meaning in URLs, and it requires extra configuration of your web server to make things work nicely. So let's just get rid of all the slashes as well. We're going to say ''isSearch[] = '|/|' '' and ''isReplace[] = '-' ''.
 Let me explain all of this in whole now that I've typed it all in. ID search and ID replace are repeatable settings in the file. You can have as many pairs of search and replace as you need to transform your IDs. You just have to be sure the brackets on the end of ID search and ID replace, so that when the configuration is read, the multiple values are processed correctly. ID search, as I mentioned, is a regular expression. It uses the Perl-style regular expressions supported in PHP, and those regular expressions require you to start and end the expression for the pattern you're matching with the same character. So, in this first example, where we're getting rid of the OAI OJS prefix, I surrounded it with matching forward slashes because that is a fairly common convention for regular expressions.