Warning: This page has not been updated in over over a year and may be outdated or deprecated.
videos:indexing_xml_records
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
videos:indexing_xml_records [2020/05/25 15:30] – [Transcript] demiankatz | videos:indexing_xml_records [2023/04/26 13:35] (current) – crhallberg | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Video 7: Indexing XML Records ====== | ====== Video 7: Indexing XML Records ====== | ||
- | The seventh | + | The seventh |
Video is available as an [[https:// | Video is available as an [[https:// | ||
Line 8: | Line 8: | ||
- [[indexing: | - [[indexing: | ||
+ | |||
+ | ===== Update Notes ===== | ||
+ | |||
+ | :!: This video was recorded using VuFind 6.1. In VuFind 8.0, changes were made which impact the content of this video: | ||
+ | |||
+ | * The ojs-multirecord.xsl file has been removed, and the standard ojs.xsl file has been updated to handle both the single-record and multi-record cases. All of the information in this video about the advantages and disadvantages of each technique still applies, but it is no longer necessary to make changes to ojs.properties in order to support the multi-record case. The only changes you need to make are in oai.ini, to control how records are harvested. | ||
+ | * All of the other example XSLT files have been adjusted to support multi-record indexing, so you can apply this technique to records harvested from other systems as well. | ||
===== Transcript ===== | ===== Transcript ===== | ||
- | // This is a raw machine-generated transcript; it will be cleaned up in the near future. // | + | Welcome |
- | | + | |
- | tutorial video this is a continuation of | + | The first thing that I should emphasize is that MARC XML is a special exception. You can use VuFind' |
- | last month' | + | |
- | we learned how to harvest | + | So, I've already mentioned XSLT, so that was a bit of a spoiler. VuFind |
- | using tools bundled with view find this | + | |
- | month we are going to look at what to do | + | There are several versions of XSLT. I believe the language is up to version 3.0 right now, but PHP' |
- | with those records once we have them and | + | |
- | to talk about indexing XML generally | + | It's perhaps a little unfortunate that PHP doesn' |
- | first thing that I should emphasize is | + | |
- | that Mark XML is a special exception | + | For today' |
- | can use view find standard | + | |
- | tools which we talked about several | + | So I'm going to go to the command line where I'm in my VuFind |
- | months ago to import binary | + | |
- | Mark XML and that is much easier than | + | We also have OJS multi-record which I will show you a little later, so stay tuned for that. But to get things started, I'm just going to show you what the '' |
- | trying to use the tools we talked about | + | |
- | today to index mark which is of course | + | So within an XSLT, anything that you see that's prefixed with XSL colon is an XSLT command, and anything else is actually output that the XSLT is going to create. The XSLTs in VuFind |
- | extremely complex | + | |
- | your work by trying to load mark using | + | So the XSLTs are mostly defining |
- | XSLT there are other tools available but | + | |
- | for everything else what we talked about | + | We have an all fields field to index all of the text within the XML document, which uses some XSLT functions to extract that text. We use variables which XSLT supports to pass in institution and collection values. I will show you momentarily how these variables get set. XSLT supports looping for multi-values. So for example, this code here populates |
- | today should be helpful | + | |
- | mentioned XSLT so that was a bit of a | + | It calls a PHP function which translates the strings from two-letter or three-letter codes into all textual representations. Again, |
- | spoiler | + | |
- | loading | + | So, the XSLT is only part of what VuFind |
- | should talk a little bit about what XSLT | + | |
- | is it's short for extensible stylesheet | + | Let's look at a properties file that goes with that XSLT. As I just showed you, all of the import properties files live in the import |
- | language transformations and it's a | + | |
- | declarative programming language where | + | The multi-record is much faster. It just requires some extra work when you harvest, and I'm going to show you how to use both of these today. We'll start one at a time, and we'll work our way up to multi-record. You also can expose specific PHP functions directly into the XSLT by just creating a list of functions here. |
- | you build an XML document that tells the | + | |
- | XSLT engine how to transform one XML | + | By default, none of the package configurations do this, but it is a possibility if you want to make PHP functions available to your XSLT. You can also create a class full of custom functions and expose all of them to your XSLT. Most of VuFind' |
- | document into another XML document | + | |
- | there are several versions of XSLT I | + | So I showed you earlier that the institution and collection fields in the Solr index are getting set to variables, and the variables are set here. So by default, you're going to get institution |
- | believe the language is up to version | + | |
- | 3.0 right now but PHP is built-in XSLT | + | So, when I run the harvest, all my records go to the directory called |
- | processor only supports version 1.0 of | + | |
- | the language | + | So, this is a really important feature of view find' |
- | teach you XSLT today in five minutes | + | |
- | it's a bit of a project to learn so if | + | So in the case of OJS, the IDs have a long prefix: ''/ |
- | you do go off and read a tutorial about | + | |
- | it be sure you find one about the | + | Let me explain all of this in whole now that I've typed it all in. ID search and ID replace |
- | original version of the language and not | + | |
- | the later ones that add a lot of | + | But for the second pair, where we want to turn forward slashes into dashes, I can' |
- | additional features | + | |
- | it's perhaps a little unfortunate that | + | With all of that in place, we're now ready to harvest |
- | PHP doesn' | + | |
- | but this is compensated for quite a bit | + | So now we're ready to put all these pieces together. We have a directory full of XML files in Dublin Core format. We have an XSLT and a properties file. There is a command-line tool that comes with VuFind |
- | by the fact that there are bindings | + | |
- | between XSLT and PHP so you can write | + | So I chose "local/harvest/expositions/ |
- | custom functions in PHP and use them in | + | |
- | your XSLT so whenever | + | So I'm going to run this command and it outputs a Solr document which is created |
- | functionality | + | |
- | cover that gap with the PHP function | + | The XML import does not immediately commit changes to Solr. If you run the command and search for a record, it won't show up instantly. To ensure that Solr is up to date, run the util/ |
- | view find comes packed with a number of | + | |
- | example functions for common needs and | + | We have more than 200 of these records, and we don't want to have to index them by hand one at a time. Fortunately, |
- | lots of examples of XSLT as well so for | + | |
- | today' | + | The batch process is smart enough that if anything should go wrong during the index, it will not move files that failed to import correctly. So, if I had one bad record in this batch, all the good ones would get successfully indexed and moved into the process |
- | ojs journal called | + | |
- | hosted at Villanova | + | Now the index process has completed, and if I do a directory listing of local harvester |
- | journal system and open source journal | + | |
- | hosting platform | + | So what I'm going to do is: remove the whole local harvester |
- | so this is a good example of a real | + | |
- | world system that you can harvest from | + | First, |
- | an index interview find and if you find | + | |
- | includes some sample configurations and | + | The other thing we need to do is set up the '' |
- | an XSLT for harvesting from evade ojs | + | |
- | and indexing the resulting data so again | + | Let's just take a quick look at that other XSLT to see what the differences are. So I' |
- | it' | + | |
- | world example | + | It just matches within the scope of a single |
- | command line where I'm in my view find | + | |
- | home directory and just show a couple of | + | In any case, I've now showed |
- | files to give you a taste of what this | + | |
- | all looks like so all of you find sample | + | And now, if I were to run the single file import XSL.php script in test only mode on one of these files, you' |
- | XSLT sheets in the import / XSL | + | |
- | subdirectory | + | So it goes on and on and on. But the advantage of this is you remember how long it took to batch import the expositions |
- | can see we actually have three different | + | |
- | flavors of ojs XSLT s we have n LM o j | + | The only disadvantage to doing things this way that I can see is that, as I mentioned, the import script will skip files that fail the import. So if I had one corrupted record in this OJS instance and I ran this batch import, one of these three files would fail, and I would know there was a problem with one of the hundred records within that file, but it would be hard to figure out which one had caused the problem. So doing single-record importing may be valuable for troubleshooting purposes if nothing else, and I would suggest that if you do a batch import and you run into trouble, try doing a single import that will probably help you pinpoint the causes of your problems. |
- | sx SL which uses the National Library of | + | |
- | medicines | + | I should also note that, as I said, most of the example |
- | bit richer than the default | + | |
- | dublin core data but for today' | + | So that's it for this month. Thank you for listening, and we'll have more next time. |
- | demonstration | + | |
- | s XSL which indexes the | + | |
- | dublin core we also have a je s | + | |
- | multi-record which i will show you a | + | |
- | little later so stay tuned for that but | + | |
- | to get things started I'm just going to | + | |
- | show you what the OJS | + | |
- | XSL looks like as I mentioned an XSLT is | + | |
- | just an XML document and it really works | + | |
- | by pattern matching using XPath which is | + | |
- | a way of specifying particular locations | + | |
- | within an XML document | + | |
- | anything that you see that's prefixed | + | |
- | with XSL : is an XSLT command and | + | |
- | anything else is actually output that | + | |
- | the XSLT is going to create | + | |
- | is in view find are all designed to | + | |
- | create | + | |
- | which always have a top-level | + | |
- | that contains | + | |
- | fields that need to be added to solar | + | |
- | documents so the XSL T' | + | |
- | defining | + | |
- | rules using XSLT to fill those fields | + | |
- | with the appropriate data so for example | + | |
- | to get our unique ID we're pulling from | + | |
- | in XSL tab I mean an XML tag called | + | |
- | identifier | + | |
- | format so this is just putting this | + | |
- | literal value into every record which | + | |
- | would enable us to create an OG a | + | |
- | specific | + | |
- | we have an all fields field to index all | + | |
- | of the text within the XML document | + | |
- | which uses some XSLT functions to | + | |
- | extract that text | + | |
- | we use variables which XSLT supports to | + | |
- | pass in institution and collection | + | |
- | values I will show you momentarily how | + | |
- | these variables get set and XSLT | + | |
- | supports looping for multi values | + | |
- | example this code here populates | + | |
- | finds language field by looping through | + | |
- | every dublin | + | |
- | document and for any non-empty values | + | |
- | calls a PHP function which translates | + | |
- | the strings from two letter or | + | |
- | three-letter codes into all textual | + | |
- | representations | + | |
- | go into great depth about all how all of | + | |
- | this works here but hopefully this this | + | |
- | gives you a little taste and if you go | + | |
- | off and read an XSLT tutorial or two it | + | |
- | should make even more sense | + | |
- | so the XSLT is only part of what view | + | |
- | find needs to do XML indexing | + | |
- | part being a properties file for the | + | |
- | import tool which tells it not only | + | |
- | which XSLT to use but also what custom | + | |
- | PHP functions to make available and what | + | |
- | values to set for any custom | + | |
- | that are used within the XSLT so let's | + | |
- | look at a s dot properties file that | + | |
- | goes with that XSLT i just showed you | + | |
- | and all of the import properties files | + | |
- | live in the import | + | |
- | contain lots of comments explaining in | + | |
- | detail what all of the settings mean but | + | |
- | just to go through the highlights of | + | |
- | course there' | + | |
- | tells us which XSLT to use and as I | + | |
- | teased earlier you see with ojs you | + | |
- | actually have a choice of the regular | + | |
- | ojs XSL which will index one a dublin | + | |
- | core record at a time or the OJS multi | + | |
- | record | + | |
- | dublin core records all | + | |
- | in one file the multi-record is much | + | |
- | faster | + | |
- | when you harvest and I'm going to show | + | |
- | you how to use both of these today we'll | + | |
- | start one at a time and we'll work our | + | |
- | way up to multi-record | + | |
- | expose specific PHP functions directly | + | |
- | into the XSLT by just creating a list of | + | |
- | functions here by default none of the | + | |
- | package configurations do this but it is | + | |
- | a possibility if you want to make PHP | + | |
- | functions available to your XSLT you can | + | |
- | also create a class full of custom | + | |
- | functions and expose all of them to your | + | |
- | X and sub t and most of view finds | + | |
- | examples just to use a view find XSLT | + | |
- | import | + | |
- | functions for exposing custom behavior | + | |
- | like that string mapping I showed you in | + | |
- | the language import | + | |
- | there' | + | |
- | classes to XSLT using their fully | + | |
- | qualified names with the namespace but | + | |
- | all of you finds configurations truncate | + | |
- | off the namespace and just expose the | + | |
- | base class name which makes the XSLT a | + | |
- | little shorter and more readable | + | |
- | every time I call the view find function | + | |
- | I just say you find : : function name | + | |
- | instead of having to type you find slash | + | |
- | XSLT slashing or slash to be fun | + | |
- | so best truncate | + | |
- | there' | + | |
- | where you set the values that are | + | |
- | exposed as variables to the XSLT so I | + | |
- | showed you earlier that the institution | + | |
- | and collection fields in the solar index | + | |
- | are getting set to variables and the | + | |
- | variables are set here so by default | + | |
- | you're going to get institution and | + | |
- | collection set to ojs | + | |
- | so before | + | |
- | actual importing process we're going to | + | |
- | need some records to play with so let me | + | |
- | set up the oai-pmh | + | |
- | expositions | + | |
- | I'm going to edit my local harvest | + | |
- | ini file which we set up on last month' | + | |
- | video and just go to the bottom and | + | |
- | create a new section I'm going to call | + | |
- | it expositions | + | |
- | harvest all of my records | + | |
- | directory called expositions under my | + | |
- | local harvest directory | + | |
- | expositions journals villanova | + | |
- | metadata | + | |
- | core and now some new settings that i | + | |
- | didn't show you last time first of all | + | |
- | inject ID equals identifier | + | |
- | mentioned when we talked about oai-pmh | + | |
- | when we harvest using that protocol we | + | |
- | get both records and header data if you | + | |
- | find needs a unique identifier for | + | |
- | everything | + | |
- | get back from a IP mhm doesn' | + | |
- | necessarily have any kind of identifier | + | |
- | in it but the oai-pmh | + | |
- | always have a unique ID for every record | + | |
- | so by setting inject ID equals | + | |
- | identifier here we're telling the | + | |
- | harvester take the ID from the oai-pmh | + | |
- | header create an identifier tag inside | + | |
- | the XML that you're going to harvest and | + | |
- | save to disk put the ID value in there | + | |
- | and this is how the XSLT I showed you | + | |
- | earlier was able to pull an ID from the | + | |
- | identifier tag and use it in the index | + | |
- | so this is a really important feature of | + | |
- | view files harvester that enables you to | + | |
- | harvest just about anything and reliably | + | |
- | be able to index it in solar with a | + | |
- | unique ID | + | |
- | the IDS that you get back from oai-pmh | + | |
- | are often extremely verbose and they | + | |
- | would make for ugly and unreadable URLs | + | |
- | so we also have some settings called ID | + | |
- | search and ID replace which let us use | + | |
- | regular expressions to transform the | + | |
- | identifiers at the same time that we're | + | |
- | injecting them so in the case of ojs the | + | |
- | ids have this long prefix oai : o JSP KP | + | |
- | dot s fu CA : we don't want to show that | + | |
- | to our users so we're going to replace | + | |
- | it with expositions - so this way | + | |
- | everything that we index from | + | |
- | expositions | + | |
- | prefix on the ID so we don't have to | + | |
- | worry about expositions | + | |
- | with records from other sources | + | |
- | other thing about this is that the there | + | |
- | are several slashes in some of the IDs | + | |
- | and slashes in IDs can create problems | + | |
- | because slashes have a special meaning | + | |
- | in URLs and it requires extra | + | |
- | configuration of your webserver | + | |
- | things | + | |
- | rid of all the slashes as well so we're | + | |
- | gonna say ID search bracket bracket | + | |
- | equals type slash pipe ID replace | + | |
- | bracket bracket equals - so let me | + | |
- | explain all of this in whole now that | + | |
- | I've typed it all in I do search and ID | + | |
- | replace | + | |
- | file you can have as many pairs of | + | |
- | search and replace as you need to | + | |
- | transform your IDs you just have to be | + | |
- | sure the brackets on the end of ID | + | |
- | search and ID replace so that when the | + | |
- | configuration is read the multiple | + | |
- | values are processed correctly | + | |
- | search as I mentioned is a regular | + | |
- | expression | + | |
- | regular expressions supported in PHP and | + | |
- | those regular expressions require you to | + | |
- | start and end the expression | + | |
- | for the pattern you're matching with the | + | |
- | same character | + | |
- | where we're getting rid of the oai ojs | + | |
- | prefix | + | |
- | because that is a fairly common | + | |
- | convention for regular expressions | + | |
- | for the second pair where we want to | + | |
- | turn forward slashes into dashes I can | + | |
- | surround the forward | + | |
- | slashes that would confuse the regular | + | |
- | expression engine | + | |
- | characters instead so that it has | + | |
- | matching characters on the beginning and | + | |
- | end of the expression that don't | + | |
- | conflict with the internal part I could | + | |
- | have chosen a different character here | + | |
- | it doesn' | + | |
- | just look pretty | + | |
- | so with all of that in place we're now | + | |
- | ready to harvest | + | |
- | just need to run of view finds oai-pmh | + | |
- | harvests | + | |
- | so I run PHP harvest | + | |
- | and I tell it I want to harvest | + | |
- | expositions | + | |
- | down a whole bunch of records 285 | + | |
- | records one for each record in | + | |
- | expositions | + | |
- | and they are all in my local harvest | + | |
- | expositions directory | + | |
- | to put all these pieces together | + | |
- | a directory full of XML files in dublin | + | |
- | core format | + | |
- | a properties file so there is a | + | |
- | command-line tool that comes with you | + | |
- | find called | + | |
- | import / import - XSLT HP and this has a | + | |
- | nice - - test - only mode that you can | + | |
- | use if you want to see what it does | + | |
- | without actually writing anything into | + | |
- | solar so I'm going to use that for the | + | |
- | first run here just to demonstrate | + | |
- | what's happened so the first parameter | + | |
- | to this command | + | |
- | is the name of an XML file so I'm going | + | |
- | to choose just one is these files more | + | |
- | or less at random | + | |
- | so I chose local harvest expositions | + | |
- | five eight eight six eight five one nine | + | |
- | two expositions article 2486 that big | + | |
- | number at the front is actually just a | + | |
- | timestamp the the harvester | + | |
- | time of harvest on every file downloads | + | |
- | the second parameter is the name of the | + | |
- | properties file I've configured to do | + | |
- | the import and I don't need to tell it | + | |
- | the path to that file I just need to | + | |
- | tell it the file name because like many | + | |
- | things in view find what it's going to | + | |
- | do is its first going to look in view | + | |
- | find local dur slash import to see if we | + | |
- | have a local customized properties file | + | |
- | if it doesn' | + | |
- | to fall back and look in if you find | + | |
- | home slash import and use the default | + | |
- | one so since I haven' | + | |
- | anything yet it's just going to to go | + | |
- | for the defaults | + | |
- | this command and it outputs a solar | + | |
- | document which itted by transforming the | + | |
- | input so as you can see like in all | + | |
- | fields it's just a whole bunch of text | + | |
- | it extracted all the free text from the | + | |
- | XML taking the tags off of it there's | + | |
- | that hard-coded record format of ojs the | + | |
- | ID is that identifier that we injected | + | |
- | and as you can see it's prefixed with | + | |
- | expositions like we told it to be and | + | |
- | the slash that would have been here has | + | |
- | become a dash so all my regular | + | |
- | expressions worked and here's my | + | |
- | University | + | |
- | those variables that were set in the | + | |
- | properties file and a whole bunch of | + | |
- | other stuff so let's repeat that command | + | |
- | but just take the test only off to | + | |
- | actually index it into solar the exit | + | |
- | import does not immediately commit | + | |
- | changes to solar so if you just run this | + | |
- | command and try to search for a record | + | |
- | it won't show up instantly | + | |
- | ensure that solar is all the way up to | + | |
- | date is to run the utility MIT PHP | + | |
- | script to send a solar commit | + | |
- | gonna do that just so I can demonstrate | + | |
- | that this actually | + | |
- | to my browser I loaded up this search | + | |
- | for all records prior to indexing | + | |
- | you can see there there were 250 records | + | |
- | at that time but if I repeat the search | + | |
- | now there are now 251 and as you can see | + | |
- | in the institution facet we have one | + | |
- | from my University | + | |
- | from that Oh Jas properties file so if I | + | |
- | click on that to filter down here is the | + | |
- | nonviolence | + | |
- | the XML so that's really great but we | + | |
- | have more than 200 of these records we | + | |
- | don't want to have to index them by hand | + | |
- | one at a time | + | |
- | fortunately | + | |
- | this slash batch import excess LSH which | + | |
- | will take the name of a directory under | + | |
- | your local harvest path and the name of | + | |
- | a property file and it will loop through | + | |
- | and index every single file in that | + | |
- | directory using that configuration | + | |
- | saving you lots and lots of typing | + | |
- | as it does the indexing | + | |
- | a subdirectory of your harvest directory | + | |
- | called processed and it moves those | + | |
- | files into the process directory | + | |
- | the end of this process after all two | + | |
- | hundred-plus files have been indexed I | + | |
- | should have an empty expositions | + | |
- | directory with a process subdirectory | + | |
- | containing all the hundreds of records | + | |
- | that got indexed the batch process is | + | |
- | also smart enough that if anything | + | |
- | should go | + | |
- | during the index it will not move files | + | |
- | that failed to import correctly | + | |
- | had one bad record in this batch all the | + | |
- | good ones would get successfully indexed | + | |
- | and moved into the processed | + | |
- | but the bad one would stay there | + | |
- | and I could then for example | + | |
- | test mode I showed you on the one record | + | |
- | to see exactly what the error message is | + | |
- | that was preventing the transformation | + | |
- | or to see oh there' | + | |
- | field or something to troubleshoot | + | |
- | and fix it the other thing that will be | + | |
- | left in the expositions directory is a | + | |
- | file called | + | |
- | will just contain the date of the last | + | |
- | time we ran the oai-pmh | + | |
- | allows incremental updates | + | |
- | believe I mentioned last time but that | + | |
- | means that if I ran the harvest again | + | |
- | tomorrow | + | |
- | and two new records had been added to | + | |
- | ojs it would only harvest those two and | + | |
- | then I could index those and I wouldn' | + | |
- | have to reenact the other 200 so now the | + | |
- | the index process has completed and if I | + | |
- | just do a directory listing of local | + | |
- | artists | + | |
- | not lying to you all that's left here is | + | |
- | the last harvest text and a processed | + | |
- | directory | + | |
- | finder | + | |
- | and sure enough | + | |
- | they' | + | |
- | links back to ojs to read the full | + | |
- | article | + | |
- | you may have noticed I had to ramble for | + | |
- | quite a lot of time while those 200 | + | |
- | records indexed | + | |
- | one at a time actually takes quite a | + | |
- | while and if you have thousands or tens | + | |
- | of thousands of records it's even worse | + | |
- | and that is why the multi record | + | |
- | function I talked about is really handy | + | |
- | so what I'm going to do is I'm going to | + | |
- | remove the whole local harvest | + | |
- | expositions directory so we can start | + | |
- | over and I can show you how much faster | + | |
- | this is if we do records and batches | + | |
- | instead of wanting to talk so first I'm | + | |
- | going to edit my oai harvesting | + | |
- | configuration in local harvest I died | + | |
- | and I all I need to do is add one more | + | |
- | setting at the bottom of this called | + | |
- | combined | + | |
- | that is going to do is tell the | + | |
- | harvester instead of writing one dublin | + | |
- | core record into each file you want to | + | |
- | create one file for every batch of | + | |
- | records that come back over IP MH and | + | |
- | you're going to wrap them in a tag | + | |
- | called collection | + | |
- | different tag name there' | + | |
- | setting you can use for that but for | + | |
- | this example just turning on combined | + | |
- | records and accepting the default tag of | + | |
- | collection is good enough | + | |
- | thing we need to do is set up the OJS | + | |
- | properties file to use the combined XSLT | + | |
- | sign so let's copy the default import | + | |
- | Jo stop properties into local import | + | |
- | because as with everything else files | + | |
- | inside local are going to override | + | |
- | defaults in the core code and let's edit | + | |
- | local import | + | |
- | I'm just going to comment out a JSX SL | + | |
- | and uncomment | + | |
- | so let's just take a quick look at that | + | |
- | other XSLT to see what the differences | + | |
- | are so I' | + | |
- | ojs multi-record | + | |
- | template matching | + | |
- | the top-level collection tag and then | + | |
- | it's going to loop through the | + | |
- | collection looking for a IDC and apply | + | |
- | templates to each of them in turn and | + | |
- | then there' | + | |
- | this code is quite similar to the single | + | |
- | record code it just matches within the | + | |
- | scope of a single | + | |
- | globally looking for particular tags and | + | |
- | this is really probably a better way to | + | |
- | approach all XSLT writing | + | |
- | between multi record and single record | + | |
- | is that I wrote the single record one | + | |
- | when I didn't know what I was doing and | + | |
- | somebody else he's better at XSLT than | + | |
- | me wrote the multi record one | + | |
- | so welcome contributions of multi record | + | |
- | import scripts for other metadata | + | |
- | formats as well but I do offer the | + | |
- | single and multi record options because | + | |
- | there are scenarios where each can be | + | |
- | useful | + | |
- | more momentarily | + | |
- | in any case I've now shown you the multi | + | |
- | record XSLT I've reconfigured the | + | |
- | oai-pmh | + | |
- | and I've configured | + | |
- | use the multi record XSLT so everything | + | |
- | should be aligned correctly | + | |
- | the oai-pmh | + | |
- | harvest underscore oai dot PHP so | + | |
- | harvest expositions and the harvest | + | |
- | should take the same amount of time | + | |
- | we're still harvesting the same 285 | + | |
- | records | + | |
- | but if I look inside | + | |
- | expositions this time there are only | + | |
- | three files there because the oai server | + | |
- | provided us with three batches of | + | |
- | records and each of those got saved to a | + | |
- | single file | + | |
- | and now if I were to run single file | + | |
- | import XSL dot PHP script in test only | + | |
- | mode on one of these files you will see | + | |
- | that the output is much longer than | + | |
- | before | + | |
- | because now instead of just having one | + | |
- | record | + | |
- | a whole collection of records of them to | + | |
- | be precise | + | |
- | advantage of this is you remember how | + | |
- | long it took to batch import the | + | |
- | expositions | + | |
- | only one record | + | |
- | faster it is when there are only three | + | |
- | files containing | + | |
- | harvest | + | |
- | physicians | + | |
- | file one two three word up | + | |
- | so that was a dramatic improvement in | + | |
- | performance | + | |
- | doing things this way that I can see is | + | |
- | that as I mentioned the import script | + | |
- | will skip files that fail the import | + | |
- | if I had one corrupted record in this | + | |
- | ajs instance and I ran this batch import | + | |
- | one of these three files would fail and | + | |
- | I would know there was a problem with | + | |
- | one of the hundred records within that | + | |
- | file but it would be hard to figure out | + | |
- | which one had caused the problem | + | |
- | doing single record importing may be | + | |
- | valuable for troubleshooting purposes if | + | |
- | nothing else and I would suggest that if | + | |
- | you do a batch import and you run into | + | |
- | trouble try doing a single import that | + | |
- | will probably help you pinpoint the | + | |
- | causes of your problems I should also | + | |
- | note that as I said most of the example | + | |
- | XSL teas are things I wrote that are | + | |
- | designed for a single record at a time | + | |
- | there's still some work to be done | + | |
- | creating batch import | + | |
- | the format' | + | |
- | that's one where this work has already | + | |
- | been done if anyone needs multi record | + | |
- | import for another format that's | + | |
- | something I would welcome contributions | + | |
- | of so that it could be shared with | + | |
- | everyone else using the project and I | + | |
- | expect that over time our repertoire | + | |
- | will expand and improve | + | |
- | this month | + | |
- | thank you for listening and we'll have | + | |
- | more next time | + | |
+ | //This is an edited version of an automated transcript. Apologies for any errors.// | ||
---- struct data ---- | ---- struct data ---- | ||
+ | properties.Page Owner : | ||
---- | ---- | ||
videos/indexing_xml_records.1590420647.txt.gz · Last modified: 2020/05/25 15:30 by demiankatz