Table of Contents
Video 13: XML Change Tracking and Email Alerts
The thirteenth VuFind® instructional video discusses VuFind®'s email alert / selective dissemination of information (SDI) functionality and builds upon some concepts introduced in the earlier OAI-PMH and XML Indexing videos.
This video was recorded using VuFind 7. In VuFind 8.0, a change was made which impacts the content of this video:
- All XML import configuration examples now include an easy mechanism for turning on change tracking; it is no longer necessary to customize the XSLT, but simply to uncomment the relevant line in the matching properties file.
This is a raw machine-generated transcript; it will be cleaned up as time permits.
hello in this month's VuFind video i'm going to tell you about VuFind's email alert feature and along the way i'm going to build on some concepts that were previously discussed in the videos about oai pmh and indexing xml records if you haven't already watched those videos you might want to review them before proceeding if you're ready to go forward i'll start by pointing out that email alerts are a comparatively recent feature in VuFind having been introduced in release 6.1 also known as selective dissemination of information or sdi if you want to be technical about it the email alerts allow a user to subscribe to a search query and receive emails notifying them of any new results matching that query as they're added to your index obviously if we want to be able to notify users about new records we need to be able to tell which records are new and this requires us to use VuFinds change tracking system which we first talked about in the oai pmh video where we showed you how to set up change tracking for marc records since that time uh we've also gone through how to index xml records and so in this video i'm going to show you how to set up change tracking for xml records so that you can set up email alerts regardless of what type of record you're working with just to very quickly recap the point of view finds change tracking system is to identify at index time uh which records are new to the index and which records have changed since the last time they are indexed this uses a couple of solar fields called first index and last indexed to keep track of dates and in order to function properly it needs to know the last change date of each record that's being indexed fortunately using our oai pmh harvesting tools it's not too difficult to do this so i'm going to go to the terminal and switch into my VuFind home directory so in the xml indexing video as an example we indexed the expositions open access journal hosted at villanova and so today i'm going to proceed with that example but just make some modifications to enable change tracking the first thing that we need to do is as i mentioned we need to know the modification dates of records so that the change tracking system can properly judge what has changed and what hasn't when we re-index things uh fortunately oai pmh provides date information with every record it just uh doesn't include it inside the record metadata itself it's part of the uh oai pmh package that's used to deliver the metadata but VuFind's oai harvester includes a setting called inject date which we can use to take the modification date out of the oai pmh header and insert it into the metadata of the record so that once it's been harvested we can take advantage of it this is pretty simple i'm just going to edit my local harvest oai.ini file which we set up in that earlier video and go down to the exposition section and i'm just going to add the line inject date equals date stamp so in every dublin core record that we harvest from expositions this will cause the harvester to add a date stamp tag which will contain the modification date of that record this works very similar to the inject id mechanism that we were already using to get ids into our harvested records so now that i've changed my harvesting rules i need to actually re-harvest all of the records and if i were to simply run the harvester now i wouldn't get everything because oai pmh harvesting is incremental so every time i run the harvest it will only download the records that have been added or changed since my last harvest however in this case because i've actually changed the way i want to harvest i really want to get everything again i don't want just an incremental update fortunately it's really easy to force if you find to do a full re-harvest we just need to delete the directory of harvested records and start fresh so i'm going to rm minus rf local harvest expositions to recursively force and delete of the exposition's harvest directory and now i can repeat my harvest by running php harvest oai dot php and tell it to harvest the expositions repository and now in just a few seconds i will have my 293 records downloaded again and just to show you that this really worked let me open up one of these files that we just harvested and if i scroll to the end of the first line i will find that it has added a date stamp tag with a modification date for this record so now that i have harvested data with dates in it i just need to index it and take advantage of those dates so that you find can track changes uh unfortunately vue finds example uh xslt sheets for indexing xml do not as of this recording include modification tracking support i have just opened a jira ticket VuFind 1461 which i will link to in the notes accompanying this video so that a future release if you find will include examples to make this a little bit more convenient but for today i'm going to take advantage of this shortcoming to give an example of customizing an xslt and i'll just add the feature to the ojs example which we're using for harvesting expositions so as with just about everything else in VuFind that you might want to customize we can customize an xslt by copying it from the core code into our local custom directory so in this instance the existing xslt lives in the local import xsl directory uh and so i need to create a local equivalent of that so i have a place to put my custom version so i'm just going to create the directory local import xsl and then i'm going to copy from import xsl ojs multirecord xsl into local import xsl since ojs multi-record xsl is the style sheet we're using for transforming records harvested from ojs in this example then i just need to edit the newly created copy of the xslt and i'm just going to go down to the bottom and put some new fields following this url here so in xslt we're generating an xml document in the case of indexing we're creating the solar document that adds a record to the solar index so uh i can just put in some solar xml field name equals first indexed in order to create the first index value that we want to store i'm going to close that field tag and to fill in the actual value of the first index date VuFind provides a convenient helper function called get first indexed which takes a few parameters and uses them to calculate the first index date so here is where i need to do a little bit of xslt i use the xsl value of tag to tell the xslt processor i want to insert a dynamic value here and for the select value i'm going to use the php function call which is how we run custom php code from inside an xslt the first parameter to this is the name of the php function which in this case is you find get first indexed and then we follow this with all of the parameters that the actual php function requires in this case we it takes three first the name of the solar core that we are indexing into which in this case is biblio next it takes the id of the record that we want to look up and this we can get from the identifier field of our harvested records so i'm just going to say normalize space string identifier which will take the contents of the identifier tag presented as a string which is the data type that VuFind expects and normalize all the space finally the third parameter is the last change date of the record that we're looking up this is the reason we added that gate stamp tag to our harvested records so again i'm going to normalize space and convert to string the date stamp and that is my function fall then i can just close my value of tag and that should do the job of course i realize that if you're not particularly familiar with xslt or php this could be a bit intimidating but even if you don't fully understand it you can copy and paste this out of the jira ticket and in a future if you find release this can all be configured through your properties file instead of having to do this work now as i mentioned there are actually two fields we need not just first index but also last indexed but the code is extremely similar for both of them so i'm just going to copy and paste what i wrote for first indexed and change every instance of the word first to last and that should do the job now it all comes down to whether i typed that correctly let's do a quick test to see if i did i'm just going to run the xml import tool import slash import xsl.php in test only mode to look at some of my recently harvested records um using this new set of changes and if i did it correctly after a moment i see the generated solar xml and sure enough it includes uh first index and last index dates showing the present time which means it worked right because all of these records have been indexed and VuFind as far as it's concerned for the first time today because this is the first time i've indexed these records with change tracking turned on uh if i were to re-index them in the future without the records themselves having changed at all um these dates would stay the same that's how VuFind can detect what changes similarly if i index new copies of the records that have been revised and have newer modification dates then that last index field is going to reflect the date of the change so having demonstrated in test mode that this all works correctly now all that's left to do is to actually index all these records i'm going to do that with the harvest batch import xsl script i just tell that that i want to index the expositions directory using the ojs.properties configuration and in just a few seconds all of my records should be stored in the index with appropriate base so that's it for change tracking xml we now have everything we need to turn on the email alert feature and fortunately this is pretty simple in fact much simpler than editing an xslt if we look at our local config you find config.ini file there's an account section which i'm going to jump down to and inside there there are a few settings related to email alerts the first is called schedule searches which is turned off by default until you turn it on all of the functionality around email alerts is hidden so i'm just going to turn it on the next setting is called force first scheduled email the purpose of this is that by default if a user subscribes to a search they won't get an email until there are actually some new records added to the search results but if you turn this on you can change the behavior so that the first night after they've subscribed they will receive an email which sort of proves that the system is working i'm going to turn that on for today's demo uh because otherwise we would have to uh wait some period of time for things to change uh before we could see the system sending a message the final setting here in config.ini is scheduled search frequencies which allows you to customize the frequency at which notifications are sent by default the user can choose between never being notified being notified every day if there's a change or being notified every week if there's a change but if you want to have different lengths of time between notifications you can customize this here but i'm going to leave it at the defaults for now also while i'm in config.ini anyway uh i should mention email configuration since obviously VuFind is only going to be able to send notification emails if it knows how to send email and that may require some additional configuration so i just want to bring your attention to the fact that there's a section in config.ini called mail which is where you can set up uh smtp settings standard mail transfer or rather simple mail transfer protocol to tell you find how to send mail by default it assumes that there is an smtp process running on the VuFind server but if there is not or if you have an smtp service that requires authentication you can adjust these settings to point to the right place there are a number of services that will allow you to send emails using smtp in some cases without cost if your volume is low or you can run your own server it's also worth noting that while VuFind currently assumes you're using smtp the laminas mail module that it relies on for sending messages is quite flexible and can support other methods as well so with a little bit of custom code it's possible to do things like use the local send mail program it would just require a custom factory for your mail service and if there's something that you think would be useful for the community please reach out and suggest it we can always add more configuration options in the future the other possibility here is that you can set your mail to test only which will cause it to pretend to send emails but not actually send them obviously in a real world situation you never want to do that but for this demo since i don't have an smtp server to show you right now i'm just going to put this in test mode so it will appear to be working even though nothing will really be happening so that's it we've configured email alerts so now let me show you what this looks like in the user interface so i'm going to go to my catalog i'm going to search for test and having performed this search it's now in my search history so if i scroll down to search options and go to the search history screen you'll see that there's an alert scheduled drop down that did not used to be here this is how users are able to uh get alerts so i'm going to set myself up for daily alerts i want to know if there are new tests and of course i have to log in because it needs to know my email address which requires me to have an account so it prompts me to do that i log in and now as you can see the search has moved from the recent searches list into my saved searches list and it's scheduled for daily alerts i could easily change this to unsubscribe or improve the increase the frequency if i wanted to but for now i'll just leave it here so how do the alerts get sent simple there is a command line tool which does the job and i will run that right now with php public index.php scheduled search slash notify and as you can see it was processing one saved search found my search because i um had that setting to always send an email the first time someone subscribes it treats this as new so it sent a virtual message and now it's done so if you want to use this feature all you need to do is add the scheduled search notify command to a cron job that runs probably on a daily basis and then your server will send out emails as things change if you already have a script set up to uh index records on a regular basis popping in this notification at the end of that script would make a lot of sense to ensure that the latest information gets sent out at the moment that it is fresh and that's all there is to email alerts uh thanks for watching and as always feel free to reach out with questions or suggestions see you in the next video