Table of Contents
Video 13: XML Change Tracking and Email Alerts
The thirteenth VuFind® instructional video discusses VuFind®'s email alert / selective dissemination of information (SDI) functionality and builds upon some concepts introduced in the earlier OAI-PMH and XML Indexing videos.
This video was recorded using VuFind 7. In VuFind 8.0, a change was made which impacts the content of this video:
- All XML import configuration examples now include an easy mechanism for turning on change tracking; it is no longer necessary to customize the XSLT, but simply to uncomment the relevant line in the matching properties file.
Hello. In this month's VuFind video I'm going to tell you about VuFind's email alert feature and along the way I'm going to build on some concepts that were previously discussed in the videos about OAI-PMH and indexing XML records. If you haven't already watched those videos, you might want to review them before proceeding. If you're ready to go forward, I'll start by pointing out that email alerts are a comparatively recent feature in VuFind having been introduced in release 6.1. Also known as selective dissemination of information or SDI if you want to be technical about it, the email alerts allow a user to subscribe to a search query and receive emails notifying them of any new results matching that query as they're added to your index. Obviously if we want to be able to notify users about new records, we need to be able to tell which records are new and this requires us to use VuFind's change tracking system which we first talked about in the OAI-PMH video where we showed you how to set up change tracking for MARC records. Since that time, we've also gone through how to index XML records and so in this video I'm going to show you how to set up change tracking for XML records so that you can set up email alerts regardless of what type of record you're working with. Just to very quickly recap, the point of VuFind's change tracking system is to identify at index time which records are new to the index and which records have changed since the last time they are indexed. This uses a couple of Solr fields called first index and last index to keep track of dates and in order to function properly, it needs to know the last change date of each record that's being indexed. Fortunately, using our OAI-PMH harvesting tools, it's not too difficult to do this. So I'm going to go to the terminal and switch into my VuFindHome directory. So in the XML indexing video, as an example, we indexed the Expositions Open Access Journal hosted at Villanova and so today I'm going to proceed with that example but just make some modifications to enable change tracking. The first thing that we need to do is, as I mentioned, we need to know the modification dates of records so that the change tracking system can properly judge what has changed and what hasn't when we reindex things.
Fortunately, OAI-PMH provides date information with every record. It just doesn't include it inside the record metadata itself. It's part of the OAI-PMH package that's used to deliver the metadata. But VuFind's OAIHarvester includes a setting called Inject Date, which we can use to take the modification date out of the OAI-PMH header and insert it into the metadata of the record so that once it's been harvested, we can take advantage of it.
This is pretty simple. I'm just going to edit my local harvest oai.ini file which we set up in that earlier video and go down to the Expositions section and I'm just going to add the line inject date equals date stamp. So in every Dublin core record that we harvest from Expositions, this will cause the harvester to add a date stamp tag which will contain the modification date of that record. This works very similar to the Inject ID mechanism that we were already using to get IDs into our harvested records.
So now that I've changed my harvesting rules, I need to actually re-harvest all of the records. And if I were to simply run the harvester now, I wouldn't get everything because OAI-PMH harvesting is incremental. So every time I run the harvest, it will only download the records that have been added or changed since my last harvest. However, in this case, because I've actually changed the way I want to harvest, I really want to get everything again. I don't want just an incremental update. Fortunately, it's really easy to force. If you find to do a full re-harvest, we just need to delete the directory of harvested records and start fresh.
So I'm going to RM minus RF local harvest Expositions to recursively force the delete of the Expositions harvest directory. And now I can repeat my harvest by running PHP harvest OAIP dot PHP and tell it to harvest the Expositions repository. And now in just a few seconds, I will have my 293 records downloaded again. And just to show you that this really worked, let me open up one of these files that we just harvested. And if I scroll to the end of the first line, I will find that it has added a date stamp tag with a modification date for this record.
So now that I have harvested data with dates in it, I just need to index it and take advantage of those dates so that you can try and contract changes. Unfortunately, VuFind's example XSLT sheets for indexing XML do not, as of this recording, include modification tracking support. I have just opened a JIRA ticket, VuFind 1461, which I will link to in the notes accompanying this video, so that a future release of VuFind will include examples to make this a little bit more convenient. But for today, I'm going to take advantage of this shortcoming to give an example of customizing an XSLT. And I'll just add the feature to the OJS example, which we're using for harvesting Expositions. So as with just about everything else in VuFind that you might want to customize, we can customize an XSLT by copying it from the core code into our local custom directory. So in this instance, the existing XSLT lives in the local import XSL directory. And so I need to create a local equivalent of that, so I have a place to put my custom version. So I'm just going to create the directory local import XSL. And then I'm going to copy from import XSL OJS multi-record XSL into local import XSL. Since OJS multi-record XSL is the style sheet we're using for transforming records harvested from OJS in this example. Then I just need to edit the newly created copy of the XSLT. And I'm just going to go down to the bottom and put some new fields following this URL here. So in XSLT, we're generating an XML document. In the case of indexing, we're creating the Solr document that adds a record to the Solr index. So I can just put in some Solr XML. Field name equals first indexed in order to create the first index value that we want to store. I'm going to close that field tag. And to fill in the actual value of the first index date, our VuFind provides a convenient helper function called get first index, which takes a few parameters and uses them to calculate the first index date. So here is where I need to do a little bit of XSLT. I use the XSL value of tag to tell the XSLT processor.
I want to insert a dynamic value here. And for the select value, I'm going to use the PHP function call, which is how we run custom PHP code from inside an XSLT. The first parameter to this is the name of the PHP function, which in this case is VuFind get first indexed. And then we follow this with all of the parameters that the actual PHP function requires. In this case, it takes three. First, the name of the Solr core that we are indexing into, which in this case is Biblio. Next, it takes the ID of the record that we want to look up. And this we can get from the identifier field of our harvested records. So I'm just going to say normalized space string identifier, which will take the contents of the identifier tag presented as a string, which is the data type that VuFind expects and normalize all the space. Finally, the third parameter is the last change date of the record that we're looking up. This is the reason we added that gate stamp tag to our harvested records. So again, I'm going to normalize space and convert the string, the date stamp. And that is my function call that I can just close my value of tag. And that should do the job. Of course, I realized that if you're not particularly familiar with XSLT or PHP, this could be a bit intimidating. But even if you don't fully understand it, you can copy and paste this out of the Jira ticket. And in a future, if you find release, this can all be configured through your property file instead of having to do this work. Now, as I mentioned, there are actually two fields we need, not just first index, but also last indexed. But the code is extremely similar for both of them. So I'm just going to copy and paste what I wrote for first indexed and change every instance of the word first to last. And that should do the job. Now, it all comes down to whether I typed that correctly. Let's do a quick test to see if I did. I'm just going to run the XML import tool import slash import XSL.php in test only mode to look at some of my recently harvested records.
I'm using this new set of changes. And if I did it correctly, after a moment, I see the generated Solr XML. And sure enough, it includes first index and last index dates showing the present time, which means it worked right, because all of these records have been indexed and you find, as far as it's concerned, for the first time today, because this is the first time I've indexed these records with change tracking turned on. If I were to reindex them in the future without the records themselves having changed at all, these dates would stay the same. That's how you find can detect what changes. Similarly, if I index new copies of the records that have been revised and have newer modification dates, then that last index field is going to reflect the date of the change. So having demonstrated in test mode that this all works correctly, now all that's left to do is to actually index all these records. I'm going to do that with the harvest batch import XSL scripts. I just tell that that I want to index the expositions directory using the OJS.properties configuration. And in just a few seconds, all of my records should be stored in the index with appropriate phase. So that's it for change tracking XML. We now have everything we need to turn on the email alert feature. And fortunately, this is pretty simple. In fact, much simpler than editing an XSLT. If we look at our local config, you find config.ini file, there's an account section, which I'm going to jump down to. And inside there, there are a few settings related to email alerts. The first is called schedule searches, which is turned off by default until you turn it on. All of the functionality around email alerts is hidden. So I'm just going to turn it on. The next setting is called force first scheduled email. The purpose of this is that by default, if a user subscribes to a search, they won't get an email until there are actually some new records added to those search results. But if you turn this on, you can change the behavior so that the first night after they've subscribed, they will receive an email, which sort of proves that the system is working.
I'm going to turn that on for today's demo, because otherwise we would have to wait some period of time for things to change before we could see the system sending a message.
The final setting here in config.ini is scheduled search frequencies, which allows you to customize the frequency at which notifications are sent. By default, the user can choose between never being notified, being notified every day if there's a change, or being notified every week if there's a change. But if you want to have different lengths of time between notifications, you can customize this here. But I'm going to leave it at the default for now.
Also, while I'm in config.ini anyway, I should mention email configuration, since obviously VuFind is only going to be able to send notification emails if it knows how to send email. And that may require some additional configuration. So I just want to bring your attention to the fact that there's a section in config.ini called mail, which is where you can set up SMTP settings, standard mail transfer, rather simple mail transfer protocol, to tell VuFind how to send mail. By default, it assumes that there is an SMTP process running on the VuFind server, but if there is not, or if you have an SMTP service that requires authentication, you can adjust these settings to point to the right place. There are a number of services that will allow you to send emails using SMTP. In some cases, without cost, if your volume is low, or you can run your own server.
It's also worth noting that while VuFind currently assumes you're using SMTP, the Laminas mail module that it relies on, for sending messages, is quite flexible and can support other methods as well. So with a little bit of custom code, it's possible to do things like use the local send mail program. It would just require a custom factory for your mail service. And if there's something that you think would be useful for the community, please reach out and suggest it. We can always add more configuration options in the future.
The other possibility here is that you can set your mail to test only, which will cause it to pretend to send emails, but not actually send them. Obviously, in a real world situation, you never want to do that.
But for this demo, since I don't have an SMTP server to show you right now, I'm just going to put this in test mode. So it will appear to be working, even though nothing will really be happening. So that's it. We've configured email alerts. So now, let me show you what this looks like in the user interface. So I'm going to go to my catalog. I'm going to search for test. And having performed this search, it's now in my search history. So if I scroll down to search options and go to the search history screen, you'll see that there's an alert scheduled drop down that did not used to be here. This is how users are able to get alerts. So I'm going to set myself up for daily alerts. I want to know if there are new tests. And of course, I have to log in because it needs to know my email address, which requires me to have an account. So it prompts me to do that. I log in. And now, as you can see, the search has moved from the recent searches list into my saved searches list. And it's scheduled for daily alerts. I can easily change this to unsubscribe or improve the increase the frequency if I wanted to. But for now, I'll just leave it here. So how do the alerts get sample? There is a command line tool, which does the job. And I will run that right now with PHP public index.php scheduled search slash notify. And as you can see, it was processing one saved search. Found my search because I had that setting to always send an email the first time someone subscribes. It treats this as new. So it sent a virtual message. And now it's done. So if you want to use this feature, all you need to do is add the scheduled search notify command to a cron job that runs probably on a daily basis. And then your server will send out emails as things change. If you already have a script set up to index records on a regular basis, popping in this notification at the end of that script would make a lot of sense to ensure that the latest information gets sent out at the moment that it is fresh.
And that's all there is to email alerts. Thanks for watching. And as always, feel free to reach out with questions or suggestions. See you in the next video.
This is an edited version of an automated transcript. Apologies for any errors.