Table of Contents
Video 21: Search System: An Overview and Update
The twenty-first VuFind® instructional video provides an overview of VuFind®'s search architecture as well as a discussion of significant changes introduced to the system in release 8.0. This is fairly technical discussion, designed to be most useful and relevant to developers working with VuFind®'s search code.
Video is available as an mp4 download or through YouTube.
Related Resources
Transcript
This is a machine-generated transcript and will be corrected as time permits.
hello and welcome to this month's viewfinder video uh this time around we are covering a more technical and less hands-on topic but uh i thought this was important and timely and might come in handy so i hope you enjoy it we have a few goals this month first of all to provide a quick overview of viewfinder internal search system the code that actually brings up your search results when you use the search function also to explain some changes to the search system that were introduced in the most recent major release viewfinder 8 and to provide guidance so that if you have done local customizations related to the search system uh you can accommodate our new changes and be aware of some things that are coming up in the future in viewfinder so first a little bit of background uh the current search system in viewfind uh has been in place more or less unchanged since the 2.0 release in 2013. uh it was designed to replace the logic that was in viewfind one which was much harder to understand and less extensible of course nothing is perfect and that's why the most recent release 8.0 addressed some shortcomings of the existing system so first of all let's talk a little bit about how the system is designed uh there are a few key design principles that were used to build this first of all separation of concerns as i mentioned in the very early versions of viewfind the search code was quite monolithic and hard to follow and so this system was designed in direct response to that rather than one big class that does lots of things we have many smaller classes with distinct responsibilities that interact with one another so you know one class to parse user parameters and create a search query another one to connect to the actual api that does the searching etc etc we also make a distinction between general purpose code which is stored in the viewfind search namespace all one word and which is sort of the most reusable part of the system and the most abstract versus application specific integration code which is in the viewfind backslash search namespace which is where the viewfinder application actually makes use of the searching system another main design principle here is parameter aggregation so there's a class called param bag which is a container for parameters and when you do a solar search for example a number of pieces of code add parameters to the param bag and these all get sent to solar to perform the search the idea here is that we build up our search query with different components working on different aspects of that obviously the biggest part is building the actual search query language that retrieves records but there are other things in solar like spell checking and recommendations and we can have different pieces of code all adding parameters to the param bag and doing their distinct functions so parameter aggregation helps with separation of concerns uh if you've worked with any middleware based frameworks you might be familiar with this kind of approach uh the final key piece of the design is the idea of events as extension points so all of the search system code is event driven there are three major events that get triggered by the search service at different points in time and these events can be hooked by code which then responds to the events and does things the three key events are the resolve event which is how the search system figures out uh which search back-end to use for searching um generally you wouldn't be customizing this but you could if you wanted to override things in a particular way and the two more important events pre and post so the pre-event is triggered before we interact with the search back end and this is an opportunity to do more parameter aggregation or change parameters you know process data before the search occurs and of course post is the mirror image of that this event triggers after the search has occurred and is an opportunity to manipulate and format the search response so all of our separate components hook these events in order to use parameter aggregation to build a request and process a response and through all of that viewfind searching occurs and because of the way it's designed these individual components can be overridden or extended so if you want to customize something it's possible to focus in fairly narrowly on what you want to customize without having to touch all the many moving parts of this quite complex system so uh there are a few key classes and interfaces that it's good to know about if you're working with the search system uh of course the biggest one is the top level uh viewfind search service this is the class that manages the whole search process this is what is triggering the events and sort of making it all happen so when you perform any action with the search system you're doing it through the search service also important is the viewfinder search back end backend interface i've used the word backend several times already uh and this just refers to the code that interfaces with a particular search system a viewfinder is designed to aggregate search results from a number of different platforms and that's all made possible by the flexibility of the search system and the fact that we have back ends implemented for multiple services whether that be solar or various third-party apis they all work through the search service using the same mechanisms and the same interface another key class is the query builder and while there is no rule that every backend needs to have its query builder it's a pattern that you'll find in the code that most of them do the query builder simply being a piece of code which takes a viewfind search query query interface object and turns it into a param bag the query interface just defines some classes used to represent user searches essentially a container for the search terms and the field being searched or for a more complex combination of these if an advanced search is being performed and so it makes sense to have code that takes this uh abstract representation of a search and turns it into concrete parameters that can be sent to an api to actually perform a search uh another detail that you'll find in all of the back ends though there's not a strict rule about how it needs to be designed is a connector which is just the class that handles the actual communication uh in theory you could put everything directly in the back end and that would be fine uh but again for separation of concerns it's often helpful to design the back end as blue code between the search service and some more abstract code that handles actual over-the-wire communication in most cases the connector is something that's part of viewfind itself but in a couple of cases we use third-party libraries as connectors for example in the case of the summon api where a php connector is provided as a third party project that is of course a very quick summary of the search system if you want a deeper dive including kind of a step-by-step uh walkthrough of how a search uh takes place you might want to watch uh my video from the 2013 viewfinder summit where the search system was introduced that goes into a bit more depth than i've been able to hear and it is still very relevant even with the changes i am about to discuss and i'll include that link in the video notes so as i say nothing is perfect so what are the problems with this search system uh the first is that its scope is fairly rigid uh this was designed really with two specific actions in mind searching for a set of records and retrieving specific records but there are lots of things that our search backends can potentially do so for example there's a service called browzine which is able to look up information about particular dois this is a very narrowly focused specific service that perhaps doesn't even have an equivalent in any other back end but if i want to access that through the search service where do i put it um so that's that's one major issue with the current or or the pre 8.0 design uh bottom line is that adding new features to viewfind that involved the search service required us to do one of two things either we need to add a new method to the viewfind search service itself so we have a way to call the new functionality and this results in a bloated viewfinder search service uh especially if we're adding methods for really specific things that are only used narrowly in particular back ends we don't really want the viewfind search service to have to know about the internal details of every possible backend that could ever exist alternatively we can just bypass the search service entirely and directly access the backend code that the search service calls and in in past code this is often what we did it was too complicated to use the search service so we just worked around it by directly accessing objects in the service manager but this isn't good either because this is inconsistent as i mentioned a lot of the search services benefits have to do with event driven functionality if we bypass the search service and access back ends directly then the search service isn't able to fire events we can't use any hooks to extend those events we lose a lot of flexibility so what's the solution to this problem the solution is the command pattern so credit where it is due um david mouse was a developer who worked on the viewfind project for a few years around the time that version 2.0 was being developed and he did the vast majority of design work on our current search system we clearly still owe him a debt of gratitude for that because it has worked out quite well but during the original design phase he actually proposed using the command pattern because i think he recognized the problems that we eventually encountered but there wasn't time to implement it and time flew by uh and it wasn't until release 8.0 that we actually implemented this original idea thanks in large part uh to alexi peebles from the national library of finland who raised some concerns and uh noticed some of these issues and took the time to help address them so thank you to both of those people for uh all of their work supporting the project but anyway to to explain what the command pattern actually is the idea here is that instead of having a method on the search service for every action we could possibly perform instead we just have one method on the search service the method is called invoke and it accepts a class representing a command as its parameter thus the search service doesn't have to know about every possible command you could ever issue it just needs to be able to deal with these command objects so command classes are generally constructed using the parameters of the command and they contain code that actually interacts with the back end to execute the command the command class also acts as a container for the results of the command so sort of the workflow here is you construct the command with all the parameters for what you need it to to do so for example a search command you'll pass it the search query and other parameters like page number and page size and so forth then you pass that constructed object to the search services invoke method and the search service does all of the event stuff uh which ultimately results in the back end object getting passed to the command object so that the command object can execute the command on the back end then the command object stores the results of that command internally and is done so the search service then returns the same command object that was passed into it which now contains the results of the command and the calling code can act on them so it's all pretty seamless but it has a lot of flexibility so what are the benefits of this well obviously this moves more logic into the back end and out of the search service the search service is now more of a separated concern it does event processing and command routing it doesn't really care about what the commands are doing that's the job of the commands it makes us a lot more focused it also reduces the amount of plumbing needed to add new search features you know previously you might potentially have to add a new method to a connector then add a new method to the back end to call that connector then add a new method to the search service and then call the search service method i wouldn't say that using commands actually reduces the amount of code that you're creating because you still have to deal with back ends and connectors and you have to write a command class to process the command but it just feels cleaner to me that the command objects are somewhat standalone and we're not wiring everything explicitly through the search service it also streamlines and centralizes parameter processing uh in event handling code so if you're trying to act on an event you create listener code that responds to the event and previously depending on what action you were listening for you might have any number of different parameters in the event object that you would have to extract and work with with the command pattern all we really need to put into the event is the command object and then your listener code can interact directly with the command you can look at the class of the command to figure out what command is uh being passed so you can specifically target specific events uh and since the command object is a class it can have any kind of methods that you need so you can create functionality there for changing and adding and removing parameters specific to the needs of the command that you're executing it just makes things a little bit more targeted and a little more contained of course a theme here is that nothing is perfect and the command pattern itself is not perfect there are some challenges here that we need to think about uh one of the biggest ones is that it is easy and tempting to put too much logic into commands you if you're writing a new command you really need to think about the intent of the command pattern the command class is just meant to be sort of a carrier of parameters and responses it's not intended to do a lot of logic but you could because when your command executes it's given access to the back end so you can theoretically write a command that has a whole lot of processing logic and does a whole lot of work in it and that would function but i think logically that gets you into some troublesome territory uh i think the the goal in designing commands should be to put most of the logic in your search back end and ha and limit your command to simply passing parameters and executing that uh and this again is just about separation of concerns the back end is where the back-end processing logic is intended to live the command is just a carrier um of course another challenge here is that commands make this already complex and abstract system a little more complex and abstract and perhaps a little bit harder to learn which is why i'm recording a video talking about it in an effort to help with some of that uh we'll see if that's actually helpful but i hope it is uh finally we we need to make decisions about where the command classes we create live within viewfinds modules and namespaces as i alluded to earlier there are some commands that are very back-end specific and probably will only ever be used with one particular system whereas there are others that are quite global and could apply in many places so uh you sort of have to think about if you're introducing a new command is this something universal that belongs in sort of a high level namespace or does it belong inside a particular back end because that is the only place where it will ever be used i anticipate over time some commands may begin life as back-end specific and then end up getting moved up as we discover that there are in fact commonalities between multiple systems so let's get into just a few more specifics about command classes there are a few interfaces and abstract classes that you should be aware of if you're looking to add a new command to the system first of all all commands need to conform to viewfind search command command interface and this is just the interface for the obvious basic functionality of a command it needs to know whether it has executed yet it needs to have a way of storing the response that is the result of the command etc it's pretty lightweight there's also a viewfind search command abstract based class which implements useful defaults for some of the methods defined in the command interface you can save yourself a little bit of time by using this and then one more useful abstract class is viewfind search command call method command this extends abstract base uh with some logic that's built around the assumption that your command just needs to call a method on the back end so uh most of the commands that ship with viewfinder actually subclasses of call method command and so the idea here is that you specify in your constructor what method is going to be called and some rules around how it's going to get called then your subclass of call method command is mainly interested in managing the parameters to that command and can rely on the parent class for actually doing the work of calling the backend storing the results etc this results in some very simple lightweight classes if you want to add a method to the back end and call it just extend call method command and you have a quick way to get your command object uh created there are also a couple of useful traits there's the viewfind search command feature query offset limit trait and in the same namespace the record identifier trait these are both traits for dealing with commonly used parameters so if you have something within offset and limit or if you have something that relies on a record id you can use these traits and they just give you getters and setters so you don't have to define them over and over again uh and if you're interested in example i would suggest uh take a look at the viewfind search command search command this is the command that is used for performing searches the most common function of the search backend it's a subclass of the call method command and if you look at it i think you'll get some idea of how all of these things work so of course i've talked in the abstract about uh what has changed in viewfinder uh but let's go into a little bit more specific uh detail since this may impact your local custom code so first of all most of the public methods on the viewfind search service have been deprecated in favor of using command objects with the invoke method so you know previously the search service had lots of methods like search and retrieve and so forth all of those are deprecated and have corresponding commands in viewfind 8 these methods are still in use because we're we're looking to uh sort of smooth the transition and not break too much custom code all at once but they are going to go away in viewfind nine so be prepared uh to stop using them in the future additionally event parameters and targets have changed again we've introduced some legacy compatibility logic so that if you call the old deprecated individual search service methods um they will set up the events very similarly to viewfind seven and earlier which should allow most custom listeners to continue to function without any change but as i say in viewfind nine things are going to change more dramatically so this legacy compatibility logic is here uh just to smooth the transition you know old code will continue to work but new style code will also work so this is an opportunity to start making the shift if you have custom listeners another important change is that all code in viewfind which bypassed the search service and directly accessed the backend objects through the backend manager has been refactored to use commands so you can now more reliably use events to hook actions and all of our sort of cheating has been uh removed so what do you need to do a couple things first of all revise your custom listeners to interact with the command object which is as i mentioned in the events command parameter instead of using legacy event parameters so right now in viewfind 8 all of the events are going to have the legacy parameters and the command parameter but all of the information in the legacy event parameters is also available through the command object so you should be able to refactor your code to just replace all the legacy stuff with direct interaction with the command and then you're good to go and when we upgrade to viewfind 9 and take away the legacy parameters your code won't need to be touched again we're just giving you a window of time where this upgrade is not absolutely essential it's just highly recommended uh the other thing to do is avoid that anti-pattern of directly accessing search back-ends instead of sending commands just use a command to access a method on the back end if you're not sure whether you're doing this or not i recommend searching your custom code for the back end manager because the easiest way to cheat and get directly to a back end is to pull the back end manager out of the service manager and then pull back ends out of it that is how the core code used to do this cheating so if you're not referencing the back end manager anywhere you should be in good shape you don't need to change anything but if you are look at that code find a better way hopefully there's not a whole lot of custom search code out there and so i don't think a lot of people are going to be affected by this but if you are please be aware and that is really everything i wanted to share but if you have any questions about this please feel free to reach out to me my email is damiencats villanova.edu or you can go directly to vufine's various community support channels mailing lists slack etc send your questions we will be happy to help uh as i say i know that the search system is complex and has a lot of abstraction in it it may not be easy to get your head around it first but we're happy to talk to you and help you along and once you understand it it is really powerful and useful so it's worth the investment and that's all i have so thank you again for your time and i will talk to you again soon