About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
videos:configuring_search_and_facet_settings

Video 4: Configuring Search and Facet Settings

The fourth VuFind® instructional video provides a tour of three of VuFind's most important configuration files: searches.ini, facets.ini and searchspecs.yaml. Using these three files, you can control many options presented to users for navigating and refining search results, and you can also change the way those search results are retrieved.

Video is available as an mp4 download or through YouTube.

Transcript

Hello and welcome to the fourth in the VuFind instructional video series! Today we are going to look at three of the most important configuration files in VuFind for adjusting the way search and faceting behave. But before we dive into that I just wanted to set the stage for those who have been following along. I didn't want the examples today to look too weird, so I have actually gone back to the standard default VuFind theme and I have re-indexed data using default settings instead of the customizations we demonstrated back in Video 2. So, if you're coming in for the first time you'll see fairly normal things. I've also indexed a few more of VuFind's sample marc records just to give us a bigger set of search results to play with. If you're keeping score the files I indexed are “journals.mrc”, “geo.mrc”, and “authoritybibs.mrc”.

So, with all that out of the way let's start looking at configuration files and this is also a good time to put in yet another reminder about always using your local settings directory when changing configuration files. So let's start with searches.ini which is sort of the top-level configuration for VuFind's search options. We will copy (I'm currently in the VuFind home directory) so we will copy “config/vufind/searches.ini” into “local/config/vufind/” so now we have a local copy of the configuration file that we can edit without touching the core files of the VuFind distribution. I'm going to bring up VSCode, which is just one of many tools that you can use to edit files, and I will open searches.ini. I'll go to the top of the file.

So, the first thing I should explain is that in this file you will see references to search handlers. “Search handler” is VuFind terminology for one of the options in the drop-down menu next to the search box like title or author or subject. It's just a group of settings that tell VuFind how to do a particular kind of search. We're going to look at how those are built a little later but for starters let's just talk about the ones that are pre-configured for you out of the box in VuFind. Which are all those common things I mentioned like title, author, and subject.

So, at the very top of the file you will see that there's a default handler set which is “AllFields”, which is a very broad search that as it says is going to search across the majority of the fields indexed in VuFind. There is really very little reason to ever change this default handler, so I recommend leaving that alone. However, there is a setting just a few lines down here called “empty_search_relevance_override” which is turned off by default because it causes VuFind to do some extra work but this is something that a lot of people may actually want to turn on. So let me show you what this does. Here I am in my local VuFind demo and if I run a blank search this will show us all of the records in the index and you'll see that by default things are sorted by relevance, which means that VuFind tries to use the user search terms to figure out which things in the result set are most important. But if you do a blank search there are no search terms to use for ranking and you just get everything back in a completely arbitrary order and that's where this setting comes in. If I go back and I uncomment empty_search_relevance_override it will use a sort option other than relevance only for empty searches and then we will get perhaps a more meaningful looking result set. So if I refresh my search results now I've got “A” at the top.

Moving down a little further you will see that there are a couple of major sections in this file that control which search handlers are available to your user. There is [Basic_Searches], which is going to control the drop-down list next to the search box which is on practically every screen of VuFind, and then there's [Advanced_Searches], which is going to control what options are available in the boxes in the advanced search screen. Snd you'll notice that by default there are more options available for advanced search because we are expecting users to be trying to do more specific things and combining fields in more complex ways when they're doing advanced searches whereas the basic search is really just the basics.

There's also one kind of special case here the tag search, which is a way of searching for user tags that users have created on records. It works a little bit differently than everything else which is why it's only available in the basic list. It doesn't play well with other types of searches and so it doesn't work properly if you try to put it in any advanced search.

So all of these lists let you do a couple of things. First of all, they let you control the order of the options so for example if I want the author option to appear above the title option I can just cut and paste, put the author line above the title line and then if I refresh my VuFind screen and look at the drop-down menu the order of the fields here matches what's in the configuration file. So if presentation of options is important to you, you have full control over that.

The other thing that you'll notice here is that some of these things look a little bit cryptic. Like author equals “adv_search_author”. What's going on here is that the values on the left side of the equal sign are the names of search handlers, which are configured in another file that we'll look at shortly. The values on the right side of the equal sign our translation strings. So VuFind has an internationalization system which looks up keys and then displays values in the user's currently chosen language. All of these values here are going to go through that translation system and get translated for the user.

So if you decide to customize these labels you can really put anything you like here. For example, I could change author to Person who Created Stuff“ and then if I go back to VuFind and refresh the page I now see that I have the option of the Person who Created Stuff search. The important thing to remember is if your particular VuFind instance is presenting the interface in multiple languages, you're going to have to create mappings for these strings in all of the language files you need so that they translate correctly when users switch languages. Internationalization is probably a good topic for a future video but for now I just wanted to make you aware that that exists.

So you'll also notice that there are a couple of searches that have semicolons in front of them, meaning that they are commented out they're turned off. That is because they are useful options in some cases but not everyone needs them so we leave them off by default. For example, we have this coordinate search which can be useful if you're using VuFind's optional geographic functionality but is meaningless otherwise. We have this journal title search which is just like a title search but it filters down to journals. Again, not everyone is indexing journal data so we don't provide it by default but just for fun let's turn it on here by removing the semicolon. Now if I refresh my browser yet again, I now have a journal title search showing up under title and if I just do a blank journal title search that brings back just ten results, which is all of the journals that were indexed in my sample data.

So that is how you control and reorganize and re-label your search handlers similarly there is a [Sorting] section in the file which controls your sorting options. It is under the heading label of [Sorting] and as with the search handlers there are some options that are on by default, some options that are commented out but they're for your reference, and in this instance VuFind by default assumes that you're going to be using Library of Congress style call numbers but if instead you're using Dewey you can just flip-flop these by commenting out “callnumber-sort” which is the LOC field and uncommenting “dewey-sort”. So in this section the values on the left are the names of fields in the Solr index that VuFind can use for sorting. The values on the right are the labels which are once again translation strings. So you can reorganize this list, you can re-label this list, you can add different Solr fields if you want to create some kind of a custom sort, but the the one thing that you have to keep in mind if you're trying to do custom fields here is that Solr sort must be done on fields that contain only one value per record. If a field contains multiple values Solr, doesn't know how to sort on that because it needs just one value to put the record in an ordered sequence. So if you try to do anything else here it will cause an error and that's a common source of confusion and we of course can get into that further when we talk about Solr in a future video.

Another useful option that's back up near the top of the file is the ability to control the page size of result. You'll notice, when you first install VuFind, you have 20 records per page and no ability to change that but in this configuration file adjusting both the page size and the ability to switch the page size is possible. There's a default limit you can see here this is set to 20 but suppose I want to change this to 40. We are going to have to close our web browser before we can see the change take effect because VuFind remembers all of your default choices in cookies and reinforces them for you. This is usually convenient because if you've made a choice and you do a new search you want that choice to remain in effect but when you're editing your configuration files and you change something it can appear that your change didn't work because VuFind is remembering what you were doing before you made the change. Now I've closed Firefox I've opened it back up again that's cleared out my session-level cookies so now when I perform a blank search I have 40 results on the page instead of 20.

Now if I want to give the user the choice of how many records to view at once, I can uncomment this limit option setting which is just a comma separated list of numbers and all of those numbers are going to be provided as a drop-down menu. So, if i refresh the page I now have a results per page control I can set that to 10 or 100 or whatever. So you are free to put any number you like in this configuration but do keep in mind that there are limits to how much data a Solr can process and pass to VuFind it once. So, if you try to set your page size to 10,000 you might expect to run into memory problems or performance problems or other issues. So configure this but try to keep it within reason.

Another very important feature of VuFind is what we call the recommendation module system. This is just a modular way of plugging custom code into the search results screen. There are actually three different regions where you can position a recommendation module and by default several of the things that VuFind uses are actually recommendation modules. That includes this top suggested topics within your search control which is a “top” recommendation module. It includes the standard faceting controls which are a “side” recommendation module. It includes a variety of things that show up if you do a search that doesn't yield any results like this “suggesting ways to find more results” blue box these are “no results” recommendation modules. So you have full control over what shows up when a user performs a search that yields no results and also what shows up above and to the side of normal search results. That is all done through the configuration file, searches.ini. If you look fairly near the top of the file you're going to see that there's default_top_recommend settings, default_side_recommend settings, and default_noresults_recommend settings and what these are going to do is tell VuFind which recommendation modules to load if there are no more specific settings available. Because you can configure each search handler to have its own set of recommendation modules. For example when you do an author search you're going to see some author specific tips and details that's made possible by this recommend configuration.

To see the other piece of this beside the defaults if you scroll down there is a section called [SideRecommendations], a section called [TopRecommendations], and a section called [NoResultsRecommendations] and in each of these sections you can specify the name of a search handler + ”[]“ (which allow you to repeat that setting to have multiple values within it) and then the one or more recommendation modules that you want to display. So, as I said author has some custom faceting that's made possible through this. Similarly, call number searches behave a little bit differently if you do a call number search and don't find any results.

There are a lot of recommendation modules available and if you're comfortable with coding you can also write your own. Each one of them is just one PHP class and one corresponding template to display things. Obviously, building new ones is beyond the scope of this video but we may do that in the future but if you just want to see all of the available options if you scroll up above to the [SideRecommendations] section of this configuration file you will find a substantial list of options divided into which section it's recommended that you might want to use these settings in. So I would encourage you to read through this and see if there's anything that catches your eye. Most of them are fairly easy to configure: you just put the name of the module and then a number of parameters which are described in the documentation here (if it's a module that has special options). But for now, just a really quick example, I'm going to go back up to the default recommendation settings and I don't know about you but I'm personally not a huge fan of the facet controls above the search results, so this is something that I often turn off when I'm setting up VuFind and getting rid of that is as easy as commenting out this line by putting a semicolon at the beginning of it. So I've removed the default_top_recommend[] = TopFacets:ResultsTop line. Now if I go back to my browser and refresh, my search results just start at the top of the page instead of having those facets above them.

One last thing that's going on in searches.ini that's worth pointing out is that this is where all of the autocomplete functionality is set up. There's an [Autocomplete] section which you can use to turn autocomplete off completely if you don't want to use it and similar to recommendation modules, there's a default handler that is used for generating autocomplete suggestions while users type their search queries but that's overridden on a handler-by-handler basis to configure different rules for how those suggestions are looked up. Most people are probably not going to need to change this but if you are planning on building your own custom handler, it's nice to know that you can also control how suggestions are provided for that handler. Essentially, the majority of these work by using the Solr lookup autocomplete handler which accepts the name of a search handler that the user's current text will be searched against and then a prioritized field list which says which Solr field or fields to use to make the recommendations. As with the recommendation modules all of the different autocomplete handlers are documented above the section in comments which has a little bit more detail about how this works again this is something most people don't need to change but if you need to it's here.

So I think that's probably plenty on searches.ini but there's another configuration file that you will probably want to play with which is facets.ini and this is what controls how VuFind provides faceting options for narrowing searches. Once again I'm going to go to the terminal, into my VuFind home directory and I'm going to copy “config/vufind/facets.ini” into “local/config/vufind” so that I have my own local copy of the file that I can modify without touching the core. Then I'm going to go back to VSCode and open my new file facets.ini.

So, at the very top of the file there's a section called [Results] and this is what controls the facets that are displayed on the side of the regular search results. Just as an aside there's actually a recommendation module configuration in searches.ini that tells VuFind that this is where to find this, so you can theoretically store this anywhere, but by default the results section of facets.ini is going to control your default facet options. Just like all the other configurations I've showed you this list can be reordered and the order changes will be reflected in the order that the facets are displayed in the VuFind interface. You can also change all of these labels on the right side of the equal signs to whatever you want and it will pass through the translation system. Similar to the sort options I showed you the values on the left side of the equal signs are all the names of Solr fields and so this is where the facet values are going to be retrieved from. Again, these defaults should be reasonable but if you're doing custom things you know for example if you index a custom field with SolrMarc you can expose it here as a facet if you want to. Again, when we talk more about Solr in the future you will learn that different fields behave differently and not all fields work well as facets so keep that in mind if you try something and get weird looking unexpected results. So, just as a quick example, let's say that we think language is the most important facet so we want to make that appear first and because our demo instance has the same value for every record with institution and building, there's really no point in showing those so we're going to put semicolons in front of those, comment them out/disable them so now if I save this and I go back and I refresh my results I now have my language settings at the top those other two things have disappeared and all the other fields are showing up as they always do. If we look a little further down there's a [ResultsTop] section and this controls which facet fields show up in that top facets area which I just disabled. So it's probably useful to still have subject faceting, so let's let's move that out of [ResultsTop] and into [Results] Save that. Refresh the page again and now when I scroll down those topic facet values that used to be showing above the search results are showing in the side bar.

If we scroll down a little bit further in the file you'll notice that there are some sections related to labeling. There's [FacetLabels] and [ExtraFacetLabels] these are used because VuFind needs to know how to label the filters that you apply to your search results. So, for example, if I just click on say “Call Number” > “General Works” you can see in the filter list but VuFind labels this filter as “Call Number”. It does that because it has this list of configuration sections where it knows it can find labels that apply to different Solr fields. If it can't find a label it's going to put the word “Other” in front of the facet. So this section is just here to enable VuFind to find all the configuration that will give it human readable labels for fields. You shouldn't have to touch this but if you ever see filters being labeled as “Other”, it probably means that you've made a customization somewhere and haven't reflected it in these configurations. Either by providing a section or by just putting an explicit Solr field to label mapping in [ExtraFacetLabels].

There's also a [SpecialFacets] section here which allows you to turn on special handling for certain fields and in the default configuration this is used to indicate that the publish date field is a “dateRange”, which is what enables this slider control to appear. There are a few different kinds of ranges you can turn on if you need them it's all explained in the comments in the file.

Also interesting is this [CheckboxFacets] section if you have a setting that's essentially a flip flop switch you can put a Solr query and a label in here and VuFind will present that as a checkbox for the user to toggle. So, for example, suppose I want to be able to filter to only records that have no author in them I can make a Solr query of ”-author:*“ which is the syntax for excluding anything with a value in the author field. I can say equals “No Author”. If I save this and I go back and refresh my search results I now have a no author checkbox available and if I click that I now end up with 10 records that don't have authors showing. That's a fairly contrived example but there are real-world situations where having these checkbox controls can be valuable right.

Under the [CheckboxFacets] a section is the [Result_Settings] section which is where you can control how many values are displayed in the sidebar, how many show sort of above the fold of that expand link, and also what kind of behavior if any we present for exploring deep into the facet set. So by default we show 6 values above the fold and that expands out to 30 values and we offer a “see all” control which pops up a lightbox where we can do more work on the facet values if we want to. If you want to change any of that say you only want to show two above the fold and only ten in the complete list you can do that by just changing the “showMore” and the “facet_limit”. You can also switch these out on a field-by-field basis so for example if we want to show a lot of formats but very few of anything else we can say “showMore[*] = 2” which sets a default and then override that default with “showMore['format'] = 10”. Now if I start all over here and do a blank search, all of the format values show. We don't actually have 10 distinct values in this index but we're seeing all of them - and everything else at most 2 with an expand control.

There are also settings for controlling the dimensions of the top area if you want to customize that and for whether we want to have exclude links which let users wipe values out of a search set rather than filtering to only include those values. If we just uncomment this “exclude=*” that will put exclude buttons on every single facet field so for example now we don't want journals, I can click this X, it deletes all the journals out of the search. This can be quite useful. We can also change facets so that by clicking more values the user sees more options rather than narrowing. That's the or facets value. That can be set to work on only certain fields so if you use the * it will apply to everything but say I only want this to apply to format, I can set that on the format field and then when i refresh my screen I now have little checkboxes and so if I click on “Conference Proceedings”, I now have narrowed down to only conference proceedings but I can still see the other two values that existed in the result set and I can widen my search again by checking off more of them. So depending on what kinds of values you're working with having this option available can be valuable.

You can also control whether facets are visible by default or if they're collapsed and again you can specify that on a field-by-field basis. If you just turn it on for everything then when you load the screen all non-selected facet regions are collapsed then the user has to expand them to see values.

The [Advanced] section controls which facets are used as filters in the select boxes on the advanced search screen. Just to show you that quickly I go to the advanced link we have these call number, language, and format filters which allow you to pre-select some values when you're doing an advanced search. We can control which boxes appear here through this [Advanced] section. [Advanced_Settings] also lets us limit how many values can appear in each of those boxes along with some other behavioral controls. You'll notice that by default the advanced facets are OR'd together so that multi selections include all of those values rather than only including records that match every single selected thing. There are also some special facets that can be turned on on the advanced screen.

Further down this is actually a setting that applies to all of VuFind not just the advanced screen. There is a translated_facets setting which you can use to run facet values through VuFind's translation system. By default the majority of values in the Solr index are just presented as-is which means that even though VuFind's interface supports many languages the values that are coming through are just going to display in whatever form they were indexed in. Sometimes you may have a controlled vocabulary in a field that you want to translate into many languages and this provides the mechanism for doing that. You'll notice that format is translated by default because VuFind comes with translations for all of the format values it uses. Additionally the callnumber-first field which contains the top-level Library of Congress call number category is translated and you'll see that there's a :CallNumberFirst after the name of the facet field that's being translated. This is specifying which language file the translations are found in. Again we'll go into that in more depth when we talk about internationalization in the future but for now just be aware that if you want to translate your facet values you can set that up here.

Finally there's a [HomePage] and [HomePage_Settings] section and these control, on the front page of VuFind, these top-level browse categories. You can set any facet field to display here and you can control how many values to show just by changing that [HomePage] section of facets that.ini. Also note that if you do make changes there or if you reindex your records you may wish to clear a VuFind's internal cache to make those changes visible if you don't see them right away.

So, that covers all of the highlights of facets.ini and now we are going to look at one last thing which is the most complex configuration file in VuFind, called “searchspecs.yml” which is the file that contains specifications for all of VuFind search handlers

So I'm once again going to copy “config/vufind/searchspecs.yml” into my “local/config/vufind” directory so I have a local copy I can tinker with. I'm going to go into VSCode and look at that. So searchspecs.yml is, as the name, says a YAML file which is a particular way of structuring data for machine readability. Because of the complex nested nature of the data we're using here, YAML was chosen early and VuFind's development as a workable format for representing all of the things we need to do here. YAML is not the most user-friendly format so be careful with it sometimes if you have a space in the wrong place so that settings don't align with one another it can get confused and have trouble reading it. So proceed with caution. For today we're just going to do a quick, high-level browse of this file so that you are aware it's of its existence and can do more with it later when you understand more about Solr.

So, first of all, the file begins with a long, long series of comments describing all of the settings and how they work and providing some examples. Under all of that is the actual set of search handler definitions. So, you'll see that each of these starts with the name of the handler and a colon and then some data indented underneath it. For example, here is the author handler, here is the ISN handler, here is the subject handler, etc. Again, without getting too deep into Solr there are essentially two different modes of searching that VuFind uses one is called “edismax” and this method works by just taking a list of fields and some relevance ranking numbers that establish the relative importance of those fields to one another and then Solr will just take the user search, search it across all of those fields, and then return a relevance ranked result based on your preferences. So, “Author” is a perfect example of one of these edismax type searches. VuFind has a number of different fields that contain author data in different formats and from different parts of the incoming records. So here we're saying primary author is the most important. The fuller version of that primary authors name if it's present may also be quite so we're giving primary author a 100-point boost we're giving author fuller a 50-point boost and then we're including all the other author fields as being of equal relevance of secondary authors, corporate authors, variant forms of names, etc. It's important to note that these numbers in and of themselves don't actually mean anything, they're just relative to one another. We're saying that an author is a hundred times more important than a secondary author. It is often useful to tweak and experiment with numbers if you're trying to change the order of your results, to find the balance that works right and there there's as much art as science in that process.

Let's actually scroll down and do an actual example of relevance ranking. I mentioned earlier that the default search handler in VuFind is AllFields and sure enough here's the definition for that and it really is AllFields because we have here a bunch of title fields that are boosted quite a lot, series fields boosted a little, authors boosted a fair amount, topics boosted almost as heavily as titles, and then a few other bits and pieces. Notice that there's this field called “allfields” and this is sort of a catch-all field that during indexing grabs all of the text of the records that you're indexing. So when we search allfields we are sure to find the words we're searching for, if they're in the record somewhere but we have all this more granular relevance boosting to try to prioritize matches in more meaningful places.

So let's just show an example of this. In my example index here, I search for the word “finds” and reset all my filters, I get two different kinds of matches here. I have a couple of title matches up at the top because Solr has decided that my search of “finds” matches the word “fine” in fine arts. This is because Solr is configured to be tolerant of word endings and you know match rather than not match if given the choice. There are also a couple of these records where the name “Greg Fines” is present. So suppose in my particular situation I think people are going to be searching for names more often than they are searching for titles. What I can do is go into my searchspecs.yml file and just make the numbers on the author fields bigger so I'm going to just say authors are 10 times more important than the default, so I'm changing the 300 point boost on author to 3000 points and 150 point boost on author_fuller to 1500. Now if I go back to VuFind and I refresh my search results I see that the author matches are at the top of the list. Obviously, this kind of search tuning can be really tricky and complicated because you're always going to be dealing with competing concerns and a change that fixes one scenario may break a different one so I say approach it with caution but keep in mind that you have a lot of power and flexibility for making these kinds of tweaks.

Anyway, getting back to what I was saying earlier about there being two types of searches I've showed the edismax style search is fairly extensively now but there's also a more detailed query syntax that Solr can do that uses boolean operators and specific matching syntax and specific fields.

This is often referred to as Lucene syntax because Solr is built on an indexing engine called Lucene. Long ago before the dismax functionality had been fully developed, VuFind relied much more heavily on the basic Lucene searching and searchspecs.yml includes some fairly rich capabilities for constructing really complicated queries across multiple fields in an effort to do what dismax now just does for free. Most of that has been eliminated because it was very complicated and you can do the same thing more easily but there are still some scenarios where having the control of the Lucene syntax is valuable.

So, for example, if we go to the very bottom of the file here there is an OCLC number search. OCLC numbers are just a commonly used identifier for records. Yhey're found in a lot of places. VuFind indexes OCLC numbers into one of the Solr fields but we don't want to treat that like just any other keyword because if you've seen OCLC numbers you'll know that sometimes they have weird alphanumeric prefixes on them sometimes they have leading zeros on them. VuFind standardizes them to a particular format. What we want to be able to do is if a user says they're searching for an OCLC number we don't know what they're going to type but we want to be able to manipulate their input to normalize it the same way that VuFind normalizes the values that it stores in its index, so that what the user types is more likely to match what we have in our index and that's what this custom section is doing. We define a custom munge (munge being computer programmer slang for data manipulation). We name it oclc_num and we do some regular expression matching which is just a way of transforming text to say “get rid of anything that's not a number and get rid of any zeros at the beginning of the string”. Then down here in query fields we tell a VuFind that we want to search the oclc_num Solr field using the oclc_num custom munge and the end result of that is the user can type pretty much whatever prefix nonsense they want and VuFind will clean it up, do the search on just the numeric part, and if there's a match it will be found.

If you want to look at something even more complicated - and I'm not going to walk through it right now - the call number search does some even more complex matching and it also does some ranking. So, it has one munge which tries to find an exact match on a call number and another munge that tries to find a loser match and it gives greater relevance ranking to exact matches to try to boost things that are closer to what the user has typed. We are running short on time but just to bring this all full circle, I'm going to go back to my searches.ini and I'm going to find the [Basic_Searches] section I'm going to add “oclc_num = 'OCLC Number'” to this to expose that custom OCLC option from searchspecs.yml that I was showing you earlier. Now, if I go to VuFind, refresh it so I get more options, do an OCLC number search, I can do a search for a ridiculously over-prefixed number that I happen to know is in the system. And even though I put all this nonsense here it finds a match on the number and brings back the record that matches it.

So, that's our time for today! I know that was a lot, but now you've seen some of the many configuration options that are available in VuFind and hopefully understand how they work a little bit better. We'll do more next month and in the meantime feel free to reach out on slack mailing lists or elsewhere if you have questions or if we can help in any other way. Thank you!

videos/configuring_search_and_facet_settings.txt · Last modified: 2023/04/26 13:34 by crhallberg