About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
videos:configuring_search_and_facet_settings

This is an old revision of the document!


Video 4: Configuring Search and Facet Settings

The fourth VuFind instructional video provides a tour of three of VuFind's most important configuration files: searches.ini, facets.ini and searchspecs.yaml. Using these three files, you can control many options presented to users for navigating and refining search results, and you can also change the way those search results are retrieved.

Video is available as an mp4 download or through YouTube.

Transcript

:!: This transcript still needs to be cleaned up and edited.

hello and welcome to the fourth in the view find instructional video series today we are going to look at three of the most important configuration files in view find for adjusting the way search and faceting behave but before we dive into that I just wanted to set the stage for those who have been following along I didn't want the examples today to look too weird so I have actually gone back to the standard default view find theme and I have re indexed data using default settings instead of the customizations we demonstrated back in video 2 so if you're coming in for the first time you'll see fairly normal things I've also indexed a few more of you finds sample mark records just to give us a bigger set of search results to play with if you're keeping score the files I indexed our journals mark geo mark and authority bibs dot mark so with all that out of the way let's start looking at configuration files and this is also a good time to put in yet another reminder about always using your local settings directory when changing configuration files so let's start with searches I and I which is sort of the top level configuration for view finds search options we will copy I'm currently in the view finder Ector e so we will copy config slash view find slash searches I and I into local slash config slash view find so now we have a local copy of the configuration file that we can edit without touching the core files of the view find distribution I'm going to bring up a visual studio code which is just one of many tools that you can use to edit files and I will open searches I and I I go to the top of the file so the first thing I should explain is that in this file you will see references to search handlers search handler is view find terminology for one of the options in the drop-down menu next to the search box like title or author or subject it's just a group of settings that tell view find how to do a particular kind of search we're going to look at how those are built a little later but for starters let's just talk about the ones that are pre-configured for you out of the box in view find which are all those common things I mentioned like title and author and subject so at the very top of the file you will see that there's a default handler set which is all fields which is a very broad search that as it says is going to search across the majority of the fields indexed in view find there is really very little reason to ever change this default handler so I recommend leaving that alone however there is a setting just a few lines down here called empty search relevance override which is turned off by default because it causes view fine to do some extra work but this is something that a lot of people may actually want to turn on so let me show you what this does Here I am in my local view find demo and if I run a blank search this will show us all of the records in the index and you'll see that by default things are sorted by relevance which means that view find tries to use the user search terms to figure out which things in the result set are most important but if you do a blank search there are no search terms to use for ranking and you just get everything back in a completely arbitrary order and that's where this setting comes in if I go back and I uncomment empty search relevance override it will use a sort option other than relevance only for empty searches and then we will get perhaps a more meaningful looking result set so if i refresh my search results now I've got a at the moving down a little further you will see that there are a couple of major sections in this file that control which search handlers are available to your user there is basic searches which is going to control the drop-down list next to the search box which is on practically every screen of you find and then there's advanced searches which is going to control what options are available in the boxes in the advanced search screen and you'll notice that by default there are more options available for advanced search because we are expecting users to be trying to do more specific things and combining fields and more complex ways when they're doing advanced searches whereas the basic search is really just the basics there's also one kind of special case here the tag search which is a way of searching for user tags that users have created on records it works a little bit differently than everything else which is why it's only available in the basic list it doesn't play well with other types of searches and so it doesn't work properly if you try to put it in any advanced search so all of these lists let you do a couple of things first of all they let you control the order of the options so for example if I want the author option to appear above the title option I can just cut and paste put the author line above the title line and then if i refresh my view find screen and look at the drop-down menu the order of the fields here matches what's in the configuration file so if presentation of options is important to you you have full control over that the other thing that you'll notice here is that some of these things look a little bit cryptic like author equals adv underscore search underscore author what's going on here is that the values on the left side of the equal sign are the names of search handlers which are configured in another file that we'll look at shortly the values on the side of the equal sign our translation strings so if you find has an internationalization system which looks up keys and then displays values in the users currently chosen language all of these values here are going to go through that translation system and get translated for the user so if you decide to customize these labels you can really put anything you like here so for example I could change author to a person who created stuff and then if I go back to if you find and refresh the page I now see that I have the option of the person who created stuff search but the important thing to remember is if your particular view find instance is presenting the interface in multiple languages you're going to have to create mappings for these strings and all of the language files you need so that they translate correctly when users switch languages internationalization is probably a good topic for a future video but for now I just wanted to make you aware that that exists so you'll also notice that there are a couple of searches that have semicolons in front of them meaning that they are commented out they're turned off um and that is because they are useful options in some cases but not everyone needs them so we leave them off by default for example we have this coordinate search which can be useful if you're using view finds optional geographic functionality but is meaningless otherwise and we have this journal title search which is just like a title search but it filters down to journals again not everyone is indexing journal data so we don't provide it by default but just for fun let's turn it on here by removing the semicolon and now if i refresh my browser yet again I now have a journal title search showing up under title and if I just do a blank journal title search that brings back just ten salts which is all of the journals that were indexed in my sample data so that is how you control and reorganize and real able your search handlers similarly there is a sorting section in the file which controls your sorting options it's under the heading label of sorting and as with the search handlers there are some options that are on by default some options that are commented out but they're for your reference and in this instance view find by default assumes that you're going to be using Library of Congress style call numbers but if instead you're using Dewey you can just flip-flop these by commenting out call number sort which is the LC field and uncommenting dewey sort so in this section the values on the left are the names of fields in the solar index that view find can use for sorting the values on the right are the labels which are once again translation strings so you can reorganize this list you can real able this list you can add different solar fields if you want to create some kind of a custom sort but the the one thing that you have to keep in mind if you're trying to do custom fields here is that solar sort must be done on fields that contain only one value per record if a field contains multiple values solar doesn't know how to sort on that because it needs just one value to put the record in an ordered sequence so if you try to do anything else here it will cause an error and that's a common source of confusion and we of course can get into that further when we talk about solar in a future video another useful option that's backed up near the top of eyal is the ability to control the page size of results you'll notice when you first install view find you have 20 records per page and no ability to change that but in this configuration file adjusting both the page size and the ability to switch the page size is possible so there's a default limit you can see here this is set to 20 but suppose I want to change this to 40 we are going to have to close our web browser before we can see the change take effect because if you find remembers all of your default default choices in cookies and reinforces them for you this is usually convenient because if you've made a choice and you do a new search you want that choice to remain in effect but when you're editing your configuration files and you change something it can appear that your change didn't work because viewfinders remembering what you were doing before you made the change so now I've closed Firefox I've opened it back up again that's cleared out my session level cookies so now when I perform a blank search I have 40 results on the page instead of 20 and so now if I want to give the user the choice of how many records to view at once I can uncomment this limit option setting which is just a comma separated list of numbers and all of those numbers are going to be provided as a drop-down menu so if i refresh the page I now have a results per page control I can set that to 10 or 100 or whatever so you are free to put any number you like in this configuration but do keep in mind that there are limits to how much data a solar can process and passed if you find it once so if you try to set your page size to 10,000 you might expect to run into memory problems or performance problems or other issues so configure this but try to keep it within reason another very important feature of you find is what we call the recommendation module system and this is just a modular way of plugging custom code into the search results screens there are actually three different regions where you can position a recommendation module and by default several of the things that you find uses are actually recommendation modules that includes this top suggested topics within your search control which is a top recommendation module it includes the standard faceting controls which are a side recommendation module and it includes a variety of things that show up if you do a search that doesn't yield any results like this suggesting ways to find more results blue box these are no results recommendation modules so you have full control over what shows up when a user performs a search that yields no results and also what shows up above and to the side of normal search results and that is all done through the configuration file searches I and I if you look fairly near the top of the file you're going to see that there's default top recommended settings defaults I'd recommend settings and default no results recommend settings and what these are going to do is tell you find which recommendation modules to load um if there are no more specific settings available because you can configure each search handler to have its own set of recommendation modules so for example when you do an author search you're going to see some author specific tips and details that's made possible by this recommend configuration and to see the other piece of this beside the defaults if you scroll down there is a section called side recommendations a section called top recommendations and a section called no results recommendations and in each of these sections you can specify the name of a search handler brackets which allow you to repeat that setting to have multiple values within it and then the one or more recommendation modules that you want to display so as I said author has some custom faceting that's made possible through this similarly call number searches behave a little bit differently if you do a call number search and don't find any results there are a lot of recommendation modules available and if you're comfortable with coding you can also write your own each one of them is just one PHP class and one corresponding template to display Thanks obviously building new ones is beyond the scope of this video but we may do that in the future but if you just want to see all of the available options if you scroll up above the side recommendations section of this configuration file you will find a substantial list of options divided into which section it's recommended that you might want to use these settings in so I would encourage you to read through this and see if there's anything that catches your eye most of them are fairly easy to configure you just put the name of the module and then a number of parameters which are described in the documentation here if it's a module that has special options but for now just a really quick example I'm going to go back up to the default recommendation settings and I don't know about you but I'm personally not a huge fan of the facet controls above the search results so this is something that I often turn off when I'm setting up you find and getting rid of that is as easy as commenting out this line by putting a semicolon at the beginning of it so I've removed the default top recommend equals top facets results top line and now if I go back to my browser and refresh my search results just start at the top of the page instead of having those facets above them one last thing that's going on in searches I and I that's worth pointing out is that this is where all of the autocomplete functionality is set up there's an autocomplete section which you can use to turn autocomplete off completely if you don't want to use it and similar to recommendation modules there's a default handler that is used for generating autocomplete suggestions while users type their search queries but that's overridden on a handler by handler basis to configure different rules for how those suggestions are looked up most people are probably not going to need to change this but if you are planning on building your own custom handler it's nice to know that you can also control how suggestions are provided for that handler and essentially the majority of these work by using the solar lookup autocomplete handler which accepts the name of a search handler that the user's current text will be searched against and then a prioritized field list which says which solar field or fields to use to make the recommendations as with the recommendation modules all of the different autocomplete handlers are documented above the section and comments which has a little bit more detail about how this works again this is something most people don't need to change but if you need to it's here so I think that's probably plenty on searches I and I but there's another configuration file that you will probably want to play with which is facets dot ini and this is what controls how view find provides faceting options for narrowing searches so once again I'm going to go to the terminal it might be you find home directory and I'm going to copy config slash view fine slash facets that I and I into local config you find so that I have my own local copy of the file that I can modify without touching the core and then I'm going to go back to vs code and open my new file facets that I and I so at the very top of the file there's a section called results and this is what controls the facets that are displayed on the side of the regular search results and just as an aside there's actually a recommendation module configuration in searches ini that tells view find that this is where to find this so you can theoretically store this anywhere but by default the results section of facets that I and I is going to control your default facet options and just like all the other configurations I've showed you this list can be reordered and the order changes will be reflected in the order that the facets are displayed in the view find interface you can also change all of these labels on the right side of the equal signs to whatever you want and it will pass through the translation system similar to the sort options I showed you the values on the left side of the equal signs are all the names of solar fields and so this is where the facet values are going to be retrieved from again these defaults should be reasonable but if you're doing custom things you know for example if you index a custom field with solar mark you can expose it here as a facet if you want to again when we talk more about solar in the future you will learn that different fields behave differently and not all fields work well as facets so keep that in mind if you try something and get weird looking unexpected results so just as a quick example let's say that we think language is the most important facet so we want to make that appear first and because our demo instance has the same value for every record with institution and building there's really no point in showing those so we're going to put semicolons in front of those comment them out disable them so now if I save this and i go back and i refresh my results I now have my language settings at the top those other two things have disappeared and all the other fields are showing up as they always do now if we look a little further down there's a results top section and this controls which facet fields show up in that top facets area which I just disabled so it's probably useful to still have subject faceting so let's let's move that out of results top and into results save that refresh the page again and now when I scroll down those topic facet values that used to be showing above the search results are showing in the side bar if we scroll down a little bit further in the file you'll notice that there are some sections related to labeling there's facet labels and extra facet labels these are used because view fine needs to know how to label the filters that you apply to your search results so for example if I just click on say call number general works you can see in the filter list but view fine labels this filter as call number it does that because it has this list of configuration sections where it knows it can find labels that apply to different solar fields if it can't find a label it's going to put the word other in front of the facet so this section is just here to enable view find to find all the configuration that will give it human readable labels for fields you shouldn't have to touch this but if you ever see filters being labeled as other it probably means that you've made a customization somewhere and haven't reflected it in these configurations either by providing a section or by just putting an explicit solar field to label mapping in the extra facet labels there's also a special facet section here which allows you to turn on special handling for certain fields and in the default configuration this is used to indicate that the publish date field is a date range which is what enables this slider control to appear and there are a few different kinds of ranges you can turn on if you need them it's all explained in the comments in the file also interesting is this checkbox facet section if you have a setting that's essentially a flip flop switch you can put a solar query and a label in here and if you find will present that as a checkbox for the user to toggle so for example suppose I want to be able to filter to only records that have no author in them I can make a solar query of - author : star which is the syntax for excluding anything with a value in the author field I can say equals no author if I save this and I go back and refresh my search results I now have a no author checkbox available and if I click that I now end up with 10 records that don't have authors showing that's a fairly contrived example but there are real-world situations where having these checkbox controls can be valuable right under the checkbox facets a section is the result settings section which is where you can control how many values are displayed in the sidebar how many shows sort of above the fold of that expand link and also what kind of behavior if any we present for exploring deep into the facet set so by default we show 6 values above the fold and that expands out to 30 values and we offer a seol control which pops up a lightbox where we can do more work on the facet values if we want to if you want to change any of that say you only want to show two above the fold and only ten in the complete list you can do that by just changing the show more and the facet limit you can also switch these out on a field by field basis so for example if we want to show a lot of formats but very few of anything else we can say show more star equals 2 which sets a default and then override that default with show more format equals 10 now if I start all over here and do a blank search all of the format values show we don't actually have 10 distinct values in this index but we're seeing all of them and everything else at most - with an expand control there are also settings for controlling the dimensions of the top area if you want to customize that and for whether we want to have exclude links which let users white values out of a search set rather than filtering to only include those values if we just uncomment this exclude equal star that will put exclude buttons on every single facet field so for example now we don't want journals I can click this X it deletes all the journals out of the search this can be quite useful we can also change facets so that by clicking more values the user sees more options rather than narrowing so that's the or facets value and that can be set to work on only certain fields so if you use the star it will apply to everything but say I only want this to apply to format I can set that on the format field and then when i refresh my screen I now have little checkboxes and so if I click on conference proceedings I now have narrowed down to only conference proceedings but I can still see the other two values that existed in the result set and I can widen my search again by checking off more of them so depending on what kinds of values you're working with having this option available can be valuable you can also control whether facets are visible by default or if they're collapsed and again you can specify that on a field by field basis if you just turn it on for everything then when you load the screen all non-selected facet regions are collapsed then the user has to expand them to see values the Advanced section controls which facets are used as filters in the select boxes on the advanced search screen just to show you that quickly I go to the advanced link we have these call number language and format filters which allow you to pre-select some values when you're doing an advanced search we can control which boxes appear here through this Advanced section and Advanced Settings also lets us limit how many values can appear in each of those boxes along with some other behavioral controls you'll notice that by default the advanced facets are bored together so that multi selections include all of those values rather than only including records that match every single selected thing there are also some special facets that can be turned on on the advanced screen and further down this is actually a setting that applies to all of you find not just the advanced screen there is a translated facets setting which you can use to run facet values through view finds translation system by default the majority of values in the solar index are just presented as is which means that even though view finds interface supports many languages the values that are coming through are just going to display in whatever form they were indexed in but sometimes you may have a controlled vocabulary in a field that you want to translate into many languages and this provides the mechanism for doing that you'll notice that format is translated by default because if you find comes with translations for all of the format values it uses additionally the call number first field which contains the top-level Library of Congress call number category is translated and you'll see that there's a : call number first after the name of the facet field that's being translated this is specifying which language file the translations are found in again we'll go into that in more depth when we talk about internationalization in the future but for now just be aware that if you want to translate your facet values you can set that up here finally there's a home page and home page settings section and these control on the front page of you find these top-level browse categories you can set any facet field to display here and you can control how many values to show just by changing that homepage section of facets that I and I also note that if you do make changes there or if you reindex your records you may wish to clear a few finds internal cache to make those changes visible if you don't see them right away so that covers all of the highlights of facets I and I and now we are going to look at one last thing which is the most complex configuration file and view find called search specs yeah mol which is the file that contains specifications for all of you find search handlers so I'm once again going to copy config slash view find slash search specs that yamo into my local config you find directory so I have a local copy I can tinker with I'm going to go into visual studio code and look at that so search specs dot yeah Mille is as the name says a yeah mol file which is a particular way of structuring data for machine readability because of the complex nested nature of the data we're using here Gamal was chosen early and few finds development as a workable format for representing all of the things we need to do here Gamal is not the most user-friendly format so be careful with it sometimes if you have a space in the wrong place so that settings don't align with one another it can get confused and have trouble reading it so proceed with caution but for today we're just going to do a quick high-level browse of this file so that you are aware it's of its existence and can do more with it later when you understand more about solar so first of all the file begins with a long long series of comments describing all of the settings and how they work and providing some examples and under all of that is the actual set of search handler definitions so you'll see that each of these starts with the name of the Handler and a colon and then some data indented underneath it so for example here is the author handler here is the isn handler here is the subject handler etc again without getting too deep into solar there are essentially two different modes of searching that view find uses one is called EDA Smacks and this method works by just taking a list of fields and some relevance ranking numbers that establish the relative importance of those fields to one another and then solar will just take the user search search it across all of those fields and then return a relevance ranked result based on your preferences so author is a perfect example of one of these dis max type searches if you find has a number of different fields that contain author data in different formats and from different parts of the incoming records and so here we're saying primary author is the most important the fuller version of that primary authors name if it's present may also be quite so we're giving primary author a 100 point boost we're giving author fuller a 50 point boost and they were including all the other author fields as being of equal relevance of secondary authors corporate authors variant forms of names etc it's important to note that these numbers in and of themselves don't actually mean anything they're just relative to one another so we're saying that an author is a hundred times more important than a secondary author it is often useful to tweak and experiment with numbers if you're trying to change the order of your results to to find the balance that works right and there there's as much art as science in that process let's actually scroll down and do an actual example of relevance ranking so I mentioned earlier that the default search Handler and beyou find is all fields and sure enough here's the definition for that and it really is all fields because we have here a bunch of title fields that are boosted quite a lot series fields boosted a little authors boosted a fair amount topics boosted almost as heavily as titles and then a few other bits and pieces notice that there's this field called all fields and this is sort of a catch-all field that during indexing grabs all of the text of the records that you're indexing so when we search all fields we are sure to find the words we're searching for if they're in the records somewhere but we have all this more granular relevance boosting to try to prioritize matches in more meaningful places so let's just show an example of this in my example index here I search for the word finds and reset all my filters I get two different kinds of matches here I have a couple of title matches up at the top because solar has decided that my search of finds matches the word fine in fine arts this is because Solar is configured to be tolerant of word endings and you know match rather than not match if given the choice there are also a couple of these records where the name Greg finds is present so suppose in my particular situation I think people are going to be searching for names more often than they are searching for titles what I can do is go into my search specs tamil file and just make the numbers on the author fields bigger so I'm going to just say authors are 10 times more important than the default so I'm changing the 300 point boost on author to 3000 points and 150 point boost on author fuller to 1500 now if I go back to view find and i refresh my search results I see that the author matches are at the top of the list obviously this kind of search tuning can be really tricky and complicated because you're always going to be dealing with competing concerns and a change that fixes one scenario may break a different one so I say approach it with caution but keep in mind that you have a lot of power and flexibility for making these kinds of tweaks but anyway getting back to what I was saying earlier about there being two types of searches I've showed the the dis max style search is fairly extensively now but there's also a more detailed query syntax that solar can do that uses boolean operators and specific matching syntax and specific fields this is often referred to as leucine the syntax because solar is built on an indexing engine called leucine long ago before the dis max functionality had been fully developed view find relied much more heavily on the basic leucine searching and search specs that yeah mol includes some fairly rich capabilities for constructing really complicated queries across multiple fields in an effort to do what dis max now just does for free most of that has been eliminated because it was very complicated and you can do the same thing more easily but there are still some scenarios where having the control of the leucine syntax is valuable so for example if we go to the very bottom of the file here there is an OCLC number search so OCLC numbers are just a commonly used identifier for records they're found in a lot of places if you find indexes OCLC numbers into one of the solar fields but we don't want to treat that like just any other keyword because if you've seen OCLC numbers you'll know that sometimes they have weird allow alphanumeric prefixes on them sometimes they have leading zeros on them view finds standardizes them to a particular format so what we want to be able to do is if a user says they're searching for an OCLC number we don't know what they're going to type but we want to be able to manipulate their input to normalize it the same way that view find normalizes the values that it stores in its index so that what the user types is more likely to match what we have in our index and so that's what this custom section is doing we define a custom Monge Monge being computer programmer slang for data manipulation we name it OCLC nom and we do some regular expression matching which is just a way of transforming text to say get rid of anything that's not a number and get rid of any zeros at the beginning of the string then down here in query fields we tell a view find that we want to search the OCLC nom solar field using the OCLC mmm custom munge and the end result of that is the user can type pretty much whatever prefix nonsense they want do you find I'll clean it up do the search on just the numeric part and if there's a match it will be found if you want to look at something even more complicated and I'm not going to walk through it right now the call number search does some even more complex matching and it also does some ranking so it has one munge which tries to find an exact match on a call number and another munge that tries to find a loser match and it gives greater relevance ranking to exact matches to try to boost things that are closer to what the user has typed we are running short on time but just to bring this all full circle I'm going to go back to my searches i and i and i'm going to find the basic searches section I'm going to add OCLC nom equals OCLC number to this to expose that custom OCLC option from search specs not yamo that I was showing you earlier and now if I go to view find refresh it so I get more options do an OCLC number search I can do a search for a ridiculously over prefixed number that I happen to know is in the system and even though I put all this nonsense here it finds a match on the number and brings back the record that matches it so that's our time for today I know that was a lot but now you've seen some of the many configuration options that are available in view find and hopefully understand how they work a little bit better we'll do more next month and in the meantime feel free to reach out on slack mailing lists or elsewhere if you have questions or if we can help in any other way thank you

videos/configuring_search_and_facet_settings.1581961578.txt.gz · Last modified: 2020/02/17 17:46 by demiankatz