Table of Contents
These directions are based on the excellent documentation provided by the Solrmarc project. Below is a simplified version. For more detailed information about custom indexing and translation, See the Solrmarc wiki.
Adding a facet to the Narrow Search box is a relatively straightforward procedure. Because facets rely on an index, these instructions assume we are beginning with one of three possible scenarios:
1. The index to be used already exists
2. No index currently exists; the data in the MARC record will be indexed directly (i.e. data is already a text string with no need or desire for normalization)
3. No index currently exists; data is encoded and will need to be translated into text strings first
Example used: Adding the date of publication as a facet
First verify existence of the index and make sure it contains the data you want. /import/marc.properties contains the basic mapping of MARC fields to the solr index. Each line of the file is a single index. The name of the Solr index is before the equal sign. Immediately following the equal sign is one or more combinations of the three digit MARC tag number and any subfields that are indexed together. A colon separates different fields or subfields to be indexed separately. A number in brackets indicates the byte-position to be indexed. “First” at the end of the line means only the first field will be indexed.
In our example, the publishDate index is a custom index not pulled directly from a MARC tag. The DateOfPublication custom indexing routine provided the date to the indexer, but we do not need to know how it does that to proceed. We have identified that the index in question is called publishDate, which is all we need right now.
The file facets.ini contains the lists of facets to be viewed. Each line is a facet, in the form:
SolrIndexName = Facet Display Name
(In a clean installation, this includes the first two facets, Institution and Library, which may not be needed. They can be commented out by inserting a semicolon at the beginning of the line). To add publication date as a facet, simply add the following line to the file:
publishDate = Publication Year
*Note that the facets are displayed in order; if publishDate is added at the bottom of the list, it will appear at the bottom of the Narrow Search box below the rest.
That's it. Changes are immediately reflected on the search results page.
If, due to typos, etc, a non-existent index is added to the list, this will break the Narrow Search box completely and no facets will display.
Index does not exist, no translation needed
Example: a customized genre facet based on the genre heading fields and subfields
If the desired facet is not already an existing index, we must first create the index. This will require re-indexing of the entire database.
The file /import/marc.properties maps MARC fields and subfields to an index. For a more detailed description of the file format than provided above, see the SolrMarc documentation.
In this example, we want to use the strings found in the 655 subfield a, and the subfield v data from the 650, 651, and 600 fields. The index will be called “allgenre”. To do this, we add the following line to marc.properties:
allgenre = 655ab:650v:600v:651v
We now need to tell Solr what to do with the new index. File /solr/biblio/conf/schema.xml defines the fields in Solr. In the <fields> section, we add
<field name="allgenre" type="textFacet" indexed="true" stored="true" multiValued="true" termVectors="true"/>
After the database has been re-indexed, the allgenre index will exist. The facet can be added to the facets.ini file as described above.
A Useful Shortcut -- Dynamic Fields
If you are using VuFind 1.3 or newer, you can take advantage of dynamic Solr fields to avoid modifying your schema. VuFind is configured to recognize certain field suffixes and treat them as new fields without requiring explicit definitions. In the example above, you could use the field name allgenre_txtF_mv instead of allgenre in marc.properties and skip the schema.xml step.
See this page for all of the available suffixes.
Index does not exist, translation needed
Example: Instrument types for music
For encoded data (such as data found in the 007, 008, or several 04X fields), we must first map the data to text strings. Luckily, the MARC format is well documented and lists of what each code means are readily available on the MARC Code Lists and at OCLC's Formats and Standards page.
Create a text file in import/solrmarc and name it “something.properties”. In this case, I have created the file import/solrmarc/instrument_map.properties to contain the mapping. The file will translate the two-letter codes used in the MARC 048 field into readable text. Each line of the file contains a single possible code and its translation. Example:
ka = Piano kb = Organ kc = Harpsichord kd = Clavichord
In import/marc.properties (or the local MARC mappings file, if you want to separate your local changes from the defaults provided by VuFind), we add the following line:
instrument_facet = 048a[0-1], instrument_map.properties
Of note: The numbers in brackets indicate that the system should look at only the first two bytes in the 048 subfield a field (0-1 mean position 0 to position 1). A comma separates the field information from the name of the file used to translate the data, in this case, instrument_map.properties.
A line defining the new index must be added to Solr's schema.xml file (found in solr/biblio/conf under your VuFind installation) and a line for the new facet will be added to facets.ini (see above for instructions).
string vs. text fields
If you set up a facet field and see individual words instead of complete facet values, this most likely means that you have faceted on an analyzed field (usually of type “text” in VuFind's Solr schema). Solr faceting displays the terms stored in the index, not the original raw text provided at index-time. Thus, if you facet on an analyzed field that tokenizes and manipulates strings, strange facet values may appear to the end user. Most of the time, you only want to facet on simple string fields to avoid this problem. This is why the default schema includes some apparently duplicate values – it is generally necessary to use different fields for search-oriented and facet-oriented tasks.