Table of Contents
Using Solr Shards
IMPORTANT: This page refers to features that were added in VuFind 1.1. If you are using an earlier version, you will have to upgrade.
Solr is capable of combining results from multiple indexes on different servers.
Solr sharding may be useful under several circumstances:
- You have such a large index that you need more than one server to handle it (the best reason to use shards)
- You have multiple VuFind instances for different specialized purposes, and you want to create an additional “meta-instance” to search all of them at once (not recommended due to feature degradation and relevance ranking problems, but possible)
When you consider using shards in Vufind, you might know about some problems, that you should keep in mind.
Some features of Solr will not work:
- More like this
For a complete and current list of supported operations and handlers with sharding consult http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
Elevation is not used by Vufind by default currently, but more like this is. So the “More like this” results will get lost, when you use sharding.
There could be side effects for cores with different schema (slight differences are enough). If one index has a field another sharded index has not AND if this field is used in searchspecs.yaml or as a facet (in facets.ini), you might get into some trouble. In this case, any query will fail and will return no results. One solution for that is using the StripFields option in searches.ini (or to cut the facet from facets.ini). With StripFields you can remove fields defined in searchspecs.yaml from your query, if a certain shard is being used. But be warned: the results will be different from a query in one single shard, because the stripped field will not be used in the query. One more confusion with that: This is only true for extended search (i.e. if one uses truncation or special search operations avoiding VuFind to use Dismax). Dismax currently does not care about missing fields and different schemas.
To avoid that kind of trouble, you could consider to use sharding in a different way and to split shard results into separate tabs (each tab containing a result from only one shard). Or you could make all indices involved in sharding completely similarly structured. One useful strategy for allowing flexibility without creating incompatible schemas is to use dynamic field definitions for custom fields. As of version 1.3, VuFind includes several dynamic field types by default – see VUFIND-480.
Be especially careful that any fields used for sorting are present in all shards. Attempting to sort using an unsupported field will cause problems.
- The indexes being combined must have identical (or at least very similar) schemas.
- The indexes being combined MUST NOT have overlapping record IDs.
Beginning with VuFind 2.0, all shard-related configurations can be found in searches.ini. Comments within the configuration file explain how they work.