Table of Contents
Using Solr Shards
Solr is capable of combining results from multiple indexes on different servers.
Use Cases
Solr sharding may be useful under several circumstances:
- You have such a large index that you need more than one server to handle it (the best reason to use shards – though SolrCloud is a more modern solution to this problem; see Fault Tolerance and Load Balancing)
- You have multiple VuFind® instances for different specialized purposes, and you want to create an additional “meta-instance” to search all of them at once (not recommended due to feature degradation and relevance ranking problems, but possible)
Pitfalls
When you consider using shards in VuFind®, you might know about some problems, that you should keep in mind.
Feature degradation
Some features of Solr will not work:
- Elevation
- More like this
- Joins
For a complete and current list of supported operations and handlers with sharding consult http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
Elevation is not used by VuFind® by default currently, but more like this is. So the “More like this” results will get lost, when you use sharding.
StripFields
There could be side effects for cores with different schema (slight differences are enough). If one index has a field another sharded index has not AND if this field is used in searchspecs.yaml or as a facet (in facets.ini), you might get into some trouble. In this case, any query will fail and will return no results. One solution for that is using the StripFields option in searches.ini (or to cut the facet from facets.ini). With StripFields you can remove fields defined in searchspecs.yaml from your query, if a certain shard is being used. But be warned: the results will be different from a query in one single shard, because the stripped field will not be used in the query. One more confusion with that: This is only true for extended search (i.e. if one uses truncation or special search operations avoiding VuFind® to use Dismax). Dismax currently does not care about missing fields and different schemas.
To avoid that kind of trouble, you could consider to use sharding in a different way and to split shard results into separate tabs (each tab containing a result from only one shard). Or you could make all indices involved in sharding completely similarly structured. One useful strategy for allowing flexibility without creating incompatible schemas is to use dynamic field definitions for custom fields.
Sorting
Be especially careful that any fields used for sorting are present in all shards. Attempting to sort using an unsupported field will cause problems.
Requirements
- The indexes being combined must have identical (or at least very similar) schemas.
- The indexes being combined MUST NOT have overlapping record IDs.
Configuration
All shard-related configurations can be found in searches.ini. Comments within the configuration file explain how they work.