This is an old revision of the document!

Fault Tolerance and Load Balancing

This page is by no means complete, but it gives some hints and guidelines for creating a fault tolerant and load balanced VuFind service. Configuration of a load balancer is out of the scope of this document. A hardware load-balancer could be used as well as a software-based solution such as HAProxy.

There are of course many ways to improve fault tolerance of a service, but this page describes one way to implement a fault-tolerant and load balanced VuFind. The basic idea is to replicate a single VuFind server into at least three separate servers. Three is an important number in that it allows clustered services to always have a successful vote and understanding of the leader etc. This is often called having a quorum. It aims to avoid the so-called split-brain situation where two groups of servers are having trouble communicating with each other but continue to serve users' requests. Having at least three servers also allows one to fail without the total capacity suffering as much as with only two servers.

While it's easy to replicate VuFind's PHP-based front-end on multiple servers, more effort is required to replicate also the most important underlying services, MySQL/MariaDB database and Solr index.

Required Software Components

All of the components below will have an instance running on each server node.

VuFind, of course
MariaDB Galera Cluster
- This replaces the standard MySQL or MariaDB installation and allows VuFind to store searches, sessions, user data etc. in a consistent way on any of the nodes.
- See https://mariadb.com/kb/en/mariadb/getting-started-with-mariadb-galera-cluster/ for instructions on how to get started with a database cluster.
SolrCloud
- This replaces the single Solr instance VuFind comes with.
- See https://cwiki.apache.org/confluence/display/solr/SolrCloud for information on setting up SolrCloud.
- There's also a stand-alone Solr 5.x installation maintained by the National Library of Finland with a VuFind-compatible schema. It has some custom fields but could be used as a base for the SolrCloud nodes. See https://github.com/NatLibFi/NDL-VuFind-Solr for more information.

Load Balancer and Apache Setup

To enable the front-end servers' Apaches to get the user's original IP address, configure the load balancer to set the X-Forwarded-For header and install the mod_rpaf extension to Apache. We use the following configuration:

LoadModule        rpaf_module modules/mod_rpaf.so
RPAF_Enable       On
RPAF_ProxyIPs     [proxy IP address]
RPAF_Header       X-Forwarded-For
RPAF_SetHostName  On
RPAF_SetHTTPS     On
RPAF_SetPort      On

VuFind Setup

Here it is assumed that the MariaDB Galera Cluster and SolrCloud are already up and running on each node. Basic setup of VuFind is described elsewhere, and it is recommended to test it out on a single node first to see that everything works properly. The servers are assumed to have IP addresses from 10.0.0.1 to 10.0.0.3.

Configure VuFind to connect to the MariaDB node on localhost.
Configure VuFind to connect to all the SolrCloud nodes by specifying their addresses as an array in config.ini like this so that it will automatically try other Solr nodes if the local one is unavailable:

[Index]
url[] = http://10.0.0.1:port/solr
url[] = http://10.0.0.2:port/solr
url[] = http://10.0.0.3:port/solr

Configure VuFind to use database-stored sessions in config.ini as this allows the sessions to be available on all nodes so that no sticky sessions are required.:

[Session]
type = Database

Set up the load balancer to use a health probe to check server status from VuFind so that it knows whether VuFind is available. From version 2.5 VuFind has an API that can be used to check the status at http://10.0.0.x/AJAX/SystemStatus. It returns a simple string “OK” with status code 200 if everything is working properly. Otherwise it returns a message describing the issue starting with “ERROR”.

Set up scheduled tasks like removal of expired searches to run on only one of the servers.

Set up AlphaBrowse index creation (if use use it) to run on ALL servers. AlphaBrowse index files are not automatically distributed to other nodes by Solr.

Special Considerations

Static File Timestamps

It is important to make sure all the static files served by VuFind have identical timestamps on all the servers. There are multiple ways to achieve this, such as:

Deploy the files using a .zip package so that timestamps are preserved.
Deploy from a git repository and use git-restore-mtime-core to set timestamps to the latest commit.
Use the touch command to set all timestamps to a manually defined moment.

In any case, make sure the timestamps are advanced if a file is changed.

Apache ETag

ETag is used between a web server and a browser in addition to the last modification time to check if a file has been changed. The server sends an ETag header along the file, and every time the browser requests the file from the server it includes the ETag in the request. If the tags match (and the file is not newer than If-Modified-Since header in the request), the file is returned from the server. Otherwise the server can optimize the transfer by returning just a 304 “Not Modified” status code without returning the whole file.

By default Apache uses the inode timestamp in addition to file's modification time when creating the ETag, which will cause trouble with caching when the file may be served from any of three servers. Therefore it's important to configure Apache to use only the file modification time and size with the following directive:

FileETag MTime Size

Now the ETag of a file will be identical on all servers as long as the modification time and file size are identical.

Additional Implementation Notes

If you use Shibboleth, configure it to use ODBC to store its data so that it's available on all the server nodes. If you're running RHEL/CentOS 6.x, check out https://issues.shibboleth.net/jira/browse/SSPCPP-473 first.
At the time of writing this it's not recommended to run Piwik in a load balancer environment like this. A [[http://piwik.org/faq/new-to-piwik/faq_134/|Piwik FAQ entry] suggests using a single database server. In any case statistics tracking could be considered less critical so it could reside on a single server.

VuFind Documentation

Table of Contents