About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
administration:fault_tolerance_and_load_balancing

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
administration:fault_tolerance_and_load_balancing [2017/04/11 06:35] – [Required Software Components] emaijalaadministration:fault_tolerance_and_load_balancing [2023/03/30 19:31] (current) – Update image location in sitemap tidy-up cmurdoch
Line 1: Line 1:
 ====== Fault Tolerance and Load Balancing ====== ====== Fault Tolerance and Load Balancing ======
  
-This page contains strategies, examples and notes on achieving high availability with VuFind.+This page contains strategies, examples and notes on achieving high availability with VuFind®.
  
 ===== High-Availability Strategies ===== ===== High-Availability Strategies =====
 ==== Load-Balancing ==== ==== Load-Balancing ====
-Load-balancing distributes requests across multiple servers, or VuFind nodes. If configured correctly, a load-balanced configuration can provide a degree of high-availability, enabling per-node downtime or maintenance.+Load-balancing distributes requests across multiple servers, or VuFind® nodes. If configured correctly, a load-balanced configuration can provide a degree of high-availability, enabling per-node downtime or maintenance.
  
 If possible, load balancers should be redundant themselves. Many vendor load balancers (F5, Barracuda, Kemp, etc.) and open-source solutions (Zen, HAProxy, etc.) have documented solutions to achieve high availability. If possible, load balancers should be redundant themselves. Many vendor load balancers (F5, Barracuda, Kemp, etc.) and open-source solutions (Zen, HAProxy, etc.) have documented solutions to achieve high availability.
  
-It may be noted that many installations separate the VuFind front-end (Apache, MySQL, etc.) from the Solr back-end in order to minimize the risk associated with downtime on a particular server and keep service management concepts simple.+It may be noted that many installations separate the VuFind® front-end (Apache, MySQL, etc.) from the Solr back-end in order to minimize the risk associated with downtime on a particular server and keep service management concepts simple.
  
 Below are two example configurations: Below are two example configurations:
  
-{{:vufind2:vufind-ha.png?700|}}+{{:administration:vufind-ha.png|}}
 === Front-End Sessions === === Front-End Sessions ===
-When configuring a load-balancer, it's important to keep in mind how user sessions will be affected. If a user logs in, they might only have a session with one of the many nodes that can serve requests, meaning they may have to log in multiple times while trying to use VuFind (a bad thing).+When configuring a load-balancer, it's important to keep in mind how user sessions will be affected. If a user logs in, they might only have a session with one of the many nodes that can serve requests, meaning they may have to log in multiple times while trying to use VuFind® (a bad thing).
  
 There are two primary strategies for dealing with this issue: There are two primary strategies for dealing with this issue:
Line 22: Line 22:
 Most load-balancers can be configured such that a user will always hit the same node during their session. This is often referred to as sticky sessions. The strategies for achieving this goal depend on the load balancer but can range from using cookies to keeping track of IP addresses. Consult the documentation for the load balancer to learn about the strategies available. Most load-balancers can be configured such that a user will always hit the same node during their session. This is often referred to as sticky sessions. The strategies for achieving this goal depend on the load balancer but can range from using cookies to keeping track of IP addresses. Consult the documentation for the load balancer to learn about the strategies available.
  
-The issue with this strategy is that if a front-end node goes down, all the users using that node will be kicked off and will need to reauthenticate when served by another node. Additionally, load may not be ideally distributed across the nodes, depending on the strategy used.+The issue with this strategy is that if a front-end node goes down, all the users using that node will be kicked off and will need to re-authenticate when served by another node. Additionally, load may not be ideally distributed across the nodes, depending on the strategy used.
  
 == Session Distribution == == Session Distribution ==
Line 32: Line 32:
 There are two primary options for keeping multiple Solr nodes synchronized: There are two primary options for keeping multiple Solr nodes synchronized:
   * SolrCloud: [[https://cwiki.apache.org/confluence/display/solr/SolrCloud]] (more complex, but more reliable)   * SolrCloud: [[https://cwiki.apache.org/confluence/display/solr/SolrCloud]] (more complex, but more reliable)
-  * Traditional Master-Slave Replication: [[administration:solr_replication|Solr Replication]] (less complex, but less flexible)+  * Traditional Replication: [[administration:solr_replication|Solr Replication]] (less complex, but less flexible)
  
 ===== Example Configuration ===== ===== Example Configuration =====
  
-This example is by no means complete, but it gives some hints and guidelines for creating a fault tolerant and load balanced VuFind service. Configuration of a load balancer is out of the scope of this document. A hardware load-balancer could be used as well as a software-based solution such as HAProxy.+This example is by no means complete, but it gives some hints and guidelines for creating a fault tolerant and load balanced VuFind® service. Configuration of a load balancer is out of the scope of this document. A hardware load-balancer could be used as well as a software-based solution such as HAProxy.
  
-There are of course many ways to improve fault tolerance of a service, but this example describes one way to implement a fault-tolerant and load balanced VuFind. The basic idea is to replicate a single VuFind server into at least three separate servers. Three is an important number in that it allows clustered services to always have a successful vote and understanding of the leader etc. This is often called having a quorum. It aims to avoid the so-called split-brain situation where two groups of servers are having trouble communicating with each other but continue to serve users' requests. Having at least three servers also allows one to fail without the total capacity suffering as much as with only two servers.+There are of course many ways to improve fault tolerance of a service, but this example describes one way to implement a fault-tolerant and load balanced VuFind®. The basic idea is to replicate a single VuFind® server into at least three separate servers. Three is an important number in that it allows clustered services to always have a successful vote and understanding of the leader etc. This is often called having a quorum. It aims to avoid the so-called split-brain situation where two groups of servers are having trouble communicating with each other but continue to serve users' requests. Having at least three servers also allows one to fail without the total capacity suffering as much as with only two servers.
  
-While it's easy to replicate VuFind's PHP-based front-end on multiple servers, more effort is required to replicate also the most important underlying services, MySQL/MariaDB database and Solr index.+While it's easy to replicate VuFind®'s PHP-based front-end on multiple servers, more effort is required to replicate also the most important underlying services, MySQL/MariaDB database and Solr index.
  
 ==== Required Software Components ==== ==== Required Software Components ====
Line 46: Line 46:
 All of the components below will have an instance running on each server node. All of the components below will have an instance running on each server node.
  
-  * VuFind, of course+  * VuFind®, of course
   * MariaDB Galera Cluster   * MariaDB Galera Cluster
-    * This replaces the standard MySQL or MariaDB installation and allows VuFind to store searches, sessions, user data etc. in a consistent way on any of the nodes.+    * This replaces the standard MySQL or MariaDB installation and allows VuFind® to store searches, sessions, user data etc. in a consistent way on any of the nodes.
     * See https://mariadb.com/kb/en/mariadb/getting-started-with-mariadb-galera-cluster/ for instructions on how to get started with a database cluster.     * See https://mariadb.com/kb/en/mariadb/getting-started-with-mariadb-galera-cluster/ for instructions on how to get started with a database cluster.
   * SolrCloud   * SolrCloud
-    * This replaces the single Solr instance VuFind comes with.+    * This replaces the single Solr instance VuFind® comes with.
     * See https://cwiki.apache.org/confluence/display/solr/SolrCloud for information on setting up SolrCloud.     * See https://cwiki.apache.org/confluence/display/solr/SolrCloud for information on setting up SolrCloud.
-    * There's also a stand-alone Solr 5.x installation maintained by the National Library of Finland with a VuFind-compatible schema. It has some custom fields but could be used as a base for the SolrCloud nodes. See https://github.com/NatLibFi/NDL-VuFind-Solr for more information.+    * There's also a stand-alone Solr 5.x installation maintained by the National Library of Finland with a VuFind®-compatible schema. It has some custom fields but could be used as a base for the SolrCloud nodes. See https://github.com/NatLibFi/NDL-VuFind-Solr for more information.
    
 === Load Balancer and Apache Setup ===  === Load Balancer and Apache Setup === 
Line 67: Line 67:
   RPAF_SetPort      On   RPAF_SetPort      On
    
-=== VuFind Setup ===+=== VuFind® Setup ===
  
-Here it is assumed that the MariaDB Galera Cluster and SolrCloud are already up and running on each node. Basic setup of VuFind is described elsewhere, and it is recommended to test it out on a single node first to see that everything works properly. The servers are assumed to have IP addresses from 10.0.0.1 to 10.0.0.3.+Here it is assumed that the MariaDB Galera Cluster and SolrCloud are already up and running on each node. Basic setup of VuFind® is described elsewhere, and it is recommended to test it out on a single node first to see that everything works properly. The servers are assumed to have IP addresses from 10.0.0.1 to 10.0.0.3.
  
-  * Configure VuFind to connect to the MariaDB node on localhost. +  * Configure VuFind® to connect to the MariaDB node on localhost. 
-  * Configure VuFind to connect to all the SolrCloud nodes by specifying their addresses as an array in config.ini like this so that it will automatically try other Solr nodes if the local one is unavailable:+  * Configure VuFind® to connect to all the SolrCloud nodes by specifying their addresses as an array in config.ini like this so that it will automatically try other Solr nodes if the local one is unavailable:
  
   [Index]   [Index]
Line 79: Line 79:
   url[] = http://10.0.0.3:port/solr   url[] = http://10.0.0.3:port/solr
  
-  * Configure VuFind to use database-stored sessions in config.ini as this allows the sessions to be available on all nodes so that no sticky sessions are required.:+  * Configure VuFind® to use database-stored sessions in config.ini as this allows the sessions to be available on all nodes so that no sticky sessions are required.:
  
   [Session]   [Session]
   type = Database   type = Database
  
-  * Set up the load balancer to use a health probe to check server status from VuFind so that it knows whether VuFind is available. From version 2.5 VuFind has an API that can be used to check the status at http://10.0.0.x/AJAX/SystemStatus. It returns a simple string "OK" with status code 200 if everything is working properly. Otherwise it returns a message describing the issue starting with "ERROR".+  * Set up the load balancer to use a health probe to check server status from VuFind® so that it knows whether VuFind® is available. From version 2.5 VuFind® has an API that can be used to check the status at http://10.0.0.x/AJAX/SystemStatus. It returns a simple string "OK" with status code 200 if everything is working properly. Otherwise it returns a message describing the issue starting with "ERROR".
  
   * Set up scheduled tasks like removal of expired searches to run on only one of the servers.   * Set up scheduled tasks like removal of expired searches to run on only one of the servers.
Line 94: Line 94:
 == Static File Timestamps == == Static File Timestamps ==
  
-It is important to make sure all the static files served by VuFind have identical timestamps on all the servers. There are multiple ways to achieve this, such as:+It is important to make sure all the static files served by VuFind® have identical timestamps on all the servers. There are multiple ways to achieve this, such as:
   * Deploy the files using a .zip package so that timestamps are preserved.   * Deploy the files using a .zip package so that timestamps are preserved.
   * Deploy from a git repository and use [[https://github.com/MestreLion/git-tools/blob/master/git-restore-mtime-core|git-restore-mtime-core]] to set timestamps to the latest commit.   * Deploy from a git repository and use [[https://github.com/MestreLion/git-tools/blob/master/git-restore-mtime-core|git-restore-mtime-core]] to set timestamps to the latest commit.
Line 113: Line 113:
 == Asset Pipeline and other shared files == == Asset Pipeline and other shared files ==
  
-Load balancing and VuFind's asset pipeline are currently not compatible. The problem is that the request for which the assets were created could be served by a different server than the one the client requests them from. Without software modifications there are at least the following possibilities:+Load balancing and VuFind®'s asset pipeline are currently not compatible. The problem is that the request for which the assets were created could be served by a different server than the one the client requests them from. Without software modifications there are at least the following possibilities:
   * Use a shared disk for all the load-balanced servers. This might have performance and reliability implications.   * Use a shared disk for all the load-balanced servers. This might have performance and reliability implications.
   * Use sticky session in the load balancer. This has its own downsides like causing future requests from clients to go to the same server as before, which could cause imbalance between the servers especially when new ones are added.   * Use sticky session in the load balancer. This has its own downsides like causing future requests from clients to go to the same server as before, which could cause imbalance between the servers especially when new ones are added.
Line 124: Line 124:
   * At the time of writing this it's not recommended to run Piwik in a load balancer environment like this. A [[http://piwik.org/faq/new-to-piwik/faq_134/|Piwik FAQ entry] suggests using a single database server. In any case statistics tracking could be considered less critical so it could reside on a single server.   * At the time of writing this it's not recommended to run Piwik in a load balancer environment like this. A [[http://piwik.org/faq/new-to-piwik/faq_134/|Piwik FAQ entry] suggests using a single database server. In any case statistics tracking could be considered less critical so it could reside on a single server.
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : 
 ---- ----
  
administration/fault_tolerance_and_load_balancing.1491892519.txt.gz · Last modified: 2017/04/11 06:35 by emaijala