Performance

This page contains notes on improving the performance of VuFind in various ways. See the table of contents for a quick overview of topics.

Java Tuning

After the initial setup and importing of records, you may notice a significant degradation of performance. This means it's time to do some JVM tuning. However, if you look around the Internet, you'll quickly realize performance tuning for Java applications is a bit of a black art. However, we'll provide some basic settings to help improve your performance, as well as resources for more in-depth tuning settings.

As a short side-note, these recommendations are for a “server-class” machine with at least 2 GB RAM and 2 processors. There are limits on the architecture that may require editing the sizes provided in the settings.

Setting Options

This section refers to VuFind 3.x and later. For instructions on earlier releases, see the change history of the page and read an earlier version.

It's recommended that you run the most recent OpenJDK.

Heap Space

The heap space in Java is the amount of memory allocated to Java for objects. One of the more common issues with a full index of records is an java.lang.OutOfMemoryError exception due to the heap space size. By default, VuFind sets this to 1GB. You can modify this using the SOLR_HEAP environment variable; for example:

SOLR_HEAP=3800M $VUFIND_HOME/solr.sh start

Note: For very large indices (> 4 million documents) you will need more than 4GB of RAM, and a 64 bit OS to utilize it.

Please note that simply increasing the heap size to a very large one is not recommended, since it may lead to unnecessarily long garbage collection pauses. With very large heaps there's also a caveat: at 32 GB Java switches to a larger pointer size which means that pointers use up more memory than with lower sizes. Therefore it is suggested to keep the heap at below 32 GB or increase it well past it (see this page for more information).

Garbage Collection

Since version 5 (included in VuFind 3), Solr has done some garbage collection auto-tuning by default; the suggestions below are probably not necessary for these releases in most situations, and you should probably not tinker with settings unless you have a known problem you are trying to solve.

Garbage collection is how aggressive the JVM is with clearing out unneeded objects. You can read the Java manual's Tuning Garbage Collection section for more in-depth information.

Collecting Data

If you want to learn more about your garbage collection performance, you can add the -Xlog:gc:[filename] parameter to your SOLR_ADDITIONAL_JVM_OPTIONS variable to create a log file containing data on Java's behavior. For example, if you want to store logs in your existing Jetty log folder and use the current date as part of the filename, you could start Solr like this:

SOLR_ADDITIONAL_JVM_OPTIONS="-Xlog:gc:$VUFIND_HOME/solr/vufind/logs/gc-`/bin/date +%F-%H-%M`.log" ./solr.sh start

Once you have a log file, you can use the gcviewer tool to get a visual representation of its contents. This is very helpful in measuring whether your configuration changes have had a positive impact!

Note that as of VuFind 5.0 (and possibly earlier), the default Solr configuration in VuFind creates a garbage collection log in $VUFIND_HOME/solr/vufind/logs/solr_gc.log, so it may be unnecessary to add extra parameters. However, the example above logs in the specific format expected by GCViewer and shows a way you can control the log filenames, so it is retained as potentially useful for reference.

Note that in older versions of Java (8 and earlier), -Xlog:gc was called -Xloggc.

Setting SOLR_ADDITIONAL_JVM_OPTIONS

As noted above, if you need to send extra options to your Java startup command line, you can use the SOLR_ADDITIONAL_JVM_OPTIONS environment variable. You can export this to your environment, override it at the beginning of the command line you use to start the solr.sh script, or you can hard-code changes into the solr.sh script, depending on your preferences.

Best practices for Java tuning have evolved over time and with new releases of Java; try a search for “java tuning” in the search engine of your choice to learn more about current trends.

Resources

Here's a list of websites on Java performance tuning; note that some of these are now quite old and may be less relevant to newer JVMs:

Tuning Garbage Collection Outline - older information on the 1.4.2 garbage collection, but still applicable for 1.5+.
Java Tuning White Paper
Java Garbage Collection Boot Camp
Java Tuning Made Easier
Use Lucene’s MMapDirectory on 64bit platforms, please! - discussion of the MMapDirectory setting, available in Solr 3.x, which may help performance on 64-bit systems

PHP Tuning

On PHP 5.4, installing a PHP cache like APC can significantly improve performance by reducing the amount of time spent by PHP parsing and compiling code. PHP 5.6 comes with build-In OpCache, which is slightly faster than APC. When upgrading, remember to disable the Apache APC module. OpCache should be configured to 64 MB.

Asset Pipeline

Starting with VuFind 3.1, there is an optional “asset pipeline” which can be used to combine Javascript and/or CSS files together to reduce the number of HTTP requests necessary to load VuFind pages. This is turned off by default but can be activated using the asset_pipeline setting in config.ini.

Theme Compiler

Starting with VuFind 4.1, there is a command-line tool known as the “theme compiler” which can be run to flatten a hierarchy of themes into a single flat theme, reducing the amount of file searching VuFind needs to do in order to find assets and templates.

To use a compiled theme, follow these steps:

1. Run “php $VUFIND_HOME/public/index.php compile theme [your theme] [your compiled theme]” at the command line.

2. Update your config.ini file to use [your compiled theme] instead of [your theme] in the theme setting.

Note: NEVER EDIT THE COMPILED THEME. Instead, edit the source theme and recompile it when you need to make changes. You must use the –force switch of the compiler if you need to overwrite your compiled theme with a new version.

Session Handling

When using database sessions, deleting expired sessions may be a surprisingly heavy process. On a busy site it may be beneficial to turn off PHP's session garbage collection (set session.gc_probability to 0 in php.ini) and run VuFind's expire_sessions utility regularly. This makes sure that garbage collection is done outside Apache that serves user requests and with a method that makes it possible to handle session deletion in a large table.

Apache Tuning

GZIP Compression

For a productive environment, you should always enable GZIP compression using mod_deflate. This reduces a amount of data transferred by 70% for text based files like html, js, … To do so,

sudo a2enmod deflate

You should now notice a shrinked transmitted file size in Firefox Developer Tools. You can configure which files are compressed in

/etc/apache2/mods-enabled/deflate.conf

Note that the “application/json” type is not usually compressed by default, but turning on GZIP compression for this can significantly approve performance, especially if you use hierarchies and collections.

Minification

One thing you should also do is minifying JS and CSS files. This removes whitespaces and line breaks. It can by done by your IDE. Netbeans for example has a plugin Js CSS Minify Compress. The minified files must be configured in theme.config.php and in some cases in the theme files. Just search for “.js” and replace with “.min.js”. If you use LESS to create your CSS files, there is an option –clean-css which optimizes your CSS, too.

Manual minification should not be necessary if you turn on VuFind's built-in asset pipeline (see above).

Solr Tuning

Cache Settings

VuFind's facet feature relies upon Solr's facet support which requires a properly sized Solr “fieldCache”. The solrconfig.xml file configures Solr's fieldCache: see the filtercache section of the Solr documentation for more details. The size of your filterCache should be larger than the number of unique facet values for optimal performance.

Sorting large lists can also use a lot of cache space. You may consider disabling unwanted sort options in order to save memory. Refer to the Solr caching documentation for more details.

Starting with VuFind 2.3, VuFind's default cache sizes have been substantially reduced. The very high settings in earlier versions were known to cause stability and performance problems. It is strongly recommended that you either upgrade or adjust your own Solr configurations. The changes can be viewed in this GitHub commit.

Warming

Solr can be configured to perform one or more searches upon startup in order to pre-populate caches and thus allow faster responses. These “warming” searches are configured in solrconfig.xml – look for the <listener> entries for the “firstSearcher” and “newSearcher” events. VuFind's default configuration has some warming searches configured, but you should really customize these for local needs to get better performance. Ideally, the warming searches should exercise all of your facet (and possibly also sort) options. Your base search query can be *:*, or you might want to run a few commonly-requested, known-slow queries. For example, here is Villanova's firstSearcher configuration (and the newSearcher is essentially identical):

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst>
          <str name="q">*:*</str>
          <str name="start">0</str>
          <str name="rows">10</str>
          <str name="facet">true</str>
          <str name="facet.mincount">1</str>
          <str name="facet.field">collection</str>
          <str name="facet.field">format</str>
          <str name="facet.field">publishDate</str>
          <str name="facet.field">callnumber-first</str>
          <str name="facet.field">topic_facet</str>
          <str name="facet.field">authorStr</str>
          <str name="facet.field">language</str>
          <str name="facet.field">genre_facet</str>
          <str name="facet.field">era_facet</str>
          <str name="facet.field">geographic_facet</str>
        </lst>
      </arr>
    </listener>

Exploiting the System Cache

On Linux, you can sometimes improve performance by loading your entire index into the system cache. This will speed up searches as long as your system has sufficient resources to cache the index.

cat /usr/local/vufind/solr/biblio/index/* > /dev/null

Thanks to Tuan Nguyen for the tip.

Index Optimization

Whenever you update the Solr index, it is a good idea to optimize it for better search performance. If you are using spell checking in VuFind 1.0RC2 or later, the optimize operation is also necessary to update your spellchecker index.

Note: Optimizing the index can take a lot of server resources, so you should schedule your index updates and optimizations for non-peak times when possible.

Starting with VuFind 1.0RC2, a simple command-line PHP script is available to optimize the Solr index. Simply switch to the util folder under your VuFind installation and run:

  php optimize.php

By default the script will optimize the biblio Solr index. For those looking to optimize another index, include the name of the index as an argument:

  php optimize.php authority

The optimize action is triggered by posting a simple XML command to the Solr server, so there are many ways to achieve this manually if you do not wish to use the PHP script.

For example, Linux users can take advantage of the SolrOperationsTools (found in the src/scripts folder of the public Solr distribution, but not included with VuFind by default).

Offloading MARC Records

See the Remote MARC Records page for details on reducing index size by storing MARC records externally to your Solr index. (Note: requires VuFind 2.5 or newer).

Limits

Certain operating system limits can impact Solr performance, and starting with release 7.3.1, the software will warn you if you are below the recommended thresholds. See the Linux startup instructions for notes on how to correct this problem.

Restricting Robots

Search engine crawlers can sometimes put a heavy load on your server, causing performance issues for actual users. The behavior of search engine robots can be controlled with the help of a robots.txt file. See the robots.txt page for more details.

Testing Performance

See the Testing Performance page for notes on measuring the performance of your VuFind instance.

VuFind Documentation

Table of Contents