Table of Contents
This page contains notes on improving the performance of VuFind in various ways. See the table of contents for a quick overview of topics.
After the initial setup and importing of records, you may notice a significant degradation of performance. This means its time to do some JVM tuning. However, if you look around the Internet, you'll quickly realize performance tuning for Java applications is a bit of a black art. However, we'll provide some basic settings to help improve your performance, as well as resources for more in-depth tuning settings.
As a short side-note, these recommendations are for a “server-class” machine with at least 2 GB RAM and 2 processors. There are limits on the architecture that may require editing the sizes provided in the settings.
If you are using the included distribution of Jetty, the tuning options are set in the JAVA_OPTIONS environmental variable. You can either set this for the user running the instance of Jetty (recommended) or in /etc/profile.
It's recommended that you run the most recent Sun JDK as there is an important switch (-server) that you use that is not in the JRE.
The heap space in Java is the amount of memory allocated to Java for objects. One of the more common issues with a full index of records is an java.lang.OutOfMemoryError exception due to the heap space size. By default, the lower limit is set to 1/64th of the server's physical memory and the max is set at 1/4 the physical memory with a max of 1GB.
It's a good idea to set both of these to the same value as follows (assumes 4GB of RAM):
Note: For very large indices (> 4 million documents) you will need more than 4GB of RAM, and a 64 bit OS to utilize it. You must also tell Java to use the 64 bit “memory model” the setting would then look like:
-d64 -Xms8192m -Xmx8192m
Failure to use the “-d64” flag will result in an error message: “Invalid initial heap size: -Xms8192M The specified size exceeds the maximum representable size. Could not create the Java virtual machine.”
Garbage collection is how aggressive the JVM is with clearing out unneeded objects. For the purposes of Vufind, parallel garbage collection should work nicely, but read Sun's Tuning Garbage Collection for more in-depth information.
Young generation is a type of garbage collection that has three object spaces, the new object space (Eden) and two survivor spaces. Newer objects are created in Eden, while longer lived objects are moved to the old generation survivor (tenured) spaces. Young generation uses fast copying garbage collection for more frequent clearing of Eden, and more spaced out full garbage collections in the tenured space (which is slower). There are a few switches that will help with this.
Note: if you use the -XX:+UseParNewGC, don't use -XX:+UseParallelGC.
If you want to learn more about your garbage collection performance, you can add the -Xloggc:[filename] parameter to create a log file containing data on Java's behavior. For example, if you want to store logs in your existing Jetty log folder and use the current date as part of the filename, you could do this:
Once you have a log file, you can use the gcviewer or PMAT tool to get a visual representation of its contents. This is very helpful in measuring whether your configuration changes have had a positive impact!
For a good base setting for JAVA_OPTIONS, this is a good set to begin with:
JAVA_OPTIONS="-server -Xmx3800m -Xms3800m -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5"
For more information on Java Tuning, see Java Tuning White Paper and experiment with the settings.
NOTE: The -server switch needs to be the first argument in the string in order for the JDK to pick up the setting.
Be sure to check that the settings are getting applied by running
If all went well, you should see output along the lines of
Checking arguments to VuFind: VUFIND_HOME = /usr/local/vufind SOLR_HOME = /usr/local/vufind/solr SOLR_DATA_DIR = /usr/local/vufind/solr/data JETTY_HOME = /usr/local/vufind/solr/jetty JETTY_LOG = /usr/local/vufind/solr/jetty/logs JETTY_CONF = JETTY_RUN = /tmp JETTY_PID = /tmp/vufind.pid JETTY_CONSOLE = /dev/tty JETTY_PORT = CONFIGS = /usr/local/vufind/solr/jetty/etc/jetty.xml JAVA_OPTIONS = -server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -Dsolr.solr.home=/usr/local/vufind/solr -Dsolr.data.dir=/usr/local/vufind/solr/data -Djetty.logs=/usr/local/vufind/solr/jetty/logs -Djetty.home=/usr/local/vufind/solr/jetty JAVA = /usr/lib/jvm/java-6-sun/bin/java CLASSPATH = RUN_CMD = /usr/lib/jvm/java-6-sun/bin/java -server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -Dsolr.solr.home=/usr/local/vufind/solr -Dsolr.data.dir=/usr/local/vufind/solr/data -Djetty.logs=/usr/local/vufind/solr/jetty/logs -Djetty.home=/usr/local/vufind/solr/jetty -jar /usr/local/vufind/solr/jetty/start.jar /usr/local/vufind/solr/jetty/etc/jetty.xml
User Contributed Settings
William and Mary: Dell Precision 2950, 2×2 dual core Xeon 3.2 GHz processors, 4GB RAM
JAVA_OPTIONS="-server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5"
Here's a list of useful websites on Java performance tuning:
- Tuning Garbage Collection Outline - older information on the 1.4.2 garbage collection, but still applicable for 1.5+.
- Use Lucene’s MMapDirectory on 64bit platforms, please! - discussion of the MMapDirectory setting, available in Solr 3.x, which may help performance on 64-bit systems
Installing a PHP cache like APC can significantly improve performance by reducing the amount of time spent by PHP parsing and compiling code.
There have been some reports of VuFind errors when running APC. Excerpt from the vufind-tech mailing list (courtesy of Graham Seaman):
Inserting the line in index.php
immediately before requiring the session handler, seems to fix the problem with APC. Since this is the result of trawling the web and not of any deep understanding on my part, I can't guarantee this won't have sideffects (I'll report back if I find any), or that it will always work.
Note: As of VuFind 1.4, the default VuFind code will register the shutdown function, so this modification should no longer be necessary.
VuFind's facet feature relies upon Solr's facet support which requires a properly sized Solr “fieldCache”. The solrconfig.xml file configures Solr's fieldCache: see the filtercache section of the Solr documentation for more details. The size of your filterCache should be larger than the number of unique facet values for optimal performance.
Sorting large lists can also use a lot of cache space. You may consider disabling unwanted sort options in order to save memory. Refer to the Solr caching documentation for more details.
Starting with VuFind 2.3, VuFind's default cache sizes have been substantially reduced. The very high settings in earlier versions were known to cause stability and performance problems. It is strongly recommended that you either upgrade or adjust your own Solr configurations. The changes can be viewed in this GitHub commit.
Solr can be configured to perform one or more searches upon startup in order to pre-populate caches and thus allow faster responses. These “warming” searches are configured in solrconfig.xml – look for the <listener> entries for the “firstSearcher” and “newSearcher” events. VuFind's default configuration has some warming searches configured, but you should really customize these for local needs to get better performance. Ideally, the warming searches should exercise all of your facet (and possibly also sort) options. Your base search query can be *:*, or you might want to run a few commonly-requested, known-slow queries. For example, here is Villanova's firstSearcher configuration (and the newSearcher is essentially identical):
<listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst> <str name="q">*:*</str> <str name="start">0</str> <str name="rows">10</str> <str name="facet">true</str> <str name="facet.mincount">1</str> <str name="facet.field">collection</str> <str name="facet.field">format</str> <str name="facet.field">publishDate</str> <str name="facet.field">callnumber-first</str> <str name="facet.field">topic_facet</str> <str name="facet.field">authorStr</str> <str name="facet.field">language</str> <str name="facet.field">genre_facet</str> <str name="facet.field">era_facet</str> <str name="facet.field">geographic_facet</str> </lst> </arr> </listener>
Exploiting the System Cache
On Linux, you can sometimes improve performance by loading your entire index into the system cache. This will speed up searches as long as your system has sufficient resources to cache the index.
cat /usr/local/vufind/solr/biblio/index/* > /dev/null
Thanks to Tuan Nguyen for the tip.
Whenever you update the Solr index, it is a good idea to optimize it for better search performance. If you are using spell checking in VuFind 1.0RC2 or later, the optimize operation is also necessary to update your spellchecker index.
Note: Optimizing the index can take a lot of server resources, so you should schedule your index updates and optimizations for non-peak times when possible.
Starting with VuFind 1.0RC2, a simple command-line PHP script is available to optimize the Solr index. Simply switch to the util folder under your VuFind installation and run:
By default the script will optimize the biblio Solr index. For those looking to optimize another index, include the name of the index as an argument:
php optimize.php authority
The optimize action is triggered by posting a simple XML command to the Solr server, so there are many ways to achieve this manually if you do not wish to use the PHP script.
Offloading MARC Records
See the Remote MARC Records page for details on reducing index size by storing MARC records externally to your Solr index. (Note: requires VuFind 2.5 or newer).
- Solr Replication - Basic information about making Solr highly available with replication in the context of VuFind.
- Solr Wiki - Performance Factors - Pros and cons of various Solr configuration options; also linked to other helpful wiki pages.
- solrmarc-tech indexing time thread - A discussion which goes into detail on several significant Solr settings.
Search engine crawlers can sometimes put a heavy load on your server, causing performance issues for actual users. The behavior of search engine robots can be controlled with the help of a robots.txt file. See the robots.txt page for more details.
See the Testing Performance page for notes on measuring the performance of your VuFind instance.