About Features Downloads Getting Started Documentation Events Support

Site Tools


administration:performance

Performance

This page contains notes on improving the performance of VuFind in various ways. See the table of contents for a quick overview of topics.

Java Tuning

After the initial setup and importing of records, you may notice a significant degradation of performance. This means its time to do some JVM tuning. However, if you look around the Internet, you'll quickly realize performance tuning for Java applications is a bit of a black art. However, we'll provide some basic settings to help improve your performance, as well as resources for more in-depth tuning settings.

As a short side-note, these recommendations are for a “server-class” machine with at least 2 GB RAM and 2 processors. There are limits on the architecture that may require editing the sizes provided in the settings.

JAVA_OPTIONS

:!: This section refers to VuFind 2.x and earlier, which include Solr 4 or earlier. Starting with VuFind 3.0, Solr 5 is bundled with VuFind, and it uses a different startup routine. To adjust the heap memory size in these versions, simply change the SOLR_HEAP environment variable.

If you are using the included distribution of Jetty, the tuning options are set in the JAVA_OPTIONS environmental variable. You can either set this for the user running the instance of Jetty (recommended) or in /etc/profile.

It's recommended that you run the most recent Sun JDK as there is an important switch (-server) that you use that is not in the JRE.

Heap Space

The heap space in Java is the amount of memory allocated to Java for objects. One of the more common issues with a full index of records is an java.lang.OutOfMemoryError exception due to the heap space size. By default, the lower limit is set to 1/64th of the server's physical memory and the max is set at 1/4 the physical memory with a max of 1GB.

It's a good idea to set both of these to the same value as follows (assumes 4GB of RAM):

-Xms3800m -Xmx3800m

Note: For very large indices (> 4 million documents) you will need more than 4GB of RAM, and a 64 bit OS to utilize it. You must also tell Java to use the 64 bit “memory model” the setting would then look like:

-d64 -Xms8192m -Xmx8192m

Failure to use the “-d64” flag will result in an error message: “Invalid initial heap size: -Xms8192M The specified size exceeds the maximum representable size. Could not create the Java virtual machine.”

Garbage Collection

Garbage collection is how aggressive the JVM is with clearing out unneeded objects. For the purposes of Vufind, parallel garbage collection should work nicely, but read Sun's Tuning Garbage Collection for more in-depth information.

-XX:+UseParallelGC

Young Generation

Young generation is a type of garbage collection that has three object spaces, the new object space (Eden) and two survivor spaces. Newer objects are created in Eden, while longer lived objects are moved to the old generation survivor (tenured) spaces. Young generation uses fast copying garbage collection for more frequent clearing of Eden, and more spaced out full garbage collections in the tenured space (which is slower). There are a few switches that will help with this.

-XX:NewRatio=5 -XX:+UseParNewGC

Note: if you use the -XX:+UseParNewGC, don't use -XX:+UseParallelGC.

Collecting Data

If you want to learn more about your garbage collection performance, you can add the -Xloggc:[filename] parameter to create a log file containing data on Java's behavior. For example, if you want to store logs in your existing Jetty log folder and use the current date as part of the filename, you could do this:

-Xloggc:$VUFIND_HOME/solr/jetty/logs/gc-`/bin/date +%F-%H-%M`.log

Once you have a log file, you can use the gcviewer or PMAT tool to get a visual representation of its contents. This is very helpful in measuring whether your configuration changes have had a positive impact!

Setting JAVA_OPTIONS

For a good base setting for JAVA_OPTIONS, this is a good set to begin with:

JAVA_OPTIONS="-server -Xmx3800m -Xms3800m -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5"

For more information on Java Tuning, see Java Tuning White Paper and experiment with the settings.

NOTE: The -server switch needs to be the first argument in the string in order for the JDK to pick up the setting.

Reminders

Be sure to check that the settings are getting applied by running

./vufind.sh check

If all went well, you should see output along the lines of

Checking arguments to VuFind: 
VUFIND_HOME    =  /usr/local/vufind
SOLR_HOME      =  /usr/local/vufind/solr
SOLR_DATA_DIR  =  /usr/local/vufind/solr/data
JETTY_HOME     =  /usr/local/vufind/solr/jetty
JETTY_LOG      =  /usr/local/vufind/solr/jetty/logs
JETTY_CONF     =  
JETTY_RUN      =  /tmp
JETTY_PID      =  /tmp/vufind.pid
JETTY_CONSOLE  =  /dev/tty
JETTY_PORT     =  
CONFIGS        =  /usr/local/vufind/solr/jetty/etc/jetty.xml
JAVA_OPTIONS   =  -server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -Dsolr.solr.home=/usr/local/vufind/solr -Dsolr.data.dir=/usr/local/vufind/solr/data -Djetty.logs=/usr/local/vufind/solr/jetty/logs -Djetty.home=/usr/local/vufind/solr/jetty 
JAVA           =  /usr/lib/jvm/java-6-sun/bin/java
CLASSPATH      =  
RUN_CMD        =  /usr/lib/jvm/java-6-sun/bin/java -server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -Dsolr.solr.home=/usr/local/vufind/solr -Dsolr.data.dir=/usr/local/vufind/solr/data -Djetty.logs=/usr/local/vufind/solr/jetty/logs -Djetty.home=/usr/local/vufind/solr/jetty  -jar /usr/local/vufind/solr/jetty/start.jar  /usr/local/vufind/solr/jetty/etc/jetty.xml

User Contributed Settings

William and Mary: Dell Precision 2950, 2×2 dual core Xeon 3.2 GHz processors, 4GB RAM

JAVA_OPTIONS="-server -Xmx3800 -Xms3800 -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5"

Resources

Here's a list of useful websites on Java performance tuning:

PHP Tuning

On PHP 5.4, installing a PHP cache like APC can significantly improve performance by reducing the amount of time spent by PHP parsing and compiling code. PHP 5.6 comes with build-In OpCache, which is slightly faster than APC. When upgrading, remember to disable the Apache APC module. OpCache should be configured to 64 MB.

If you created a custom module, remember to create a autoloader_classmap.php by calling

../../vendor/zendframework/zendframework/bin/classmap_generator.php -w

from module directory.

Troubleshooting

There have been some reports of VuFind errors when running APC. Excerpt from the vufind-tech mailing list (courtesy of Graham Seaman):

Inserting the line in index.php

register_shutdown_function('session_write_close');

immediately before requiring the session handler, seems to fix the problem with APC. Since this is the result of trawling the web and not of any deep understanding on my part, I can't guarantee this won't have sideffects (I'll report back if I find any), or that it will always work.

Note: As of VuFind 1.4, the default VuFind code will register the shutdown function, so this modification should no longer be necessary.

Apache Tuning

For a productive environment, you should always enable GZIP compression. This reduces a amount of data transferred by 70% for text based files like html, js, … To do so,

sudo a2enmod deflate

You should now notice a shrinked transmitted file size in Firefox Developer Tools. You can configure which files are compressed in

/etc/apache2/mods-enabled/deflate.conf

One thing you should also do is minifying JS and CSS files. This removes whitespaces and line breaks. It can by done by your IDE. Netbeans for example has a plugin Js CSS Minify Compress. The minified files must be configured in theme.config.php and in some cases in the theme files. Just search for “.js” and replace with “.min.js”. If you use LESS to create your CSS files, there is an option –clean-css which optimizes your CSS, too.

Solr Tuning

Cache Settings

VuFind's facet feature relies upon Solr's facet support which requires a properly sized Solr “fieldCache”. The solrconfig.xml file configures Solr's fieldCache: see the filtercache section of the Solr documentation for more details. The size of your filterCache should be larger than the number of unique facet values for optimal performance.

Sorting large lists can also use a lot of cache space. You may consider disabling unwanted sort options in order to save memory. Refer to the Solr caching documentation for more details.

:!: Starting with VuFind 2.3, VuFind's default cache sizes have been substantially reduced. The very high settings in earlier versions were known to cause stability and performance problems. It is strongly recommended that you either upgrade or adjust your own Solr configurations. The changes can be viewed in this GitHub commit.

Warming

Solr can be configured to perform one or more searches upon startup in order to pre-populate caches and thus allow faster responses. These “warming” searches are configured in solrconfig.xml – look for the <listener> entries for the “firstSearcher” and “newSearcher” events. VuFind's default configuration has some warming searches configured, but you should really customize these for local needs to get better performance. Ideally, the warming searches should exercise all of your facet (and possibly also sort) options. Your base search query can be *:*, or you might want to run a few commonly-requested, known-slow queries. For example, here is Villanova's firstSearcher configuration (and the newSearcher is essentially identical):

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst>
          <str name="q">*:*</str>
          <str name="start">0</str>
          <str name="rows">10</str>
          <str name="facet">true</str>
          <str name="facet.mincount">1</str>
          <str name="facet.field">collection</str>
          <str name="facet.field">format</str>
          <str name="facet.field">publishDate</str>
          <str name="facet.field">callnumber-first</str>
          <str name="facet.field">topic_facet</str>
          <str name="facet.field">authorStr</str>
          <str name="facet.field">language</str>
          <str name="facet.field">genre_facet</str>
          <str name="facet.field">era_facet</str>
          <str name="facet.field">geographic_facet</str>
        </lst>
      </arr>
    </listener>

Exploiting the System Cache

On Linux, you can sometimes improve performance by loading your entire index into the system cache. This will speed up searches as long as your system has sufficient resources to cache the index.

cat /usr/local/vufind/solr/biblio/index/* > /dev/null

Thanks to Tuan Nguyen for the tip.

Index Optimization

Whenever you update the Solr index, it is a good idea to optimize it for better search performance. If you are using spell checking in VuFind 1.0RC2 or later, the optimize operation is also necessary to update your spellchecker index.

Note: Optimizing the index can take a lot of server resources, so you should schedule your index updates and optimizations for non-peak times when possible.

Starting with VuFind 1.0RC2, a simple command-line PHP script is available to optimize the Solr index. Simply switch to the util folder under your VuFind installation and run:

  php optimize.php

By default the script will optimize the biblio Solr index. For those looking to optimize another index, include the name of the index as an argument:

  php optimize.php authority

The optimize action is triggered by posting a simple XML command to the Solr server, so there are many ways to achieve this manually if you do not wish to use the PHP script.

For example, Linux users can take advantage of the SolrOperationsTools (found in the src/scripts folder of the public Solr distribution, but not included with VuFind by default).

Offloading MARC Records

See the Remote MARC Records page for details on reducing index size by storing MARC records externally to your Solr index. (Note: requires VuFind 2.5 or newer).

Further Reading

Restricting Robots

Search engine crawlers can sometimes put a heavy load on your server, causing performance issues for actual users. The behavior of search engine robots can be controlled with the help of a robots.txt file. See the robots.txt page for more details.

Testing Performance

See the Testing Performance page for notes on measuring the performance of your VuFind instance.

administration/performance.txt · Last modified: 2016/04/25 11:11 by demiankatz