Table of Contents
Search Engine Optimization
In some cases, you may wish to make sure that the contents of your VuFind system are visible beyond your local site. This page provides tips and tools for increasing and controlling search engine visibility.
This feature was added after the release of VuFind 1.0.1.
Several search engines look for sitemap files which list all of the pages of your site and ensure that no content is missed during indexing. For example, sitemaps are useful in combination with Google Search Console.
VuFind comes with a tool for generating sitemaps. To use it, follow these steps:
1. Edit the sitemap.ini file under your VuFind local directory to specify the location of the generated sitemaps, what to include and some other settings. All options are explained by comments within the file.
2. Run the sitemap generator:
This is an example command. See also Command Line Utilities for more information on local modules etc.
cd $VUFIND_HOME VUFIND_LOCAL_DIR=`pwd`/local php public/index.php util/sitemap -v [additional parameters]
This may take some time. When it is complete, your sitemap file(s) have been generated.
You may wish to automate this so that sitemaps are built on a regular basis; see the Automation page for tips on automating VuFind-related tasks.
Since the sitemap generation takes some time, it is recommended to generate the sitemap files in a temporary directory and move them in place only when completed. Otherwise the robots may try to crawl incomplete sitemap information while the generation is still in progress. Make sure to clean up any old sitemap files from the temporary directory before generating new ones, and from the public directory when copying them over.
3. Include a reference to the sitemap in robots.txt:
This is an incomplete excerpt to illustrate the Sitemap option. The Sitemap URL must be fully-qualified.
User-agent: * Disallow: /AJAX Sitemap: http://www.example.com/sitemap.xml
Exposing Static / Multi-lingual Content
If you have important static content pages (landing pages, texts about services/institutions, etc.), you may wish to include these in a base sitemap as a complement to the record list build by VuFind's sitemap generator. If your VuFind instance serves content in multiple languages, you may also wish to take advantage of the ?lng= GET parameter to provide multiple language-specific versions of the link in the sitemap. See RelBib's baseSitemap.xml for an example.
Understand Crawling Budgets
A VuFind site can easily have hundreds of thousands, or even millions, of pages in its sitemaps – it's all dependent on the number of records in your index. Search engines will take a significant amount of time to crawl all of these pages, and even more time to detect changes. Be aware that publishing a sitemap will not instantly lead to full visibility of all of your content.
Using Search Engine Tools
Creating a sitemap is only half the battle; the rest is informing search engines about it. Tools like Google's Search Console are important for publishing your sitemaps and managing how your site is crawled.
Early versions of VuFind made use of the standard Solr TermsComponent for extracting identifiers. This is still available as a configurable option, since it is very fast, but it does not account for certain configuration options such as hidden filters and may not properly represent your site. Starting with VuFind 5.1, the default behavior was changed from TermsComponent to a slower but more universally compatible search-based approach. The configuration can be changed via the retrievalMode setting in sitemap.ini.
Search engine optimization is challenging to maintain, because search engines are constantly changing their rules for crawling and ranking. Google, for example, can update its procedures hundreds of times every year (see History of Google Algorithm Updates for details). News sites like Search Engine Land can provide some help in learning about recent changes and trends.
robots.txt - Recommendations for restricting search engine access to avoid confusing results and/or unnecessary server load.