About Features Downloads Getting Started Documentation Events Support GitHub

Site Tools


indexing:websites

Indexing a Website

Starting with release 2.1, VuFind can be used to create a website index separate from your main search index. Results from this index can then be used on their own or merged with catalog results using the combined search tools.

Getting Started

  1. Make sure that you have a full text extraction tool installed and configured.
  2. Enable the website core by editing solr/solr.xml and uncommenting the appropriate line.
  3. Copy config/vufind/webcrawl.ini into the config/vufind subdirectory of your local settings directory and edit the file to specify where your website's XML sitemap lives.
  4. Run the import/webcrawl.php tool to load your website's data into the index (this may take a long time).
  5. When crawling is done, go to http://vufind_server/vufind/Web/Results – you can enter a search in the box here.

Several things can be modified (with the help of your local settings directory) to adjust web search behavior and appearance.

  • You can customize the way web pages are indexed by creating a custom version of import/xsl/sitemap.xsl and/or import/sitemap.properties.
  • You can customize search behavior and options through config/vufind/website.ini and config/vufind/websearchspecs.yaml.
  • You can customize display behavior through the VuFind\RecordDriver\SolrWeb record driver and corresponding templates.

Notes

  • The current webcrawl.php tool works very much by brute force; we may want to build a more intelligent, flexible crawler at some point in the future.
indexing/websites.txt · Last modified: 2015/12/14 15:16 by demiankatz