About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
indexing:full_text_tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
indexing:full_text_tools [2015/12/14 17:14] – ↷ Page moved and renamed from aperture to indexing:full_text_tools demiankatzindexing:full_text_tools [2024/03/13 11:58] (current) demiankatz
Line 1: Line 1:
 ====== Full Text Extraction Tools ====== ====== Full Text Extraction Tools ======
  
-VuFind's import tools include support for using external software to extract full text from external documents (PDF, Word, etc.).  In order to take advantage of this, you need to install an appropriate tool (see options below) and point VuFind to the software by editing the [[https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/conf/fulltext.ini|fulltext.ini]] file (in web/conf for VuFind 1.x or config/vufind within your [[vufind2:local settings directory]] for VuFind 2.x).+VuFind®'s import tools include support for using external software to extract full text from external documents (PDF, Word, etc.).  In order to take advantage of this, you need to install an appropriate tool (see options below) and point VuFind® to the software by editing the [[https://github.com/vufind-org/vufind/blob/dev/config/vufind/fulltext.ini|fulltext.ini]] file within your [[configuration:local_settings_directory|local settings directory]].
  
-For more details of using full text in VuFind, see the import instructions for [[indexing:marc#indexing_full_text|MARC]] and [[indexing:xml#full_text|XML]] documents.+For more details of using full text in VuFind®, see the import instructions for [[indexing:marc#indexing_full_text|MARC]] and [[indexing:xml#full_text|XML]] documents.
  
-===== Aperture =====+===== Tika =====
  
-Aperture was the first tool that VuFind supported for full-text extraction, and it is the only option for use with versions 1.3 and earlier.  Unfortunately, Aperture is no longer in active development, so users with VuFind 1.4 or later are encouraged to use Tika instead (see below).+Tika is the recommended full-text extraction tool for use with VuFind®.
  
-==== Downloading Aperture ====+==== Downloading Tika ====
  
-  * [[http://aperture.sourceforge.net/|Aperture official site]]+  * [[http://tika.apache.org/|Tika official site]]
  
-==== Troubleshooting Aperture ====+====Aperture =====
  
-Under Linux, there is a known bug with version 1.5.0 which prevents Aperture from running on the command line correctly.  See [[http://sourceforge.net/tracker/index.php?func=detail&aid=3094429&group_id=150969&atid=779500|this page]] for details on the fix (it just involves minor edits to the lcp.sh file).+:!: Aperture is supported as an alternative to TikaHowever, Aperture is no longer in active development, so users are strongly encouraged to use Tika instead (see above).
  
-===== Tika =====+==== Downloading Aperture ====
  
-Tika is supported by VuFind 1.4 and later, and is the recommended full-text extraction tool when supported.+  * [[http://aperture.sourceforge.net/|Aperture official site]]
  
-==== Downloading Tika ====+===== Related Video =====
  
-  * [[http://tika.apache.org/|Tika official site]]+The [[videos:sitemaps_and_web_indexing|Sitemaps and Web Indexing]] video includes a demonstration of setting up a full text extraction tool.
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : 
 ---- ----
  
indexing/full_text_tools.1450113288.txt.gz · Last modified: 2015/12/14 17:14 by demiankatz