About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
indexing:deduplication

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
indexing:deduplication [2015/12/14 17:57] – ↷ Page moved from deduplication to indexing:deduplication demiankatzindexing:deduplication [2023/03/20 16:31] – [RecordManager] demiankatz
Line 1: Line 1:
 ====== Support for Deduplication ====== ====== Support for Deduplication ======
  
-// The features described on this page are available starting with VuFind 2.3. //+// The features described on this page are available starting with VuFind® 2.3. // 
 + 
 +===== Introduction ===== 
 + 
 +The deduplication feature allows multiple records from multiple sources to be combined together and displayed as a single search result. It requires some extra setup and external tools to fully implement. Users who choose not to use deduplication may instead wish to configure the [[configuration:record_versions|Record Versions]] feature, which offers a different method for associating related records. (It is also possible for the Deduplication and Record Versions features to be used to complement one another).
  
 ===== Solr Setup ===== ===== Solr Setup =====
  
-VuFind includes support for displaying deduplicated records. This requires that records are deduplicated before indexing into Solr, and that a so called merged record is created for each dedup group (group of original duplicate records) alongside the original records. [[https://github.com/KDK-Alli/RecordManager|RecordManager]] can be used for deduplication as it has built-in support for VuFind-compatible deduplication, but VuFind doesn't require RecordManager to be used, just some index fields and the merged record to be present.+VuFind® includes support for displaying deduplicated records. This requires that records are deduplicated before indexing into Solr, and that a so called merged record is created for each dedup group (group of original duplicate records) alongside the original records. 
 + 
 +==== RecordManager ==== 
 + 
 +[[https://github.com/KDK-Alli/RecordManager|RecordManager]] can be used for deduplication as it has built-in support for VuFind®-compatible deduplication, but VuFind® doesn't require RecordManager to be used, just some index fields and the merged record to be present. 
 + 
 +Using RecordManager does offer some specific advantages:
  
 +  - RecordManager can find the best record among the deduplicated records to use as the base record when creating a merged record.
 +  - There's control over how single-valued and multi-valued fields are merged into the merged record, as well as the possibility of handling first_indexed and last_indexed data.
 +  - The records belonging to a dedup group can also be enriched with data from the merged record, so enrichment can be achieved in both directions.
 +  - The mechanism can ensure e.g. that two records from the same data source never get deduplicated. This is a built-in assumption in RecordManager's default deduplication algorithm that records in a single source should already be distinct ones.
 ==== Required Solr Fields in Merged Records ==== ==== Required Solr Fields in Merged Records ====
  
Line 77: Line 91:
 ===== Configuration ===== ===== Configuration =====
  
-The following settings in the Records section of [[searches.ini]] affect deduplication:+The following settings in the Records section of [[configuration:files:searches.ini]] affect deduplication:
  
   * **deduplication** Whether support for deduplicated records is enabled   * **deduplication** Whether support for deduplicated records is enabled
Line 98: Line 112:
 </code> </code>
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : emaijala
 ---- ----
  
indexing/deduplication.txt · Last modified: 2023/03/20 16:33 by demiankatz