About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
indexing:deduplication

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Last revisionBoth sides next revision
indexing:deduplication [2023/03/20 16:30] – [Solr Setup] demiankatzindexing:deduplication [2023/03/20 16:31] – [RecordManager] demiankatz
Line 17: Line 17:
 Using RecordManager does offer some specific advantages: Using RecordManager does offer some specific advantages:
  
-- RecordManager can find the best record among the deduplicated records to use as the base record when creating a merged record. +  - RecordManager can find the best record among the deduplicated records to use as the base record when creating a merged record. 
-- There's control over how single-valued and multi-valued fields are merged into the merged record, as well as the possibility of handling first_indexed and last_indexed data. +  - There's control over how single-valued and multi-valued fields are merged into the merged record, as well as the possibility of handling first_indexed and last_indexed data. 
-- The records belonging to a dedup group can also be enriched with data from the merged record, so enrichment can be achieved in both directions. +  - The records belonging to a dedup group can also be enriched with data from the merged record, so enrichment can be achieved in both directions. 
-- The mechanism can ensure e.g. that two records from the same data source never get deduplicated. This is a built-in assumption in RecordManager's default deduplication algorithm that records in a single source should already be distinct ones.+  - The mechanism can ensure e.g. that two records from the same data source never get deduplicated. This is a built-in assumption in RecordManager's default deduplication algorithm that records in a single source should already be distinct ones.
 ==== Required Solr Fields in Merged Records ==== ==== Required Solr Fields in Merged Records ====
  
indexing/deduplication.txt · Last modified: 2023/03/20 16:33 by demiankatz