[VUFIND-1277] Deduplicate URLs in record drivers Created: 06/Mar/18  Updated: 30/Sep/22

Status: Resolved
Project: VuFind®
Components: Record
Affects versions: None
Fix versions: 9.0

Type: Improvement Priority: Minor
Reporter: Demian Katz Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified


 Description   
Currently, the SolrDefault / SolrMarc record drivers make no effort to deduplicate URLs; they simply return whatever they encounter. There might be benefits to making this behavior smarter, but it is also somewhat complex due to the fact that one URL might be accompanied by multiple labels.

A pull request was opened at https://github.com/vufind-org/vufind/pull/434/files to approach this problem, but it needed further work and was eventually closed due to inactivity. That might serve as a starting point if somebody wishes to revisit this issue.

 Comments   
Comment by Demian Katz [ 28/Sep/22 ]
See work in progress: [https://github.com/vufind-org/vufind/pull/2561|https://github.com/vufind-org/vufind/pull/2561|smart-link]
Comment by Demian Katz [ 30/Sep/22 ]
The aforementioned work in progress has been finalized and merged. It takes a simple approach of removing only full duplicates (where URL + description are identical) and does not address more complex cases (e.g. same URL, different descriptions). However, it does create a specific overrideable method for deduplication, so there’s a possibility for growth and local customization in the future. I’m marking this as resolved for now, but if more complex algorithms are needed in the future, new tickets are welcomed.
Generated at Fri Mar 29 07:05:07 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100248-rev:6a03a54452e975225e04dfda06fdac6fd9e95b00.