[VUFIND-1277] Deduplicate URLs in record drivers Created: 06/Mar/18 Updated: 30/Sep/22 |
|
Status: | Resolved |
Project: | VuFind® |
Components: | Record |
Affects versions: | None |
Fix versions: | 9.0 |
Type: | Improvement | Priority: | Minor |
Reporter: | Demian Katz | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original estimate: | Not Specified |
Description |
Currently, the SolrDefault / SolrMarc record drivers make no effort to deduplicate URLs; they simply return whatever they encounter. There might be benefits to making this behavior smarter, but it is also somewhat complex due to the fact that one URL might be accompanied by multiple labels. A pull request was opened at https://github.com/vufind-org/vufind/pull/434/files to approach this problem, but it needed further work and was eventually closed due to inactivity. That might serve as a starting point if somebody wishes to revisit this issue. |
Comments |
Comment by Demian Katz [ 28/Sep/22 ] |
See work in progress: [https://github.com/vufind-org/vufind/pull/2561|https://github.com/vufind-org/vufind/pull/2561|smart-link] |
Comment by Demian Katz [ 30/Sep/22 ] |
The aforementioned work in progress has been finalized and merged. It takes a simple approach of removing only full duplicates (where URL + description are identical) and does not address more complex cases (e.g. same URL, different descriptions). However, it does create a specific overrideable method for deduplication, so there’s a possibility for growth and local customization in the future. I’m marking this as resolved for now, but if more complex algorithms are needed in the future, new tickets are welcomed. |