[VUFIND-629] Improvement to results retrieved by Wikipedia Created: 13/Jul/12  Updated: 06/Aug/13  Resolved: 16/Nov/12

Status: Resolved
Project: VuFind®
Components: None
Affects versions: None
Fix versions: 2.0RC1

Type: Improvement Priority: Minor
Reporter: Ronan McHugh Assignee: Demian Katz
Resolution: Fixed Votes: 2
Labels: wiki
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File wikiPatch - 13-07-12.patch    

 Description   
This patch improves the default Wikipedia behaviour in the Author module by using a third party service (VIAF) to check for a direct link to an author's Wiki page. To do this, it constructs a search of the authority index for a matching author. It then retrieves the raw lccn of this author and feeds it to VIAF. VIAF returns a JSON blob containing the name of the author's Wiki page if available. This is then fed to the default Wiki request. It was necessary to add a raw lccn field to the auth import properties file and a method to IndexRecord.php to grab this. A config setting specifies whether snippets from articles will only be shown when there is an authorised link available or whether the default "best guess" behaviour is allowed when no link is available.

Some issues:

The raw lccn method should perhaps not be in Index Record, but since there wasn't a specific Authority Record class I decided this was the most appropriate.

In one case encountered during testing (http://viaf.org/viaf/sourceID/LC|n%20%2079056824/justlinks.json) Viaf returned a JSON blob which PHP was unable to decode, meaning that there was no wikipedia article when there otherwise should have been. If anyone can figure out what is going on here, I would much appreciate it. My best guess is some sort of illegal character in the response for this author, but this isn't particularly plausible. See http://stackoverflow.com/questions/689185/json-decode-returns-null-php for some discussion.

I was unable to get the normal Proxy class working properly for the Viaf request and so was forced to use a rather hacky work-around. I think this was something to do with the encoding of the request, but all the different combinations I tried either returned nothing or a 400 error. If any smarter heads than mine can fix this, it would be much appreciated.

 Comments   
Comment by Ronan McHugh [ 16/Jul/12 ]
Eoghan solved the json problem - it's to do with the presentation of ids by a specific provider which is breaking the json_decode: "IT\ICCU\CFIV\006780". I've written to Viaf about this, if they don't fix it I'll insert something into the code to strip out the offending characters.
UPDATE - VIAF have said that they'll fix it, so I'll leave the code as is.
Comment by Demian Katz [ 16/Nov/12 ]
Since this is a fairly specialized feature, I decided not to spend time cleaning up the 1.x patch; instead, I reimplemented this in 2.0 in a slightly different form:

https://github.com/vufind-org/vufind/commit/8aa41842ee0a8a63231c22f0fcbb53e7c9c8dcf7

This is now part of the AuthorInfo recommendation module (a 2.0-specific feature), and VIAF lookups can be turned on or off via the recommendation configuration. I also pull raw LCCNs directly from the raw MARC to avoid the need to index extra fields, and I look in 700|0 as well as 010|a so that the OCLC FAST data can be used to achieve this effect in the absence of locally-indexed LCNAF records.

Of course, if anyone needs this in 1.x, they're welcome to apply Ronan's original patch. Upgrading from a 1.x installation with Ronan's patch to a 2.0 installation with my official committed feature is simply a matter of removing the unused directLinksOnly setting from the [Content] section of config.ini and turning on the [use_viaf] parameter of the AuthorInfo recommendation setting in searches.ini.
Generated at Tue Apr 23 13:14:05 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100251-rev:1799943389865e673b0a2c8607653e705b66f09c.