[VUFIND-513] IDs containing slashes cause problems Created: 22/Feb/12  Updated: 20/Feb/18  Resolved: 29/Aug/13

Status: Resolved
Project: VuFind®
Components: Record, Search
Affects versions: None
Fix versions: 2.0

Type: Bug Priority: Minor
Reporter: Demian Katz Assignee: Demian Katz
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified


 Description   
Although VUFIND-508 solves problems with most special characters in Solr IDs, slashes still seem to cause issues (in both VuFind 1.x and 2.x). This is the result of standard Apache functionality; it can be worked around by adding "AllowEncodedSlashes on" to the <VirtualHost> section used by VuFind.

 Comments   
Comment by Demian Katz [ 25/Jul/12 ]
This article explains the cause of this problem:

http://www.jampmark.com/web-scripting/5-solutions-to-url-encoded-slashes-problem-in-apache.html

It sounds like Apache's "AllowEncodedSlashes" directive may be the most straightforward solution.
Comment by Demian Katz [ 27/Jul/12 ]
Thanks to Thomas Schwaerzler for actually experimenting with this. He found that adding the "AllowEncodedSlashes NoDecode" directive to his Apache configuration, VuFind would receive the IDs. However, some modifications to web/services/Record/Record.php were necessary to decode the slash:

$_id = str_replace('%2F', '/', $_REQUEST['id']);

if (!($record = $this->db->getRecord($_id))) { // ...

This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash.

Perhaps there is a way to further refine the Apache and mod_rewrite configuration to make this work properly... it would be nice to find a simple solution that doesn't set up any weird special cases.
Comment by Thomas Schwaerzler [ 19/Oct/12 ]
since i had the problem again with another partner i chose this quite comfortable workaround solution: replacing the "/" at marc.properties like this:

id = 001, (pattern_map.id_remove_slash), first
# remove first occurence of "/"
pattern_map.id_remove_slash.pattern_0 = (.+)/(.+)=>$1_$2

certainly for multiple occurences of "/" the expression had to be modified.to also cover trailing or leading slashes maybe something like this woudl be needed:
pattern_map.id_remove_slash.pattern_0 = (.+)?/(.+)?=>$1_$2 # untested

Comment by Demian Katz [ 28/Aug/13 ]
The 27/Jul/12 comment to 1.x code, but I can think of two possible solutions in VuFind 2.x:

Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.

Solution b) Customize your indexing rules to translate slashes into %2F at index-time.

I'd still love to find a more seamless solution!
Comment by David Maus [ 28/Aug/13 ]
Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.

Hm. I don't think this will work. A search event cannot change the function arguments, only the ParamBag.

"This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash."

An ID containing the literal sequence "%2F" would be encoded as "%252F" as a query parameter while a ID containing "/" would be %2F. In theory there shouldn't be a problem.

Request parameter => urldecode() = > --- => urlencode() => Solr => --- => urlencode() => Create Link
Comment by Demian Katz [ 28/Aug/13 ]
True -- I guess you would have to build a custom subclass of the Connector to perform the translation and then call the parent method.
Comment by Demian Katz [ 28/Aug/13 ]
Regarding the %2F problem I mention, if memory serves, the problem is that "AllowEncodedSlashes NoDecode" means that Apache does not decode slashes but it does decode everything else... so an encoded / remains %2F, but %252F also resolves to "%2F." It is possible that I am mistaken about this -- it is a very confusing situation and it has been several months since I actually ran tests myself -- but that is my current recollection.
Comment by Demian Katz [ 28/Aug/13 ]
I've just received a report that VF2 works with "AllowEncodedSlashes on" instead of "AllowEncodedSlashes NoDecode" with no code changes needed. It's possible that the NoDecode solution was only necessary for 1.x because of the more complicated mod_rewrite rules in the old version. I'll have to do some testing of my own, but perhaps we can just put AllowEncodedSlashes on in httpd-vufind.conf (if it's allowed at that level) and close the ticket.

EDIT: I tried to test this out, both putting the directive in httpd-vufind.conf and in my top-level Apache configuration, but I got 404 errors in both cases. I will see if I can find out more information on the successful configuration.
Comment by David Maus [ 28/Aug/13 ]
The AllowEncodedSlashes directive can only be applied in the virtual host or server config context: https://httpd.apache.org/docs/2.2/mod/core.html#AllowEncodedSlashes
Comment by Demian Katz [ 29/Aug/13 ]
Thanks, David and Joe -- I have confirmed that everything works properly when AllowEncodedSlashes on is added within the appropriate <VirtualHost> section of the Apache configuration. This is not something we can resolve from within the VuFind code, but I have added a note to the installation documentation, so I believe that we can now close this issue.
Comment by Demian Katz [ 20/Feb/18 ]
See some related conversation at https://github.com/vufind-org/vufind/pull/1118
Generated at Tue Apr 16 21:39:34 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100250-rev:d2bbf99a611e8c219fc0b1362289195366130541.