VuFind
  1. VuFind
  2. VUFIND-513

IDs containing slashes cause problems

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: Record, Search
    • Labels:
      None

      Description

      Although VUFIND-508 solves problems with most special characters in Solr IDs, slashes still seem to cause issues (in both VuFind 1.x and 2.x). This is the result of standard Apache functionality; it can be worked around by adding "AllowEncodedSlashes on" to the <VirtualHost> section used by VuFind.

        Activity

        Hide
        Demian Katz added a comment -
        This article explains the cause of this problem:

        http://www.jampmark.com/web-scripting/5-solutions-to-url-encoded-slashes-problem-in-apache.html

        It sounds like Apache's "AllowEncodedSlashes" directive may be the most straightforward solution.
        Show
        Demian Katz added a comment - This article explains the cause of this problem: http://www.jampmark.com/web-scripting/5-solutions-to-url-encoded-slashes-problem-in-apache.html It sounds like Apache's "AllowEncodedSlashes" directive may be the most straightforward solution.
        Hide
        Demian Katz added a comment -
        Thanks to Thomas Schwaerzler for actually experimenting with this. He found that adding the "AllowEncodedSlashes NoDecode" directive to his Apache configuration, VuFind would receive the IDs. However, some modifications to web/services/Record/Record.php were necessary to decode the slash:

        $_id = str_replace('%2F', '/', $_REQUEST['id']);

        if (!($record = $this->db->getRecord($_id))) { // ...

        This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash.

        Perhaps there is a way to further refine the Apache and mod_rewrite configuration to make this work properly... it would be nice to find a simple solution that doesn't set up any weird special cases.
        Show
        Demian Katz added a comment - Thanks to Thomas Schwaerzler for actually experimenting with this. He found that adding the "AllowEncodedSlashes NoDecode" directive to his Apache configuration, VuFind would receive the IDs. However, some modifications to web/services/Record/Record.php were necessary to decode the slash: $_id = str_replace('%2F', '/', $_REQUEST['id']); if (!($record = $this->db->getRecord($_id))) { // ... This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash. Perhaps there is a way to further refine the Apache and mod_rewrite configuration to make this work properly... it would be nice to find a simple solution that doesn't set up any weird special cases.
        Hide
        Thomas Schwaerzler added a comment - - edited
        since i had the problem again with another partner i chose this quite comfortable workaround solution: replacing the "/" at marc.properties like this:

        id = 001, (pattern_map.id_remove_slash), first
        # remove first occurence of "/"
        pattern_map.id_remove_slash.pattern_0 = (.+)/(.+)=>$1_$2

        certainly for multiple occurences of "/" the expression had to be modified.to also cover trailing or leading slashes maybe something like this woudl be needed:
        pattern_map.id_remove_slash.pattern_0 = (.+)?/(.+)?=>$1_$2 # untested

        Show
        Thomas Schwaerzler added a comment - - edited since i had the problem again with another partner i chose this quite comfortable workaround solution: replacing the "/" at marc.properties like this: id = 001, (pattern_map.id_remove_slash), first # remove first occurence of "/" pattern_map.id_remove_slash.pattern_0 = (.+)/(.+)=>$1_$2 certainly for multiple occurences of "/" the expression had to be modified.to also cover trailing or leading slashes maybe something like this woudl be needed: pattern_map.id_remove_slash.pattern_0 = (.+)?/(.+)?=>$1_$2 # untested
        Hide
        Demian Katz added a comment -
        The 27/Jul/12 comment to 1.x code, but I can think of two possible solutions in VuFind 2.x:

        Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.

        Solution b) Customize your indexing rules to translate slashes into %2F at index-time.

        I'd still love to find a more seamless solution!
        Show
        Demian Katz added a comment - The 27/Jul/12 comment to 1.x code, but I can think of two possible solutions in VuFind 2.x: Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records. Solution b) Customize your indexing rules to translate slashes into %2F at index-time. I'd still love to find a more seamless solution!
        Hide
        David Maus added a comment - - edited
        Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.

        Hm. I don't think this will work. A search event cannot change the function arguments, only the ParamBag.

        "This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash."

        An ID containing the literal sequence "%2F" would be encoded as "%252F" as a query parameter while a ID containing "/" would be %2F. In theory there shouldn't be a problem.

        Request parameter => urldecode() = > --- => urlencode() => Solr => --- => urlencode() => Create Link
        Show
        David Maus added a comment - - edited Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records. Hm. I don't think this will work. A search event cannot change the function arguments, only the ParamBag. "This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash." An ID containing the literal sequence "%2F" would be encoded as "%252F" as a query parameter while a ID containing "/" would be %2F. In theory there shouldn't be a problem. Request parameter => urldecode() = > --- => urlencode() => Solr => --- => urlencode() => Create Link
        Hide
        Demian Katz added a comment -
        True -- I guess you would have to build a custom subclass of the Connector to perform the translation and then call the parent method.
        Show
        Demian Katz added a comment - True -- I guess you would have to build a custom subclass of the Connector to perform the translation and then call the parent method.
        Hide
        Demian Katz added a comment - - edited
        Regarding the %2F problem I mention, if memory serves, the problem is that "AllowEncodedSlashes NoDecode" means that Apache does not decode slashes but it does decode everything else... so an encoded / remains %2F, but %252F also resolves to "%2F." It is possible that I am mistaken about this -- it is a very confusing situation and it has been several months since I actually ran tests myself -- but that is my current recollection.
        Show
        Demian Katz added a comment - - edited Regarding the %2F problem I mention, if memory serves, the problem is that "AllowEncodedSlashes NoDecode" means that Apache does not decode slashes but it does decode everything else... so an encoded / remains %2F, but %252F also resolves to "%2F." It is possible that I am mistaken about this -- it is a very confusing situation and it has been several months since I actually ran tests myself -- but that is my current recollection.
        Hide
        Demian Katz added a comment - - edited
        I've just received a report that VF2 works with "AllowEncodedSlashes on" instead of "AllowEncodedSlashes NoDecode" with no code changes needed. It's possible that the NoDecode solution was only necessary for 1.x because of the more complicated mod_rewrite rules in the old version. I'll have to do some testing of my own, but perhaps we can just put AllowEncodedSlashes on in httpd-vufind.conf (if it's allowed at that level) and close the ticket.

        EDIT: I tried to test this out, both putting the directive in httpd-vufind.conf and in my top-level Apache configuration, but I got 404 errors in both cases. I will see if I can find out more information on the successful configuration.
        Show
        Demian Katz added a comment - - edited I've just received a report that VF2 works with "AllowEncodedSlashes on" instead of "AllowEncodedSlashes NoDecode" with no code changes needed. It's possible that the NoDecode solution was only necessary for 1.x because of the more complicated mod_rewrite rules in the old version. I'll have to do some testing of my own, but perhaps we can just put AllowEncodedSlashes on in httpd-vufind.conf (if it's allowed at that level) and close the ticket. EDIT: I tried to test this out, both putting the directive in httpd-vufind.conf and in my top-level Apache configuration, but I got 404 errors in both cases. I will see if I can find out more information on the successful configuration.
        Hide
        David Maus added a comment -
        The AllowEncodedSlashes directive can only be applied in the virtual host or server config context: https://httpd.apache.org/docs/2.2/mod/core.html#AllowEncodedSlashes
        Show
        David Maus added a comment - The AllowEncodedSlashes directive can only be applied in the virtual host or server config context: https://httpd.apache.org/docs/2.2/mod/core.html#AllowEncodedSlashes
        Hide
        Demian Katz added a comment -
        Thanks, David and Joe -- I have confirmed that everything works properly when AllowEncodedSlashes on is added within the appropriate <VirtualHost> section of the Apache configuration. This is not something we can resolve from within the VuFind code, but I have added a note to the installation documentation, so I believe that we can now close this issue.
        Show
        Demian Katz added a comment - Thanks, David and Joe -- I have confirmed that everything works properly when AllowEncodedSlashes on is added within the appropriate <VirtualHost> section of the Apache configuration. This is not something we can resolve from within the VuFind code, but I have added a note to the installation documentation, so I believe that we can now close this issue.

          People

          • Assignee:
            Demian Katz
            Reporter:
            Demian Katz
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: