VuFind
  1. VuFind
  2. VUFIND-630

Wikipedia module crashes for certain articles on Windows platform

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 2.0
    • Component/s: Author
    • Labels:
      None
    • Environment:
      Windows XP, XAMPP for Windows

      Description

      When retrieving and parsing the wikipedia article for James_Joyce, the Wikipedia module crashes on a Windows platform. The guilty code is preg_match_all("/".$open.$recursive_match.$close."/Us", $body2, $new_matches);

      The error returned is not a php error but a connection error: The connection to the server was reset while the page was loading.

      We speculate that this has something to do with the apache heap size under windows, but we don't know.

      Attached is a test script which demonstrates this issue. There are two possible inputs for the preg_match_all: $body1 is an article head which is parsed correctly, $body2 is the one which causes the error. Just change the variable in the preg_match_all statement to see the difference.

        Activity

        Hide
        Ronan McHugh (Inactive) added a comment -
        The bug seems related to the following section of code. Try running the preg_match_all on a new variable, defined as follows:

        $body3 = "[[File:Revolutionary Joyce Better Contrast.jpg|thumb|alt=Half-length portrait of man in his thirties. He looks to his right so that his face is in profile. He has a mustache, a thin beard, and medium-length hair slicked back, and wears a pince-nez and a plain dark greatcoat, looking vaguely like a Russian revolutionary.|[[File:James Joyce signature.svg|200px]]<br /><center>Joyce in [[Zurich]], {{circa|1918}}</center>]]";

        So there is obviously something in this extract that is causing the regex problems. If we can identify this, we might be able to sanitise the input somewhat before parsing.
        Show
        Ronan McHugh (Inactive) added a comment - The bug seems related to the following section of code. Try running the preg_match_all on a new variable, defined as follows: $body3 = "[[File:Revolutionary Joyce Better Contrast.jpg|thumb|alt=Half-length portrait of man in his thirties. He looks to his right so that his face is in profile. He has a mustache, a thin beard, and medium-length hair slicked back, and wears a pince-nez and a plain dark greatcoat, looking vaguely like a Russian revolutionary.|[[File:James Joyce signature.svg|200px]]<br /><center>Joyce in [[Zurich]], {{circa|1918}}</center>]]"; So there is obviously something in this extract that is causing the regex problems. If we can identify this, we might be able to sanitise the input somewhat before parsing.
        Hide
        Demian Katz added a comment -
        Your heap theory is correct. This thread describes the reason for the problem in more detail:

        http://stackoverflow.com/questions/7620910/regexp-in-preg-match-function-returning-browser-error

        I am able to reproduce the problem using your test script in Windows under Apache 2.2.15 and PHP 5.3.14. The same problem is not reproducible using command-line PHP 5.3.14 under Windows. This is because CLI PHP has more heap by default than PHP-in-Apache.

        I reconfigured Apache to have a larger heap size by adding this to httpd.conf:

        <IfModule mpm_winnt_module>
           ThreadStackSize 8388608
        </IfModule>

        This solved the problem -- no more regex failures in test.php.

        So there are three possible resolutions to this ticket:

        1.) Close the ticket with no action; assume that users having problems can manually adjust their ThreadStackSize setting, and hope that the default settings under Windows eventually become more reasonable.

        2.) Add a higher ThreadStackSize setting to httpd-vufind.conf, so that the default VuFind configuration has more memory available under Windows (presumably the <IfModule> statement above will prevent this from breaking anything under other platforms).

        3.) Find a less memory-intensive solution to the problem currently being solved by recursive regex matching.

        Obviously #3 is the nicest solution -- but I don't have time to invest in trying to optimize this area of the code. I don't really like option #2, since it may cause more problems than it solves. Thus I favor #1 for the moment, though I'm not especially happy about it.

        Any other thoughts?
        Show
        Demian Katz added a comment - Your heap theory is correct. This thread describes the reason for the problem in more detail: http://stackoverflow.com/questions/7620910/regexp-in-preg-match-function-returning-browser-error I am able to reproduce the problem using your test script in Windows under Apache 2.2.15 and PHP 5.3.14. The same problem is not reproducible using command-line PHP 5.3.14 under Windows. This is because CLI PHP has more heap by default than PHP-in-Apache. I reconfigured Apache to have a larger heap size by adding this to httpd.conf: <IfModule mpm_winnt_module>    ThreadStackSize 8388608 </IfModule> This solved the problem -- no more regex failures in test.php. So there are three possible resolutions to this ticket: 1.) Close the ticket with no action; assume that users having problems can manually adjust their ThreadStackSize setting, and hope that the default settings under Windows eventually become more reasonable. 2.) Add a higher ThreadStackSize setting to httpd-vufind.conf, so that the default VuFind configuration has more memory available under Windows (presumably the <IfModule> statement above will prevent this from breaking anything under other platforms). 3.) Find a less memory-intensive solution to the problem currently being solved by recursive regex matching. Obviously #3 is the nicest solution -- but I don't have time to invest in trying to optimize this area of the code. I don't really like option #2, since it may cause more problems than it solves. Thus I favor #1 for the moment, though I'm not especially happy about it. Any other thoughts?
        Hide
        Eoghan Ó Carragáin added a comment -
        How about #1b: Close ticket but reference VUFIND-630 in the windows install documentation and in the [Content] section config.ini?

        Is it something that could be presented as an option/warning to windows users as part of the automated 2.x install?

        Cheers

        Show
        Eoghan Ó Carragáin added a comment - How about #1b: Close ticket but reference VUFIND-630 in the windows install documentation and in the [Content] section config.ini? Is it something that could be presented as an option/warning to windows users as part of the automated 2.x install? Cheers
        Hide
        Demian Katz added a comment -
        That sounds like a reasonable option to me -- I'll try to find time to implement it after the next dev call unless there is any dissent there.

        On a related note, I wonder if Wikipedia should be off by default so that administrators would be more likely to notice the warning when going in to turn it on. Since it's a somewhat controversial feature, I'm not sure if the current "on by default" is the best choice -- though we should get some more input before making such a change.
        Show
        Demian Katz added a comment - That sounds like a reasonable option to me -- I'll try to find time to implement it after the next dev call unless there is any dissent there. On a related note, I wonder if Wikipedia should be off by default so that administrators would be more likely to notice the warning when going in to turn it on. Since it's a somewhat controversial feature, I'm not sure if the current "on by default" is the best choice -- though we should get some more input before making such a change.
        Hide
        Demian Katz added a comment -
        Comments on a different but seemingly related issue have been removed; see VUFIND-739 for more details.
        Show
        Demian Katz added a comment - Comments on a different but seemingly related issue have been removed; see VUFIND-739 for more details.
        Hide
        Demian Katz added a comment - - edited
        Note added to config here:

        https://github.com/vufind-org/vufind/commit/11018575124f40e621248d485ad604d18a54c4c3

        Also put a "general notes" appendix on the Windows install pages in the wiki containing a similar comment.
        Show
        Demian Katz added a comment - - edited Note added to config here: https://github.com/vufind-org/vufind/commit/11018575124f40e621248d485ad604d18a54c4c3 Also put a "general notes" appendix on the Windows install pages in the wiki containing a similar comment.

          People

          • Assignee:
            Demian Katz
            Reporter:
            Ronan McHugh (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: