[VUFIND-630] Wikipedia module crashes for certain articles on Windows platform Created: 20/Jul/12 Updated: 06/Aug/13 Resolved: 10/May/13 |
|
Status: | Resolved |
Project: | VuFind® |
Components: | Author |
Affects versions: | 1.3 |
Fix versions: | 2.0 |
Type: | Bug | Priority: | Minor |
Reporter: | Ronan McHugh | Assignee: | Demian Katz |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original estimate: | Not Specified | ||
Environment: | Windows XP, XAMPP for Windows |
Attachments: | test.php |
Description |
When retrieving and parsing the wikipedia article for James_Joyce, the Wikipedia module crashes on a Windows platform. The guilty code is preg_match_all("/".$open.$recursive_match.$close."/Us", $body2, $new_matches); The error returned is not a php error but a connection error: The connection to the server was reset while the page was loading. We speculate that this has something to do with the apache heap size under windows, but we don't know. Attached is a test script which demonstrates this issue. There are two possible inputs for the preg_match_all: $body1 is an article head which is parsed correctly, $body2 is the one which causes the error. Just change the variable in the preg_match_all statement to see the difference. |
Comments |
Comment by Ronan McHugh [ 20/Jul/12 ] |
The bug seems related to the following section of code. Try running the preg_match_all on a new variable, defined as follows: $body3 = "[[File:Revolutionary Joyce Better Contrast.jpg|thumb|alt=Half-length portrait of man in his thirties. He looks to his right so that his face is in profile. He has a mustache, a thin beard, and medium-length hair slicked back, and wears a pince-nez and a plain dark greatcoat, looking vaguely like a Russian revolutionary.|[[File:James Joyce signature.svg|200px]]<br /><center>Joyce in [[Zurich]], {{circa|1918}}</center>]]"; So there is obviously something in this extract that is causing the regex problems. If we can identify this, we might be able to sanitise the input somewhat before parsing. |
Comment by Demian Katz [ 14/Jan/13 ] |
Your heap theory is correct. This thread describes the reason for the problem in more detail: http://stackoverflow.com/questions/7620910/regexp-in-preg-match-function-returning-browser-error I am able to reproduce the problem using your test script in Windows under Apache 2.2.15 and PHP 5.3.14. The same problem is not reproducible using command-line PHP 5.3.14 under Windows. This is because CLI PHP has more heap by default than PHP-in-Apache. I reconfigured Apache to have a larger heap size by adding this to httpd.conf: <IfModule mpm_winnt_module> ThreadStackSize 8388608 </IfModule> This solved the problem -- no more regex failures in test.php. So there are three possible resolutions to this ticket: 1.) Close the ticket with no action; assume that users having problems can manually adjust their ThreadStackSize setting, and hope that the default settings under Windows eventually become more reasonable. 2.) Add a higher ThreadStackSize setting to httpd-vufind.conf, so that the default VuFind configuration has more memory available under Windows (presumably the <IfModule> statement above will prevent this from breaking anything under other platforms). 3.) Find a less memory-intensive solution to the problem currently being solved by recursive regex matching. Obviously #3 is the nicest solution -- but I don't have time to invest in trying to optimize this area of the code. I don't really like option #2, since it may cause more problems than it solves. Thus I favor #1 for the moment, though I'm not especially happy about it. Any other thoughts? |
Comment by Eoghan Ó Carragáin [ 15/Jan/13 ] |
How about #1b: Close ticket but reference Is it something that could be presented as an option/warning to windows users as part of the automated 2.x install? Cheers |
Comment by Demian Katz [ 15/Jan/13 ] |
That sounds like a reasonable option to me -- I'll try to find time to implement it after the next dev call unless there is any dissent there. On a related note, I wonder if Wikipedia should be off by default so that administrators would be more likely to notice the warning when going in to turn it on. Since it's a somewhat controversial feature, I'm not sure if the current "on by default" is the best choice -- though we should get some more input before making such a change. |
Comment by Demian Katz [ 23/Jan/13 ] |
Comments on a different but seemingly related issue have been removed; see |
Comment by Demian Katz [ 10/May/13 ] |
Note added to config here: https://github.com/vufind-org/vufind/commit/11018575124f40e621248d485ad604d18a54c4c3 Also put a "general notes" appendix on the Windows install pages in the wiki containing a similar comment. |