preg_replace with /u (unicode/UTF-8) and many hits = very bad performance
| Bug #44336 | preg_replace with /u (unicode/UTF-8) and many hits = very bad performance | ||||
|---|---|---|---|---|---|
| Submitted: | 2008-03-05 14:10 UTC | Modified: | 2009-01-13 19:23 UTC | ||
| From: | frode at coretrek dot com | Assigned: | nlopess (profile) | ||
| Status: | Closed | Package: | PCRE related | ||
| PHP Version: | 5.2.6RC1 | OS: | Debian GNU/Linux 4.0r3 | ||
| Private report: | No | CVE-ID: | None | ||
[2008-03-05 14:10 UTC] frode at coretrek dot com
Description:
------------
The "/u" modifier with preg_replace() yields extraordinarily poor performance when there are a lot of matches.
I tried to run php through valgrind/callgrind and kcachegrind, and it seems that the time is mostly spent in "php__pcre_valid_utf8()", Perhaps this method is (unnecessarily) called over and over again, once for each substring match/replace?
This happens at least in both PHP 5.2.5 and PHP 5.2.6RC1
Reproduce code:
---------------
<?php
$goodstring = str_repeat('Test', 50000);
$badstring = str_repeat('Test ', 50000);
$t = microtime(true);
preg_replace('/\\s+/', ' ', $goodstring);
echo "matches: NO unicode: NO ".(microtime(true)-$t)." sec\n";
$t = microtime(true);
preg_replace('/\\s+/u', ' ', $goodstring);
echo "matches: NO unicode: YES ".(microtime(true)-$t)." sec\n";
$t = microtime(true);
preg_replace('/\\s+/', ' ', $badstring);
echo "matches: YES unicode: NO ".(microtime(true)-$t)." sec\n";
$t = microtime(true);
preg_replace('/\\s+/u', ' ', $badstring);
echo "matches: YES unicode: YES ".(microtime(true)-$t)." sec\n";
Expected result:
----------------
Similar performance for all runs (less than 1 second)
Actual result:
--------------
matches: NO unicode: NO 0.020231962204 sec
matches: NO unicode: YES 0.0206818580627 sec
matches: YES unicode: NO 0.0361981391907 sec
matches: YES unicode: YES 27.6555769444 sec
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2008-03-05 15:44 UTC] frode at coretrek dot com
[2008-03-05 15:46 UTC] frode at coretrek dot com
[2008-03-05 16:34 UTC] felipe@php.net
[2008-03-08 12:04 UTC] nlopess@php.net