PHP :: Request #63588 :: Duplicate implementation of php_next_utf8_char

Request #63588 Duplicate implementation of php_next_utf8_char
Submitted: 2012-11-23 13:27 UTC Modified: 2012-11-24 13:04 UTC
From: remi@php.net Assigned: remi (profile)
Status: Closed Package: json (PECL)
PHP Version: 5.4.9 OS: GNU/Linux (Fedora 18)
Private report: No CVE-ID: None

 [2012-11-23 13:27 UTC] remi@php.net

Description:
------------
The json extension provides a duplicate implementation of php_next_utf8_char.

This is also related to Bug #63520

The attached patch use php_next_utf8_char function and allow to drop "utf8_to_utf16.*" non really free files from PHP sources.

All the json unit tests succeed with the patch applied.

It seems, there is also a small performance gain (~5% on very large json_encode)


Patches

php-5.4.9-json.patch (last revision 2012-11-23 14:14 UTC by remi@php.net)
json.patch (last revision 2012-11-23 13:27 UTC by remi)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

 [2012-11-24 13:04 UTC] pajoye@php.net

-Status: Open +Status: Assigned -Assigned To: +Assigned To: remi

 [2012-11-25 07:20 UTC] remi@php.net

-Status: Assigned +Status: Closed

 [2012-11-25 12:14 UTC] yoram dot b at zend dot com

The patch looks fine, except of testing for true value of utf16 in each iteration. it might be the cause of the performance degradation, if you compiled without proper optimizations.

 [2012-11-26 09:29 UTC] remi@php.net

@yoram: the previous implementation have the same problem.

    for (;;) {
    ...
            if (w) {
    ...

For now, there is no performance degradation, but a small improvement (according to my quick bench, ~5%)

Of course we can optimize this (and probably also the "bits" handling in json_escape_string, avoid the REVERSE code and make it more readable)

   smart_str_appendl(buf, "\\u", 2);
   smart_str_appendc(buf, digits[(us & 0xf000) >> 12]);
   smart_str_appendc(buf, digits[(us & 0xf00)  >> 8]);
   smart_str_appendc(buf, digits[(us & 0xf0)   >> 4]);
   smart_str_appendc(buf, digits[(us & 0xf)]);