json_encode silently cuts non-UTF8 strings
| Bug #43941 | json_encode silently cuts non-UTF8 strings | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Submitted: | 2008-01-25 22:19 UTC | Modified: | 2008-01-30 16:12 UTC |
|
||||||||||
| From: | stas at zend dot com | Assigned: | ||||||||||||
| Status: | Closed | Package: | JSON related | |||||||||||
| PHP Version: | 5.3CVS-2008-01-25 (CVS) | OS: | * | |||||||||||
| Private report: | No | CVE-ID: | None | |||||||||||
[2008-01-25 22:19 UTC] stas at zend dot com
Description:
------------
Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced.
I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c:
if (c < 0) {
return UTF8_END ? the_index : UTF8_ERROR;
}
which inherited this bug from code published on json.org. It should be:
if (c < 0) {
return (c == UTF8_END) ? the_index : UTF8_ERROR;
}
Reproduce code:
---------------
var_dump(json_encode("ab\xE0"));
var_dump(json_encode("ab\xE0\""));
Expected result:
----------------
Some error message
Actual result:
--------------
Just:
""ab""
""ab""
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2008-01-30 03:22 UTC] stas@php.net
[2008-01-30 16:12 UTC] rasmus@php.net