htmlentities() uses obsolete mapping table for character entity references
| Request #46478 | htmlentities() uses obsolete mapping table for character entity references | ||||
|---|---|---|---|---|---|
| Submitted: | 2008-11-04 12:56 UTC | Modified: | 2009-12-22 05:50 UTC | ||
| From: | for-bugs at hnw dot jp | Assigned: | moriyoshi (profile) | ||
| Status: | Closed | Package: | Feature/Change Request | ||
| PHP Version: | 5.2.6 | OS: | * | ||
| Private report: | No | CVE-ID: | None | ||
[2008-11-04 12:56 UTC] for-bugs at hnw dot jp
Description: ------------ ext/standard/html.c has incorrect mapping table which htmlentities() uses. html.c is based on http://www.unicode.org/Public/MAPPINGS/OBSOLETE/UNI2SGML.TXT, but this mapping table is obsolete and not compatible with HTML4.0 or XHTML1.0. For example, U+2235(which is encoded to "\xe2\x88\xb5" with UTF-8) is not in http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent, but htmlentities() returns "∵". U+226A(≪) and U+226B(≫) are similler case. Reproduce code: --------------- <?php var_dump(htmlentities("\xe2\x88\xb5", ENT_QUOTES, "utf-8")); Expected result: ---------------- string(3) "??" Actual result: -------------- string(8) "∵"
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2008-11-09 16:39 UTC] moriyoshi@php.net
[2009-12-22 05:50 UTC] moriyoshi@php.net