unicode causes xml_parser to misbehave
| Bug #38427 | unicode causes xml_parser to misbehave | ||||
|---|---|---|---|---|---|
| Submitted: | 2006-08-11 11:57 UTC | Modified: | 2006-08-15 22:49 UTC | ||
| From: | codje2 at lboro dot ac dot uk | Assigned: | |||
| Status: | Closed | Package: | XML related | ||
| PHP Version: | 5.1.4 | OS: | OS X | ||
| Private report: | No | CVE-ID: | None | ||
[2006-08-11 11:57 UTC] codje2 at lboro dot ac dot uk
Description:
------------
When xml_parse_into_struct reaches a unicode character (i.e an
umlaut) when in a CDATA array element.It will split the text
across two elements the second starting with that character.
Reproduce code:
---------------
<?php
header('Content-type: text/html; charset=utf-8');
$input=<<<END
<p>s?me p tag <b>s?me text</b> a bit more ?f the p t?g</p>
END;
print "<html><body><pre><xmp>";
//either save this example in UTF-8 form, or enable the follwing line
//$input=utf8_encode($input);
$parser = xml_parser_create('utf-8');
xml_parse_into_struct($parser, $input, $vals, $index);
print_r($vals);
print "</xmp></pre></body></html>";
?>
Expected result:
----------------
Array
(
[0] => Array
(
[tag] => P
[type] => open
[level] => 1
[value] => s?me p tag
)
[1] => Array
(
[tag] => B
[type] => complete
[level] => 2
[value] => s?me text
)
[2] => Array
(
[tag] => P
[value] => a bit more ?f the p t?g
[type] => cdata
[level] => 1
)
[4] => Array
(
[tag] => P
[type] => close
[level] => 1
)
)
Actual result:
--------------
Array
(
[0] => Array
(
[tag] => P
[type] => open
[level] => 1
[value] => s?me p tag
)
[1] => Array
(
[tag] => B
[type] => complete
[level] => 2
[value] => s?me text
)
[2] => Array
(
[tag] => P
[value] => a bit more
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => P
[value] => ?f the p t?g
[type] => cdata
[level] => 1
)
[4] => Array
(
[tag] => P
[type] => close
[level] => 1
)
)
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2006-08-15 22:49 UTC] rrichards@php.net