CSV fields incorrectly split if escape char followed by UTF chars
| Bug #72330 | CSV fields incorrectly split if escape char followed by UTF chars | ||||
|---|---|---|---|---|---|
| Submitted: | 2016-06-03 16:00 UTC | Modified: | 2018-04-10 17:01 UTC | ||
| From: | cronfy at gmail dot com | Assigned: | cmb (profile) | ||
| Status: | Closed | Package: | Strings related | ||
| PHP Version: | Irrelevant | OS: | Linux Mint 17.1 Rebecca | ||
| Private report: | No | CVE-ID: | None | ||
[2016-06-03 16:00 UTC] cronfy at gmail dot com
Description:
------------
When escape character set for str_getcsv() is followed by some UTF characters, string is parsed incorrectly.
I tested it on php 5.4, 5.5, 5.6 and 7.0 - behavior is the same.
Test script:
---------------
$utf_1 = chr(0xD1) . chr(0x81); // U+0440;
$utf_2 = chr(0xD8) . chr(0x80); // U+0600
$string = '"first #' . $utf_1 . $utf_2 . '";"second one"';
$d = str_getcsv($string, ';', '"', "#");
print_r($d);
Expected result:
----------------
Array
(
[0] => first #с
[1] => second one
)
Actual result:
--------------
Array
(
[0] => first #с";second one"
)
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2016-06-14 13:21 UTC] cmb@php.net
-Status: Open +Status: Feedback
[2016-06-15 00:08 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
[2016-06-26 04:22 UTC] php-bugs at lists dot php dot net
[2016-07-17 11:07 UTC] cronfy at gmail dot com
-Status: No Feedback +Status: Closed
[2016-07-17 11:07 UTC] cronfy at gmail dot com
[2016-07-17 11:08 UTC] cronfy at gmail dot com
[2016-07-17 11:45 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened
[2016-07-21 16:13 UTC] cmb@php.net
-Summary: str_getcsv() splits fields incorrectly if escape char flollowed by UTF chars +Summary: CSV fields incorrectly split if escape char followed by UTF chars
[2016-07-21 16:13 UTC] cmb@php.net
[2016-07-21 17:13 UTC] cmb@php.net
-Status: Re-Opened +Status: Closed
[2018-04-09 18:21 UTC] ganlvtech at qq dot com
str_getcsv not correctly work with qouted multibyte character PHP version: 7.2.2 Operating system: Windows 10 zh-CN Description: ------------ str_getcsv not correctly work with qouted multibyte characters. When the multibyte characters are simply seperated by comma, everything seems ok. If the value contains a quotation mark("), I need to escape quotation mark by doubled quotation mark(""), and quote the value with a pair of quotation mark. And when I try to decode the csv string by str_getcsv, this value will combined with next value (I lost a column and got two value together in one column). There is not just one type of wrong result. But I think every type of wrong result be caused by the escaped quotation mark. Bug #72330: CSV fields incorrectly split if escape char followed by UTF chars Test script: --------------- <?php // Test 1 $data = [ "\xE4\xBD\xA0\xE5\xA5\xBD", // 你好 "\xE4\xB8\x96\xE7\x95\x8C", // 世界 ]; $encoded = implode(',', array_map(function ($value) { return '"' . str_replace('"', '""', $value) . '"'; }, $data)); var_dump(str_getcsv($encoded) === $data); // Test 2 $data = [ "\"\xE5\x95\x8A", // "啊 ]; $encoded = str_putcsv($data); var_dump(str_getcsv($encoded) === $data); /** @link https://bugs.php.net/bug.php?id=64183 */ function str_putcsv($fields, $delimiter = ',', $enclosure = '"', $escape_char = '\\') { $stream = fopen('php://memory', 'w+'); fputcsv($stream, $fields, $delimiter, $enclosure, $escape_char); rewind($stream); return stream_get_contents($stream); } Expected result: ---------------- bool(true) bool(true) Actual result: -------------- bool(false) bool(false)[2018-04-09 18:21 UTC] ganlvtech at qq dot com
[2018-04-09 21:30 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened
[2018-04-10 10:56 UTC] cmb@php.net
-Status: Re-Opened +Status: Feedback
[2018-04-10 11:44 UTC] ganlvtech at qq dot com
[2018-04-10 16:48 UTC] cmb@php.net
-Status: Feedback +Status: Re-Opened
[2018-04-10 16:48 UTC] cmb@php.net
[2018-04-10 17:01 UTC] cmb@php.net
-Status: Re-Opened +Status: Closed
[2018-04-10 17:20 UTC] ganlvtech at qq dot com