bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603) · python/cpython@7a465cb

bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603) · python/cpython@7a465cb

File tree

3 files changed

lines changed

Misc/NEWS.d/next/Core and Builtins

3 files changed

lines changed

Original file line number	Diff line number	Diff line change
`@@ -406,6 +406,15 @@ def test_lone_surrogates(self):`
`406`	`406`	`self.assertEqual(test_sequence.decode(self.encoding, "backslashreplace"),`
`407`	`407`	`before + backslashreplace + after)`
`408`	`408`
	`409`	`+def test_incremental_surrogatepass(self):`
	`410`	`+# Test incremental decoder for surrogatepass handler:`
	`411`	`+# see issue #24214`
	`412`	`+data = '\uD901'.encode(self.encoding, 'surrogatepass')`
	`413`	`+for i in range(1, len(data)):`
	`414`	`+dec = codecs.getincrementaldecoder(self.encoding)('surrogatepass')`
	`415`	`+self.assertEqual(dec.decode(data[:i]), '')`
	`416`	`+self.assertEqual(dec.decode(data[i:], True), '\uD901')`
	`417`	`+`
`409`	`418`
`410`	`419`	`class UTF32Test(ReadTest, unittest.TestCase):`
`411`	`420`	`encoding = "utf-32"`

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+Fixed support of the surrogatepass error handler in the UTF-8 incremental`
	`2`	`+decoder.`

Original file line number	Diff line number	Diff line change
`@@ -4883,6 +4883,9 @@ PyUnicode_DecodeUTF8Stateful(const char *s,`
`4883`	`4883`	`case 2:`
`4884`	`4884`	`case 3:`
`4885`	`4885`	`case 4:`
	`4886`	`+if (s == end \|\| consumed) {`
	`4887`	`+ goto End;`
	`4888`	`+ }`
`4886`	`4889`	`errmsg = "invalid continuation byte";`
`4887`	`4890`	`startinpos = s - starts;`
`4888`	`4891`	`endinpos = startinpos + ch - 1;`