Issue 23231: Fix codecs.iterencode/decode() by allowing data parameter to be omitted

Issue 23231: Fix codecs.iterencode/decode() by allowing data parameter to be omitted

Issue23231

Created on 2015-01-13 12:48 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
final-no-object.patch	martin.panter, 2015-01-13 12:48		review
final-no-object.ignore-space.diff	martin.panter, 2015-01-13 12:50	diff --ignore-all-space	review
iter-unsupported.patch	martin.panter, 2016-08-20 09:22		review

Messages (8)
msg233932 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-13 12:48
As mentioned in Issue 20132, iterencode() and iterdecode() only work on text-to-byte codecs, because they assume particular data types when finalizing the incremental codecs. This patch changes the signature of the IncrementalEncoder and IncrementalDecoder methods from IncrementalEncoder.encode(object[, final]) IncrementalEncoder.decode(object[, final]) to IncrementalEncoder.encode([object,] [final]) IncrementalEncoder.decode([object,] [final]) so that iteren/decode(), and perhaps in the future, StreamWriter/Reader, can operate the incremental codec without knowing what kind of data should be processed.
msg233933 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-13 12:50
Original patch has lots of whitespace changes, probably due to generated codec code not being regenerated for a long time. This diff ignores the space changes, so should be easier to review.
msg234206 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-18 00:19
Another idea that doesn’t involve changing the incremental codec APIs is kind of described in <https://bugs.python.org/issue7475#msg145986>: to add format parameters to iterencode() and iterdecode(), which would allow it to determine the right data type to finalize the codecs with.
msg256746 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-19 23:50
The patch changes public interface. This breaks compatibility with third-party codecs implementing it. We have found other solution to iterencode/iterdecode problem. For example we can buffer iterated values and encode with one step delay: prev = sentinel = object() for input in iterator: if prev is not sentinel: output = encoder.encode(prev) if output: yield output prev = input if prev is not sentinel: output = encoder.encode(prev, True) if output: yield output Or remember the previous value and use it to calculate the empty value at the end (works only if input type supports slicing): prev = sentinel = object() for input in iterator: output = encoder.encode(input) if output: yield output prev = input if prev is not sentinel: output = encoder.encode(prev[:0], True) if output: yield output
msg273101 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-08-19 09:06
Serhiy’s two proposals won’t work for codecs that include non-zero output for zero input: >>> tuple(iterencode((), "utf-8-sig")) (b'\xef\xbb\xbf',) >>> encode(b"", "uu") b'begin 666 <data>\n \nend\n' >>> encode(b"", "zlib") b'x\x9c\x03\x00\x00\x00\x00\x01' However I agree that changing the incremental codec APIs is not ideal. Since nobody seems to care that much, it might be simpler to document that: * iterencode() only works where text str objects can be encoded, so base64-codec is not supported, but rot13-codec is supported * iterdecode() only works where bytes objects can be decoded, so rot13-codec is not supported, but base64-codec should be supported (pending other aspects of Issue 20132)
msg273198 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-08-20 09:22
Here is my documentation proposal.
msg273203 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-08-20 10:25
> it might be simpler to document that Agreed.
msg278678 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-10-15 01:05
New changeset 402eba63650c by Martin Panter in branch '3.5': Issue #23231: Document codecs.iterencode(), iterdecode() shortcomings https://hg.python.org/cpython/rev/402eba63650c New changeset 0837940bcb9f by Martin Panter in branch '3.6': Issue #23231: Merge codecs doc from 3.5 into 3.6 https://hg.python.org/cpython/rev/0837940bcb9f New changeset 1955dcc27332 by Martin Panter in branch 'default': Issue #23231: Merge codecs doc from 3.6 https://hg.python.org/cpython/rev/1955dcc27332

History
Date	User	Action	Args
2022-04-11 14:58:11	admin	set	github: 67420
2016-10-15 01:37:36	martin.panter	set	status: open -> closed stage: patch review -> resolved resolution: fixed versions: + Python 3.7
2016-10-15 01:05:11	python-dev	set	nosy: + python-dev messages: + msg278678
2016-08-20 10:25:41	serhiy.storchaka	set	assignee: serhiy.storchaka -> martin.panter messages: + msg273203 nosy: + r.david.murray
2016-08-20 09:22:54	martin.panter	set	files: + iter-unsupported.patch versions: + Python 3.5 messages: + msg273198 components: + Documentation, - Library (Lib) stage: patch review
2016-08-19 09:06:30	martin.panter	set	messages: + msg273101
2015-12-20 05:30:29	r.david.murray	set	nosy: - Ruel Net1400
2015-12-20 05:30:11	r.david.murray	set	messages: - msg256747
2015-12-20 01:03:26	Ruel Net1400	set	nosy: + Ruel Net1400 messages: + msg256747
2015-12-19 23:50:07	serhiy.storchaka	set	nosy: + lemburg, doerwalter messages: + msg256746 versions: + Python 3.6, - Python 3.5
2015-07-23 01:54:38	martin.panter	link	issue20132 dependencies
2015-07-16 02:08:14	martin.panter	link	issue13881 dependencies
2015-02-28 13:37:26	serhiy.storchaka	set	assignee: serhiy.storchaka nosy: + serhiy.storchaka
2015-01-18 00:19:04	martin.panter	set	messages: + msg234206
2015-01-13 12:50:06	martin.panter	set	files: + final-no-object.ignore-space.diff messages: + msg233933
2015-01-13 12:48:19	martin.panter	create