Issue23231
Created on 2015-01-13 12:48 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| final-no-object.patch | martin.panter, 2015-01-13 12:48 | review | ||
| final-no-object.ignore-space.diff | martin.panter, 2015-01-13 12:50 | diff --ignore-all-space | review | |
| iter-unsupported.patch | martin.panter, 2016-08-20 09:22 | review | ||
| Messages (8) | |||
|---|---|---|---|
| msg233932 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-13 12:48 | |
As mentioned in Issue 20132, iterencode() and iterdecode() only work on text-to-byte codecs, because they assume particular data types when finalizing the incremental codecs. This patch changes the signature of the IncrementalEncoder and IncrementalDecoder methods from IncrementalEncoder.encode(object[, final]) IncrementalEncoder.decode(object[, final]) to IncrementalEncoder.encode([object,] [final]) IncrementalEncoder.decode([object,] [final]) so that iteren/decode(), and perhaps in the future, StreamWriter/Reader, can operate the incremental codec without knowing what kind of data should be processed. |
|||
| msg233933 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-13 12:50 | |
Original patch has lots of whitespace changes, probably due to generated codec code not being regenerated for a long time. This diff ignores the space changes, so should be easier to review. |
|||
| msg234206 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-18 00:19 | |
Another idea that doesn’t involve changing the incremental codec APIs is kind of described in <https://bugs.python.org/issue7475#msg145986>: to add format parameters to iterencode() and iterdecode(), which would allow it to determine the right data type to finalize the codecs with. |
|||
| msg256746 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-12-19 23:50 | |
The patch changes public interface. This breaks compatibility with third-party codecs implementing it.
We have found other solution to iterencode/iterdecode problem. For example we can buffer iterated values and encode with one step delay:
prev = sentinel = object()
for input in iterator:
if prev is not sentinel:
output = encoder.encode(prev)
if output:
yield output
prev = input
if prev is not sentinel:
output = encoder.encode(prev, True)
if output:
yield output
Or remember the previous value and use it to calculate the empty value at the end (works only if input type supports slicing):
prev = sentinel = object()
for input in iterator:
output = encoder.encode(input)
if output:
yield output
prev = input
if prev is not sentinel:
output = encoder.encode(prev[:0], True)
if output:
yield output
|
|||
| msg273101 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-08-19 09:06 | |
Serhiy’s two proposals won’t work for codecs that include non-zero output for zero input: >>> tuple(iterencode((), "utf-8-sig")) (b'\xef\xbb\xbf',) >>> encode(b"", "uu") b'begin 666 <data>\n \nend\n' >>> encode(b"", "zlib") b'x\x9c\x03\x00\x00\x00\x00\x01' However I agree that changing the incremental codec APIs is not ideal. Since nobody seems to care that much, it might be simpler to document that: * iterencode() only works where text str objects can be encoded, so base64-codec is not supported, but rot13-codec is supported * iterdecode() only works where bytes objects can be decoded, so rot13-codec is not supported, but base64-codec should be supported (pending other aspects of Issue 20132) |
|||
| msg273198 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-08-20 09:22 | |
Here is my documentation proposal. |
|||
| msg273203 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-08-20 10:25 | |
> it might be simpler to document that Agreed. |
|||
| msg278678 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2016-10-15 01:05 | |
New changeset 402eba63650c by Martin Panter in branch '3.5': Issue #23231: Document codecs.iterencode(), iterdecode() shortcomings https://hg.python.org/cpython/rev/402eba63650c New changeset 0837940bcb9f by Martin Panter in branch '3.6': Issue #23231: Merge codecs doc from 3.5 into 3.6 https://hg.python.org/cpython/rev/0837940bcb9f New changeset 1955dcc27332 by Martin Panter in branch 'default': Issue #23231: Merge codecs doc from 3.6 https://hg.python.org/cpython/rev/1955dcc27332 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:11 | admin | set | github: 67420 |
| 2016-10-15 01:37:36 | martin.panter | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions: + Python 3.7 |
| 2016-10-15 01:05:11 | python-dev | set | nosy:
+ python-dev messages: + msg278678 |
| 2016-08-20 10:25:41 | serhiy.storchaka | set | assignee: serhiy.storchaka -> martin.panter messages:
+ msg273203 |
| 2016-08-20 09:22:54 | martin.panter | set | files:
+ iter-unsupported.patch versions: + Python 3.5 messages: + msg273198 components:
+ Documentation, - Library (Lib) |
| 2016-08-19 09:06:30 | martin.panter | set | messages: + msg273101 |
| 2015-12-20 05:30:29 | r.david.murray | set | nosy:
- Ruel Net1400 |
| 2015-12-20 05:30:11 | r.david.murray | set | messages: - msg256747 |
| 2015-12-20 01:03:26 | Ruel Net1400 | set | nosy:
+ Ruel Net1400 messages: + msg256747 |
| 2015-12-19 23:50:07 | serhiy.storchaka | set | nosy:
+ lemburg, doerwalter messages:
+ msg256746 |
| 2015-07-23 01:54:38 | martin.panter | link | issue20132 dependencies |
| 2015-07-16 02:08:14 | martin.panter | link | issue13881 dependencies |
| 2015-02-28 13:37:26 | serhiy.storchaka | set | assignee: serhiy.storchaka nosy: + serhiy.storchaka |
| 2015-01-18 00:19:04 | martin.panter | set | messages: + msg234206 |
| 2015-01-13 12:50:06 | martin.panter | set | files:
+ final-no-object.ignore-space.diff messages: + msg233933 |
| 2015-01-13 12:48:19 | martin.panter | create | |

