Issue16349
Created on 2012-10-28 12:43 by takluyver, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| format-bytes.patch | martin.panter, 2014-12-18 05:55 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg174042 - (view) | Author: Thomas Kluyver (takluyver) * | Date: 2012-10-28 12:43 | |
At least in CPython, format strings can be given as bytes, as an alternative to str. E.g. >>> struct.unpack(b'>hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03') (1, 2, 3) Looking at the source code [1], this appears to be consciously accounted for. But it doesn't seem to be mentioned in the documentation. I think the docs should either say it's a possibility, or warn that it's an implementation detail. [1] http://hg.python.org/cpython/file/cde4b66699fe/Modules/_struct.c#l1340 |
|||
| msg174083 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2012-10-28 22:36 | |
Also it would be nice to clarify if struct.Struct.format is meant to be a byte string. Reading the documentation and examples I expected a character string. It was an issue for me when embedding one structure within another:
HSF_VOL_DESC = Struct("< B 5s B")
# Python 3.2.3's "Struct.format" is actually a byte string
NSR_DESC = Struct(HSF_VOL_DESC.format.decode() + "B")
|
|||
| msg174584 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2012-11-02 21:28 | |
For 3.3, I verified that adding b prefix to first three doc examples gives same output as without, but also discovered that example outputs are wrong, at least on windows, because of byte ordering issues.
>>> pack('hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x03\x00\x00\x00'
>>> pack(b'hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x03\x00\x00\x00'
>>> unpack(b'hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(256, 512, 50331648)
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(256, 512, 50331648)
|
|||
| msg174680 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2012-11-03 19:35 | |
> but also discovered that example outputs are wrong That's documented to some extent: there's a line in the docs that says: "All examples assume a native byte order, size, and alignment with a big-endian machine". Given that little-endian machines are much more common that big-endian these days, it may be worth rewriting the examples for little-endian machines. |
|||
| msg174682 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2012-11-03 19:40 | |
> Also it would be nice to clarify if struct.Struct.format is meant to be > a byte string. Hmm. That seems wrong to me. After all, the format string is supposed to be a piece of human-readable text rather than a collection of bytes. I think it's borderline acceptable to allow a bytes instance to be passed in for the format (practicality beats purity and all that), but I'd say that the output format should definitely be unicode. |
|||
| msg174711 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2012-11-03 22:13 | |
I think the example should be switched *and* the formats should specify the endianess so the examples work on all systems. |
|||
| msg176681 - (view) | Author: Thomas Kluyver (takluyver) * | Date: 2012-11-30 11:04 | |
I'm happy to put together a docs patch, but I don't have any indication of the right answer (is it a safe feature to use, or an implementation detail?) Is there another venue where I should raise the question? |
|||
| msg176701 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2012-11-30 18:55 | |
Python 2 supports only str. Support for unicode objects has been added in r59687 (merged with other unrelated changes in changeset 13aabc23cf2e). Maybe Raymond can explain why the type for the Struct.format was chosen bytes, not unicode. |
|||
| msg176702 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2012-11-30 19:05 | |
No, this is not r59687. I can't found from which revision in 59680-59695 it came. |
|||
| msg216656 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2014-04-17 05:11 | |
The issue of Struct.format being a byte string has been raised separately in Issue 21071. |
|||
| msg232767 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2014-12-16 22:34 | |
Actually the “struct” module doc string seems to already hint that format strings can be byte strings: “Python bytes objects are used to hold the data representing the C struct and also as format strings . . .” |
|||
| msg232858 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2014-12-18 05:55 | |
Assuming it is intended to support byte strings, here is a patch that documents them being allowed, and adds a test case |
|||
| msg292554 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2017-04-29 02:46 | |
I think the direction to take for this depends on the outcome of Issue 21071. First we have to decide if the “format” attribute is blessed as a byte string (and not deprecated), or whether it is deprecated or changed to a text string. Serhiy pointed out that it is not entirely “safe” because mixing equivalent byte and text formats can generate ByteWarning. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:37 | admin | set | github: 60553 |
| 2017-09-14 03:25:33 | xiang.zhang | unlink | issue19985 dependencies |
| 2017-04-29 02:46:51 | martin.panter | set | dependencies: + struct.Struct.format is bytes, but should be str, - Document whether it's safe to use bytes for struct format string |
| 2017-04-29 02:46:51 | martin.panter | unlink | issue16349 dependencies |
| 2017-04-29 02:46:14 | martin.panter | set | dependencies:
+ Document whether it's safe to use bytes for struct format string messages: + msg292554 |
| 2017-04-29 02:46:14 | martin.panter | link | issue16349 dependencies |
| 2016-04-15 04:00:33 | martin.panter | link | issue19985 dependencies |
| 2014-12-19 00:17:39 | Arfrever | set | nosy:
+ Arfrever |
| 2014-12-18 05:55:28 | martin.panter | set | files:
+ format-bytes.patch keywords: + patch messages: + msg232858 |
| 2014-12-16 22:34:55 | martin.panter | set | messages: + msg232767 |
| 2014-04-17 05:11:19 | martin.panter | set | messages: + msg216656 |
| 2012-11-30 19:05:18 | serhiy.storchaka | set | nosy:
+ christian.heimes messages: + msg176702 |
| 2012-11-30 18:55:08 | serhiy.storchaka | set | versions:
+ Python 3.2, Python 3.3, Python 3.4 nosy: + rhettinger, serhiy.storchaka messages: + msg176701 components: + Extension Modules, - Library (Lib) |
| 2012-11-30 11:04:26 | takluyver | set | messages: + msg176681 |
| 2012-11-03 22:13:58 | terry.reedy | set | messages: + msg174711 |
| 2012-11-03 19:40:04 | mark.dickinson | set | messages: + msg174682 |
| 2012-11-03 19:35:49 | mark.dickinson | set | messages: + msg174680 |
| 2012-11-02 21:28:35 | terry.reedy | set | nosy:
+ mark.dickinson, meador.inge, terry.reedy messages: + msg174584 |
| 2012-10-28 22:36:28 | martin.panter | set | nosy:
+ martin.panter messages: + msg174083 |
| 2012-10-28 12:43:42 | takluyver | create | |
