Issue16473
Created on 2012-11-14 21:22 by aleperalta, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_quopri.diff | aleperalta, 2012-11-14 21:22 | review | ||
| codec-impl.patch | martin.panter, 2015-01-19 05:46 | Document and test quotetabs=True for quopri-codec | review | |
| Messages (13) | |||
|---|---|---|---|
| msg175593 - (view) | Author: Alejandro Javier Peralta Frías (aleperalta) | Date: 2012-11-14 21:22 | |
New to python-dev; I grab a beginner tasks "increase test coverage" and I decided to add coverage to this bit of code in the quopri module:
# quopri.py
L138 while n > 0 and line[n-1:n] in b" \t\r":
L139 n = n-1
As far as I understand to get into that while-loop the line to decode should end in " \t\r\n".
So the I added the following test:
def test_decodestring_badly_enconded(self):
e = b"hello \t\r\n"
p = b"hello\n"
s = self.module.decodestring(e)
self.assertEqual(s, p)
but that only passes when the module doesn't use binascii. In fact I change test_quopri to use support.import_fresh_module to disable binascii and removed a decorator that was used.
The decode text when binascci is used is:
>>> quopri.decodestring("hello \t\r\n")
'hello \t\r\n'
which differs from
>>> quopri.a2b_qp = None
>>> quopri.b2a_qp = None
>>> quopri.decodestring("hello \t\r\n")
'hello\n
And what's the deal with:
>>> import quopri
>>> quopri.encodestring("hello \t\r")
'hello \t\r'
>>> "hello \t\r".encode("quopri")
'hello=20=09\r'
|
|||
| msg175594 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2012-11-14 21:32 | |
I think I can answer your last question. There are two quopri algorithms, one where spaces are allowed (message body) and one where they aren't (email headers). For the rest, I'd have to take a closer look than I have time for right now. |
|||
| msg175595 - (view) | Author: Alejandro Javier Peralta Frías (aleperalta) | Date: 2012-11-14 21:35 | |
I think I can answer your last question. There are two quopri algorithms, > one where spaces are allowed (message body) and one where they aren't > (email headers). > > OK, thank you. Good to know. |
|||
| msg179744 - (view) | Author: Jesús Cea Avión (jcea) * ![]() |
Date: 2013-01-11 23:14 | |
Ping. |
|||
| msg222121 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014-07-02 20:11 | |
I'll take this on if I can. Is binascii available on all platforms, as if it is the quopri code could be simplified slightly along with the test code? |
|||
| msg222122 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-07-02 20:26 | |
The first problem is determining the "best" error recovery algorithms by reading through the RFCs and considering use cases. |
|||
| msg234300 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-19 05:46 | |
Three slightly different points here: 1. Decoding trailing whitespace: My understanding is quoted-printable encoding aims to be tolerant of whitespace being added to and removed from the end of encoded lines. So I assume the “binascii” module is wrong to leave trailing whitespace in the decoded output, and the native “quopri” implementation is correct to ignore it. 2. CRLF handling: See Issue 20121. It seems CRLF newlines should be valid, and I have added a patch to that issue to make the native Python implementation handle CRLF newlines. 3. Whitespace encoding: The quopri-codec actually sets quotetabs=True. Here is a patch to document and test that, as well as correct the functions used by other codecs. |
|||
| msg234304 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-19 06:26 | |
Regarding decoding trailing whitespace, <https://tools.ietf.org/html/rfc1521.html#section-5.1> rule #3 says: “When decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents.” |
|||
| msg250506 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-09-12 00:50 | |
Will commit a slightly modified version of my doc patch to 3.4+, since mentioning the wrong functions is confusing. But I think we still need to fix the “binascii” decoding, and have a look at Alejandro’s test suite patch. |
|||
| msg250508 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2015-09-12 01:44 | |
New changeset de82f41d6669 by Martin Panter <vadmium> in branch '3.4': Issue #16473: Fix byte transform codec documentation; test quotetabs=True https://hg.python.org/cpython/rev/de82f41d6669 New changeset 28cd11dc2915 by Martin Panter <vadmium> in branch '3.5': Issue #16473: Merge codecs doc and test from 3.4 into 3.5 https://hg.python.org/cpython/rev/28cd11dc2915 New changeset 3ecb5766ba15 by Martin Panter <vadmium> in branch 'default': Issue #16473: Merge codecs doc and test from 3.5 https://hg.python.org/cpython/rev/3ecb5766ba15 |
|||
| msg250509 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2015-09-12 02:56 | |
New changeset cfb0481c89d7 by Martin Panter <vadmium> in branch '2.7': Issue #16473: Fix byte transform codec documentation; test quotetabs=True https://hg.python.org/cpython/rev/cfb0481c89d7 |
|||
| msg250514 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-09-12 08:13 | |
Mentioned functions are not exact equivalents of codecs. They are preferable way to to obtain the similar (apart from minor details) output. |
|||
| msg250520 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-09-12 12:16 | |
The list of functions were added in Issue 17844. I made the change today because I forgot that the listed functions weren’t exactly equivalent when investigating Issue 25075. Base64-codec encodes to multiple lines, but b64encode() returns the raw encoding without line breaks. I see that base64.encodebytes() is listed as a “legacy interface”, but as far as I can tell nothing outside the legacy interface does any line splitting. Hex-codec encodes to lowercase, but b16encode() returns uppercase, following RFC 4648. Quopri-codec encodes all whitespace, but quopri.encodestring() lets most whitespace through verbatim by default. In this case I think it would be reasonable to change back to encodestring() if we say that quotetabs=True is passed in. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:38 | admin | set | github: 60677 |
| 2019-02-24 22:39:40 | BreamoreBoy | set | nosy:
- BreamoreBoy |
| 2015-09-12 12:16:04 | martin.panter | set | messages: + msg250520 |
| 2015-09-12 08:13:33 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka, ncoghlan messages: + msg250514 |
| 2015-09-12 02:56:31 | python-dev | set | messages: + msg250509 |
| 2015-09-12 01:44:43 | python-dev | set | nosy:
+ python-dev messages: + msg250508 |
| 2015-09-12 00:50:28 | martin.panter | set | versions:
+ Python 2.7, Python 3.4, Python 3.5, Python 3.6, - Python 3.3 nosy: + berker.peksag messages: + msg250506 type: behavior |
| 2015-07-23 01:54:38 | martin.panter | link | issue20132 dependencies |
| 2015-01-19 06:26:53 | martin.panter | set | messages: + msg234304 |
| 2015-01-19 05:46:36 | martin.panter | set | files:
+ codec-impl.patch assignee: docs@python messages: + msg234300 |
| 2014-07-02 20:26:57 | r.david.murray | set | messages: + msg222122 |
| 2014-07-02 20:11:58 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg222121 |
| 2013-01-11 23:14:44 | jcea | set | messages: + msg179744 |
| 2012-11-14 22:00:37 | jcea | set | nosy:
+ jcea |
| 2012-11-14 21:35:10 | aleperalta | set | messages: + msg175595 |
| 2012-11-14 21:32:36 | r.david.murray | set | nosy:
+ barry, r.david.murray messages: + msg175594 components: + email |
| 2012-11-14 21:22:57 | aleperalta | set | nosy:
+ brett.cannon |
| 2012-11-14 21:22:06 | aleperalta | create | |

