Issue 16473: quopri module differences in quoted-printable text with whitespace

Issue16473

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	aleperalta, barry, berker.peksag, brett.cannon, docs@python, jcea, martin.panter, ncoghlan, python-dev, r.david.murray, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2012-11-14 21:22 by aleperalta, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
test_quopri.diff	aleperalta, 2012-11-14 21:22		review
codec-impl.patch	martin.panter, 2015-01-19 05:46	Document and test quotetabs=True for quopri-codec	review

Messages (13)
msg175593 - (view)	Author: Alejandro Javier Peralta Frías (aleperalta)	Date: 2012-11-14 21:22
New to python-dev; I grab a beginner tasks "increase test coverage" and I decided to add coverage to this bit of code in the quopri module: # quopri.py L138 while n > 0 and line[n-1:n] in b" \t\r": L139 n = n-1 As far as I understand to get into that while-loop the line to decode should end in " \t\r\n". So the I added the following test: def test_decodestring_badly_enconded(self): e = b"hello \t\r\n" p = b"hello\n" s = self.module.decodestring(e) self.assertEqual(s, p) but that only passes when the module doesn't use binascii. In fact I change test_quopri to use support.import_fresh_module to disable binascii and removed a decorator that was used. The decode text when binascci is used is: >>> quopri.decodestring("hello \t\r\n") 'hello \t\r\n' which differs from >>> quopri.a2b_qp = None >>> quopri.b2a_qp = None >>> quopri.decodestring("hello \t\r\n") 'hello\n And what's the deal with: >>> import quopri >>> quopri.encodestring("hello \t\r") 'hello \t\r' >>> "hello \t\r".encode("quopri") 'hello=20=09\r'
msg175594 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-11-14 21:32
I think I can answer your last question. There are two quopri algorithms, one where spaces are allowed (message body) and one where they aren't (email headers). For the rest, I'd have to take a closer look than I have time for right now.
msg175595 - (view)	Author: Alejandro Javier Peralta Frías (aleperalta)	Date: 2012-11-14 21:35
I think I can answer your last question. There are two quopri algorithms, > one where spaces are allowed (message body) and one where they aren't > (email headers). > > OK, thank you. Good to know.
msg179744 - (view)	Author: Jesús Cea Avión (jcea) *	Date: 2013-01-11 23:14
Ping.
msg222121 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014-07-02 20:11
I'll take this on if I can. Is binascii available on all platforms, as if it is the quopri code could be simplified slightly along with the test code?
msg222122 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-02 20:26
The first problem is determining the "best" error recovery algorithms by reading through the RFCs and considering use cases.
msg234300 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-19 05:46
Three slightly different points here: 1. Decoding trailing whitespace: My understanding is quoted-printable encoding aims to be tolerant of whitespace being added to and removed from the end of encoded lines. So I assume the “binascii” module is wrong to leave trailing whitespace in the decoded output, and the native “quopri” implementation is correct to ignore it. 2. CRLF handling: See Issue 20121. It seems CRLF newlines should be valid, and I have added a patch to that issue to make the native Python implementation handle CRLF newlines. 3. Whitespace encoding: The quopri-codec actually sets quotetabs=True. Here is a patch to document and test that, as well as correct the functions used by other codecs.
msg234304 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-19 06:26
Regarding decoding trailing whitespace, <https://tools.ietf.org/html/rfc1521.html#section-5.1> rule #3 says: “When decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents.”
msg250506 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-12 00:50
Will commit a slightly modified version of my doc patch to 3.4+, since mentioning the wrong functions is confusing. But I think we still need to fix the “binascii” decoding, and have a look at Alejandro’s test suite patch.
msg250508 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-09-12 01:44
New changeset de82f41d6669 by Martin Panter <vadmium> in branch '3.4': Issue #16473: Fix byte transform codec documentation; test quotetabs=True https://hg.python.org/cpython/rev/de82f41d6669 New changeset 28cd11dc2915 by Martin Panter <vadmium> in branch '3.5': Issue #16473: Merge codecs doc and test from 3.4 into 3.5 https://hg.python.org/cpython/rev/28cd11dc2915 New changeset 3ecb5766ba15 by Martin Panter <vadmium> in branch 'default': Issue #16473: Merge codecs doc and test from 3.5 https://hg.python.org/cpython/rev/3ecb5766ba15
msg250509 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-09-12 02:56
New changeset cfb0481c89d7 by Martin Panter <vadmium> in branch '2.7': Issue #16473: Fix byte transform codec documentation; test quotetabs=True https://hg.python.org/cpython/rev/cfb0481c89d7
msg250514 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-09-12 08:13
Mentioned functions are not exact equivalents of codecs. They are preferable way to to obtain the similar (apart from minor details) output.
msg250520 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-12 12:16
The list of functions were added in Issue 17844. I made the change today because I forgot that the listed functions weren’t exactly equivalent when investigating Issue 25075. Base64-codec encodes to multiple lines, but b64encode() returns the raw encoding without line breaks. I see that base64.encodebytes() is listed as a “legacy interface”, but as far as I can tell nothing outside the legacy interface does any line splitting. Hex-codec encodes to lowercase, but b16encode() returns uppercase, following RFC 4648. Quopri-codec encodes all whitespace, but quopri.encodestring() lets most whitespace through verbatim by default. In this case I think it would be reasonable to change back to encodestring() if we say that quotetabs=True is passed in.

History
Date	User	Action	Args
2022-04-11 14:57:38	admin	set	github: 60677
2019-02-24 22:39:40	BreamoreBoy	set	nosy: - BreamoreBoy
2015-09-12 12:16:04	martin.panter	set	messages: + msg250520
2015-09-12 08:13:33	serhiy.storchaka	set	nosy: + serhiy.storchaka, ncoghlan messages: + msg250514
2015-09-12 02:56:31	python-dev	set	messages: + msg250509
2015-09-12 01:44:43	python-dev	set	nosy: + python-dev messages: + msg250508
2015-09-12 00:50:28	martin.panter	set	versions: + Python 2.7, Python 3.4, Python 3.5, Python 3.6, - Python 3.3 nosy: + berker.peksag messages: + msg250506 type: behavior stage: needs patch
2015-07-23 01:54:38	martin.panter	link	issue20132 dependencies
2015-01-19 06:26:53	martin.panter	set	messages: + msg234304
2015-01-19 05:46:36	martin.panter	set	files: + codec-impl.patch assignee: docs@python components: + Documentation title: quopri module minor difference in decoding quoted-printable text -> quopri module differences in quoted-printable text with whitespace nosy: + docs@python, martin.panter messages: + msg234300
2014-07-02 20:26:57	r.david.murray	set	messages: + msg222122
2014-07-02 20:11:58	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg222121 title: Minor difference in decoding quoted-printable text -> quopri module minor difference in decoding quoted-printable text
2013-01-11 23:14:44	jcea	set	messages: + msg179744
2012-11-14 22:00:37	jcea	set	nosy: + jcea
2012-11-14 21:35:10	aleperalta	set	messages: + msg175595
2012-11-14 21:32:36	r.david.murray	set	nosy: + barry, r.david.murray messages: + msg175594 components: + email
2012-11-14 21:22:57	aleperalta	set	nosy: + brett.cannon
2012-11-14 21:22:06	aleperalta	create