Message 303612 - Python tracker

Message303612

Author	methane
Recipients	ezio.melotti, methane, mrabarnett
Date	2017-10-03.12:58:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1507035482.04.0.213398074469.issue31677@psf.upfronthosting.co.za>
In-reply-to

Content
email.header has this pattern: https://github.com/python/cpython/blob/85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3/Lib/email/header.py#L34-L43 # Match encoded-word strings in the form =?charset?q?Hello_World?= ecre = re.compile(r''' =\? # literal =? (?P<charset>[^?]?) # non-greedy up to the next ? is the charset \? # literal ? (?P<encoding>[qb]) # either a "q" or a "b", case insensitive \? # literal ? (?P<encoded>.?) # non-greedy up to the next ?= is the encoded string \?= # literal ?= ''', re.VERBOSE \| re.IGNORECASE \| re.MULTILINE) Since only 's' and 'i' has other lower case character, this is not a real bug. But using re.ASCII is more safe. Additionally, email.util has same pattern from 10 years ago, and it is not used by anywhere. It should be removed.

Content

email.header has this pattern:

https://github.com/python/cpython/blob/85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3/Lib/email/header.py#L34-L43

# Match encoded-word strings in the form =?charset?q?Hello_World?=                       
ecre = re.compile(r'''                                                                   
  =\?                   # literal =?                                                     
  (?P<charset>[^?]*?)   # non-greedy up to the next ? is the charset                     
  \?                    # literal ?                                                      
  (?P<encoding>[qb])    # either a "q" or a "b", case insensitive                        
  \?                    # literal ?                                                      
  (?P<encoded>.*?)      # non-greedy up to the next ?= is the encoded string             
  \?=                   # literal ?=                                                     
  ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)


Since only 's' and 'i' has other lower case character, this is not a real bug.
But using re.ASCII is more safe.

Additionally, email.util has same pattern from 10 years ago, and it is not used by anywhere.
It should be removed.

History
Date	User	Action	Args
2017-10-03 12:58:02	methane	set	recipients: + methane, ezio.melotti, mrabarnett
2017-10-03 12:58:02	methane	set	messageid: <1507035482.04.0.213398074469.issue31677@psf.upfronthosting.co.za>
2017-10-03 12:58:02	methane	link	issue31677 messages
2017-10-03 12:58:01	methane	create