Issue 34360: urllib.parse doesn't fully comply to RFC 3986

Issue34360

Created on 2018-08-08 16:02 by The Compiler, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg323292 - (view) Author: Florian Bruhin (The Compiler) * Date: 2018-08-08 16:02
Since bpo-29651, the urllib.parse docs say:

> Unmatched square brackets in the netloc attribute will raise a ValueError.

However, when there are at least one [ and ], but they don't match, there's somewhat inconsistent behavior:

>>> urllib.parse.urlparse('http://[::1]]').hostname
'::1'
>>> urllib.parse.urlparse('http://[[::1]').hostname
'[::1'
msg323362 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-08-10 09:28
I confirm violation of https://tools.ietf.org/html/rfc3986#section-3.2.2 . 

URLs are now covered by RFC 3986 which obsoletes RFC 1808 that `urllib's documentation refers to.

This new URL RFC adds [] to 'reserved' characters, so them being present unquoted anywhere where reserved characters are not allowed shall be a parsing error.
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78541
2020-08-07 06:49:54Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2020-06-29 17:13:36RJ722setnosy: + RJ722
2018-08-10 09:28:00Ivan.Pozdeevsetversions: + Python 3.6, Python 3.7, Python 3.8
nosy: + Ivan.Pozdeev
title: urllib.parse doesn't fail with multiple unmatching square brackets -> urllib.parse doesn't fully comply to RFC 3986
messages: + msg323362
2018-08-09 07:45:55xtreaksetnosy: + xtreak
2018-08-08 16:02:35The Compilercreate