Issue29651
Created on 2017-02-25 19:45 by vfaronov, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1128 | merged | python-dev, 2017-04-14 05:38 | |
| PR 1596 | merged | orsenthil, 2017-05-16 05:01 | |
| PR 1597 | merged | orsenthil, 2017-05-16 05:09 | |
| Messages (6) | |||
|---|---|---|---|
| msg288577 - (view) | Author: Vasiliy Faronov (vfaronov) | Date: 2017-02-25 19:45 | |
There is a problem with the standard library's urlsplit and urlparse functions, in Python 2.7 (module urlparse) and 3.2+ (module urllib.parse).
The documentation for these functions [1] does not explain how they behave when given an invalid URL.
One could try invoking them manually and conclude that they tolerate anything thrown at them:
>>> urlparse('http:////::\\\\!!::!!++///')
ParseResult(scheme='http', netloc='', path='//::\\\\!!::!!++///',
params='', query='', fragment='')
>>> urlparse(os.urandom(32).decode('latin-1'))
ParseResult(scheme='', netloc='', path='\x7f¼â1gdä»6\x82', params='',
query='', fragment='\n\xadJ\x18+fli\x9cÛ\x9ak*ÄÅ\x02³F\x85Ç\x18')
Without studying the source code, it is impossible to know that there is a very narrow class of inputs on which they raise ValueError [2]:
>>> urlparse('http://[')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/urllib/parse.py", line 295, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
File "/usr/lib/python3.5/urllib/parse.py", line 345, in urlsplit
raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
This could be viewed as a documentation issue. But it could also be viewed as an implementation issue. Instead of raising ValueError on those square brackets, urlsplit could simply consider them *invalid* parts of an RFC 3986 reg-name, and lump them into netloc, as it already does with other *invalid* characters:
>>> urlparse('http://\0\0æí\n/')
ParseResult(scheme='http', netloc='\x00\x00æí\n', path='/', params='',
query='', fragment='')
Note that the raising behavior was introduced in Python 2.7/3.2.
See also issue 8721 [3].
[1] https://docs.python.org/3/library/urllib.parse.html
[2] https://github.com/python/cpython/blob/e32ec93/Lib/urllib/parse.py#L406-L408
[3] http://bugs.python.org/issue8721
|
|||
| msg288959 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2017-03-04 05:15 | |
A note in the docs would be useful. This API is far too well established to make any behavioral changes at this point. |
|||
| msg291640 - (view) | Author: Howie Benefiel (Howie Benefiel) * | Date: 2017-04-14 05:07 | |
I'm going to make a note in the documentation. I should have a PR for it in about 1 day. |
|||
| msg293748 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2017-05-16 04:48 | |
New changeset f6e863d868a621594df2a8abe072b5d4766e7137 by Senthil Kumaran (Howie Benefiel) in branch 'master': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) https://github.com/python/cpython/commit/f6e863d868a621594df2a8abe072b5d4766e7137 |
|||
| msg293750 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2017-05-16 05:41 | |
New changeset 72e5aa1ef812358b3b113e784e7365fec13dfd69 by Senthil Kumaran in branch '3.5': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) (#1597) https://github.com/python/cpython/commit/72e5aa1ef812358b3b113e784e7365fec13dfd69 |
|||
| msg293751 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2017-05-16 05:41 | |
New changeset 75b8a54bcad70806d9dcbbe20786f4d9092ab39c by Senthil Kumaran in branch '3.6': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) (#1596) https://github.com/python/cpython/commit/75b8a54bcad70806d9dcbbe20786f4d9092ab39c |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:43 | admin | set | github: 73837 |
| 2017-05-16 05:50:14 | orsenthil | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017-05-16 05:41:10 | orsenthil | set | messages: + msg293751 |
| 2017-05-16 05:41:05 | orsenthil | set | messages: + msg293750 |
| 2017-05-16 05:09:39 | orsenthil | set | pull_requests: + pull_request1691 |
| 2017-05-16 05:01:30 | orsenthil | set | pull_requests: + pull_request1690 |
| 2017-05-16 04:48:18 | orsenthil | set | messages: + msg293748 |
| 2017-04-15 00:23:13 | berker.peksag | set | stage: needs patch -> patch review versions: + Python 3.5 |
| 2017-04-14 05:38:02 | python-dev | set | pull_requests: + pull_request1263 |
| 2017-04-14 05:07:23 | Howie Benefiel | set | nosy:
+ Howie Benefiel messages: + msg291640 |
| 2017-03-04 05:15:17 | rhettinger | set | nosy:
+ rhettinger messages: + msg288959 |
| 2017-03-03 21:21:09 | terry.reedy | set | nosy:
+ orsenthil stage: needs patch versions: - Python 3.3, Python 3.4, Python 3.5 |
| 2017-02-25 19:45:27 | vfaronov | create | |
