Issue19094
Created on 2013-09-25 20:55 by jaraco, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urljoin_throws_type_error.patch | vajrasky, 2013-09-26 10:12 | review | ||
| urljoin_throws_type_error_v2.patch | vajrasky, 2013-09-28 09:02 | review | ||
| urljoin_throws_type_error_v3.patch | vajrasky, 2013-10-01 14:11 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 26687 | open | jacobtylerwalls, 2021-06-12 01:47 | |
| Messages (12) | |||
|---|---|---|---|
| msg198418 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2013-09-25 20:55 | |
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'
That should raise a TypeError, not succeed quietly as if an empty string were passed.
|
|||
| msg198435 - (view) | Author: Vajrasky Kok (vajrasky) * | Date: 2013-09-26 10:12 | |
This is the preliminary patch to raise TypeError in that situation. |
|||
| msg198455 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2013-09-26 22:59 | |
Thanks Vajraski for the patch (especially the tests). A colleague reminded me of an aphorism by Raymond Hettinger from the recent PyCon paraphrased: duck typing is superior to isinstance. Maybe instead consider something like this: diff -r 7f13d5ecf71f Lib/urllib/parse.py --- a/Lib/urllib/parse.py Sun Sep 22 09:33:45 2013 -0400 +++ b/Lib/urllib/parse.py Thu Sep 26 18:52:18 2013 -0400 @@ -405,10 +405,9 @@ def urljoin(base, url, allow_fragments=True): """Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.""" - if not base: - return url - if not url: - return base + if not base or not url: + # one of the inputs is empty, so simply concatenate + return base + url base, url, _coerce_result = _coerce_args(base, url) bscheme, bnetloc, bpath, bparams, bquery, bfragment = \ urlparse(base, '', allow_fragments) This implementation is not only shorter and raises TypeErrors if the types aren't compatible, but those errors are triggered by the underlying implementation. Furthermore, if either the url or base were to be a duck-type that met the necessary interface, it need not be a string subclass. I don't feel strongly either way, but I'm slightly more inclined to accept the simpler implementation. I'm interested in the everyone's thoughts on either approach. |
|||
| msg198462 - (view) | Author: Vajrasky Kok (vajrasky) * | Date: 2013-09-27 05:21 | |
What about this one? >>> urljoin(['a'], []) ['a'] >>> urljoin(['a'], ['b']) ..... omitted ...... AttributeError: 'list' object has no attribute 'decode' Is this desirable? Both patches missed this case. Should we only accept str and bytes? |
|||
| msg198493 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2013-09-27 17:40 | |
I don't mind the AttributeError - that's a reasonable exception when passing invalid types in, and that's in fact the current behavior. The example of (['a'], []) does bother me though. Those inputs are also seemingly invalid, though somewhat more compatible from a 'urljoin' perspective than two different types. You've given me reason to re-consider your patch. |
|||
| msg198510 - (view) | Author: Vajrasky Kok (vajrasky) * | Date: 2013-09-28 09:02 | |
I modified the patch to handle the last case using your way as well.
Anyway, I found out that urlsplit and urlparse got the same issue as well.
>>> urlparse('python.org', b'http://')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 292, in urlparse
url, scheme, _coerce_result = _coerce_args(url, scheme)
File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 109, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
>>> urlparse('python.org', b'')
ParseResult(scheme=b'', netloc='', path='python.org', params='', query='', fragment='')
>>> urlparse('python.org', 0)
ParseResult(scheme=0, netloc='', path='python.org', params='', query='', fragment='')
Same thing happens in urlsplit. Fortunately, urlunsplit and urlunparse don't have this issue.
|
|||
| msg198750 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2013-10-01 05:07 | |
I have provided my comments in the review tool. Please check them out. |
|||
| msg198784 - (view) | Author: Vajrasky Kok (vajrasky) * | Date: 2013-10-01 14:11 | |
Okay, attached the patch based on your comment, Senthil Kumaran. Thanks.
The reason I use isinstance to check the type is because I want to support the inheritance as much as possible.
>>> class new_str(str):
... pass
>>> urljoin(new_str('http://python.org') + new_str('hehe'))
will not work with the newest patch.
Also, in Lib/urllib/parse.py, most of the time, we use isinstance(x, str) not type(x) == str.
But I don't know. Maybe it does not matter.
|
|||
| msg198808 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2013-10-01 21:14 | |
Vajrasky, you're right. Comparing against type(obj) is an anti-pattern. isinstance is better. Duck typing is even better (in many cases). |
|||
| msg230742 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2014-11-06 13:50 | |
About acceptable behavior with wrong arguments types see discussions in issue22766 and http://comments.gmane.org/gmane.comp.python.devel/150073 . |
|||
| msg386875 - (view) | Author: Irit Katriel (iritkatriel) * ![]() |
Date: 2021-02-12 20:01 | |
Still broken in 3.10:
Running Release|x64 interpreter...
Python 3.10.0a5+ (heads/master:bf2e7e55d7, Feb 11 2021, 23:09:25) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'
>>>
|
|||
| msg395598 - (view) | Author: Jacob Walls (jacobtylerwalls) * | Date: 2021-06-11 03:57 | |
Hi vajrasky, do you have any interest in converting your patch to a GitHub PR? If not I can see about doing so myself. Cheers. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:51 | admin | set | github: 63293 |
| 2021-06-12 01:47:56 | jacobtylerwalls | set | pull_requests: + pull_request25276 |
| 2021-06-11 10:49:11 | iritkatriel | set | versions: + Python 3.11, - Python 3.8 |
| 2021-06-11 03:57:18 | jacobtylerwalls | set | nosy:
+ jacobtylerwalls messages: + msg395598 |
| 2021-02-12 20:01:14 | iritkatriel | set | nosy:
+ iritkatriel messages:
+ msg386875 |
| 2014-11-06 13:50:51 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg230742 |
| 2013-10-01 21:14:06 | jaraco | set | messages: + msg198808 |
| 2013-10-01 14:11:39 | vajrasky | set | files:
+ urljoin_throws_type_error_v3.patch messages: + msg198784 |
| 2013-10-01 05:07:45 | orsenthil | set | messages: + msg198750 |
| 2013-09-28 21:28:10 | terry.reedy | set | nosy:
+ orsenthil type: behavior |
| 2013-09-28 09:02:46 | vajrasky | set | files:
+ urljoin_throws_type_error_v2.patch messages: + msg198510 |
| 2013-09-27 17:40:02 | jaraco | set | messages: + msg198493 |
| 2013-09-27 05:21:49 | vajrasky | set | messages: + msg198462 |
| 2013-09-26 22:59:51 | jaraco | set | messages: + msg198455 |
| 2013-09-26 10:12:03 | vajrasky | set | files:
+ urljoin_throws_type_error.patch nosy:
+ vajrasky keywords: + patch |
| 2013-09-26 08:36:24 | berker.peksag | set | nosy:
+ berker.peksag |
| 2013-09-25 20:55:41 | jaraco | create | |
