I'm changing the name to better describe the problem, and suggest a better solution.
The urlparse.urlsplit and .urlunsplit functions currently don't validate the scheme argument, if given. According to the RFC:
Scheme names consist of a sequence of characters. The lower case
letters "a"--"z", digits, and the characters plus ("+"), period
("."), and hyphen ("-") are allowed. For resiliency, programs
interpreting URLs should treat upper case letters as equivalent to
lower case in scheme names (e.g., allow "HTTP" as well as "http").
https://www.ietf.org/rfc/rfc1738.txt
If the scheme is specified, I suggest it should be normalised to lowercase and validated, something like this:
# untested
if scheme:
# scheme_chars already defined in module
badchars = set(scheme) - set(scheme_chars)
if badchars:
raise ValueError('"%c" is invalid in URL schemes' % badchars.pop())
scheme = scheme.lower()
This will help avoid errors such as passing 'http://' as the scheme. |