Issue 4718: wsgiref package totally broken
Created on 2008-12-22 11:30 by hdima, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (45)
msg78171 - (view)
Author: Dmitry Vasiliev (hdima)
Date: 2008-12-22 11:30
Date: 2008-12-22 11:40
Date: 2008-12-22 15:01
Date: 2008-12-22 15:23
Date: 2008-12-22 15:37
Date: 2008-12-22 22:32
Date: 2008-12-22 22:39
Date: 2008-12-23 12:57
Date: 2008-12-23 13:43
Date: 2008-12-23 14:26
Date: 2008-12-23 16:08
Date: 2008-12-24 20:40
Date: 2008-12-26 17:47
Date: 2008-12-27 10:33
Date: 2008-12-27 11:14
Date: 2008-12-27 11:42
Date: 2008-12-31 15:15
Date: 2009-01-03 19:03
Date: 2009-01-03 19:56
Date: 2009-01-03 20:08
Date: 2009-01-03 20:26
Date: 2009-01-03 21:07
Date: 2009-01-03 23:01
Date: 2009-01-31 23:30
It seems the wsgiref package was copied from Python 2.* without any modifications. There are already 3 issues about that but they only describe a part of the problem so I decided to start a new one. The issues was: http://bugs.python.org/issue3348 http://bugs.python.org/issue3795 http://bugs.python.org/issue4522 The attached patch fix the problem with the following changes: - Fixed headers handling in wsgiref/simple_server.py; - Fixed encoding problems. Now WSGI applications must return iterable with bytes but start_response() allow status and headers as bytes and as strings. "wsgi.input" file-like now use BytesIO instead of StringIO; - Fixed tests; - Updated documentation examples;msg78172 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-22 11:40
Phillip, do you have time to take a look at it? We really *must* fix wsgiref in py3k...msg78188 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-22 15:01
FYI, instead of trying to do exhaustive type checking in _check_type(), you can just rely on duck typing and catch the TypeError: >>> str(b"a", "utf-8") 'a' >>> str(bytearray(b"a"), "utf-8") 'a' >>> str(memoryview(b"a"), "utf-8") 'a' >>> str(1, "utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: coercing to str: need string or buffer, int foundmsg78192 - (view) Author: PJ Eby (pje) *
Date: 2008-12-22 15:23
If you want to change to using bytes, you're going to have to take it to the Web-SIG and hash out a revision to PEP 333, which at the moment requires the use of strings, period. This has nothing to do with the desirability of bytes vs. strings; I am sure that if Python had bytes from day 1, bytes would've been the way to go with it. But simply changing the reference library is not the way to change the spec. In the meantime, as far as I'm aware, there are two other patches pending to address these issues, but I'm not in a position to assess their readiness/correctness since I have yet to even download Py3K. In principle, I approve their approaches, so if someone else can handle the code review, those fixes could in principle be put in without changing the PEP. But to put *this* patch in, the PEP would have to be changed.msg78193 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-22 15:37
> If you want to change to using bytes, you're going to have to take it > to the Web-SIG and hash out a revision to PEP 333, which at the moment > requires the use of strings, period. What was called str in 2.x has become the bytes object in py3k. What was called unicode in 2.x has become str in py3k. (roughly) Given the meaning of the term "string" and its possible acceptions have dramatically changed between 2.x and py3k, how does this patch violate the PEP more than any other? Actually, the PEP says: HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings passed to or from the server must be standard Python *byte strings*, not Unicode objects. [emphasis mine] So, not accepting bytes in py3k is clearly a violation of the PEP!msg78194 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-22 16:03
Antoine Pitrou wrote: > FYI, instead of trying to do exhaustive type checking in _check_type(), > you can just rely on duck typing and catch the TypeError: Good point! I'll update the patch soon.msg78196 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-22 16:36
Antoine Pitrou wrote: >> If you want to change to using bytes, you're going to have to take it >> to the Web-SIG and hash out a revision to PEP 333, which at the moment >> requires the use of strings, period. > > What was called str in 2.x has become the bytes object in py3k. > What was called unicode in 2.x has become str in py3k. > (roughly) Agreed, moreover it's time for Python 3.0.1 and we need to decide - apply a patch or just remove wsgiref completely for now. Keep wsgiref just as nonworking piece of code is the worse solution which can made questionable all WSGI effort. Given that old str has been replaced by bytes in Python 3 I think the patch is a correct implementation of the PEP from the Python 3's point of view. To avoid confusion note about the meaning of the term *string* can be added to the PEP later.msg78207 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-22 18:36
New version of the patch: - Now only Unicode strings are allowed as status and headers because allowing bytes leads to big changes in wsgiref.validate and wsgiref.handlers;msg78214 - (view) Author: PJ Eby (pje) *
Date: 2008-12-22 22:32
At 03:37 PM 12/22/2008 +0000, Antoine Pitrou wrote: >So, not accepting bytes in py3k is clearly a violation of the PEP! On the contrary. Please read the two paragraphs *after* the one you quoted.msg78215 - (view) Author: PJ Eby (pje) *
Date: 2008-12-22 22:39
To be quite clear: this change requires discussion on the Web-SIG and an appropriate revision of the PEP. Ideally, the patch should include the necessary PEP revision. The Web-SIG discussion regarding a switch to bytes should also take into consideration the effects of running 2to3 on existing WSGI applications and/or servers. Will their code be converted to use bytes, or Unicode? The previous choice to use Unicode was based on source compatibility across Python implementations, so this shouldn't be thrown out on the basis of simple expediency.msg78229 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-23 11:14
OK, I've attached PEP-333 compatible fixes for wsgiref. I think there is only one problem remains: - wsgiref expects io.BytesIO as input and output streams because of http.server module. I didn't find any restrictions on data returned by read() method of the "wsgi.input" stream in the PEP. Maybe I've missed some details?msg78232 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-23 12:57
> Please read the two paragraphs *after* the one you quoted. I don't see anything forbidding bytes objects in those two paragraphs.msg78235 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-23 13:43
Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit : > Dmitry Vasiliev <dima@hlabs.spb.ru> added the comment: > > OK, I've attached PEP-333 compatible fixes for wsgiref. I may be mistaken, but it seems that your patch forces iso-8859-1 encoding of http bodies.msg78236 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-23 14:07
Antoine Pitrou wrote:
> Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit :
>> OK, I've attached PEP-333 compatible fixes for wsgiref.
>
> I may be mistaken, but it seems that your patch forces iso-8859-1
> encoding of http bodies.
No, just as PEP said str used as a container for binary data. For
example to return UTF-8 encoded data you can use the following code:
def app(environ, start_response):
...
return [data.encode("utf-8").decode("iso-8859-1")]
I don't like it but I guess it's strictly follow the PEP (actually I
didn't notice this sections before):
"""
On Python platforms where the str or StringType type is in fact
Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
"strings" referred to in this specification must contain only code
points representable in ISO-8859-1 encoding (\u0000 through \u00FF,
inclusive). It is a fatal error for an application to supply strings
containing any other Unicode character or code point. Similarly, servers
and gateways must not supply strings to an application containing any
other Unicode characters.
Again, all strings referred to in this specification must be of type str
or StringType, and must not be of type unicode or UnicodeType. And, even
if a given platform allows for more than 8 bits per character in
str/StringType objects, only the lower 8 bits may be used, for any value
referred to in this specification as a "string".
"""
We definitely need to use bytes in the future but it requires PEP update
and some migration guide.
msg78237 - (view)
Author: Antoine Pitrou (pitrou) *
Date: 2008-12-23 14:26
> No, just as PEP said str used as a container for binary data.
This is clearly the wrong thing to do. The only (immutable) string-like
object adequate for binary data in py3k is bytes, not str.
I understand the desire to stick to the PEP, but the PEP was devised
even before the first py3k alphas, and it clearly wasn't written with
py3k in mind. For example the following sentence becomes nonsensical:
Again, all strings referred to in this specification must be of
type str or StringType, and must not be of type unicode or
UnicodeType.
since "str" objects *are* of type UnicodeType in py3k (and the C API is
still named PyUnicode_*)...
When a legal text becomes nonsensical wrt. reality, one has to adapt his
interpretation of the text to reality, not adapt reality to match the
nonsense.
In other words, wsgiref should accept/expose HTTP bodies as bytes, not
str. Confusing binary data with iso-8859-1 text is the kind of mess py3k
was designed to avoid.
msg78241 - (view)
Author: PJ Eby (pje) *
Date: 2008-12-23 16:08
Antoine, you have three choices here: 1. Follow the PEP, 2. Take it to the Web-SIG and get the appropriate discussion, consensus, and PEP revision, or 3. Drop wsgiref from the current release of Py3K until #2 can be done. Which would you prefer? Please note that your arguments regarding what revision should take place are irrelevant here; the correct forum for that discussion is the Web-SIG. Personally, I think they are valid arguments; WSGI simply did not have the benefit of having a sane (and standard!) "bytes" type available, and were we writing the spec today, I would absolutely have specified it in a bytes-oriented way, and treated older Pythons as the special case. However, we have to take into consideration how applications will be *migrated* to Py3K. I am not an expert in this, nor do I personally have huge volumes of code that will be migrated. Therefore, the correct forum for discussing migration impact and how best to write the spec is the Web-SIG. Making the change to bytes is not something to be undertaken on a whim: the spec must include how to handle inadvertent mixing of bytes and unicode, in order to allow unambiguous error handling and migration support. It will not be solved by the fiat of one individual: certainly not by you or I. And it has absolutely nothing to do with what is "right" in the technical sense, because it is not a technical problem. A specification is a social construct, not a technical one, so changing wsgiref by itself solves nothing. And that's why those three choices are the only available options, so far as I am aware.msg78251 - (view) Author: Graham Dumpleton (grahamd) Date: 2008-12-23 22:47
Note that the page: http://www.wsgi.org/wsgi/Amendments_1.0 contains clarifications for WSGI PEP in respect of Python 3.0. This list was previously come up with on WEB-SIG list. As another reference implementation for Python 3.0, you might look at mod_wsgi (source code from subversion trunk), as that has been updated to support Python 3.0 in line with those list of proposed clarifications for WSGI PEP.msg78261 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-24 14:36
Attached new WSGI 1.0+ version of the patch.msg78269 - (view) Author: PJ Eby (pje) *
Date: 2008-12-24 20:40
Graham: thanks for pointing that out; I completely forgot we already *had* the migration discussion on the Web-SIG! It just slipped my mind because I didn't have any 3.0 work on the horizon. Dmitry: A question about the new patch. Are bytearray and memoryview objects in 3.0 really the same as bytestrings? It seems to me that allowing mutable bytes objects is a mistake from a bug-finding standpoint, even if it could be a win from a performance standpoint. I think it might be better to be more restrictive to start out, and then let people lobby for supporting other types, rather than the other way around, where we'll never get to narrow the list. Apart from that, the patch looks pretty good. Thank you!msg78279 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-25 09:35
Phillip J. Eby wrote: > Graham: thanks for pointing that out; I completely forgot we already > *had* the migration discussion on the Web-SIG! It just slipped my > mind because I didn't have any 3.0 work on the horizon. Good to see we came to conclusion. Actually my first patch went in the right direction. :-) > Dmitry: A question about the new patch. Are bytearray and memoryview > objects in 3.0 really the same as bytestrings? It seems to me that > allowing mutable bytes objects is a mistake from a bug-finding > standpoint, even if it could be a win from a performance > standpoint. I think it might be better to be more restrictive to > start out, and then let people lobby for supporting other types, > rather than the other way around, where we'll never get to narrow the > list. Apart from that, the patch looks pretty good. Thank you! Actually I thought about functionality, not performance but I think you're right and mutable bytes objects also can open doors for unexpected side effects. I'll update the patch today.msg78280 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-25 10:41
Attached updated version of the patch.msg78292 - (view) Author: Graham Dumpleton (grahamd) Date: 2008-12-26 10:54
If making changes in wsgireg.validate, may be worthwhile also fixing up one area where it isn't strictly correct according to WSGI PEP. As per discussion: http://groups.google.com/group/python-web-sig/browse_frm/thread/b14b862ec4c620c0 the check for number of arguments supplied to wsgi.input.read() is wrong as it allows for an optional argument, when argument is supposed to mandatory.msg78294 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-26 12:47
Graham Dumpleton wrote: > the check for number of arguments supplied to wsgi.input.read() is wrong as it allows for an optional argument, > when argument is supposed to mandatory. I think it's a good idea. I'll update the patch soon.msg78298 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-26 13:42
Added check for wsgi.input.read() argument.msg78304 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-26 17:47
Hi, The patch looks ok to me, although the tests against mutable byte-like types are probably useless.msg78348 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-27 10:30
Antoine Pitrou wrote: > The patch looks ok to me, although the tests against mutable byte-like > types are probably useless. Hmm, it's strange because such tests was removed two versions ago (per discussion with Phillip). But at the time they really was needed.msg78349 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-27 10:33
> Hmm, it's strange because such tests was removed two versions ago (per > discussion with Phillip). But at the time they really was needed. Not a big deal anyway, let's keep them and we'll see.msg78351 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-27 11:12
Antoine Pitrou wrote: >> Hmm, it's strange because such tests was removed two versions ago (per >> discussion with Phillip). But at the time they really was needed. > > Not a big deal anyway, let's keep them and we'll see. I'm afraid I've lost your point here. Are you proposing to add back tests for mutable bytes-like objects?msg78352 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-27 11:14
Why do you say they were removed? I see code like "assert isinstance(value, bytes)".msg78353 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-12-27 11:34
Antoine Pitrou wrote: > Why do you say they were removed? I see code like "assert > isinstance(value, bytes)". Support and tests for mutable "bytearray" and "memoryview" was removed. All subclasses of "bytes" must be immutable so isinstance() should be OK here.msg78354 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-27 11:42
Ok, sorry for the misunderstanding.msg78612 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2008-12-31 15:15
Philip, Graham, do you have any objections to the current patch? Otherwise I think I'm gonna commit it soon.msg78738 - (view) Author: Graham Dumpleton (grahamd) Date: 2009-01-01 23:20
One interesting thing of note that has occurred to me looking at the patch
is that although with Python <3.0 you technically could return a str as
iterable from application, ie., because iteration over str returns str for
each character, the same doesn't really apply to bytes in Python 3.0. This
is because iterating over bytes yields an int fdor each item.
Thus have odd situation where with Python 3.0, one could technically return
str as iterable, with rule that would apply would be that each str returned
would then be converted to bytes by way of latin-1 conversion, but for
bytes returned as iterable, should fail.
Not sure how this plays out in wsgiref server yet as haven't looked.
Anyway, make the validator code:
@@ -426,6 +436,6 @@
# Technically a string is legal, which is why it's a really bad
# idea, because it may cause the response to be returned
# character-by-character
- assert_(not isinstance(iterator, str),
+ assert_(not isinstance(iterator, (str, bytes)),
"You should not return a string as your application iterator, "
"instead return a single-item list containing that string.")
quite a good thing to have.
msg78787 - (view)
Author: Dmitry Vasiliev (hdima)
Date: 2009-01-02 12:22
Graham Dumpleton wrote: > Thus have odd situation where with Python 3.0, one could technically return > str as iterable, with rule that would apply would be that each str returned > would then be converted to bytes by way of latin-1 conversion, but for > bytes returned as iterable, should fail. > > Not sure how this plays out in wsgiref server yet as haven't looked. If application return bytes instead of an iterable AssertionError will be raised in handlers.BaseHandler.write().msg78990 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 19:03
It is committed now in py3k and the 3.0 maintenance branch. Thanks all for your participation!msg78996 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 19:56
Reopening, the patch actually produces failures when run with "python -bb", that is there are comparisons between str and bytes. See the errors at the end of http://www.python.org/dev/buildbot/3.x.stable/ppc%20Debian%20unstable%203.0/builds/26/step-test/0msg78997 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 20:08
People, does this patch look ok to you?msg78999 - (view) Author: Dmitry Vasiliev (hdima) Date: 2009-01-03 20:24
Antoine Pitrou wrote: > People, does this patch look ok to you? Oh, didn't know about -bb. The patch looks OK for me.msg79000 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 20:26
Le samedi 03 janvier 2009 à 20:24 +0000, Dmitry Vasiliev a écrit : > Dmitry Vasiliev <dima@hlabs.spb.ru> added the comment: > > Antoine Pitrou wrote: > > People, does this patch look ok to you? > > Oh, didn't know about -bb. Well, it's meant to catch potential bugs. str and bytes always compare unequal in py3k (i.e. "a" != b"a"), so it's good to use -bb when porting some stuff from 2.x. There's another problem in that buildbot failure with the environment variable "NO_PROXY". We'll see if it's still there after the patch.msg79002 - (view) Author: Dmitry Vasiliev (hdima) Date: 2009-01-03 20:59
Antoine Pitrou wrote: > There's another problem in that buildbot failure with the environment > variable "NO_PROXY". We'll see if it's still there after the patch. Strange error and it seems there is only part of the traceback. I've already seen such "partially displayed" Python 3 traceback and error actually can be in very different place.msg79003 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 21:07
> Strange error and it seems there is only part of the traceback. I've > already seen such "partially displayed" Python 3 traceback and error > actually can be in very different place. If you can reproduce such a problem, please open a bug.msg79005 - (view) Author: Dmitry Vasiliev (hdima) Date: 2009-01-03 21:14
Antoine Pitrou wrote: >> Strange error and it seems there is only part of the traceback. I've >> already seen such "partially displayed" Python 3 traceback and error >> actually can be in very different place. > > If you can reproduce such a problem, please open a bug. OK, I'll try to reproduce.msg79013 - (view) Author: Dmitry Vasiliev (hdima) Date: 2009-01-03 21:57
Attached patch for test_urllib, possible source of the "NO_PROXY" problem.msg79020 - (view) Author: Antoine Pitrou (pitrou) *
Date: 2009-01-03 23:01
Nice catch! I've committed the two patches and we'll see whether it makes the buildbots feel better.msg80889 - (view) Author: Benjamin Peterson (benjamin.peterson) *
Date: 2009-01-31 23:30
I assume the buildbots were placated?