BOM should be ignored by Python
Mark Hammond
mhammond at skippinet.com.au
Mon May 1 20:21:44 EDT 2000
More information about the Python-list mailing list
Mon May 1 20:21:44 EDT 2000
- Previous message (by thread): BOM should be ignored by Python
- Next message (by thread): BOM should be ignored by Python
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Neil Hodgson" <neilh at scintilla.org> wrote in message news:YBoP4.9095$v85.58388 at news-server.bigpond.net.au... Hi Neil... > Unicode files may contain an initial Byte Order Mark to describe the way > that the file is encoded. In UTF-8 this is the byte sequence EF BB BF. One > current editor, the Win2K version of Notepad adds this BOM to the front of > files saved as UTF-8. I would like to see the Python interpreter accept but > ignore this at the start of a file. The current behaviour is to throw a > SyntaxError. I believe this was discussed on python-dev, and decided that Python itself should not handle BOM markers at all - simply leave them to the app. It would be a little painful to change the Python file read semantics to handle this only when reading the first 2 bytes of a disk-based file. Further, Python would need to maintain the BOM read for a particular stream, so it can be applied to later, potentially disjointed reads of the file. So it was decided that this is purely an application issue. The app should open the file, read the first 2 bytes, and take whatever action it needs. FWIW, some other MS documentation says this is the "official" way to determine if a text file is unicode or ascii. So it really would be a big ask to expect Python to be able to have 3 modes for reading a file, all based on the first 2 bytes - no BOM == ascii, and the 2 BOM values... > In the future, the BOM could also be used to change the behaviour of the > interpreter. How would this work? I could see that it could change the parser (and I guess the compiler), but how the interpreter? Read the BOM from stdin? Mark.
- Previous message (by thread): BOM should be ignored by Python
- Next message (by thread): BOM should be ignored by Python
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list