Guessing the encoding from a BOM
Rustom Mody
rustompmody at gmail.com
Fri Jan 17 00:08:23 EST 2014
More information about the Python-list mailing list
Fri Jan 17 00:08:23 EST 2014
- Previous message (by thread): Guessing the encoding from a BOM
- Next message (by thread): Guessing the encoding from a BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Friday, January 17, 2014 7:10:05 AM UTC+5:30, Tim Chase wrote: > On 2014-01-17 11:14, Chris Angelico wrote: > > UTF-8 specifies the byte order > > as part of the protocol, so you don't need to mark it. > You don't need to mark it when writing, but some idiots use it > anyway. If you're sniffing a file for purposes of reading, you need > to look for it and remove it from the actual data that gets returned > from the file--otherwise, your data can see it as corruption. I end > up with lots of CSV files from customers who have polluted it with > Notepad or had Excel insert some UTF-8 BOM when exporting. This > means my first column-name gets the BOM prefixed onto it when the > file is passed to csv.DictReader, grr. And its part of the standard: Table 2.4 here http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
- Previous message (by thread): Guessing the encoding from a BOM
- Next message (by thread): Guessing the encoding from a BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list