Finding nonprintable characters?
Steven Majewski
sdm7g at Virginia.EDU
Tue Feb 19 14:35:40 EST 2002
More information about the Python-list mailing list
Tue Feb 19 14:35:40 EST 2002
- Previous message (by thread): Finding nonprintable characters?
- Next message (by thread): Finding nonprintable characters?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 19 Feb 2002, VanL wrote: > > I have a function > > isBinary(filehandle) > > that I'm not sure how to implement. I've decided to define binary as > containing characters above \x80. But what is the best way to do this? > > 1. iterate through xreadline, so the whole thing doesn't get loaded into > memory? I would use file.read( bytes ) -- if it's binary, then you probably don't need to read the whole file in. Most programs I've seen that try to determine 'binaryness' only check the first N bytes anyway. ( I've seen some that want a certain percentage of non-printing chars per block -- not just a single out of range char. ) > 2. String searching? If so, for what string? Searching for anything > greater than \x7f? > > 3. Re searching? for what class? > How about something like: filter( lambda c: ord(c) > value, file.read( blocksize ) ) or, as you note, save the ord() call and use an octal or hex string literal. If you want to use list comprehensions it would be something like: [ c for c in file.read( blocksize ) if c > '\x7f' ] but list comprehensions give you a list while filter on a string yields a string. You can divide the (float) length of the filtered value by the length of the original ( blocksize ) for a ratio if you want to use that instead of a single out of range char. -- Steve Majewski
- Previous message (by thread): Finding nonprintable characters?
- Next message (by thread): Finding nonprintable characters?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list