"Newbie" questions - "unique" sorting ?
John Hunter
jdhunter at ace.bsd.uchicago.edu
Tue Jun 24 23:11:59 EDT 2003
More information about the Python-list mailing list
Tue Jun 24 23:11:59 EDT 2003
- Previous message (by thread): "Newbie" questions - "unique" sorting ?
- Next message (by thread): "Newbie" questions - "unique" sorting ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>>>>> "John" == John Fitzsimons <xpm4senn001 at sneakemail.com> writes: John> (B) I am wanting to sort words (or is that strings ?) into a John> list from a clipboard and/or file input and/or.... John> (C) To sort out the list of "unique" words/strings. The classic idiom for getting a unique list is to use a dictionary http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560. If you have enough memory to do everything in memory, the following should be quote efficient allWords = file('myfile.dat').read().split() uwords = dict([(w,1) for w in allWords]).keys() uwords.sort() print uwords By using list comprehensions to build the dict, as above, you avoid some of the overhead of a manual loop approach. Although this approach conserves speed over memory, in my own experience processing text files, it is the way to go. Very large text files (you mentioned 50MB) are extremely rare. For example, the entire King James bible, including html markup, is < 5MB. The complete works of Shakespeare, including html markup, are < 10MB. So I think it would be unusual for you to need to process a single text file larger that 10MB. Unless you have a specific example where you need to process such extremely large files, I recommend doing as much as possible in memory. John Hunter
- Previous message (by thread): "Newbie" questions - "unique" sorting ?
- Next message (by thread): "Newbie" questions - "unique" sorting ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list