testing for uniquness in a large list
Alex Martelli
aleaxit at yahoo.com
Wed Oct 20 07:37:32 EDT 2004
More information about the Python-list mailing list
Wed Oct 20 07:37:32 EDT 2004
- Previous message (by thread): Free hosting for Python open-source projects
- Next message (by thread): testing for uniquness in a large list
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Lol McBride <newspost at lolmc.com> wrote: > I'm looking for some help in developing a way to test for the uniqueness > of an item in a large list.To illustrate what I mean the code below is an > indicator of what I want to do but I'm hoping for a faster way than the > one shown.Basically,I have a list of 20 integers and I want to create > another list of 200000 unique subsets of 12 integers from this list.WhatI > have done here is to use the sample()function from the random module > and then compare the result to every item in the ints list to check for > uniqueness - as you can guess this takes an interminable amount of time to > grind through.Can someone please enlighten me as to how I can do this and > keep the amount of time to do it to a minimum? One word: dictionaries! Untested, but should work...: import random # dont't use from...import *, on general grounds def picks(seq=xrange(1, 21), rlen=200000, picklen=12): results = {} while len(results) < rlen: pick = random.sample(seq, picklen) pick.sort() pick = tuple(pick) results[pick] = 1 return results.keys() this returns a list of tuples; if you need a list of lists, return [list(pick) for pick in results] In 2.4, you could use "result=set()" instead of {}, results.add(pick) to maybe add a new pick, and possibly "return list(result)" if you need a list of tuples as the function's overall return value (the list comprehension for the case in which you need a list lists stays OK). But that's just an issue of readability, not speed. Slight speedups can be had by hoisting some lookups out of the while loop, but nothing major, I think -- eg. sample=random.sample just before the while, and use sample(seq, picklen) within the while. Putting all the code in a function and using local variables IS a major speed win -- local variable setting and access is WAY faster than globals'. Alex
- Previous message (by thread): Free hosting for Python open-source projects
- Next message (by thread): testing for uniquness in a large list
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list