Testing for membership speed
Larry Bates
lbates at swamisoft.com
Wed Jun 23 10:38:42 EDT 2004
More information about the Python-list mailing list
Wed Jun 23 10:38:42 EDT 2004
- Previous message (by thread): Testing for membership speed
- Next message (by thread): How to call Java from Python?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The dictionaries are faster because they are indexed. Lists/tuples must be searched serially from the beginning. Dictionary indexes are hashed and the hashing algorithm has more to do on a string than on an integer (see first article link below for explanation). You could check into pre-hashing your strings and using these as integer keys. It will take longer to build the dictionary, but searching should be faster. Here are some articles that might interest you: http://effbot.org/zone/python-hash.htm http://mail.python.org/pipermail/python-dev/2003-June/036556.html http://www.egenix.com/files/python/mxTextTools.html Searching list or tuple serially from the beginning should take approximately the same time. Nothing about the mutability can help the search speed. HTH, Larry Bates Syscon, Inc. <brianc at temple.edu> wrote in message news:mailman.42.1087998564.27577.python-list at python.org... > I just ran this stuff for my own knowledge. Though it might be > useful to some other people to know and maybe spark a discussion. > > I needed a fast way to test for membership, so naturally the > choices were the builtin containers: lists, dictionaries, and > tuples. The following is the test code and results: > > import timeit > > lst_i=timeit.Timer('random.randrange(10000) in l','import > random; l=range(10000)') > dct_i=timeit.Timer('l.has_key(random.randrange(10000))','import > random; l=dict([(i,None) for i in xrange(10000)])') > tup_i=timeit.Timer('random.randrange(10000) in l','import > random; l=tuple(range(10000))') > > lst_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest() > in l','import random,md5; l=[md5.md5(str(i)).hexdigest() for i > in xrange(10000)]') > dct_str=timeit.Timer('l.has_key(md5.md5(str(random.randrange(10000))).hexdig est())','import > random,md5; l=dict([(md5.md5(str(i)).hexdigest(),None) for i > in xrange(10000)])') > tup_str=timeit.Timer('md5.md5(str(random.randrange(10000))).hexdigest() > in l','import random,md5; l=tuple([md5.md5(str(i)).hexdigest() > for i in xrange(10000)])') > > print 'Integer lookup' > r=lst_i.repeat(100,100); print 'list:',min(r),max(r); > r=dct_i.repeat(100,100); print 'dict:',min(r),max(r); > r=tup_i.repeat(100,100); print 'tupl:',min(r),max(r); > print 'String lookup' > r=lst_str.repeat(100,100); print 'list:',min(r),max(r); > r=dct_str.repeat(100,100); print 'dict:',min(r),max(r); > r=tup_str.repeat(100,100); print 'tupl:',min(r),max(r); > > [[[Ran on IRIX64 6.5, 24 processors, 12G Memory, 4G Swap, this > code only uses one processor at %100 the full length of the run]]] > Python 2.2.3 (#1, Nov 25 2003, 18:58:21) [C] on irix646-n32 > Type "help", "copyright", "credits" or "license" for more > information. > >>> ## working on region in file > /usr/tmp/python-119959673PMu.py... > Integer lookup > list: 0.126830816269 0.160212993622 > dict: 0.00362300872803 0.00385618209839 > tupl: 0.119297981262 0.161748170853 > String lookup > list: 0.142526865005 0.188524961472 > dict: 0.00711393356323 0.00760197639465 > tupl: 0.143892049789 0.186873912811 > >>> > > The results are conclusive, go for dictionaries. But this > surprised me a little, anyone have insight as to why? > > I was also surprised that tuples and lists scored exactly the > same. I was hoping that immutable tuples might earn it some > speed over lists. > > I will eventually need this for testing strings. So the > doubling of speed for strings over integers for dictionaries > is a little alarming. Lists and tuples only saw a modest increase. > > Thank you in advance for any clever tricks you suggest. > > -Brian >
- Previous message (by thread): Testing for membership speed
- Next message (by thread): How to call Java from Python?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list