Message 312250 - Python tracker

Message312250

Author terry.reedy
Recipients lemburg, luis@luispedro.org, methane, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, twouters, vstinner
Date 2018-02-16.19:57:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518811050.87.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
Luís, what you really need for the problem you outline is an immutable set with one operation, 'in', aside from create and delete.  If specialized to strings only, it could internally stores only character sequences, rather than Python objects.  I suspect someone has written such a class (in C), but did not find anything on pypi.python.org.

Lacking that, the following experiment suggests that you might be able to restore near O(n) time by partitioning the set of keys into a collection of sets.  My 16M set took 6.48 and 1.80 seconds.  Times 4 is 25.9 and 7.2.  My 64M set took 28 and 19 seconds, but 4 16M sets take 26.3 and 7.5 seconds, only about 5% more than the x4 target.

print(timeit.Timer(
    's = [{str(n) for n in range(i*1000000, (i+16)*1000000)}'
	        ' for i in (0, 16, 32, 48)]')
      .timeit(number=1))
print(timeit.Timer(
    'del s',
    's = [{str(n) for n in range(i*1000000, (i+16)*1000000)}'
	        ' for i in (0, 16, 32, 48)]')
      .timeit(number=1))
History
Date User Action Args
2018-02-16 19:57:30terry.reedysetrecipients: + terry.reedy, lemburg, tim.peters, twouters, rhettinger, vstinner, luis@luispedro.org, methane, serhiy.storchaka
2018-02-16 19:57:30terry.reedysetmessageid: <1518811050.87.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-16 19:57:30terry.reedylinkissue32846 messages
2018-02-16 19:57:30terry.reedycreate