[Python-Dev] Tuning Python dicts
Reid Kleckner
rnk at mit.edu
Sat Apr 10 23:35:24 CEST 2010
More information about the Python-Dev mailing list
Sat Apr 10 23:35:24 CEST 2010
- Previous message: [Python-Dev] Tuning Python dicts
- Next message: [Python-Dev] Tuning Python dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Apr 10, 2010 at 4:40 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: > Reid Kleckner <rnk <at> mit.edu> writes: >> >> I think you're right about the number of collisions, though. CPython >> dicts use a pretty low load factor (2/3) to keep collision counts >> down. One of the major benefits cited in the paper is the ability to >> maintain performance in the face of higher load factors, so I may be >> able to bump up the load factor to save memory. This would increase >> collisions, but then that wouldn't matter, because resolving them >> would only require looking within two consecutive cache lines. > > Why wouldn't it matter? Hash collisions still involve more CPU work, even though > if you're not access memory a lot. So we know for sure that extra CPU work to avoid cache misses is a big win, but it's never clear whether or not any given memory access will be a miss. Because Python's access patterns are rarely random, it may turn out that all of the elements it accesses are in cache and the extra CPU work dominates. If you have a random access pattern over a large dataset, the dictionary will not fit in cache, and saving random memory accesses at the expense of CPU will be a win. Reid
- Previous message: [Python-Dev] Tuning Python dicts
- Next message: [Python-Dev] Tuning Python dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list