Is 100,000 entries big for a dictionary?
rturpin at my-deja.com
rturpin at my-deja.com
Mon Jan 1 19:50:10 EST 2001
More information about the Python-list mailing list
Mon Jan 1 19:50:10 EST 2001
- Previous message (by thread): Is 100,000 entries big for a dictionary?
- Next message (by thread): Simulating WWW button press
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <mailman.978392642.24494.python-list at python.org>, "Tim Peters" <tim.one at home.com> wrote: > .. Python's hash table structure is utterly vanilla, > a contiguous vector of 2**N structs -- it's just that > sometimes a brand new vector 2x bigger .. may get > allocated and everything moves to the new vector. .. An alternate method, at each split, is to allocate a new vector the same size as the sum of all previous, and to move half the existing entries (those whose next hash bit is 1) into the new vector. This saves half the cost of reinsertion, but it requires a table of offsets for the sequence of vectors. Since the vectors grow exponentially, this table is small. > .. All references to extensible (more often "extendible") > hashing I've seen have a much fancier scheme in mind, > with a directory (or index) table pointing to buckets > that can split independently; the thrust of such schemes > is to ensure that an item can be found with no more than > two disk accesses (incl. one for the directory table, if > it's paged out). I believe that's the std meaning, and > it doesn't have anything in common with Python's scheme > beyond hash functions and powers of 2 .. As I view it, the core ideas of extensible hashing are (a) that a single hash function can index different sizes of hash space by using the first m high-order bits, and (b) when this is done, the memory boundaries for the different spaces align in a way that supports a variety of strategies for merge and split. This can be done for in-memory tables, as I described above, or for indexed tables on mass storage. In the latter case, the index stores the mapping of address space to disk, possibly with collapse of some parts of the hash space space, ie, 2^n pointers may reference the same disk bucket. Russell Sent via Deja.com http://www.deja.com/
- Previous message (by thread): Is 100,000 entries big for a dictionary?
- Next message (by thread): Simulating WWW button press
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list