py/map: Add board option MICROPY_PY_MAP_ORDERED to make all dicts/maps ordered. by andrewleech · Pull Request #5323

py/map: Add board option MICROPY_PY_MAP_ORDERED to make all dicts/maps ordered. by andrewleech · Pull Request #5323 · micropython/micropython

No change to normal builds, but if MICROPY_PY_MAP_ORDERED is set in port/board config then the ordered flag is set for all new maps (dicts)

While I presume this has a performance impact it is useful behavior for some applications eg. #5322

Hmm, interesting. I guess this really goes with the CPython 3.7 change that dict guarantees insertion order; https://docs.python.org/3/whatsnew/3.7.html

Although there is a subtly that maps are not 1:1 with dict, map is used to implement dict. Not sure if that really matters though.

While I presume this has a performance impact

Would be interesting to test it. It would also mean that most of the code in mp_map_lookup() is now unused.

Yes I know I kind of glossed over the terminology of maps vs dicts, I used the dict label in the title as ordering them is what I wanted to achieve, but that's done via the map code.

I just looked at mp_map_lookup(), I see what you mean, Much of it is the implementation for hash lookup as opposed to the "brute force linear" search for ordered maps. I could use the same define to remove the rest of the code to make it all a little smaller.

@dpgeorge I actually think it might be slightly faster with this new ordered dict setting enabled.

I grabbed the list of dict related tests in the micropython repo and imported (run) them all from a top level file
dict_test.py

from basics import dict1
from basics import dict2
from basics import dict_clear
from basics import dict_construct
from basics import dict_copy
from basics import dict_del
from basics import dict_fixed
from basics import dict_from_iter
from basics import dict_fromkeys
from basics import dict_fromkeys2
from basics import dict_get
from basics import dict_intern
from basics import dict_iterator
from basics import dict_pop
from basics import dict_popitem
from basics import dict_setdefault
from basics import dict_specialmeth
from basics import dict_update
from basics import dict_views
from basics import namedtuple_asdict
from basics import object_dict
from basics import ordereddict1
from basics import ordereddict_eq
from micropython import heapalloc_fail_dict
from stress import dict_copy
from stress import dict_create
from stress import dict_create_max

Then made a quick runner to execute this in a loop
run_dict_test.py

import time
import subprocess

start = time.time()
for _ in range(100):
    subprocess.run(['micropython', 'dict_test.py'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
end = time.time()
print(end-start)

Then python3 run_dict_test.py a number of times.

I found that the print to console slowed it down by a few orders of magnitude, hence the capture to pipe.

My normal build of unix micropython with this new setting disabled run in ~4.3 seconds (4.23s to 4.6s over 10 runs)

Recompiled with the ordered setting enabled, ~4.1 seconds (4.03s to 4.17s) over the same number of runs.

I switched the setting on and off a couple of times and re-ran to ensure the change is consistent, and yeah it kept coming up ~200ms faster with ordered turned on permanently.

Interesting that it's consistent, still it would be better to do the profiling in the MicroPython process to rule out whatever happens when launching the process, and also change it so it doesn't take the importing account but purely the dict operations; even then I'm not sure how close the tests are to real-life software using dict so I'd also try a couple of the benchmark tests which are available on the internet and run those. Would be nice if that gives the same results!

Interesting that the performance looks better... but I'm not sure this would be true for all uses of dict, since large dicts definitely benefit from a proper O(1) hashed lookup, rather than a brute force O(N) search.

Running the performance benchmark suite on PYBv1.0 I get:

diff of scores (higher is better)
N=100 M=100                    orig ->    ordered         diff      diff% (error%)
bm_chaos.py                  292.07 ->     326.05 :     +33.98 = +11.634% (+/-0.04%)
bm_fannkuch.py                72.32 ->      72.33 :      +0.01 =  +0.014% (+/-0.00%)
bm_fft.py                   2388.45 ->    2394.47 :      +6.02 =  +0.252% (+/-0.00%)
bm_float.py                 4744.28 ->    5455.48 :    +711.20 = +14.991% (+/-0.00%)
bm_hexiom.py                  35.07 ->      38.64 :      +3.57 = +10.180% (+/-0.00%)
bm_nqueens.py               4177.52 ->    4276.91 :     +99.39 =  +2.379% (+/-0.01%)
bm_pidigits.py               633.88 ->     636.17 :      +2.29 =  +0.361% (+/-0.19%)
core_with.py               45340.42 ->   47442.55 :   +2102.13 =  +4.636% (+/-0.03%)
core_yield_from.py         58104.92 ->   58126.86 :     +21.94 =  +0.038% (+/-0.04%)
misc_aes.py                  354.77 ->     374.04 :     +19.27 =  +5.432% (+/-0.01%)
misc_mandel.py              2926.65 ->    2996.15 :     +69.50 =  +2.375% (+/-0.01%)
misc_pystone.py             1913.26 ->    1798.12 :    -115.14 =  -6.018% (+/-0.00%)
misc_raytrace.py             299.83 ->     348.09 :     +48.26 = +16.096% (+/-0.01%)
viper_call0.py               576.79 ->     576.80 :      +0.01 =  +0.002% (+/-0.00%)
viper_call1a.py              550.32 ->     550.34 :      +0.02 =  +0.004% (+/-0.00%)
viper_call1b.py              434.87 ->     434.90 :      +0.03 =  +0.007% (+/-0.01%)
viper_call1c.py              439.42 ->     439.45 :      +0.03 =  +0.007% (+/-0.01%)
viper_call2a.py              536.26 ->     536.28 :      +0.02 =  +0.004% (+/-0.00%)
viper_call2b.py              377.24 ->     377.26 :      +0.02 =  +0.005% (+/-0.01%)

So almost all benchmarks improve by a few percent. But pystone decreases, and that is one which uses maps quite a bit (in user classes, for their instance member table).

I think the reason for the improvements in performance is because qstr_hash() is not as fast as it could be, and calling this in mp_map_lookup() (for the case of hashed maps) is a bit of a bottleneck for small dicts. Ie the qstr_hash() call takes most of the time for a dict with only a few elements. On an ordered dict there is no call to qstr_hash() because it just does a brute force linear search, and for small dicts this is pretty fast. So it's a case of algorithms C1 + O(1) vs C2 + O(N), where the constant overhead C1 is much bigger than C2 (for hashed and linear search resp).

Ah yes, the qstr lookup time does make sense.

Certainly the algorithm scaling blow out the difference for larger dicts. I re-did my test by concatenating most of the dict unit tests together an running them in a x100 loop directly inside micropython. With most of the tests in place:
master: 3.3037 s
ordered: 3.0319 s

However, if I enable

    # copying a large dictionary

    a = {i:2*i for i in range(1000)}
    b = a.copy()
    for i in range(1000):
        dprint(i, b[i])
    dprint(len(b))
    # create a large dictionary

    d = {}
    x = 1
    while x < 1000:
        d[x] = x
        x += 1
    dprint(d[500])

in the x100 loop then the difference jumps to
master: 3.7364 s
ordered: 5.2220 s

And then the big killer, running just once (on unix):

# The aim with this test is to hit the maximum resize/rehash size of a dict,
# where there are no more primes in the table of growing allocation sizes.
# This value is 54907 items.

script_end = utime.time()
print("Elapsed %0.4f ms" % (script_end-script_start))

d = {}
try:
    for i in range(54908):
        d[i] = i
except MemoryError:
    pass

master: 0.0086 s
ordered: 10.7578 s

So while for smaller application, the simpler brute force ordered implementation increases efficiency and could be used to chop out some code size, for many other larger applications the ordered approach would likely be unusable.

I've just expanded the #define's to chop out the hash based code that's no longer used if ordered is enabled.
It does make the #defines's a little harder to visually follow, but on my build it reduces flash size by 1232 bytes.

Would there be any merit to implementing the compact/sparse array/map that CPython and PyPy adopted for Python 3.6 back in 2015? When considering how we might keep our dicts working API similar to CPython perhaps keeping the implementation similar makes that easier.

Would there be any merit to implementing the compact/sparse array/map that CPython and PyPy adopted for Python 3.6 back in 2015?

Maybe. I would suggest thinking about a compile-time option that selects between the current dict implementation (efficient in RAM usage and O(1) lookup, but not ordered) and one that uses more RAM with O(1) lookup and ordered.

Personally I feel improvements to the ordered dict implementation is outside the scope of this particular PR, I was just going for ordered by default, irrespective of performance.

If a new hashmap implementation is to be written to replace the current ordered implementation, I would think this PR is still equally relevant as-is to select between the two.

Choosing a new ordered dict implementation is a larger job as there are multiple factors to choose between. I personally would lean towards the current pypy hashmap: https://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html
which might be a more strictly ordered version of the cpython3.6 one (https://mail.python.org/pipermail/python-dev/2016-September/146327.html)

PyPy also implements the "compact dict", but it uses further "tricks"
to preserve the order even if items are removed and then others are
added. We might also implement these tricks in CPython, so dict will
be ordered as well!

Basically I trust their benchmarks and test results more than my own abilities to implement something more efficient/suitable.

But again, I think this is a larger discussion than this PR intention to just force ordered by default.

		// if the map is an ordered array then we must do a brute force linear search
		#if !MICROPY_PY_MAP_ORDERED
		if (map->is_ordered) {
		for (mp_map_elem_t elem = &map->table[0], top = &map->table[map->used]; elem < top; elem++) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this uncrustify which wants to dedent all the code? It's not great. Would it fix it to add an extra set of { } around this code, outside the #if?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep these were both from uncrustify breaking the build. I'll try the { }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, yes the brackets appear to have fixed the formatting.

		// map is a hash table (not an ordered array), so do a hash lookup

		if (map->alloc == 0) {
		if (map->alloc == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shouldn't be at column 1.

kamtom480 pushed a commit to kamtom480/micropython that referenced this pull request

Sep 10, 2021

Added support for Challenger RP2040 WiFi

Closing in favour of a proper ordered dict implementation (branch: py-map-ordered, building on dpgeorge's initial work).