Issue32623
Created on 2018-01-22 16:37 by yselivanov, last changed 2022-04-11 14:58 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 5297 | closed | methane, 2018-01-24 12:57 | |
| PR 5300 | closed | methane, 2018-01-24 15:07 | |
| Messages (9) | |||
|---|---|---|---|
| msg310427 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2018-01-22 16:37 | |
We should investigate whether we want dicts to compact themselves on del/pop operations. Currently we have to rely on workarounds to have compactable dict.copy (see issue 31179) etc. |
|||
| msg310433 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-01-22 16:57 | |
Note: It was recently discussed if "del dict[key]" should keep the insertion order. If I understood correctly: yes, the order must be preserved on deletion. https://mail.python.org/pipermail/python-dev/2017-November/150142.html |
|||
| msg310548 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2018-01-24 01:53 | |
For insert/pop loop, reduce table size aggressively on pop may cause performance regression. So reducing size should be conservative. So my opinion is: * When dict size become 0, make the dict shared-empty, like dict.clear() * When (dict size < dk_size/8), call insertion_resize() |
|||
| msg310570 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-01-24 09:29 | |
I agree that an heuristic is needed to decide when a dict should be compacted. > * When (dict size < dk_size/8), call insertion_resize() In bpo-31179, I suggested to Yury to use 2/3 ratio... to avoid integer overflow :-) He first used 80%, but I dislike using the FPU in the dictobject.c. I'm not sure of the cost of switching from integers to floats, and more generally I hate rounding issues, so I prefer to use regular integers ;-) + (3) if 'mp' is non-compact ('del' operation does not resize dicts), + do fast-copy only if it has at most 1/3 non-used keys. + + The last condition (3) is important to guard against a pathalogical + case when a large dict is almost emptied with multiple del/pop + operations and copied after that. In cases like this, we defer to + PyDict_Merge, which produces a compacted copy. By the way, if dict automatically compacts itself automatically, do we still need Yury's test "is the dict compact"? |
|||
| msg310571 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-01-24 09:31 | |
This issue is a matter of CPU vs memory tradeoff. It reminds me the PEP 393: str type doesn't make tradeoff anymore, CPython guarantee that str always use the most efficient storage (least memory) for characters. But here the tradeoff is different. A dict is mutable, whereas str is immutable ;-) |
|||
| msg310604 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2018-01-24 16:02 | |
> * When dict size become 0, make the dict shared-empty, like dict.clear() This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear(). > * When (dict size < dk_size/8), call insertion_resize() This is bad too. When ma_used=127 and dk_size=1024, new size will be 1024! It's because current GROWTH_RATE is used*2 + size/2. This GROWTH_RATE is set in issue17563. We should understand it before changing anything. |
|||
| msg310607 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-01-24 16:20 | |
"This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear()." Should we make sure that all dicts have at least MINSIZE entries? |
|||
| msg310649 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2018-01-25 00:46 | |
> Should we make sure that all dicts have at least MINSIZE entries? I don't think so. I think "allocate on first insert" is good idea for dict.clear() And I think it's good idea for new dict too: https://github.com/python/cpython/pull/1080 |
|||
| msg310654 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2018-01-25 02:07 | |
I think I understand #17563, and I should fix GROWTH_RATE. Before compact-ordered dict, we can avoid resizing in "the number of deletions is on a par with the number of insertions." scenario, by large GROWTH_RATE. That's because new entry can reuse dummy entries. But in compact-ordered dict, we can't do that. We need resizing always, and resize is much faster than legacy dict. I think GROWTH_RATE should be ma_used*3 for now. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:56 | admin | set | github: 76804 |
| 2020-08-04 16:02:32 | vstinner | set | nosy:
- vstinner |
| 2020-08-04 04:34:20 | rhettinger | set | nosy:
+ tim.peters, rhettinger |
| 2020-08-04 03:34:42 | gvanrossum | set | nosy:
+ gvanrossum |
| 2018-04-02 11:56:14 | methane | set | dependencies: + GROWTH_RATE prevents dict shrinking |
| 2018-01-31 04:23:23 | xgdomingo | set | nosy:
+ xgdomingo |
| 2018-01-25 02:07:15 | methane | set | messages: + msg310654 |
| 2018-01-25 00:46:17 | methane | set | messages: + msg310649 |
| 2018-01-24 16:20:32 | vstinner | set | messages: + msg310607 |
| 2018-01-24 16:02:09 | methane | set | messages: + msg310604 |
| 2018-01-24 15:07:04 | methane | set | pull_requests: + pull_request5147 |
| 2018-01-24 12:57:18 | methane | set | keywords:
+ patch stage: patch review pull_requests: + pull_request5144 |
| 2018-01-24 09:31:37 | vstinner | set | messages: + msg310571 |
| 2018-01-24 09:29:40 | vstinner | set | messages: + msg310570 |
| 2018-01-24 01:53:52 | methane | set | messages: + msg310548 |
| 2018-01-22 16:57:06 | vstinner | set | messages: + msg310433 |
| 2018-01-22 16:37:38 | yselivanov | create | |
