Issue31543
Created on 2017-09-21 13:25 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 3685 | closed | vstinner, 2017-09-21 13:27 | |
| Messages (8) | |||
|---|---|---|---|
| msg302692 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-09-21 13:25 | |
Attached pull request adds a fastpath for wrapper descriptors to use the FASTCALL calling convention. It's a follow up of bpo-31410 and all my work on FASTCALL during Python 3.6 and 3.7 development cycles. Microbenchmark: ./python -m perf timeit -s 'import array; obj=array.array("b"); wrap=array.array.__len__' 'wrap(obj)' Result: haypo@selma$ ./python -m perf compare_to ref.json patch.json Mean +- std dev: [ref] 59.2 ns +- 0.6 ns -> [patch] 28.2 ns +- 0.9 ns: 2.10x faster (-52%) It removes 31 nanoseconds on such very fast C function, array_length(). Attached PR is still a work-in-progress. First I would like to know if it's worth it because working on polishing the actual code. |
|||
| msg302697 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-09-21 15:44 | |
> First I would like to know if it's worth it I don't know. What's the point of optimizing `array.array.__len__(obj)`? People usually call `len(obj)` for that... |
|||
| msg302709 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-09-21 16:46 | |
> I don't know. What's the point of optimizing `array.array.__len__(obj)`? People usually call `len(obj)` for that... Right, type.method(self) is less than common than self.method(). I looked at the stdlib. I found that the following method are called using wrapper descriptors: * object.__repr__(self) * object.__getattribute__(self, name) * _asyncio.Future.__del__(self) * int.__str__(obj) * etc. But it seems like such calls are rare compared to other kinds of function calls. -- By the way, _PyMethodDescr_FastCallKeywords() is only called from call_function() in Python/ceval.c. It's not used in Objects/call.c. Maybe we should use it there as well? It seems like this is a question about tracing. But maybe we can copy/paste the code from call_function()? |
|||
| msg302710 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-09-21 16:52 | |
It optimizes the same cases as bpo-31410. bpo-31410 removed a half of the overhead for wrapper descriptors, Victor's patch removes the remaining half. Actually it makes calling the descriptor faster than calling the corresponding builtin! But bpo-31410 changes were simple, just few lines, and PR 3685 looks much more complex. Are non-fast wrappers still needed? If the patch replaces the code instead of adding it this would decrease its cost. AFAIK the only descriptors that should support non-fast calling convention are __new__, __init__ and __call__. As for practicality, this change should slightly speed up the code that directly calls __eq__, __lt__. For example classes decorated with total_ordering. Victor, can you provide microbenchmarks with more practical code? |
|||
| msg302717 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2017-09-21 19:40 | |
This seems like a straight-forward win. I don't think we really need to see microbenchmarks before going forward with this one. |
|||
| msg302724 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-09-21 21:09 | |
The most complex slots are __new__, __init__ and __call__ because they accept keywords. It's hard to design a calling convention which optimize all cases because of the backward compatibility. The risk is to convert a dict to args + kwnames (tuple) and then back to a dict: two expensive and useless conversions. It's my 3rd or 4th attempt to optimize new/init/call :-) My previous attempt was to add new tp_fastXXX slots to types, but again thr backward compatibilty was a major pain. https://bugs.python.org/issue29358 Here the scope is much small and backward compatibilty shouldn't affect us. I will try to find a way to optimize these slots as well and complete my patch. |
|||
| msg302733 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-09-21 22:01 | |
These slots are the only slots that accept keyword arguments and arbitrary number of positional parameters. Other slots can use fast calls. |
|||
| msg306865 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-11-24 01:10 | |
It seems like these specific descriptors are rare, so the added complexity is not worth it. I close my issue as rejected. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:52 | admin | set | github: 75724 |
| 2017-11-24 01:10:38 | vstinner | set | status: open -> closed resolution: rejected messages: + msg306865 stage: patch review -> resolved |
| 2017-09-21 22:01:12 | serhiy.storchaka | set | messages: + msg302733 |
| 2017-09-21 21:09:36 | vstinner | set | messages: + msg302724 |
| 2017-09-21 19:40:37 | rhettinger | set | nosy:
+ rhettinger messages: + msg302717 |
| 2017-09-21 16:52:11 | serhiy.storchaka | set | messages: + msg302710 |
| 2017-09-21 16:46:42 | vstinner | set | messages: + msg302709 |
| 2017-09-21 15:44:16 | pitrou | set | nosy:
+ pitrou messages: + msg302697 |
| 2017-09-21 13:27:17 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request3674 |
| 2017-09-21 13:25:31 | vstinner | create | |
