Issue29782
Created on 2017-03-10 10:53 by niklasf, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bench_bit_length.py | vstinner, 2017-03-27 15:36 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 594 | closed | niklasf, 2017-03-10 10:58 | |
| PR 20739 | merged | niklasf, 2020-06-08 18:08 | |
| Messages (19) | |||
|---|---|---|---|
| msg289353 - (view) | Author: Niklas Fiekas (niklasf) * | Date: 2017-03-10 10:53 | |
Baseline performance (9e6ac83acae): $ ./python -m timeit "12345678 == 12345678.0" 5000000 loops, best of 5: 40 nsec per loop $ ./python -m timeit "1 == 1.0" 10000000 loops, best of 5: 38.8 nsec per loop $ ./python -m timeit "(1234578987654321).bit_length()" 10000000 loops, best of 5: 39.4 nsec per loop Upcoming PR: $ ./python -m timeit "12345678 == 12345678.0" 10000000 loops, best of 5: 34.3 nsec per loop $ ./python -m timeit "1 == 1.0" 10000000 loops, best of 5: 34.4 nsec per loop $ ./python -m timeit "(1234578987654321).bit_length()" 10000000 loops, best of 5: 36.4 nsec per loop |
|||
| msg289400 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2017-03-10 18:25 | |
Thanks for the idea, and the PR! To be useful, this would need a bit of tweaking: we can't assume that a digit is always `unsigned long` (in fact, I'd expect that to be rare on 64-bit non-Windows systems, where `unsigned long` typically has 64 bits, and `digit` should be using a 32-bit type), so we'd need to identify and use the most appropriate variant of clz/clzl/clzll somehow. I think it would also make sense to move the detection of the existence of `__builtin_clz` and friends into the configuration machinery: these builtins aren't just restricted to GCC (clang supports them as well), and that would allow other platforms to provide their own substitutes by modifying `pyport.h`. Ideally, the configuration machinery #defines a HAVE_CLZ variable, and then in longobject.c we do an #ifdef HAVE_CLZ ... We also need to trade-off the additional complication in the implementation against the benefits: though we do certainly care about the speed, speed at all costs isn't the target here; readability, portability and maintainability of the source counts, too. It'll probably be easier to weigh those factors once there's an implementation that addresses the above points. |
|||
| msg289401 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-03-10 18:52 | |
I think we can assume that digit is no larger than unsigned long, otherwise PyLong_AsLong() and like wouldn't work. |
|||
| msg289404 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2017-03-10 19:49 | |
True enough. It looks as though someone (*cough*) already documented that restriction, too: https://github.com/mdickinson/cpython/blob/v3.6.0/Include/longintrepr.h#L28-L30 |
|||
| msg289406 - (view) | Author: Niklas Fiekas (niklasf) * | Date: 2017-03-10 20:09 | |
Thanks for the review. I pushed a change to check if clz can be used (`sizeof(digit) <= sizeof(unsigned int)`). Otherwise use clzl. I believe the latter should be the most common, since unsigned long has 32bits. As you say unsigned long long should never be needed. Btw. mathmodule.c currently duplicates the function: https://github.com/python/cpython/blob/master/Modules/mathmodule.c#L1317. It might be worth factoring it out, but I don't know where to put it. |
|||
| msg289407 - (view) | Author: Niklas Fiekas (niklasf) * | Date: 2017-03-10 20:12 | |
Oops. Actually clz should commonly be enough. And platforms where clz and clzl are different (<=> unsigned int and unsigned long are different) should be rare. |
|||
| msg290620 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-03-27 15:36 | |
The change is an optimization, so it requires a benchmark, think you provided. Ok. But the speedup is a few nanoseconds on a function which takes currently 40 nanoseconds. It's not really what I would call significant: $ ./python -m perf compare_to ref.json patch.json --table +---------------------+---------+-----------------------------+ | Benchmark | ref | patch | +=====================+=========+=============================+ | int==float 8 digits | 39.2 ns | 37.9 ns: 1.03x faster (-3%) | +---------------------+---------+-----------------------------+ | int==float 1 digit | 38.0 ns | 37.9 ns: 1.00x faster (-0%) | +---------------------+---------+-----------------------------+ | int.bit_length() | 42.0 ns | 41.2 ns: 1.02x faster (-2%) | +---------------------+---------+-----------------------------+ (See attached bench_bit_length.py script.) So I'm not really convinced that the change is useful. Is bit_length() used in hot loops? |
|||
| msg290621 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-03-27 15:41 | |
> $ ./python -m timeit "12345678 == 12345678.0" > 5000000 loops, best of 5: 40 nsec per loop By the way, I check the bytecode to make sure that the compiler doesn't optimize that. I'm suprised that it's not replaced with True! Is there a reason to perform the test at runtime? |
|||
| msg290622 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-03-27 15:45 | |
There are not good reasons to optimize this case at compile time. This is very obscure way of writing True. |
|||
| msg290973 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-04-01 09:26 | |
I don't think a 3% improvement on a micro-benchmark is worth it (this will translate into 0% improvement on real-world workloads). Are there other uses of _Py_bit_length() that show bigger benefits? |
|||
| msg292090 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-04-21 23:58 | |
I concur with Antoine and suggest to reject this issue. |
|||
| msg292105 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-04-22 05:32 | |
In some microbenchmarks this can give up to 15%. $ ./python.patched -m perf timeit -q --compare-to=./python.default -s "a = list(map(float, range(10000)))" "12345 in a" Mean +- std dev: [python.default] 1.28 ms +- 0.11 ms -> [python.patched] 1.12 ms +- 0.07 ms: 1.15x faster (-13%) |
|||
| msg292106 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-04-22 05:40 | |
But even 15% speed up in particular microbenchmarks looks too small to me for such complex change. |
|||
| msg292114 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2017-04-22 12:14 | |
Closing. |
|||
| msg292115 - (view) | Author: Louie Lu (louielu) * | Date: 2017-04-22 12:19 | |
If this issue is closed by "not a big performance improvement", maybe the XXX in mathmoudle.c should be take off? """ /* XXX: This routine does more or less the same thing as * bits_in_digit() in Objects/longobject.c. Someday it would be nice to * consolidate them. On BSD, there's a library function called fls() * that we could use, and GCC provides __builtin_clz(). */ """ |
|||
| msg292139 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2017-04-22 21:57 | |
I'm in favor of documenting in comments decisions to not micro-optimize such code. I did something similar in ceval.c for 1+1. |
|||
| msg371035 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-06-08 18:24 | |
I reopen the issue since PR 20739 was created. |
|||
| msg371539 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-06-15 12:33 | |
New changeset 794e7d1ab2d7afe70fe0dd87ca8174ac860413e4 by Niklas Fiekas in branch 'master': bpo-29782: Consolidate _Py_Bit_Length() (GH-20739) https://github.com/python/cpython/commit/794e7d1ab2d7afe70fe0dd87ca8174ac860413e4 |
|||
| msg371543 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-06-15 13:40 | |
Thanks Niklas Fiekas for your tenacity :-) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:44 | admin | set | github: 73968 |
| 2020-06-15 13:40:04 | vstinner | set | messages: + msg371543 |
| 2020-06-15 12:44:55 | vstinner | set | status: open -> closed resolution: fixed versions: + Python 3.10, - Python 3.7 |
| 2020-06-15 12:33:52 | vstinner | set | messages: + msg371539 |
| 2020-06-08 18:24:27 | vstinner | set | status: closed -> open resolution: rejected -> (no value) messages: + msg371035 |
| 2020-06-08 18:08:21 | niklasf | set | pull_requests: + pull_request19946 |
| 2017-04-22 21:57:14 | vstinner | set | messages: + msg292139 |
| 2017-04-22 12:19:45 | louielu | set | nosy:
+ louielu messages: + msg292115 |
| 2017-04-22 12:14:11 | mark.dickinson | set | status: open -> closed resolution: rejected messages: + msg292114 stage: resolved |
| 2017-04-22 05:40:58 | serhiy.storchaka | set | messages: + msg292106 |
| 2017-04-22 05:32:49 | serhiy.storchaka | set | messages: + msg292105 |
| 2017-04-21 23:58:49 | vstinner | set | messages: + msg292090 |
| 2017-04-01 09:26:05 | pitrou | set | nosy:
+ pitrou messages: + msg290973 |
| 2017-03-27 15:45:40 | serhiy.storchaka | set | messages: + msg290622 |
| 2017-03-27 15:41:13 | vstinner | set | messages: + msg290621 |
| 2017-03-27 15:36:58 | vstinner | set | files:
+ bench_bit_length.py nosy: + vstinner messages: + msg290620 |
| 2017-03-10 20:12:03 | niklasf | set | messages: + msg289407 |
| 2017-03-10 20:09:01 | niklasf | set | messages: + msg289406 |
| 2017-03-10 19:49:59 | mark.dickinson | set | messages: + msg289404 |
| 2017-03-10 18:52:27 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg289401 |
| 2017-03-10 18:25:57 | mark.dickinson | set | messages: + msg289400 |
| 2017-03-10 11:10:48 | xiang.zhang | set | nosy:
+ mark.dickinson |
| 2017-03-10 10:58:59 | niklasf | set | pull_requests: + pull_request490 |
| 2017-03-10 10:53:57 | niklasf | create | |
