Issue 36560: test_functools leaks randomly 1 memory block
Created on 2019-04-08 14:22 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (21)
msg339643 - (view)
Author: STINNER Victor (vstinner) *
Date: 2019-04-08 14:22
Date: 2019-04-08 15:33
Date: 2019-04-08 17:11
Date: 2019-04-08 17:14
Date: 2019-04-09 11:20
Date: 2019-04-09 11:22
Date: 2019-04-09 11:28
Date: 2019-04-09 12:05
Date: 2019-04-09 12:23
Date: 2019-04-09 16:01
Date: 2019-04-09 16:26
Date: 2019-04-09 16:43
Date: 2019-04-09 16:56
Date: 2019-04-11 11:21
Date: 2019-04-17 10:16
Date: 2019-09-24 10:45
Date: 2020-06-11 14:23
Date: 2021-05-26 22:40
Date: 2021-08-31 14:53
Date: 2021-08-31 15:05
Date: 2021-08-31 16:05
Date: 2019-04-08 14:22
Sometimes, each run of test_functools leaks exactly 1 memory block, even when the whole test is "re-run in verbose mode". Sometimes, it doesn't leak. https://buildbot.python.org/all/#/builders/80/builds/550 test_functools leaked [1, 1, 1] memory blocks, sum=3 Re-running test 'test_functools' in verbose mode test_functools leaked [1, 1, 1] memory blocks, sum=3 Maybe the problem comes from Example on Linux: $ ./python -m test -F -r -j1 -R 3:3 test_functools Using random seed 3891892 Run tests in parallel using 1 child processes 0:00:01 load avg: 2.38 [ 1] test_functools passed beginning 6 repetitions 123456 ...... (...) 0:00:06 load avg: 2.27 [ 6] test_functools passed beginning 6 repetitions 123456 ...... 0:00:07 load avg: 2.27 [ 7/1] test_functools failed beginning 6 repetitions 123456 ...... test_functools leaked [1, 2, 1] memory blocks, sum=4 0:00:08 load avg: 2.27 [ 8/1] test_functools passed beginning 6 repetitions 123456 ...... == Tests result: FAILURE == 7 tests OK. 1 test failed: test_functools Total duration: 8 sec 333 ms Tests result: FAILUREmsg339651 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-08 15:33
Current bisection progress: I'm able to reproduce a failure with attached test_functools.py (211 lines) using: ./python -m test -F -j5 -R 3:3 test_functools It takes between 4 and 200 runs to reproduce the failure, I don't understand what triggers the bug. I failed to identify if the bug is more likely when the system is idle or when the system is busy. It looks purely random.msg339662 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-08 17:11
Some progress. I'm now able to reproduce the issue with attached test_functools2.py (103 lines, a single test method): $ cp (...)test_functools2.py Lib/test/ $ ./configure -C --with-pydebug CFLAGS=-O0 $ make $ ./python -m test -F -j5 -R 3:3 test_functools2 (...) 0:00:02 load avg: 0.95 [ 19/1] test_functools failed beginning 6 repetitions 123456 ...... test_functools leaked [1, 1, 1] memory blocks, sum=3 (...) Tests result: FAILURE I disabled the C accelerator _functools. I'm now trying to disable the C accelerator of abc, but I had a small issue: bpo-36565.msg339663 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-08 17:14
The issue comes maybe from the complex code in Lib/test/libregrtest/refleak.py to handle the ABC cache: abcs = {} for abc in [getattr(collections.abc, a) for a in collections.abc.__all__]: if not isabstract(abc): continue for obj in abc.__subclasses__() + [abc]: abcs[obj] = _get_dump(obj)[0] ... # clear type cache sys._clear_type_cache() # Clear ABC registries, restoring previously saved ABC registries. abs_classes = [getattr(collections.abc, a) for a in collections.abc.__all__] abs_classes = filter(isabstract, abs_classes) for abc in abs_classes: for obj in abc.__subclasses__() + [abc]: for ref in abcs.get(obj, set()): if ref() is not None: obj.register(ref()) obj._abc_caches_clear() clear_caches()msg339731 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 11:20
This issue isn't a real memory leak: if I use -R 3:10 instead of -R 3:3, the test doesn't fail anymore. But the issue is still annoying since it makes Refleaks buildbot workers fail randomly :-/ This issue remembers me the unstable multiprocessing tests: * bpo-33735: test_multiprocessing_spawn leaked [1, 2, 1] memory blocks on AMD64 Windows8.1 Refleaks 3.7 * bpo-33984: test_multiprocessing_forkserver leaked [1, 2, 1] memory blocks on x86 Gentoo Refleaks 3.x Patch to always display memory allocations differences: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index d68ea63b5b..997be819fa 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -118,6 +118,8 @@ def dash_R(the_module, test, indirect_test, huntrleaks): print(msg, file=refrep) refrep.flush() failed = True + if not failed: + print(alloc_deltas[nwarmup:]) return failed Truncated output with the patch: vstinner@apu$ ./python -m test -F -r -j1 -R 3:10 test_functools Using random seed 4308771 Run tests in parallel using 1 child processes 0:00:04 load avg: 0.91 [ 1] test_functools passed [0, 1, 2, 0, 0, 0, 0, 0, 0, 0] ... 0:00:13 load avg: 0.92 [ 3] test_functools passed [2, 1, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:17 load avg: 0.93 [ 4] test_functools passed [0, 3, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:21 load avg: 0.93 [ 5] test_functools passed [0, 1, 0, 0, 2, 0, 0, 0, 0, 0] ... 0:00:26 load avg: 0.93 [ 6] test_functools passed [0, 4, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:34 load avg: 0.87 [ 8] test_functools passed [0, 1, 0, 2, 0, 0, 0, 0, 0, 0] ... 0:01:06 load avg: 1.15 [ 15] test_functools passed [0, 1, 0, 2, 0, -1, 1, 0, 0, 0] ... 0:01:10 load avg: 1.46 [ 16] test_functools passed [0, 4, 0, 0, 0, 0, 0, 0, -1, 1] ... The maximum sum() of these list is around 5 on 10 runs: not every run leaks a memory block. It looks more like a internal cache which is "unstable" if you look at the number of allocated memory blocks.msg339732 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 11:22
I wrote PR 12743 to modify _abc: use weakref.WeakSet rather than using directly a set of weak references. With this PR, calling _get_dump() should clear all broken weak references. I'm not sure if it's really useful for this specific issue, since dash_R_cleanup() calls obj._abc_caches_clear() on all abstract base classes (ABC): all caches must be removed before reading the total number of allocated memory blocks (sys.getallocatedblocks()).msg339733 - (view) Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2019-04-09 11:28
_abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc?msg339735 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 12:05
If I modify libregrtest with the following patch: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index 0bb8a0a2bf..f0225a9768 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -128,7 +128,7 @@ def dash_R(ns, the_module, test_name, test_func): failed = False for deltas, item_name, checker in [ (rc_deltas, 'references', check_rc_deltas), - (alloc_deltas, 'memory blocks', check_rc_deltas), + (alloc_deltas, 'memory blocks', check_fd_deltas), (fd_deltas, 'file descriptors', check_fd_deltas) ]: # ignore warmup runs And I add the following file Lib/test/test_noop.py: import unittest class NoopTests(unittest.TestCase): def test_noop(self): pass regrtest detects a "leak": $ ./python -m test -R 3:3 test_noop Run tests sequentially 0:00:00 load avg: 0.55 [1/1] test_noop beginning 6 repetitions 123456 ...... test_noop leaked [0, 1, 0] memory blocks, sum=1 test_noop failed == Tests result: FAILURE == 1 test failed: test_noop Total duration: 113 ms Tests result: FAILURE The issue comes from this look in Lib/test/libregrtest/refleak.py: for i in range(repcount): indirect_test() alloc_after, rc_after, fd_after = dash_R_cleanup(fs, ps, pic, zdc, abcs) print('.', end='', file=sys.stderr, flush=True) if i >= nwarmup: rc_deltas[i] = get_pooled_int(rc_after - rc_before) alloc_deltas[i] = get_pooled_int(alloc_after - alloc_before) fd_deltas[i] = get_pooled_int(fd_after - fd_before) alloc_before = alloc_after rc_before = rc_after fd_before = fd_after Because of "if i >= nwarmup:", get_pooled_int() isn't call during "warmup", whereas the purpose of the warmup is to warmup *everything*. Maybe get_pooled_int() allocates one frame object and keeps it alive in its "zombi frame". Maybe something else is allocated and kept alive. Anything, removing "if i >= nwarmup:" to always compute deltas fix this specific issue. Attached PR 12744 fix this bug.msg339738 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 12:23
New changeset 5aaac94eeb44697e92b0951385cd557bc27e0f6a by Victor Stinner in branch 'master': bpo-36560: Fix reference leak hunting in regrtest (GH-12744) https://github.com/python/cpython/commit/5aaac94eeb44697e92b0951385cd557bc27e0f6amsg339768 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 16:01
New changeset 9c14061a2c2df9a9b84d0aab190a50c24a0d52f4 by Victor Stinner in branch '2.7': bpo-36560: Fix reference leak hunting in regrtest (GH-12744) (GH-12745) https://github.com/python/cpython/commit/9c14061a2c2df9a9b84d0aab190a50c24a0d52f4msg339772 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 16:26
New changeset bb4447897a5f141eecf42987a1191a3330c5d7ed by Victor Stinner in branch 'master': bpo-36560: regrtest: don't collect the GC twice (GH-12747) https://github.com/python/cpython/commit/bb4447897a5f141eecf42987a1191a3330c5d7edmsg339777 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 16:43
> _abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc? I mostly wrote PR 12743 to debug this issue. Please see the discussion directly on the PR ;-)msg339781 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-09 16:56
New changeset 86f0354fcb815312295b923c55e39364d85d0388 by Victor Stinner in branch '3.7': [3.7] bpo-36560: regrtest: don't collect the GC twice (GH-12747) (GH-12749) https://github.com/python/cpython/commit/86f0354fcb815312295b923c55e39364d85d0388msg339965 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-11 11:21
Patch making check on memory block leaks stricter: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index 235d6bfd3a..dfadabdef6 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -130,7 +130,7 @@ def dash_R(ns, the_module, test_name, test_func): failed = False for deltas, item_name, checker in [ (rc_deltas, 'references', check_rc_deltas), - (alloc_deltas, 'memory blocks', check_rc_deltas), + (alloc_deltas, 'memory blocks', check_fd_deltas), (fd_deltas, 'file descriptors', check_fd_deltas) ]: # ignore warmup runs Using this patch, at least the following tests fail: * test_asyncio * test_code * test_collections * test_contextlib * test_contextlib_async * test_ctypes * test_functools * test_multiprocessing_forkserver * test_multiprocessing_spawn * test_regrtest * test_statistics * test_typing * test_xml_etree_c I didn't analyze why yet. I guess that they are not real memory leaks, but more minor issue in the code checking for memory leaks. Sadly, it seems like such small glitch can cause a whole Refleak buildbot worker to fail :-(msg340386 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-04-17 10:16
Another example: AMD64 Fedora Rawhide Refleaks 3.6 https://buildbot.python.org/all/#/builders/193/builds/5 test_contextlib leaked [8, 1, 1] memory blocks, sum=10 I cannot reproduce the issue on my ("idle") laptop. It sounds like a random failure (race condition).msg353071 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-09-24 10:45
Recent fail on AMD64 Fedora Rawhide Refleaks 3.x: https://buildbot.python.org/all/#builders/189/builds/162 test_functools leaked [1, 2, 1] memory blocks, sum=4msg371286 - (view) Author: STINNER Victor (vstinner) *
Date: 2020-06-11 14:23
AMD64 Fedora Rawhide Refleaks 3.8: https://buildbot.python.org/all/#/builders/79/builds/196 test_functools leaked [1, 1, 1] memory blocks, sum=3msg394485 - (view) Author: STINNER Victor (vstinner) *
Date: 2021-05-26 22:40
I didn't see this failure recently. I close the issue.msg400739 - (view) Author: STINNER Victor (vstinner) *
Date: 2021-08-31 14:53
Recent failure on AMD64 Fedora Rawhide Refleaks 3.x: https://buildbot.python.org/all/#/builders/565/builds/131 test_typing leaked [1, 1, 1] memory blocks, sum=3 (...) 0:46:40 load avg: 0.00 Re-running test_typing in verbose mode test_typing leaked [1, 1, 1] memory blocks, sum=3 I don't reopen the issue since I failed to reproduce it.msg400742 - (view) Author: STINNER Victor (vstinner) *
Date: 2021-08-31 15:05
I can see the test_typing issue with this patch: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index b94826a5da..49e5f03414 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -124,7 +124,7 @@ def check_rc_deltas(deltas): # # [5, 5, 6] # [10, 1, 1] - return all(delta >= 1 for delta in deltas) + return any(delta >= 1 for delta in deltas) def check_fd_deltas(deltas): return any(deltas) Examples: $ ./python -m test test_typing -R 3:20 (...) beginning 23 repetitions 12345678901234567890123 ....................... test_typing leaked [1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=3 test_typing failed (reference leak) $ ./python -m test test_typing -R 1:20 WARNING: Running tests with --huntrleaks/-R and less than 3 warmup repetitions can give false positives! (...) beginning 21 repetitions 123456789012345678901 ..................... test_typing leaked [6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] references, sum=6 test_typing leaked [4, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=7 test_typing failed (reference leak) (...)msg400754 - (view) Author: Ken Jin (kj) *
Date: 2021-08-31 16:05
@Victor, with your patch applied, for test_typing (locally, Windows x64): $ ./python -m test test_typing -R 3:20 Different result on every run (race condition maybe?) $ ./python -m test test_typing -R 1:20 Same result as yours on every run.