Test Order & Concurrency by paultiq · Pull Request #75 · duckdb/duckdb-python
This arose from discussion
Edit: Summary
duckdb-python tests "pollute" the global state by: modifying the default duckdb connection, using local file names that aren't randomized, or otherwise break if run in the "wrong" order.
This PR fixes that:
- using the tmp_path fixture to use unique, test-local databases
- closing any pre-existing default connection before using it
- addressing issues with the query interruption test that arose while testing.
This PR also ensures that test dependencies aren't introduced by randomizing their order & parallelizing their running.
PR
While running tests for free-threading, I found some test "contamination": tests that are order-dependent or lack isolation of resources.
* This PR makes no functional changes, only changes to tests / testing.
This PR addresses the individual tests and adds steps to detect in the future:
- Randomizing their order: to surface any order dependencies
- Enabling multi-process parallelism to tests, identifying any dependencies and also reducing overall test time
- Using unique path fixtures or table names to eliminate conflicts across concurrent tests
- Moved a slow test (10M rows) to tests/slow
- Reworked test_query_interruption.
- added a 10 minute timeout for pytests
Added Plugins
pytest-xdist
Disabled by default. This PR adds -n 2 to packaging_wheels to run two parallel tests at a time.
Comments:
- One concern would be overloading the test runners. I haven't found this to happen:
-n autowas surprisingly fine. I believe Runners on public repos are 4 cores. - I suggest
-n 2as a first step in case any issues with the more performance intensive tests. - When doing performance testing, don't pass
-n.
pytest-randomly
Enabled by default. Randomizes the order of tests.
Comments:
- Random order is good for surfacing test dependencies and assumptions
- This identified a number of test dependencies, especially around use of a "dirty" default connection or reusing file names