Test Order & Concurrency by paultiq · Pull Request #75 · duckdb/duckdb-python

This arose from discussion

Edit: Summary

duckdb-python tests "pollute" the global state by: modifying the default duckdb connection, using local file names that aren't randomized, or otherwise break if run in the "wrong" order.

This PR fixes that:

  • using the tmp_path fixture to use unique, test-local databases
  • closing any pre-existing default connection before using it
  • addressing issues with the query interruption test that arose while testing.

This PR also ensures that test dependencies aren't introduced by randomizing their order & parallelizing their running.

PR

While running tests for free-threading, I found some test "contamination": tests that are order-dependent or lack isolation of resources.

* This PR makes no functional changes, only changes to tests / testing.

This PR addresses the individual tests and adds steps to detect in the future:

  • Randomizing their order: to surface any order dependencies
  • Enabling multi-process parallelism to tests, identifying any dependencies and also reducing overall test time
  • Using unique path fixtures or table names to eliminate conflicts across concurrent tests
  • Moved a slow test (10M rows) to tests/slow
  • Reworked test_query_interruption.
  • added a 10 minute timeout for pytests

Added Plugins

pytest-xdist

Disabled by default. This PR adds -n 2 to packaging_wheels to run two parallel tests at a time.

Comments:

  • One concern would be overloading the test runners. I haven't found this to happen: -n auto was surprisingly fine. I believe Runners on public repos are 4 cores.
  • I suggest -n 2 as a first step in case any issues with the more performance intensive tests.
  • When doing performance testing, don't pass -n.
pytest-randomly

Enabled by default. Randomizes the order of tests.

Comments:

  • Random order is good for surfacing test dependencies and assumptions
  • This identified a number of test dependencies, especially around use of a "dirty" default connection or reusing file names