Locally built whl files are not reproducible
Pip uses a temporary directory as the build directory for system specific libraries (eg. psutil), this information gets transferred to the resulting *.so files which causes each call of pip to produce a different *.so file.
The following repository illustrates the issue https://github.com/ltekieli/rules_python_bug.
It uses --disk_cache to specify a common cache between runs. Each run is done in a new docker container instance.
Buggy output is:
$ ./test.sh
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 4f84ccf8-7744-4871-abcd-2b1ef9209fbb
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (31 packages loaded, 589 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
bazel-bin/test_one
INFO: Elapsed time: 53.267s, Critical Path: 1.28s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one PASSED in 0.9s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 5 total actions
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: f615b1f7-d36c-4f30-96fd-c4fb36d6f4e3
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (31 packages loaded, 590 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
bazel-bin/test_one
INFO: Elapsed time: 53.548s, Critical Path: 1.05s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one PASSED in 0.7s
In the second run the test result should be taken from the cache, but due to the nondeterministic whl it is not.
Applying the following patch solves this issue and the test is properly taken from the cache.
diff --git a/rules_python/piptool.py b/rules_python/piptool.py
index f5d504a..eb688dc 100644
--- a/rules_python/piptool.py
+++ b/rules_python/piptool.py
@@ -154,7 +154,7 @@ def main():
args = parser.parse_args()
# https://github.com/pypa/pip/blob/9.0.1/pip/__init__.py#L209
- if pip_main(["wheel", "-w", args.directory, "-r", args.input]):
+ if pip_main(["wheel", "-b", "/tmp/pip-build", "-w", args.directory, "-r", args.input]):
sys.exit(1)
# Enumerate the .whl files we downloaded.
And correct output:
./test.sh
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 4fe44acd-92dc-4063-8c05-f5639da00909
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546883993 +0100"}
INFO: Analysed target //:test_one (31 packages loaded, 591 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
bazel-bin/test_one
INFO: Elapsed time: 51.196s, Critical Path: 1.06s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one PASSED in 0.7s
Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these INFO: Build completed successfully, 5 total actions
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 961fce0b-6e99-4b32-a6a8-1d5ca9640e2c
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546883993 +0100"}
INFO: Analysed target //:test_one (31 packages loaded, 589 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
bazel-bin/test_one
INFO: Elapsed time: 47.996s, Critical Path: 0.26s
INFO: 1 process: 1 remote cache hit.
INFO: Build completed successfully, 5 total actions
//:test_one (cached) PASSED in 0.2s
Executed 0 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these INFO: Build completed successfully, 5 total actions
The problem with this solution is that it might happen that parallel executions of pip write to the same build directory when tmp is not sandboxed by bazel. I'm not sure how to solve this properly yet.