BLD: replace uses of openblas_support with openblas wheels [wheel build] by mattip · Pull Request #25505 · numpy/numpy
Use the openblas wheels instead of tools/openblas_support.py in
- wheel building
- nightly/stable openblas32 CI runs
- sdist
This will fix #25503 (unpin cibuildwheel)
Thanks Matti. This looks about right; the failures on 32-bit Windows, musllinux and macOS arm64 are all the same (per platform) - you may want to fix those on a single wheel build job on your own fork before running the whole battery of builds here.
Mess. The scipy-openblas wheels differ from the tarballs. The tarballs we currently use have, in usr/local/lib, symlinks like
libscipy_openblas64_.a -> libscipy_openblas64_p-r0.3.24.dev.a
libscipy_openblas64_.so -> libscipy_openblas64_p-r0.3.24.dev.so
libscipy_openblas64_.so.0 -> libscipy_openblas64_p-r0.3.24.dev.so
libscipy_openblas64_p-r0.3.24.dev.so
But the real file libscipy_openblas64_p-r0.3.24.dev.so has an SONAME of libscipy_openblas64_.so.0, so after linking _multiarray_umath.so with the shared object it expects to find libscipy_openblas64_.so.0 in the LD_LIBRARY_PATH (the names are changed but the idea is the same on macos). This makes dynamically linking the artifacts in the wheel useless.
I think we have to rework the scipy-openblas wheel build before we can use it here to
- ship
libscipy_openblas64_.sonotlibscipy_openblas64_p-r0.3.24.dev.so, and - change the SONAME to
libscipy_openblas64_.so
That seems like the right change to make. There's no need for names like p-r0.3.24 at all, and symlinks in wheels are bad.
This was referenced
Jan 25, 2024@andyfaff do you know why the cirrus build sequence for wheels first builds NumPy, then calls cibuildwheel to build it again? It is failing since the cibw_before_build.sh does not overwrite $PWD/.openblas the second time it is called.
macos with python 3.11, 3.12 and pypy3.9 is failing but with python3.9 and 3.10 is passing. The problem seems to be that delocate is not seeing the DYLD_LIBRARY_PATH environment variable, so delocate does not find the scipy_openblas shared object. Did something change in the way delocate process the environment between the versions?
do you know why the cirrus build sequence for wheels first builds NumPy, then calls cibuildwheel to build it again?
It shouldn't do. Can you link to a build log that shows this. When I look at e.g. the cron job on cirrus I can't see the two builds.
I did notice something wrong with the cirrus macos_arm64 build though, it shouldn't be installing gfortran from cibw_before_build, it should be using the mamba environment installed compiler for gfortran.
A quick reminder, there is a matrix of images that the wheel building occurs for. Once for all the pythons on sonoma, and once on monterey. The wheels on Sonoma should be using Accelerate for BLAS, those on Monterey are supposed to still use OpenBLAS.
So that's a second issue, the Sonoma builds for the last cron job are finding and using scipy-openblas, when they should be using Accelerate. That needs to be fixed, the whole point of Accelerate is to reduce the size of wheels. It's important to fix this before the next release happens! (@charris, @rgommers). I don't actually know where scipy-openblas is being installed for that wheel build job.
It's not commented why delocate is pinned in the cibw_before_build script. I'll have a look into these issues in the next few days. It's going to be a slow ping time for me because I'm busy at work this week.
Can you link to a build log that shows this.
Here is a build log with 4 steps: clone, build, install_cibuildwheel, cibuildwheel.
the Sonoma builds for the last cron job are finding and using scipy-openblas
Those builds are probable not setting INSTALL_OPENBLAS=false, so it is being set by default here
| if [ -z $INSTALL_OPENBLAS ]; then | |
| export INSTALL_OPENBLAS=true | |
| fi |
which then triggers the use of OpenBLAS
clone, build, install_cibuildwheel, cibuildwheel
The naming of those build steps is a little cryptic. The first 'build' step is actually firing all the other steps off, not actually building numpy in it. It could be called something like 'initial setup' or something like that. I'm not sure why the cibw_before_build script is executed in that step, possibly cruft that can be cleaned up. Debugging all of this is probably better done locally, using the wiki instructions I wrote.
Those builds are probable not setting
INSTALL_OPENBLAS=false, so it is being set by default here
The Accelerate jobs should not have OpenBLAS installed, and that was certainly working until recently. Maybe the requirements files reshuffle broke it?
Debugging all of this is probably better done locally, using the wiki instructions I wrote.
Thanks I will try out these instructions without a VM on a mac machine. Or did you mean the containerized one?
Rebased on top of #25763 and added a fix for cibuildwheel not setting macos repair wheel environment variables as specified in the pyproject.toml
|
|
||
| source $PROJECT_DIR/tools/wheels/gfortran_utils.sh | ||
| install_gfortran | ||
| pip install "delocate==0.10.4" |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: this whole stanza has been removed: gfortran will not be installed on macos. The scipy-openblas wheels do not need it. This means the build is faster , but also that we cannot test 2py in these builds and I think the tests can use the gfortran from the conda install.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but also that we cannot test 2py in these builds.
That's fine; those are way too slow and need their own CI jobs instead. There's already an open issue for that somewhere.
I think this is ready for review. Some follow up, that I will do together with fixes from review:
- delete the
tools/openblas_support.pyscript - think about compacting the build matrix for cirrus arm64 builds. Even the double
cp310 cp311build is finishing in ~7 minutes.
think about compacting the build matrix for cirrus arm64 builds. Even the double cp310 cp311 build is finishing in ~7 minutes.
That seems very fast, it is an order of magnitude faster than the cirrus aarch64 builds and twice as fast as the linux x86_64 cp11 build. Maybe something is wrong?
By default each cirrus Mac runner has several cores, so compilation and testing is fast. That seems reasonable timing. If I do a clean build of numpy (with Accelerate) the build step takes 17s on an M1 mac, the testing takes about 2.5 mins.
One wheel build timed out in testing. Everything else passed. I don't think the failure is connected to this PR, but maybe?
A question before I review. The TLDR of the PR seems to be: install scipy_openblasX, then create a PKG_CONFIG_PATH with a scipy-openblas.pc file that is sourced from the scipy-openblasX wheel.
It's probably been discussed before, but it's not clear to me why the build (i.e. meson) can't be scripted in such a way that if it knows it's going to be using the scipy-openblasX wheel to source openblas, then why can't it query the wheel to figure out where the pc file is, and where the shared library is? i.e. why does this PR have to jump through hoops to move the pc file into a PKG_CONFIG_PATH?
e.g. on a mac the spin cmd for build appears to work as follows (won't have all the details correct here):
- If there are no blas specified find and use Accelerate
- If you use the
-use-scipy-openblasflag then use openblas from the wheel - use pkg-config to figure out the relevant BLAS, possibly having to use flag settings to change which BLAS is desired, and PKG_CONFIG_PATH to specify where the pc file is.
My point is that in step 2 the user doesn't have to fiddle about with setting PKG_CONFIG_PATH, or creating files, everything is taken care of behind the scenes. Why can't this happen in this PR, why the manual intervention by setting paths beforehand?
| echo $pkg_config > $PKG_CONFIG_PATH/scipy-openblas.pc | ||
| - name: Build a wheel via an sdist | ||
| env: | ||
| PKG_CONFIG_PATH: ${{ github.workspace }}/.openblas |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is set twice, probably because environment variables aren't persistent between steps? Would it be better to use GITHUB_ENV?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you set here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
| - name: Install dependencies | ||
| env: | ||
| # Need to do this in every step since setup-python overrides PKG_CONFIG_PATH | ||
| PKG_CONFIG_PATH: ${{ github.workspace }}/.openblas |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write to GITHUB_ENV to prevent having to set it repeatedly?
| PKG_CONFIG_PATH=$PROJECT_DIR/.openblas | ||
| rm -rf $PKG_CONFIG_PATH | ||
| mkdir -p $PKG_CONFIG_PATH | ||
| python -m pip install scipy-openblas64 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for maintenance branches how do we choose what version of scipy-openblas gets installed. e.g. for a certain maintenance branch we may want to stick with a certain (older) version, and bump when we do a major release. At the moment we just get the latest version.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the ci_requirements.txt file.
| elif [[ $RUNNER_OS == "Windows" ]]; then | ||
| # Install Openblas from scipy-openblas64 | ||
| if [[ "$INSTALL_OPENBLAS" = "true" ]] ; then | ||
| echo PKG_CONFIG_PATH $PKG_CONFIG_PATH |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to install scipy-openblas here - understood. However, why is the rest needed? Can't the meson build process do the querying of scipy-openblas to figure out all the necessary info?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meson could run python and try to import scipy_openblas32, then try to import scipy_openblas64. If either of those succeed, it could use get_include() and get_libary() to add the proper flags to the build. @rgommers thoughts?
Why can't this happen in this PR, why the manual intervention
The cibw_before_build.sh script does exactly what spin --with-open-blas does: create a pc file. Using spin during the build would not be a 1:1 replacement. In wheel building, the wheel packaging step (delocate/auditwheel/delvewheel) needs to know where to find the shared objects, which must be done outside the controlled spin environment. I guess this too could be moved inside spin with a dedicated spin repair-wheel command to abstract out the differences between the three tools and to add the proper path to the shared objects.
Is that what you were thinking?
the spin cmd for build appears to work as follows ...
Just to be more accurate: that is not the spin logic, it is logic inside numpy/meson.build, which has branching logic to handle the various BLAS options. It could call incantations of run_command(py, ...) to replace parts of the cibw_before_build.sh script and the spin --with-scipy-openblas logic with calls to scipy_openbas32 or scipy_openblas64. I am not sure where that "magic" should happen.
Outside meson.build: meson tries very hard to not incorporate too much logic and magic inside it, and depends on the user to properly set up pkg-config or cmake files for dependency detection.
Inside meson.build: there is some merit to not repeating the pkg-config file creation twice, and to eliminate the need to externally set a PKGC_CONFIG_PATH environment variable.
OK, so you're saying that the main purpose is to let delocate/auditwheel/delvewheel know where the libraries are? I don't know why they would need to be told that. For example, if I use otool on a recently built scipy dylib:
(dev3) 192-168-1-107:linalg andrew$ otool -L cython_lapack.cpython-310-darwin.so
cython_lapack.cpython-310-darwin.so:
/opt/arm64-builds/lib/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)
It tells me where the openblas library is! They should just go and grab it.
Perhaps I'm being too nitpicky. What we have so far is obviously working, so perhaps we should merge instead of making things too magical.
I don't know why they would need to be told that.
The tarball used by openblas_support is built with absolute paths burned into the shared objects (RPATH and friends) on linux and macos. Using relative paths instead like the scipy-openblas self-contained wheel build has implications. You can see the commit history where I tried many iterations of things until I got to this point where everything JustWorks and CI is green.
Edit: with that, I am prepared to rethink the work here and discuss alternatives.
Edit2: I think you are looking at the results of a local build of scipy. When installing scipy into a virtual environment from a wheel it looks different:
% otool -L /tmp/venv3python312/lib/python3.12/site-packages/scipy//linalg/cython_lapack.cpython-312-darwin.so
/tmp/venv3python312/lib/python3.12/site-packages/scipy//linalg/cython_lapack.cpython-312-darwin.so:
@loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
Ah ok. Well then since it's green we may as well merge, modulo the things that I suggested that you think should be added.
I pushed a fix commit:
- remove the tools/openblas_support.py script
- reviewed all the places PKG_CONFIG_PATH is set and try to compact it to once per CI file
- used
spin config-openblaswhere possible - add a comment about
/projectas a path - compact the cirrus wheel build jobs from 3 to 2. I think it could even be 1 job and still not exceed an hour.
Using spin config-openblas cannot be done in a wheel-based build, since it prepares the build for the case of using the scipy-openblas wheel at runtime, and in the wheel-build case we want to build a stand-alone wheel that packages the shared objects into the wheel rather than get them from the scipy-openblas wheel.
Looks good to me. A good block of work, thanks @mattip
| build-verbosity = "3" | ||
| before-build = "bash {project}/tools/wheels/cibw_before_build.sh {project}" | ||
| # The build will use openblas64 everywhere, except on arm64 macOS >=14.0 (uses Accelerate) | ||
| config-settings = "setup-args=-Duse-ilp64=true setup-args=-Dallow-noblas=false build-dir=build" |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip why did you remove use-ilp64 here? It looks like it switched all the wheels to LP64. Log extracts:
# manylinux x86-64:
Run-time dependency scipy-openblas found: YES 0.3.26
Message: BLAS symbol suffix:
# Accelerate:
Run-time dependency accelerate found: YES
Message: BLAS symbol suffix: $NEWLAPACK
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the macros to the pkg-config file here so it is not needed for the scipy-openblas build. I guess it is important to leave it in for diagnostics and for other blas builds though, right?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed
No more tarballs or openblas_support script, nice! Thanks for getting this in. Looks good overall, modulo my one comment.
This was referenced
Feb 11, 2024
mattip
deleted the
replace-openblas_support
branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters