Override bash executable, defaulting to Git for Windows git bash over WSL bash by emanspeaks · Pull Request #1791 · gitpython-developers/GitPython

added 4 commits

January 8, 2024 19:21

EliahKagan

@emanspeaks

@emanspeaks

@emanspeaks

@emanspeaks

EliahKagan

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Jan 25, 2024
+ Test a successful refresh with a relative path, which will be
  safer to do once the refresh tests restore changed state.

This is incomplete, because while it is probably not necessary
while running the test suite to preserve an old False value of
git.GIT_OK (most tests can't work if that happens anyway), the
value of Git.GIT_PYTHON_GIT_EXECUTABLE is not the only other global
state that effects the behavior of subsequently run tests and that
may be changed as a result of the refresh tests.

1. After the git.refresh function calls git.cmd.Git.refresh, it
   calls git.remote.FetchInfo.refresh, which rebinds the
   git.remote.FetchInfo._flag_map attribute.

2. Future changes to git.cmd.Git.refresh may mutate other state in
   the Git class, and ideally the coupling would be loose enough
   that the refresh tests wouldn't have to be updated for that if
   the behavior being tested does not change.

3. Future changes to git.refresh may perform other refreshing
   actions, and ideally it would be easy (and obvious) what has to
   be done to patch it back. In particular, it will likely call
   another Git method that mutates class-wide state due to gitpython-developers#1791,
   and for such state that is also of the Git class, ideally no
   further changes would have to be made to the code that restores
   state after the refresh tests.

If we assume git.refresh is working at least in the case that it is
called with no arguments, then the cleanup can just be a call to
git.refresh(). Otherwise, sufficiently general cleanup may be more
complicated.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Jan 25, 2024
+ Test a successful refresh with a relative path, which will be
  safer to do once the refresh tests restore changed state.

See gitpython-developers#1811. This addresses it incompletely, because while it is
probably not necessary while running the test suite to preserve an
old False value of git.GIT_OK (most tests can't work if that
happens anyway), the value of Git.GIT_PYTHON_GIT_EXECUTABLE is not
the only other global state that effects the behavior of
subsequently run tests and that may be changed as a result of the
refresh tests.

1. After the git.refresh function calls git.cmd.Git.refresh, it
   calls git.remote.FetchInfo.refresh, which rebinds the
   git.remote.FetchInfo._flag_map attribute.

2. Future changes to git.cmd.Git.refresh may mutate other state in
   the Git class, and ideally the coupling would be loose enough
   that the refresh tests wouldn't have to be updated for that if
   the behavior being tested does not change.

3. Future changes to git.refresh may perform other refreshing
   actions, and ideally it would be easy (and obvious) what has to
   be done to patch it back. In particular, it will likely call
   another Git method that mutates class-wide state due to gitpython-developers#1791,
   and for such state that is also of the Git class, ideally no
   further changes would have to be made to the code that restores
   state after the refresh tests.

If we assume git.refresh is working at least in the case that it is
called with no arguments, then the cleanup can just be a call to
git.refresh(). Otherwise, sufficiently general cleanup may be more
complicated.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request

Dec 1, 2024
Rather than hard-coding `bash` on all systems as the fallback
interpreter when a fixture script cannot be run directly, this
falls back in an operating system specific manner:

- Except on Windows, always fall back to `bash`, as before.

- On Windows, run `git --exec-path` to find the `git-core`
  directory. Then check if a `bash.exe` exists at the expected
  location relative to that. In Git for Windows installations,
  this will usually work. If so, use that path (with `..`
  components resolved away).

- On Windows, if a specific `bash.exe` is not found in that way,
  then fall back to using the relative path `bash.exe`. This is to
  preserve the ability to run `bash` on Windows systems where it
  may have worked before even without `bash.exe` in an expected
  location provided by a Git for Windows installation.

(The distinction between `bash` and `bash.exe` is only slightly
significant: we check for the existence of the interpreter without
initially running it, and that check requires the full filename.
It is called `bash.exe` elsewhere for consistency both with the
checked-for executable and for consistencey with how we run most
other programs on Windows, e.g., the `git` vs. `git.exe`.)

This fixes GitoxideLabs#1359. That bug is not currently observed on CI, but
this change is verified to fix it on a local test system where it
previously always occurred when running the test suite from
PowerShell in an unmodified environment. The fix applies both with
`GIX_TEST_IGNORE_ARCHIVES` unset, in which case there are now no
failures, and with `GIX_TEST_IGNORE_ARCHIVES=1`, in which case the
failures are now limited to the 15 cases tracked in GitoxideLabs#1358.

Previously, fixture scripts had been run on Windows with whatever
`bash` was found in a `PATH` search, which had two problems:

- On most Windows systems, even if no WSL distribution is installed
  and even if WSL itself is not set up, the `System32` directory
  contains a `bash.exe` program associated with WSL. This program
  attempts to use WSL to run `bash` in an installed distribution.
  The `wsl.exe` program also provides this functionality and is
  favored for this purpose, but the `bash.exe` program is still
  present and is likely to remain for many years for compatibility.

  Even when this `bash` is usable, it is not suited for running
  most shell scripts meant to operate on the native Windows system.
  In particular, it is not suitable for running our fixture
  scripts, which need to use the native `git` to prepare fixtures
  to be used natively, among other requirements that would not be
  satisfied with WSL (except when the tests are actually running in
  WSL).

  Since some fixtures are `.gitignore`d because creating them on
  the test system (rather than another system) is part of the test,
  this has caused breakage in most Windows environments unless
  `PATH` is modified -- either explicitly or by testing in an MSYS2
  environment, such as the Git Bash environment -- whether or not
  `GIX_TEST_IGNORE_ARCHIVES` is set. This was the cause of GitoxideLabs#1359.

- Although using a Git Bash environment or otherwise adjusting the
  path *currently* works, the reasons it works are subtle and rely
  on non-guaranteed behavior of `std::process::Command` path search
  that may change without warning.

  On Windows, processes are created by calling the `CreateProcessW`
  API function. `CreateProcessW` is capable of performing a `PATH`
  search, but this `PATH` search is not secure in most uses, since
  it includes the current directory (and searches it before `PATH`
  directories) unless `NoDefaultCurrentDirectoryInExePath` is set
  in the caller's environment.

  While it is the most relevant to security, the CWD is not the
  only location `CreateProcessW` searches before searching `PATH`
  directories (and regardless of where, if anywhere, they may also
  appear in `PATH`). Another such location is the `System32`
  directory. This is to say that, even when another directory with
  `bash.exe` precedes `System32` in `PATH`, an executable search
  will still find the WSL-associated `bash.exe` in `System32`
  unless it deviates from the algorithm `CreateProcessW` uses.

  To avoid including the CWD in the search, `std::process::Command`
  performs its own path search, then passes the resolved path to
  `CreateProcessW`. The path search it performs is currently almost
  the same the algorithm `CreateProcessW` uses, other than not
  automatically including the CWD. But there are some other subtle
  differences.

  One such difference is that, when the `Command` instance is
  configured to create a modified child environment (for example,
  by `env` calls), the `PATH` for the child is searched early on.
  This precedes a search of the `System32` directory. It is done
  even if none of the customizations of the child environment
  modify its `PATH`.

  This behavior is not guaranteed, and it may change at any time.
  It is also the behavior we rely on inadvertently every time we
  run `bash` on Windows with a `std::process::Command` instance
  constructed by passing `bash` or `bash.exe` as the `program`
  argument: it so happens that we are also customizing the child
  environment, and due to implementation details in the Rust
  standard library, this manages to find a non-WSL `bash` when
  the tests are run in Git Bash, in GitHub Actions jobs, and in
  some other cases.

  If in the future this is not done, or narrowed to be done only
  when `PATH` is one of the environment variables customized for
  the child process, then putting the directory with the desired
  `bash.exe` earlier than the `System32` directory in `PATH` will
  no longer prevent `std::proces::Command` from finding the
  `bash.exe` in `System32` as `CreateProcessW` would and using it.
  Then it would be nontrivial to run the test suite on Windows.

For references and other details, see GitoxideLabs#1359 and comments including:
GitoxideLabs#1359 (comment)

On the approach of finding the Git for Windows `bash.exe` relative
to the `git-core` directory, see the GitPython pull request
gitpython-developers/GitPython#1791.

Two possible future enhancements are *not* included in this commit:

1. This only modifies how test fixture scripts are run. It only
   affects the behavior of `gix-testtools`, and not of any other
   gitoxide crates such as `gix-command`. This is because:

   - While gitoxide uses information from `git` to find out where
     it is installed, mainly so we know where to find installation
     level configuration, we cannot in assume that `git` is present
     at all. Unlike GitPython, gitoxide is usable without `git`.

   - We know our test fixture scripts are all (at least currently)
     `bash` scripts, and this seems likely for other software that
     currently uses this functionality of `gix-testtools`. But
     scripts that are run as hooks, or as custom commands, or
     filters, etc., are often written in other languages, such as
     Perl. (The fallback here does not examine leading `#!` lines.)

   - Although a `bash.exe` located at the usual place relative to
     (but outside of) the `git-core` directory is usually suitable,
     there may be scenarios where running an executable found this
     way is not safe. Limiting it to `gix-testtools` pending
     further research may help mitigate this risk.

2. As in other runs of `git` by `gix-testools`, this calls
   `git.exe`, letting `std::process::Command` do an executable
   search, but not trying any additional locations where Git is
   known sometimes to be installed. This does not find `git.exe` in
   as many situations as `gix_path::env::exe_invocation` does.

   The reasons for not (or not quite yet) including that change are:

   - It would add `gix-path` as a dependency of `gix-testtools`.

   - Finding `git` in a `std::process::Command` path search is an
     established (though not promised) approach in `gix-testtools`,
     including to run `git --exec-path` (to find `git-daemon`).

   - It is not immediately obvious that `exe_invocation` behavior
     is semantically correct for `gix-testtools`, though it most
     likely is reasonable.

     The main issue is that, in many cases where `git` itself runs
     scripts, it prepends the path to the `git-core` directory to
     the `PATH` environment variable for the script. This directory
     has a `git` (or `git.exe`) executable in it, so scripts run
     an equivalent `git` associated with the same installation.

     In contrast, when we run test fixture scripts with a
     `bash.exe` associated with a Git for Windows installation, we
     do not customize its path. Since top-level scripts written to
     use `git` but not to be used *by* `git` are usually written
     without the expectation of such an environment, prepending
     this will not necessarily be an improvement.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request

Dec 1, 2024
Rather than hard-coding `bash` on all systems as the fallback
interpreter when a fixture script cannot be run directly, this
falls back in an operating system specific manner:

- Except on Windows, always fall back to `bash`, as before.

- On Windows, run `git --exec-path` to find the `git-core`
  directory. Then check if a `bash.exe` exists at the expected
  location relative to that. In Git for Windows installations,
  this will usually work. If so, use that path (with `..`
  components resolved away).

- On Windows, if a specific `bash.exe` is not found in that way,
  then fall back to using the relative path `bash.exe`. This is to
  preserve the ability to run `bash` on Windows systems where it
  may have worked before even without `bash.exe` in an expected
  location provided by a Git for Windows installation.

(The distinction between `bash` and `bash.exe` is only slightly
significant: we check for the existence of the interpreter without
initially running it, and that check requires the full filename.
It is called `bash.exe` elsewhere for consistency both with the
checked-for executable and for consistencey with how we run most
other programs on Windows, e.g., the `git` vs. `git.exe`.)

This fixes GitoxideLabs#1359. That bug is not currently observed on CI, but
this change is verified to fix it on a local test system where it
previously always occurred when running the test suite from
PowerShell in an unmodified environment. The fix applies both with
`GIX_TEST_IGNORE_ARCHIVES` unset, in which case there are now no
failures, and with `GIX_TEST_IGNORE_ARCHIVES=1`, in which case the
failures are now limited to the 15 cases tracked in GitoxideLabs#1358.

Previously, fixture scripts had been run on Windows with whatever
`bash` was found in a `PATH` search, which had two problems:

- On most Windows systems, even if no WSL distribution is installed
  and even if WSL itself is not set up, the `System32` directory
  contains a `bash.exe` program associated with WSL. This program
  attempts to use WSL to run `bash` in an installed distribution.
  The `wsl.exe` program also provides this functionality and is
  favored for this purpose, but the `bash.exe` program is still
  present and is likely to remain for many years for compatibility.

  Even when this `bash` is usable, it is not suited for running
  most shell scripts meant to operate on the native Windows system.
  In particular, it is not suitable for running our fixture
  scripts, which need to use the native `git` to prepare fixtures
  to be used natively, among other requirements that would not be
  satisfied with WSL (except when the tests are actually running in
  WSL).

  Since some fixtures are `.gitignore`d because creating them on
  the test system (rather than another system) is part of the test,
  this has caused breakage in most Windows environments unless
  `PATH` is modified -- either explicitly or by testing in an MSYS2
  environment, such as the Git Bash environment -- whether or not
  `GIX_TEST_IGNORE_ARCHIVES` is set. This was the cause of GitoxideLabs#1359.

- Although using a Git Bash environment or otherwise adjusting the
  path *currently* works, the reasons it works are subtle and rely
  on non-guaranteed behavior of `std::process::Command` path search
  that may change without warning.

  On Windows, processes are created by calling the `CreateProcessW`
  API function. `CreateProcessW` is capable of performing a `PATH`
  search, but this `PATH` search is not secure in most uses, since
  it includes the current directory (and searches it before `PATH`
  directories) unless `NoDefaultCurrentDirectoryInExePath` is set
  in the caller's environment.

  While it is the most relevant to security, the CWD is not the
  only location `CreateProcessW` searches before searching `PATH`
  directories (and regardless of where, if anywhere, they may also
  appear in `PATH`). Another such location is the `System32`
  directory. This is to say that, even when another directory with
  `bash.exe` precedes `System32` in `PATH`, an executable search
  will still find the WSL-associated `bash.exe` in `System32`
  unless it deviates from the algorithm `CreateProcessW` uses.

  To avoid including the CWD in the search, `std::process::Command`
  performs its own path search, then passes the resolved path to
  `CreateProcessW`. The path search it performs is currently almost
  the same the algorithm `CreateProcessW` uses, other than not
  automatically including the CWD. But there are some other subtle
  differences.

  One such difference is that, when the `Command` instance is
  configured to create a modified child environment (for example,
  by `env` calls), the `PATH` for the child is searched early on.
  This precedes a search of the `System32` directory. It is done
  even if none of the customizations of the child environment
  modify its `PATH`.

  This behavior is not guaranteed, and it may change at any time.
  It is also the behavior we rely on inadvertently every time we
  run `bash` on Windows with a `std::process::Command` instance
  constructed by passing `bash` or `bash.exe` as the `program`
  argument: it so happens that we are also customizing the child
  environment, and due to implementation details in the Rust
  standard library, this manages to find a non-WSL `bash` when
  the tests are run in Git Bash, in GitHub Actions jobs, and in
  some other cases.

  If in the future this is not done, or narrowed to be done only
  when `PATH` is one of the environment variables customized for
  the child process, then putting the directory with the desired
  `bash.exe` earlier than the `System32` directory in `PATH` will
  no longer prevent `std::proces::Command` from finding the
  `bash.exe` in `System32` as `CreateProcessW` would and using it.
  Then it would be nontrivial to run the test suite on Windows.

For references and other details, see GitoxideLabs#1359 and comments including:
GitoxideLabs#1359 (comment)

On the approach of finding the Git for Windows `bash.exe` relative
to the `git-core` directory, see the GitPython pull request
gitpython-developers/GitPython#1791, its
comments, and the implementation of the approach by @emanspeaks:
https://github.com/gitpython-developers/GitPython/blob/f065d1fba422a528a133719350e027f1241273df/git/cmd.py#L398-L403

Two possible future enhancements are *not* included in this commit:

1. This only modifies how test fixture scripts are run. It only
   affects the behavior of `gix-testtools`, and not of any other
   gitoxide crates such as `gix-command`. This is because:

   - While gitoxide uses information from `git` to find out where
     it is installed, mainly so we know where to find installation
     level configuration, we cannot in assume that `git` is present
     at all. Unlike GitPython, gitoxide is usable without `git`.

   - We know our test fixture scripts are all (at least currently)
     `bash` scripts, and this seems likely for other software that
     currently uses this functionality of `gix-testtools`. But
     scripts that are run as hooks, or as custom commands, or
     filters, etc., are often written in other languages, such as
     Perl. (The fallback here does not examine leading `#!` lines.)

   - Although a `bash.exe` located at the usual place relative to
     (but outside of) the `git-core` directory is usually suitable,
     there may be scenarios where running an executable found this
     way is not safe. Limiting it to `gix-testtools` pending
     further research may help mitigate this risk.

2. As in other runs of `git` by `gix-testools`, this calls
   `git.exe`, letting `std::process::Command` do an executable
   search, but not trying any additional locations where Git is
   known sometimes to be installed. This does not find `git.exe` in
   as many situations as `gix_path::env::exe_invocation` does.

   The reasons for not (or not quite yet) including that change are:

   - It would add `gix-path` as a dependency of `gix-testtools`.

   - Finding `git` in a `std::process::Command` path search is an
     established (though not promised) approach in `gix-testtools`,
     including to run `git --exec-path` (to find `git-daemon`).

   - It is not immediately obvious that `exe_invocation` behavior
     is semantically correct for `gix-testtools`, though it most
     likely is reasonable.

     The main issue is that, in many cases where `git` itself runs
     scripts, it prepends the path to the `git-core` directory to
     the `PATH` environment variable for the script. This directory
     has a `git` (or `git.exe`) executable in it, so scripts run
     an equivalent `git` associated with the same installation.

     In contrast, when we run test fixture scripts with a
     `bash.exe` associated with a Git for Windows installation, we
     do not customize its path. Since top-level scripts written to
     use `git` but not to be used *by* `git` are usually written
     without the expectation of such an environment, prepending
     this will not necessarily be an improvement.