Improve scripts and tool configurations by EliahKagan · Pull Request #1693

Improve scripts and tool configurations by EliahKagan · Pull Request #1693 · gitpython-developers/GitPython

added 30 commits

October 3, 2023 14:02

This also reorders the hooks from pre-commit/pre-commit-hooks so
that the overall order of all hooks from all repositories is: lint
Python, lint non-Python, non-lint.

Its output is colorized normally, but on CI it is not colorized
without this (pre-commit's own output is, but not shellcheck's).

This suppresses ShellCheck SC2016, "Double quote to prevent
globbing and word splitting," on the command in the version check
script that expands $config_opts to build the "-c ..." arguments.

It also moves the code repsonsible for getting the latest tag,
which this is part of, into a function for that purpose, so it's
clear that building config_opts is specifically for that, and so
that the code is not made harder to read by adding the ShellCheck
suppression comment.

(The suppression applies only to the immediate next command.)

In the release building script, this changes $1 to "$1", because
$1 without quotes means to perform word splitting and globbing
(pathname expansion) on the parameter (unless otherwise disabled by
the value of $IFS or "set -f", respectively) and use the result of
doing so, which isn't the intent of the code.

This function is only used from within the script, where it is
not given values that would be changed by these additional
expansions. So this is mainly about communicating intent.

(If in the future it is intended that multiple arguments be usable,
then they should be passed as separate arguments to release_with,
which should be modified by replacing "$1" with "$@". I have not
used "$@" at this time because it is not intuitively obvious that
the arguments should be to the interpreter, rather than to the
build module, so I don't think this should be supported unless or
until a need for it determines that.)

I think this is easier to read, but this is admittedly subjective.

This commit also makes the separate change of adjusting comment
spacing for consistency within the script. (Two spaces before "#"
is not widely regarded as better than one in shell scripting, so
unlike Python where PEP-8 recommends that, it would be equally good
to have changed all the other places in the shell scrips to have
just one.)

The script is nonportable to other shells already because of its
use of trap (and other features, such as implicit assumptions made
about echo, and the function keyword), which its hashbang already
clearly conveys.

This uses:

- $(<X) in place of $(cat X), to have the shell read the file
  directly rather than using cat.

- printf -v in one place to set a variable rather than printing.
  This eliminates a command substitution that was harder to read.

Because users may have an old version of git without "git switch",
init-tests-after-clone.sh should continue to use "git checkout" to
attempt to switch to master. But without "--", this suffers from
the problem that it's ambiguous if master is a branch (so checkout
behaves like switch) or a path (so checkout behaves like restore).

There are two cases where this ambiguity can be a problem. The most
common is on a fork with no master branch but also, fortunately, no
file or directory named "master". Then the problem is just the
error message (printed just before the script proceeds to redo
the checkout with -b):

    error: pathspec 'master' did not match any file(s) known to git

The real cause of the error is the branch being absent, as happens
when a fork copies only the main branch and the upstream remote is
not also set up. Adding the "--" improves the error message:

    fatal: invalid reference: master

However, it is possible, though unlikely, for a file or directory
called "master" to exist. In that case, if there is also no master
branch, git discards unstaged changes made to the file or (much
worse!) everywhere in that directory, potentially losing work.

This commit adds "--" to the right of "master" so git never
regards it as a path. This is not needed with -b, which is always
followed by a symbolic name, so I have not added it there.

(Note that the command is still imperfect because, for example, in
rare cases there could be a master *tag*--and no master branch--in
which case, as before, HEAD would be detached there and the script
would attempt to continue.)

This had been done everywhere except in init-tests-after-clone.sh.

This translates init-tests-after-clone.sh from bash to POSIX sh,
changing the hashbang and replacing all bash-specific features, so
that it can be run on any POSIX-compatible ("Bourne style") shell,
including but not limited to bash. This allows it to be used even
on systems that don't have any version of bash installed anywhere.

Only that script is modified. The other shell scripts make greater
use of (and gain greater benefit from) bash features, and are also
only used for building and releasing. In contrast, this script is
meant to be used by most users who clone the repository.

This makes three related changes:

- Removes "git fetch --tags" from the instructions in the readme,
  because the goal of this command can be achieved in the
  init-tests-after-clone.sh script, and because this fetch command,
  as written, is probably only achieving that goal in narrow cases.
  In clones and fetches, tags on branches are fetched by default,
  and the tags needed by the tests are on branches. So the
  situations where "git fetch --tags" was helping were (a) when the
  remote recently gained the tags, and (b) when the original remote
  was cloned in an unusual way, not fetching all tags. In both
  cases, the "--tags" option is not what makes that fetch get the
  needed tags.

- Adds "git fetch --all --tags" to init-tests-after-clone.sh. The
  "--all" option causes it to fetch from all remotes, and this is
  more significant than "--tags", since the tags needed for testing
  are on fetched branches. This achieves what "git fetch --tags"
  was achieving, and it also has the benefit of getting tags from
  remotes that have been added but not fetched from, as happens
  with an upstream remote that was manually added with no further
  action. (It also gets branches from those remotes, but if master
  is on multiple remotes but at different commits then "git
  checkout master" may not be able to pick one. So do this *after*
  rather than before that.)

- Skips this extra fetch, and also the submodule cloning/updating
  step, when running on CI. CI jobs will already have taken care of
  cloning the repo with submodules recursively, and fetching all
  available tags. In forks without tags, the necessary tags for the
  test are not fetched--but the upstream remote is not set up on
  CI, so they wouldn't be obtained anyway, not even by refetching
  with "--all".

This extracts duplicated code to functions in check-version.sh.

One effect is unfortunately that the specific commands being run
are less explicitly clear when reading the script. However, small
future changes, if accidentally made to one but not the other in
either pair of "git status" commands, would create a much more
confusing situation. So I think this change is justified overall.

- Because the substitition string after the hyphen is empty,
  "${VIRTUAL_ENV:-}" and "${VIRTUAL_ENV-}" have the same effect.
  However, the latter, which this changes it to, expresses the
  correct idea that the special case being handled is when the
  variable is unset: in this case, we expand an empty field rather
  than triggering an error due to set -u. When the variable is set
  but empty, it already expands to the substitution value, and
  including that in the special case with ":" is thus misleading.

- Continuing in the vein of d18d90a (and 1e0b3f9), this removes
  another explicit newline by adding another echo command to print
  the leading blank line before the table.

This adds comments to init-tests-after-clone.sh to explain what
each of the steps is doing that is needed by some of the tests.

It also refactors some recently added logic, in a way that lends
itself to greater clarity when combined with these comments.

This is already done in the other shell scripts. Although
init-tests-after-clone.sh does not have as many places where a bug
could slip through by an inadvertently nonexistent parameter, it
does have $answer (and it may have more expansions in the future).

This is a minor improvement to the robustness of the command for
"make all", in the face of plausible future target names.

- Use [[:alpha:]] instead of [a-z] because, in POSIX BRE and ERE,
  whether [a-z] includes capital letters depends on the current
  collation order. (The goal here is greater consistency across
  systems, and for this it would also be okay to use [[:lower:]].)

- Pass -x to the command that filters out the all target itself,
  so that the entire name must be "all", because a future target
  may contain the substring "all" as part of a larger word.

This seems simpler to me, but I admit it is subjective.

- Add the psf/black-pre-commit-mirror pre-commit hook but have it
  just check and report violations rather than fixing them
  automatically (to avoid inconsistency with all the other hooks,
  and also so that the linting instructions and CI lint workflow
  can remain the same and automatically do the black check).

- Remove the "black" environment from tox.ini, since it is now part
  of the linting done in the "lint" environment. (But README.md
  remains the same, as it documented actually auto-formatting with
  black, which is still done the same way.)

- Add missing "exclude" keys for the shellcheck and black
  pre-commit hooks to explicitly avoid traversing into the
  submodules.

This splits the pre-commit hook for black into two hooks, both
using the same repo and id but separately aliased:

- black-check, which checks but does not modify files. This only
  runs when the manual stage is specified, and it is used by tox
  and on CI, with tox.ini and lint.yml modified accordingly.

- black-format, which autoformats code. This provides the behavior
  most users will expect from a pre-commit hook for black. It runs
  automatically along with other hooks. But tox.ini and lint.yml,
  in addition to enabling the black-check hook, also explicitly
  skip the black-format hook.

The effect is that in ordinary development the pre-commit hook for
black does auto-formatting, but that pre-commit continues to be
used effectively for running checks in which code should not be
changed.

In the lint workflow, passing extra-args replaced --all-files,
rather than keeping it but adding the extra arguments after it.

(But --show-diff-on-failure and --color=always were still passed.)

Now that the changes to lint.yml are fixed up, tested, and shown
to be capable of revealing formatting errors, the formatting error
I was using for testing (which is from 9fa1cee where I had
introduced it inadvertently) can be fixed.

As on CI and with tox (but not having to build and create and use a
tox environment).

This may not be the best way to do it, since currently the
project's makefiles are otherwise used only for things directly
related to building and publishing. However, this seemed like a
readily available way to run the moderately complex command that
produces the same behavior as on CI or with tox, but outside a tox
environment. It may be possible to improve on this in the future.

This adds BUILDDIR as a variable in the documentation generation
makefile that, along SPHINXOPTS, SPHINXBUILD, and PAPER, defaults
to the usual best value but can be set when invoking make.

The main use for choosing a different build output directory is to
test building without overwriting or otherwise interfering with any
files from a build one may really want to use. tox.ini is thus
modified to pass a path pointing inside its temporary directory as
BUILDDIR, so the "html" tox environment now makes no changes
outside the .tox directory. This is thus suitable for being run
automatically, so the "html" environment is now in the envlist.

As well as still checking for Travis, for backward compatibility
and because experience shows that this is safe.

The check can be much broader, and would be more accurate, with
fewer false negatives. But a false positive can result in local
data loss because the script does hard resets on CI without
prompting for confirmation. So for now, this just checks $TRAVIS
and $GITHUB_ACTIONS.

Now that GHA is included, the CI workflows no longer need to set
$TRAVIS when running the script, so that is removed.

This extends the init script so it tries harder to get version
tags:

- As before, locally, "git fetch --all --tags" is always run. (This
  also fetches other objects, and the developer experience would be
  undesirably inconsistent if that were only sometimes done.)

- On CI, run it if version tags are absent. The criterion followed
  here and in subsequent steps is that if any existing tag starts
  with a digit, or with the letter v followed by a digit, we regard
  version tags to be present. This is to balance the benefit of
  getting the tags (to make the tests work) against the risk of
  creating a very confusing situation in clones of forks that
  publish packages or otherwise use their own separate versioning
  scheme (especially if those tags later ended up being pushed).

- Both locally and on CI, after that, if version tags are absent,
  try to copy them from the original gitpython-developers/GitPython
  repo, without copying other tags or adding that repo as a remote.
  Copy only tags that start with a digit, since v+digit tags aren't
  currently used in this project (though forks may use them). This
  is a fallback option and it always displays a warning. If it
  fails, another message is issued for that. Unexpected failure to
  access the repo terminates the script with a failing exit status,
  but the absence of version tags in the fallback remote does not,
  so CI jobs can continue and reveal which tests fail as a result.

On GitHub Actions CI specifically, the Actions syntax for creating
a warning annotation on the workflow is used, but the warning is
still also written to stderr (as otherwise). GHA workflow
annotations don't support multi-line warnings, so the message is
adjusted into a single line where needed.

In the step output, the warning that produces a workflow annotation
is fully readable (and even made to stand out), so it doesn't need
to be printed in the plain way as well, which can be reserved for
when GitHub Actions is not detected.

The tox environments that are not duplicated per Python version
were set to run on Python 3.9, for consistency with Cygwin, where
3.9 is the latest available (through official Cygwin packages), so
output can be compared between Cygwin and other platforms while
troubleshooting problems.

However, this prevented them from running altogether on systems
where Python 3.9 was not installed. So I've added all the other
Python versions the project supports as fallback versions for those
tox environments to use, in one of the reasonable precedence
orders, while keeping 3.9 as the first choice for the same reasons
as before.

This changed a while ago but README.md still listed having a main
branch as a condition for automatic CI linting and testing in a
fork.

This is probably the *only* way anyone should run that script on
Windows, but I don't know of specific bad things that happen if it
is run in some other way, such as with WSL bash, aside from messing
up line endings, which users are likely to notice anyway.

This commit also clarifies the instructions by breaking up another
paragraph that really represented two separate steps.

This was referenced

Oct 7, 2023

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Oct 18, 2023

Since 7110bf8 (in gitpython-developers#1693), "git submodule update --init --recursive"
was not run on CI, on the mistaken grounds that the CI test
workflows would already have taken care of cloning all submodules
(ever since 4eef3ec when the "submodules: recursive" option was
added to the actions/checkout step).

This changes the init-tests-after-clone.sh script to again run that
command unconditionally, including on CI. The assumption that it
wasn't needed on CI was based on the specific content of
GitPython's own GitHub Actions workflows. But this disregarded that
the test suite is run on CI for *other* projects: specifically, for
downstream projects that package GitPython.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Oct 18, 2023

Since 7110bf8 (in gitpython-developers#1693), "git submodule update --init --recursive"
was not run on CI, on the mistaken grounds that the CI test
workflows would already have taken care of cloning all submodules
(ever since 4eef3ec when the "submodules: recursive" option was
added to the actions/checkout step).

This changes the init-tests-after-clone.sh script to again run that
command unconditionally, including on CI. The assumption that it
wasn't needed on CI was based on the specific content of
GitPython's own GitHub Actions workflows. But this disregarded that
the test suite is run on CI for *other* projects: specifically, for
downstream projects that package GitPython (gitpython-developers#1713).

This also brings back the comment from fc96980 that says more about
how the tests rely on submodules being present (specifically, that
they need a submodule with a submodule). However, that is not
specifically related to the bug being fixed.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Mar 12, 2024

This is to make it so simple `tox` usage has the expected property
of leaving all source code files in the working tree unchanged.

As noted in c66257e, linting how sometimes performs auto-fixes
since gitpython-developers#1862, and the pre-commit command in tox.ini, which had also
run `black --check`, will do even more file editing with gitpython-developers#1865.

The bifurcation for black into separate mutating and non-mutating
hooks, introduced in 5d8ddd9 (gitpython-developers#1693), is not carried over into Ruff
autoformatting in gitpython-developers#1865. But also it:

- Was not necessarily a good approach, and likely should not be
preserved in any form. It is an unusual and unintuitive use of
pre-commit. (It can be brought back if no better approach is
found, though.)

- Was done to avoid a situation where it was nontrivial to set up
necessary dependencies for linting in the GitPython virtual
environment itself, because flake8 and its various plugins would
have to be installed.

They were not listed in any existing or newly introduced extra
(for example, they were not added to test-requirements.txt) in
part in the hope that they would all be replaced by Ruff, which
happened in gitpython-developers#1862.

- Already does not achieve its goal since gitpython-developers#1862, since it was
(probably rightly) not extended to Ruff linting to use/omit --fix.

Now that Ruff is being used, people can run `pip install ruff` in a
virtual environment, then run the `ruff` command however they like.
This takes the place of multiple tools and plugins.

This commit *avoids* doing any of the following, even though it may
be useful to do them later:

- This does not give specific instructions in the readme for
installing and running ruff (and c66257e before this also omits
that). This can be added later and the best way to document it
may depend on some other upcoming decisions (see below).

- This does not add ruff to the test extra or as any other kind of
extra or optional dependency. Although the test extra currently
contains some packages not used for running unit tests, such as
pre-commit and mypy, adding Ruff will cause installation to take
a long time and/or or fail on some platforms like Cygwin where
Ruff has to be built from (Rust) source code. This can be solved
properly by reorganizing the extras, but that is likely to wait
until they are expressed completely inside pyproject.toml rather
than one per requirements file (see discussion in comments
in gitpython-developers#1716 for general information about possible forthcoming
changes in how the project is defined).

- This does not update tox.ini to run ruff directly, which could be
done by splitting the current lint tox environment into two or
three environments for Python linting, Python autoformatting, and
the miscellaneous other tasks performed by pre-commit hooks, only
the latter of which would use the pre-commit command. Whether and
how this should be done may depend on other forthcoming changes.

- This does not remove or update the Makefile "lint" target. That
target should perhaps not have been added, and was always meant
to be improved/replaced, since everything else in the top-level
Makefile is for building and publishing releases. See 5d15063
(gitpython-developers#1693). This is likewise not done now since it may depend on
as-yet unmerged changes and tooling decisions not yet made. It
should be feasible to do together when further updating tox.ini.

- This does not update tox.ini, Makefile, or the lint.yml GitHub
Actions workflow to omit the manual hook-stage, which will be
unused as of gitpython-developers#1865. This would complicate integration of changes,
and once it is known that it won't be needed, it can removed.

The situation with the tox "lint" environment is thus now similar
to that of the tox "html" environment when it was added in e6ec6c8
(gitpython-developers#1667), until it was improved in f094909 (gitpython-developers#1693) to run with
proper isolation.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request

Mar 13, 2024

This is to make it so simple `tox` usage has the expected property
of leaving all source code files in the working tree unchanged.

Linting how sometimes performs auto-fixes since gitpython-developers#1862, and the
pre-commit command in tox.ini, which had also run `black --check`,
will do even more file editing due to the changes in gitpython-developers#1865.

The bifurcation for black into separate mutating and non-mutating
hooks, introduced in 5d8ddd9 (gitpython-developers#1693), was not carried over into
Ruff autoformatting in gitpython-developers#1865. But also it:

- Was not necessarily a good approach, and likely should not be
preserved in any form. It was an unusual and unintuitive use of
pre-commit. (It can be brought back if no better approach is
found, though.)

- Already did not achieve its goal as of gitpython-developers#1862, since it was
(probably rightly) not extended to Ruff linting to use/omit --fix.

Now that Ruff is being used, people can run `pip install ruff` in a
virtual environment, then run the `ruff` command however they like.
This takes the place of multiple tools and plugins.