Replace all wildcard imports with explicit imports by EliahKagan · Pull Request #1880 · gitpython-developers/GitPython

@EliahKagan

This script can be removed after imports are refactored and checked
to see that module contents are the same as before or otherwise
non-broken.

This script assumes that module contents are the same as the
contents of their dictionaries, and that all modules in the project
get imported as a consequence of importing the top-level module.
These are both the case currently for GitPython, but they do not
hold for all projects, and may not hold for GitPython at some point
in the future.

@EliahKagan

Although this situation is not inherently desirable, for backward
compatibility it cannot change at this time. It may be possible to
change it in the next major version of GitPython, but even then it
should not be changed accidentally, which can easily happen while
refactoring imports.

This tests the highest-risk accidental change (of those that are
currently known) of the kind that the temporary modattrs.py script
exists to help safeguard against. That script will be removed when
the immediately forthcoming import refactoring is complete, whereas
these test cases can be kept.

For information about the specific situation this helps ensure
isn't changed accidentally, see the new test cases' docstrings, as
well as the next commit (which will test modattrs.py and these test
cases by performing an incomplete change that would be a bug until
completed).

This commit adds three test cases. The first tests the unintuitive
aspect of the current situation:

- test_git_util_attribute_is_git_index_util

The other two test the intuitive aspects of it, i.e., they test
that changes (perhaps in an attempt to preserve the aspect needed
for backward compatibility) do not make `git.util` unusual in new
(and themselves incompatible) ways:

- test_git_index_util_attribute_is_git_index_util
- test_separate_git_util_module_exists

The latter tests should also clarify, for readers of the tests, the
limited nature of the condition the first test asserts.

@EliahKagan

This also checks if the regression tests in test_imports.py work.

This replaces wildcard imports in git.index with explicit imports,
and adds git.index.__all__. But this temporarily omits from it the
public attributes of git.index that name its Python submodules and
are thus are present as an indirect effect of importing names
*from* them (since doing so also imports them).

This partial change, until it is completed in the next commit,
represents the kind of bug that modattrs.py seeks to safeguard
against, and verifies that a diff between old and current output of
modattrs.py clearly shows it. This diff is temporarily included as
ab.diff, which will be deleted in the next commit.

The key problem that diff reveals is the changed value of the util
attribute of the top-level git module. Due to the way wildcard
imports have been used within GitPython for its own modules,
expressions like `git.util` (where `git` is the top-level git
module) and imports like `from git import util` are actually
referring to git.index.util, rather than the util Python submodule
of the git module (conceptually git.util), which can only be
accessed via `from git.util import ...` or in `sys.modules`.
Although this is not an inherently good situation, in practice it
has to be maintained at least until the next major version, because
reasonable code that uses GitPython would be broken if the
situation were changed to be more intuitive.

But the more intuitive behavior automatically happens if wildcard
imports are replaced with explicit imports of the names they had
originally intended to import (in this case, in the top-level git
module, which has not yet been done but will be), or if __all__ is
added where helpful (in this case, in git.index, which this does).
Adding the names explicitly is sufficient to restore the old
behavior that is needed for backward compatibility, and has the
further advantage that the unintuitive behavior will be explicit
once all wildcard imports are replaced and __all__ is added to each
module where it would be helpful. The modattrs.py script serves to
ensure incompatible changes are not made; this commit checks it.

The tests in test_imports.py also cover this specific anticipated
incompatibility in git.util, but not all breakages that may arise
when refactoring imports and that diff-ing modattrs.py output would
help catch.

@EliahKagan

- Add the base, fun, typ, and util Python submodules of git.index
  to git.index.__all__ to restore the old behavior.

- Import them explicitly for clarity, even though they are bound to
  the same-named attributes of git.index either way. This should
  help human readers know what the entries in __all__ are referring
  to, and may also help some automated tools.

  This is a preliminary decision and not necessarily the one that
  will ultimately be taken in this import refactoring. It *may*
  cause some kinds of mistakes, such as those arising from
  ill-advised use of wildcard imports in production code *outside*
  GitPython, somewhat worse in their effects.

- Remove the ab.diff file that showed the problem this solves.

This resolves the problem deliberately introduced in the previous
incomplete commit, which modattrs.py output and test_imports test
results confirm.

@EliahKagan

@EliahKagan

@EliahKagan

This uses explicit imports rather than wildcard imports in git.refs
for names from its submodules.

A comment suggested that the import order was deliberate. But each
of the modules being imported from defined __all__, and there was
no overlap in the names across any of them.

The other main reason to import in a specific order is an order
dependency in the side effects, but that does not appear to apply,
at least not at this time.

(In addition, at least for now, this adds explicit imports for the
Python submodules of git.refs, so it is clear that they can always
be accessed directly in git.refs without further imports, if
desired. For clarity, those appear first, and that makes the order
of the "from" imports not relevant to such side effects, due to the
"from" imports no longer causing modules to be loaded for the first
time. However, this is a much less important reason to consider the
other imports' reordering okay, because these module imports may
end up being removed later during this refactoring; their clarity
benefit might not be justified, because if production code outside
GitPython ill-advisedly uses wildcard imports, the bad effect of
doing so could be increased.)

@EliahKagan

@EliahKagan

- No need to import Repo "as Repo". Some tools only recognize this
  to give the name conceptually public status when it appears in
  type stub files (.pyi files), and listing it in the newly created
  __all__ is sufficient to let humans and all tools know it has
  that status.

- As very recently done in git.refs, this explicitly imports the
  submodules, so it is clear they are available and don't have to
  be explicitly imported. (Fundamental to the way they are used is
  that they will end up being imported in order to import Repo.)

  However, also as in git.refs, it may be that the problems this
  could cause in some inherently flawed but plausible uses are too
  greater for it to be justified. So this may be revisited.

@EliahKagan

But not yet in git.objects.submodule.* modules.

@EliahKagan

@EliahKagan

@EliahKagan

Such patching, which was introduced in 9519f18, seems no longer to
be necessary. Since git.objects.submodule.util.__all__ has existed
for a long time without including those names, they are not
conceptually public attributes of git.objects.submodule.util, so
they should not be in use by any code outside GitPython either.

The modattrs.py script shows the change, as expected, showing these
two names as no longer being in the git.objects.submodule.util
module dictionary, in output of:

    python modattrs.py >b
    git diff --no-index a b

However, because the removal is intentional, this is okay.

@EliahKagan

The submodules being made explicit here are of course Python
submodules, not git submodules.

The git.objects.submodule Python submodule (which is *about* git
submodules) is made explicit here (as are the names imported from
that Python submodule's own Python submodules) along with the other
Python submodules of git.objects.

Unlike some other submodules, but like the top-level git module
until gitpython-developers#1659, git.objects already defined __all__ but it was
dynamically constructed. As with git.__all__ previously (as noted
in gitpython-developers#1859), I have used https://github.com/EliahKagan/deltall here
to check that it is safe, sufficient, and probably necessary to
replace this dynamically constructed __all__ with a literal list of
strings in which all of the original elements are included. See:
https://gist.github.com/EliahKagan/e627603717ca5f9cafaf8de9d9f27ad7

Running the modattrs.py script, and diffing against the output from
before the current import refactoring began, reveals two changes to
module contents as a result of the change here:

- git.objects no longer contains `inspect`, which it imported just
  to build the dynamically constructed __all__. Because this was
  not itself included in that __all__ or otherwise made public or
  used out of git.objects, that is okay. This is exactly analogous
  to the situation in 8197e90, which removed it from the top-level
  git module after gitpython-developers#1659.

- The top-level git module now has has new attributes blob, commit,
  submodule, and tree, aliasing the corresponding modules. This
  has happened as a result of them being put in git.objects.__all__
  along with various names imported from them. (As noted in some
  prior commits, there are tradeoffs associated with doing this,
  and it may be that such elements of __all__ should be removed,
  here and elsewhere.)

@EliahKagan

So git.objects.util is less likely to be confused with git.util.

(The modattrs.py script reveals the change in the value of
git.objects.util.__doc__, as expected.)

@EliahKagan

This is the top-level of the git.objects.submodule subpackage, for
its own Python submodules.

This appears only not to have been done before because of a cyclic
import problem involving imports that are no longer present. Doing
it improves consistency, since all the other subpackages under the
top-level git package already do this.

The modattrs.py script reveals the four new attributes of
git.objects.submodule for the four classes obtained by the new
imports, as expected. (The explicit module imports do not change
the attribues because they are the same attributes as come into
existence due to those Python submodules being imported anywhere in
any style.)

@EliahKagan

+ Update the detailed comment about the unused import situation.

@EliahKagan

@EliahKagan

@EliahKagan

These are in the modules that are directly under the top-level git
module (and are not themselves subpackages, with their own
submodules in the Python sense), except for:

- git.util, where this change was very recently made.

- git.compat, where no improvements of this kind were needed.

- git.types, which currently has no __all__ and will only benefit from
  it if what should go in it is carefully considered (and where the
  imports themselves are grouped, sorted, and formatted already).

@EliahKagan

In git.compat and git.util.

This applies them individually to each name that is known to be an
unused import, rather than to entire import statements (except
where the entire statement fit on one line and everything it
imported is known to be unused).

Either Ruff has the ability to accept this more granular style of
suppression, or I had been unaware of it from flake8 (used before).
I have veriifed that the suppressions are not superfluous: with no
suppressions, Ruff does warn. This commit makes no change in
git.types because it looks like no suppression at all may be
needed there anymore; that will be covered in the next commit.

This also removes the old `@UnusedImport` comments, which had been
kept before because they were more granular; this specificity is
now achieved by comments the tools being used can recognize.

@EliahKagan

In git.types.

It appears Ruff, unlike flake8, recognizes "import X as X" to mean
that "X" should not be considered unused, even when it appears
outside a .pyi type stub file where that notation is more commonly
used.

Those imports in git.types may benefit from being changed in some
way that uses a syntax whose intent is clearer in context, and
depending on how that is done it may even be necessary to bring
back suppressions. If so, they can be written more specifically.
(For now, changing them would express more than is known about what
names that are only present in git.types becuase it imports them
should continue to be imported, should be considered conceptually
public, or should be condered suitable for use within GitPython.)

@EliahKagan

@EliahKagan

- Use explicit imports instead of * imports.
- Remove now-unneeded linter suppressions.
- Alphabetize inside the try-block, though this will be undone.

This currently fails to import due to a cyclic import error, so the
third change, alphabetizing the imports, will have to be undone (at
least in the absence of other changes). It is not merely that they
should not be reordered in a way that makes them cross into or out
of the try-block, but that within the try block not all orders will
work.

There will be more to do for backward compatibility, but the
modattrs.py script, which imports the top-level git module, cannot
be run until the cyclic import problem is fixed.

@EliahKagan

This still uses all explicit rather than wildcard imports (and
still omits suppressions that are no longer needed due to wildcard
imports not being used). But it brings back the old relative order
of the `from ... import ...` statements inside the try-block.

Since this fixes the circular import problem, it is possible to run
the modattrs.py script to check for changes. New changes since
replacing wildcard imports, which are probably undesirable, are the
removal of these attributes pointing to indirect Python submodules
of the git module:

    base -> git.index.base
    fun -> git.index.fun
    head -> git.refs.head
    log -> git.refs.log
    reference -> git.refs.reference
    symbolic -> git.refs.symbolic
    tag -> git.refs.tag
    typ -> git.index.typ

@EliahKagan

These should definitely never be used by code inside or outside of
GitPython, as they have never been public, having even been omitted
by the dynamic (and expansive) technique used to build git.__all__
in the past (which omitted modules intentionally).

However, to ease compatibility, for now they are back. This is so
that the change of making all imports explicit rather than using
wildcards does not break anything. However, code using these names
could never be relied on to work, and these should be considered
eligible for removal, at least from the perspective of code outside
GitPython.

That is for the indirect submodules whose absence as a same-named
direct submodule or attribute listed in __all__ was readily
discoverable.

The more difficult case is util. git.util is a module, git/util.py,
which is how it is treated when it appears immediately after the
"from" keyword, or as a key in sys.modules. However, the expression
`git.util`, and a `from git import util` import, instead access the
separate git.index.util module, which due to a wildcard import has
been an attribute of the top-level git module for a long time.

It may not be safe to change this, because although any code
anywhere is better off not relying on this, this situation hasn't
been (and isn't) immediately clear. To help with it somewhat, this
also includes a detailed comment over where util is imported from
git.index, explaining the situation.

@EliahKagan

@EliahKagan

Since they are listed in __all__. (They are imported either way
because names are imported from them, and this caused them to be
present with the same effect.)

Though they are proably about to be removed along with the
corresponding entries in __all__.

@EliahKagan

This is for non-toplevel __all__, as git.__all__ already did not
do this.

As noted in some of the previous commit messags that added them,
omitting them might be a bit safer in terms of the impact of bugs
bugs in code using GitPython, in that unexpected modules, some of
which have the same name as other modules within GitPython, won't
be imported due to wildcard imports from intermediate subpackages
(those that are below the top-level git package/module but collect
non-subpackage modules). Though it is hard to know, since some of
these would have been imported before, when an __all__ was not
defined at that level.

However, a separate benefit of omitting them is consistency with
git.__all__, which does not list the direct Python submodules of
the git module.

This does not affect the output of the modattrs.py script, because
the attributes exist with the same objects as their values as a
result of those Python submodules being imported (in "from" imports
and otherwise), including when importing the top-level git module.

@EliahKagan

This makes all __all__ everywhere in the git package lists. Before,
roughly half were lists and half were tuples. There are reasonable
theoretical arguments for both, and in practice all tools today
support both. Traditionally using a list is far more common, and it
remains at least somewhat more common. Furthermore, git/util.py
uses a list and is currently written to append an element to it
that is conditionally defined on Windows (though it would probably
be fine for that to be the only list, since it reflects an actual
relevant difference about it).

The goal here is just to remove inconsistency that does not signify
anything and is the result of drift over time. If a reason (even
preference) arises to make them all tuples in the future, then that
is also probably fine.