Replace all wildcard imports with explicit imports by EliahKagan · Pull Request #1880 · gitpython-developers/GitPython
This script can be removed after imports are refactored and checked to see that module contents are the same as before or otherwise non-broken. This script assumes that module contents are the same as the contents of their dictionaries, and that all modules in the project get imported as a consequence of importing the top-level module. These are both the case currently for GitPython, but they do not hold for all projects, and may not hold for GitPython at some point in the future.
Although this situation is not inherently desirable, for backward compatibility it cannot change at this time. It may be possible to change it in the next major version of GitPython, but even then it should not be changed accidentally, which can easily happen while refactoring imports. This tests the highest-risk accidental change (of those that are currently known) of the kind that the temporary modattrs.py script exists to help safeguard against. That script will be removed when the immediately forthcoming import refactoring is complete, whereas these test cases can be kept. For information about the specific situation this helps ensure isn't changed accidentally, see the new test cases' docstrings, as well as the next commit (which will test modattrs.py and these test cases by performing an incomplete change that would be a bug until completed). This commit adds three test cases. The first tests the unintuitive aspect of the current situation: - test_git_util_attribute_is_git_index_util The other two test the intuitive aspects of it, i.e., they test that changes (perhaps in an attempt to preserve the aspect needed for backward compatibility) do not make `git.util` unusual in new (and themselves incompatible) ways: - test_git_index_util_attribute_is_git_index_util - test_separate_git_util_module_exists The latter tests should also clarify, for readers of the tests, the limited nature of the condition the first test asserts.
This also checks if the regression tests in test_imports.py work. This replaces wildcard imports in git.index with explicit imports, and adds git.index.__all__. But this temporarily omits from it the public attributes of git.index that name its Python submodules and are thus are present as an indirect effect of importing names *from* them (since doing so also imports them). This partial change, until it is completed in the next commit, represents the kind of bug that modattrs.py seeks to safeguard against, and verifies that a diff between old and current output of modattrs.py clearly shows it. This diff is temporarily included as ab.diff, which will be deleted in the next commit. The key problem that diff reveals is the changed value of the util attribute of the top-level git module. Due to the way wildcard imports have been used within GitPython for its own modules, expressions like `git.util` (where `git` is the top-level git module) and imports like `from git import util` are actually referring to git.index.util, rather than the util Python submodule of the git module (conceptually git.util), which can only be accessed via `from git.util import ...` or in `sys.modules`. Although this is not an inherently good situation, in practice it has to be maintained at least until the next major version, because reasonable code that uses GitPython would be broken if the situation were changed to be more intuitive. But the more intuitive behavior automatically happens if wildcard imports are replaced with explicit imports of the names they had originally intended to import (in this case, in the top-level git module, which has not yet been done but will be), or if __all__ is added where helpful (in this case, in git.index, which this does). Adding the names explicitly is sufficient to restore the old behavior that is needed for backward compatibility, and has the further advantage that the unintuitive behavior will be explicit once all wildcard imports are replaced and __all__ is added to each module where it would be helpful. The modattrs.py script serves to ensure incompatible changes are not made; this commit checks it. The tests in test_imports.py also cover this specific anticipated incompatibility in git.util, but not all breakages that may arise when refactoring imports and that diff-ing modattrs.py output would help catch.
- Add the base, fun, typ, and util Python submodules of git.index to git.index.__all__ to restore the old behavior. - Import them explicitly for clarity, even though they are bound to the same-named attributes of git.index either way. This should help human readers know what the entries in __all__ are referring to, and may also help some automated tools. This is a preliminary decision and not necessarily the one that will ultimately be taken in this import refactoring. It *may* cause some kinds of mistakes, such as those arising from ill-advised use of wildcard imports in production code *outside* GitPython, somewhat worse in their effects. - Remove the ab.diff file that showed the problem this solves. This resolves the problem deliberately introduced in the previous incomplete commit, which modattrs.py output and test_imports test results confirm.
This uses explicit imports rather than wildcard imports in git.refs for names from its submodules. A comment suggested that the import order was deliberate. But each of the modules being imported from defined __all__, and there was no overlap in the names across any of them. The other main reason to import in a specific order is an order dependency in the side effects, but that does not appear to apply, at least not at this time. (In addition, at least for now, this adds explicit imports for the Python submodules of git.refs, so it is clear that they can always be accessed directly in git.refs without further imports, if desired. For clarity, those appear first, and that makes the order of the "from" imports not relevant to such side effects, due to the "from" imports no longer causing modules to be loaded for the first time. However, this is a much less important reason to consider the other imports' reordering okay, because these module imports may end up being removed later during this refactoring; their clarity benefit might not be justified, because if production code outside GitPython ill-advisedly uses wildcard imports, the bad effect of doing so could be increased.)
- No need to import Repo "as Repo". Some tools only recognize this to give the name conceptually public status when it appears in type stub files (.pyi files), and listing it in the newly created __all__ is sufficient to let humans and all tools know it has that status. - As very recently done in git.refs, this explicitly imports the submodules, so it is clear they are available and don't have to be explicitly imported. (Fundamental to the way they are used is that they will end up being imported in order to import Repo.) However, also as in git.refs, it may be that the problems this could cause in some inherently flawed but plausible uses are too greater for it to be justified. So this may be revisited.
Such patching, which was introduced in 9519f18, seems no longer to be necessary. Since git.objects.submodule.util.__all__ has existed for a long time without including those names, they are not conceptually public attributes of git.objects.submodule.util, so they should not be in use by any code outside GitPython either. The modattrs.py script shows the change, as expected, showing these two names as no longer being in the git.objects.submodule.util module dictionary, in output of: python modattrs.py >b git diff --no-index a b However, because the removal is intentional, this is okay.
The submodules being made explicit here are of course Python submodules, not git submodules. The git.objects.submodule Python submodule (which is *about* git submodules) is made explicit here (as are the names imported from that Python submodule's own Python submodules) along with the other Python submodules of git.objects. Unlike some other submodules, but like the top-level git module until gitpython-developers#1659, git.objects already defined __all__ but it was dynamically constructed. As with git.__all__ previously (as noted in gitpython-developers#1859), I have used https://github.com/EliahKagan/deltall here to check that it is safe, sufficient, and probably necessary to replace this dynamically constructed __all__ with a literal list of strings in which all of the original elements are included. See: https://gist.github.com/EliahKagan/e627603717ca5f9cafaf8de9d9f27ad7 Running the modattrs.py script, and diffing against the output from before the current import refactoring began, reveals two changes to module contents as a result of the change here: - git.objects no longer contains `inspect`, which it imported just to build the dynamically constructed __all__. Because this was not itself included in that __all__ or otherwise made public or used out of git.objects, that is okay. This is exactly analogous to the situation in 8197e90, which removed it from the top-level git module after gitpython-developers#1659. - The top-level git module now has has new attributes blob, commit, submodule, and tree, aliasing the corresponding modules. This has happened as a result of them being put in git.objects.__all__ along with various names imported from them. (As noted in some prior commits, there are tradeoffs associated with doing this, and it may be that such elements of __all__ should be removed, here and elsewhere.)
So git.objects.util is less likely to be confused with git.util. (The modattrs.py script reveals the change in the value of git.objects.util.__doc__, as expected.)
This is the top-level of the git.objects.submodule subpackage, for its own Python submodules. This appears only not to have been done before because of a cyclic import problem involving imports that are no longer present. Doing it improves consistency, since all the other subpackages under the top-level git package already do this. The modattrs.py script reveals the four new attributes of git.objects.submodule for the four classes obtained by the new imports, as expected. (The explicit module imports do not change the attribues because they are the same attributes as come into existence due to those Python submodules being imported anywhere in any style.)
These are in the modules that are directly under the top-level git module (and are not themselves subpackages, with their own submodules in the Python sense), except for: - git.util, where this change was very recently made. - git.compat, where no improvements of this kind were needed. - git.types, which currently has no __all__ and will only benefit from it if what should go in it is carefully considered (and where the imports themselves are grouped, sorted, and formatted already).
In git.compat and git.util. This applies them individually to each name that is known to be an unused import, rather than to entire import statements (except where the entire statement fit on one line and everything it imported is known to be unused). Either Ruff has the ability to accept this more granular style of suppression, or I had been unaware of it from flake8 (used before). I have veriifed that the suppressions are not superfluous: with no suppressions, Ruff does warn. This commit makes no change in git.types because it looks like no suppression at all may be needed there anymore; that will be covered in the next commit. This also removes the old `@UnusedImport` comments, which had been kept before because they were more granular; this specificity is now achieved by comments the tools being used can recognize.
In git.types. It appears Ruff, unlike flake8, recognizes "import X as X" to mean that "X" should not be considered unused, even when it appears outside a .pyi type stub file where that notation is more commonly used. Those imports in git.types may benefit from being changed in some way that uses a syntax whose intent is clearer in context, and depending on how that is done it may even be necessary to bring back suppressions. If so, they can be written more specifically. (For now, changing them would express more than is known about what names that are only present in git.types becuase it imports them should continue to be imported, should be considered conceptually public, or should be condered suitable for use within GitPython.)
- Use explicit imports instead of * imports. - Remove now-unneeded linter suppressions. - Alphabetize inside the try-block, though this will be undone. This currently fails to import due to a cyclic import error, so the third change, alphabetizing the imports, will have to be undone (at least in the absence of other changes). It is not merely that they should not be reordered in a way that makes them cross into or out of the try-block, but that within the try block not all orders will work. There will be more to do for backward compatibility, but the modattrs.py script, which imports the top-level git module, cannot be run until the cyclic import problem is fixed.
This still uses all explicit rather than wildcard imports (and
still omits suppressions that are no longer needed due to wildcard
imports not being used). But it brings back the old relative order
of the `from ... import ...` statements inside the try-block.
Since this fixes the circular import problem, it is possible to run
the modattrs.py script to check for changes. New changes since
replacing wildcard imports, which are probably undesirable, are the
removal of these attributes pointing to indirect Python submodules
of the git module:
base -> git.index.base
fun -> git.index.fun
head -> git.refs.head
log -> git.refs.log
reference -> git.refs.reference
symbolic -> git.refs.symbolic
tag -> git.refs.tag
typ -> git.index.typ
These should definitely never be used by code inside or outside of GitPython, as they have never been public, having even been omitted by the dynamic (and expansive) technique used to build git.__all__ in the past (which omitted modules intentionally). However, to ease compatibility, for now they are back. This is so that the change of making all imports explicit rather than using wildcards does not break anything. However, code using these names could never be relied on to work, and these should be considered eligible for removal, at least from the perspective of code outside GitPython. That is for the indirect submodules whose absence as a same-named direct submodule or attribute listed in __all__ was readily discoverable. The more difficult case is util. git.util is a module, git/util.py, which is how it is treated when it appears immediately after the "from" keyword, or as a key in sys.modules. However, the expression `git.util`, and a `from git import util` import, instead access the separate git.index.util module, which due to a wildcard import has been an attribute of the top-level git module for a long time. It may not be safe to change this, because although any code anywhere is better off not relying on this, this situation hasn't been (and isn't) immediately clear. To help with it somewhat, this also includes a detailed comment over where util is imported from git.index, explaining the situation.
Since they are listed in __all__. (They are imported either way because names are imported from them, and this caused them to be present with the same effect.) Though they are proably about to be removed along with the corresponding entries in __all__.
This is for non-toplevel __all__, as git.__all__ already did not do this. As noted in some of the previous commit messags that added them, omitting them might be a bit safer in terms of the impact of bugs bugs in code using GitPython, in that unexpected modules, some of which have the same name as other modules within GitPython, won't be imported due to wildcard imports from intermediate subpackages (those that are below the top-level git package/module but collect non-subpackage modules). Though it is hard to know, since some of these would have been imported before, when an __all__ was not defined at that level. However, a separate benefit of omitting them is consistency with git.__all__, which does not list the direct Python submodules of the git module. This does not affect the output of the modattrs.py script, because the attributes exist with the same objects as their values as a result of those Python submodules being imported (in "from" imports and otherwise), including when importing the top-level git module.
This makes all __all__ everywhere in the git package lists. Before, roughly half were lists and half were tuples. There are reasonable theoretical arguments for both, and in practice all tools today support both. Traditionally using a list is far more common, and it remains at least somewhat more common. Furthermore, git/util.py uses a list and is currently written to append an element to it that is conditionally defined on Windows (though it would probably be fine for that to be the only list, since it reflects an actual relevant difference about it). The goal here is just to remove inconsistency that does not signify anything and is the result of drift over time. If a reason (even preference) arises to make them all tuples in the future, then that is also probably fine.