Issue41354
Created on 2020-07-21 07:29 by chanke, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 21580 | closed | chanke, 2020-07-21 09:26 | |
| PR 30120 | closed | python-dev, 2021-12-15 13:59 | |
| Messages (6) | |||
|---|---|---|---|
| msg374054 - (view) | Author: Christof Hanke (chanke) * | Date: 2020-07-21 07:29 | |
help(filecmp.cmp) says:
"""
cmp(f1, f2, shallow=True)
Compare two files.
Arguments:
f1 -- First file name
f2 -- Second file name
shallow -- Just check stat signature (do not read the files).
defaults to True.
Return value:
True if the files are the same, False otherwise.
This function uses a cache for past comparisons and the results,
with cache entries invalidated if their stat information
changes. The cache may be cleared by calling clear_cache().
"""
However, looking at the code, the shallow-argument is taken only into account if the signatures are the same:
"""
s1 = _sig(os.stat(f1))
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
if shallow and s1 == s2:
return True
if s1[1] != s2[1]:
return False
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
outcome = _do_cmp(f1, f2)
if len(_cache) > 100: # limit the maximum size of the cache
clear_cache()
_cache[f1, f2, s1, s2] = outcome
return outcome
"""
Therefore, if I call cmp with shallow=True and the stat-signatures differ,
cmp actually does a "deep" compare.
This "deep" compare however does not check the stat-signatures.
Thus I propose follwing patch:
cmp always checks the "full" signature.
return True if shallow and above test passed.
It does not make sense to me that when doing a "deep" compare, that only the size
is compared, but not the mtime.
--- filecmp.py.orig 2020-07-16 12:00:57.000000000 +0200
+++ filecmp.py 2020-07-16 12:00:30.000000000 +0200
@@ -52,10 +52,10 @@
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
- if shallow and s1 == s2:
- return True
- if s1[1] != s2[1]:
+ if s1 != s2:
return False
+ if shallow:
+ return True
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
|
|||
| msg383191 - (view) | Author: Scott (FreeSandwiches) | Date: 2020-12-16 17:46 | |
I suggest changing the documentation rather than the code. The mix up is in the wording. Documentation below "If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared." The "Otherwise" appears to be referencing the "If shallow is true" cause, when it should be referring to the equality of the _sig()s. Proposed Change "If shallow is true, files with identical os.stat() signatures are taken to be equal. If they are not equal, the contents of the files are compared." |
|||
| msg383885 - (view) | Author: Christof Hanke (chanke) * | Date: 2020-12-28 07:45 | |
I understand that you are reluctant to change existing code. But for me as a sysadmin, the current behavior doesn't make sense for two reasons: * st.st_size is part of _sig. why would you do a deep compare if the two files have a different length ? * comparing thousands of files, a proper shallow-only compare is required, since it takes a long time to compare large files (especially when they are migrated to a tape-backend), so a silent-fallback to a deep-compare is not good. |
|||
| msg397272 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-12 01:31 | |
Christof: I've left one comment on the PR. Generally your reasoning (on why unexpected deep cmp is not ideal), makes sense to me. However, this is a backwards incompatible change that will probably affect a large number of a lot of scripts that quietly run in the background and might silently break (i.e. just by affecting different sets of files, with no reported errors or warnings) with this change. Can you find other instances where people complained about this behavior e.g. here on BPO or SO? What do you think about making this change while preserving backwards compatibility e.g. via a new arg? |
|||
| msg397318 - (view) | Author: Christof Hanke (chanke) * | Date: 2021-07-12 15:04 | |
Andrei, See https://bugs.python.org/issue42958 Someone else stumbled over this topic. Maybe you can merge these two requests? Otherwise, I'm fine with a new arg. Christof |
|||
| msg398933 - (view) | Author: Ćukasz Langa (lukasz.langa) * ![]() |
Date: 2021-08-04 18:51 | |
Closed in favor of BPO-42958. Thanks for your report, Christof! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:33 | admin | set | github: 85526 |
| 2021-12-15 13:59:10 | python-dev | set | nosy:
+ python-dev pull_requests: + pull_request28340 |
| 2021-08-04 18:51:38 | lukasz.langa | set | status: open -> closed superseder: filecmp.cmp(shallow=True) isn't actually shallow when only mtime differs nosy:
+ lukasz.langa |
| 2021-07-12 15:04:28 | chanke | set | messages: + msg397318 |
| 2021-07-12 01:31:37 | andrei.avk | set | nosy:
+ andrei.avk messages: + msg397272 |
| 2020-12-28 07:45:44 | chanke | set | messages: + msg383885 |
| 2020-12-16 17:46:24 | FreeSandwiches | set | nosy:
+ FreeSandwiches messages: + msg383191 |
| 2020-07-21 09:26:27 | chanke | set | keywords:
+ patch stage: patch review pull_requests: + pull_request20722 |
| 2020-07-21 07:29:03 | chanke | create | |

