Issue42958
Created on 2021-01-18 20:24 by AlexVndnblcke, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 24246 | closed | AlexVndnblcke, 2021-01-18 20:31 | |
| PR 27166 | merged | andrei.avk, 2021-07-15 15:21 | |
| PR 27607 | merged | miss-islington, 2021-08-04 19:39 | |
| PR 27608 | merged | miss-islington, 2021-08-04 19:39 | |
| Messages (14) | |||
|---|---|---|---|
| msg385225 - (view) | Author: Alexander Vandenbulcke (AlexVndnblcke) * | Date: 2021-01-18 20:24 | |
Consider the case where 2 files are shallow compared where only the mtime differs (i.e. the mode and size is identical). With filecmp.cmp(f1, f2, shallow=True) a deep compare would be performed behind the scenes since the guard clauses fell through. This discrepancy is either a problem in the docstring or a problem in the comparison itself. Two options remain: - the documentation is altered and describes that, in case only the mtime differs (i.e. mode and size are equal) a deep compare is performed - the behaviour is restricted in that effectively only a shallow compare is performed - a shallow compare becomes more restrictive (do not consider the mtime during the compare) My preference is to adjust the function to safeguard the meaning of 'shallow' in this context. |
|||
| msg385304 - (view) | Author: Josh Rosenberg (josh.r) * ![]() |
Date: 2021-01-19 22:40 | |
This is a problem with the docstring. The actual docs for it are a bit more clear, https://docs.python.org/3/library/filecmp.html#filecmp.cmp : "If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared." Your patch can't be used because it changes longstanding documented behavior. If you'd like to submit a patch to fix the docstring, that's fine, but we're not going to break existing code to make the function less accurate. The patch should probably make the documentation more clear while it's at it. 1. The original wording could be misinterpreted as having the "Otherwise" apply to shallow=False only, not to the case where shallow=T rue but os.stat doesn't match. 2. The existing wording isn't clear on what an os.stat() "signature" is, which can leave the impression that the entirety of os.stat is compared (which would only match for hardlinks of the same file), when in fact it's just the file type (stat.S_IFMT(st.st_mode), file vs. directory vs. symlink, etc.), size and mtime. Proposed rewording of main docs would be: "If shallow is true, files with identical os.stat() signatures (file type, size, and modification time) are taken to be equal. When shallow is false, or the file signatures are identical, the contents of the files are compared." A similar wording (or at least, a shorter version of the above, rather than a strictly wrong description of the shallow parameter) could be applied to the docstring. |
|||
| msg397320 - (view) | Author: Christof Hanke (chanke) * | Date: 2021-07-12 15:05 | |
Hi, this is also discussed in https://bugs.python.org/issue41354. Ciao, Christof |
|||
| msg397340 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-12 17:35 | |
Christof and Alexander, as you both said your preference is to have some way of enforcing "only shallow" compare, I want to clarify -- it seems if you do that, you will get the answer of True if the files are most likely the same (sig is equal), but if the sig is not equal, I'm not sure what answer you expect or why. The sig may be different because of mtime, and then the files may be different or the same, it's anyone's guess. I wonder if both of you expect the same behavior, and if so, for the same use case or not? |
|||
| msg397356 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-12 21:04 | |
Also see this 16 years old issue: https://bugs.python.org/issue1234674 |
|||
| msg397363 - (view) | Author: Christof Hanke (chanke) * | Date: 2021-07-12 22:30 | |
Hi Andrei,
I would follow rsync.
From the man page:
"""
[...]
-c, --checksum
This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync
uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and
receiver.
[...]
"""
so, yes you can have false positives with a shallow comparison of size + mtime only. But that's usually ok for e.g. incremental backups.
Wow, the bug is that old...
|
|||
| msg397364 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-12 22:49 | |
But rsync is a quite more specific tool. For example, unix cmp command does not guess based on any part of `stat` sig. That's a much closer command in scope to 'filecmp'. |
|||
| msg397381 - (view) | Author: Christof Hanke (chanke) * | Date: 2021-07-13 05:44 | |
Andrei, cmp is the deep-compare part of filecmp. I thought we were taking about the shallow one. Thus, - shallow like rsync "quick": size + mtime. - deep like cmp |
|||
| msg397560 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-15 15:37 | |
I've put up the doc update PR here: https://github.com/python/cpython/pull/27166 I've also reviewed a few dozen SO questions about filecmp.cmp and shallow arg, many of them find the docs confusing or lacking, so I thought updating docs is definitely very useful. I haven't seen any of them asking for "force shallow" behavior, but there's many of them and I haven't read all; searching for "force shallow" doesn't find any. I think it might be useful to also add a new 'force shallow' arg, but I hope more people ask for it or compile a list of SO questions asking for it, otherwise it might not be worth additional complexity. |
|||
| msg397561 - (view) | Author: Andrei Kulakov (andrei.avk) * ![]() |
Date: 2021-07-15 15:45 | |
Alexander: sorry, I didn't notice your PR because I was first working on a different issue, and then found this as a duplicate, and only noticed there's already an open PR here. If you prefer, we can close my PR and work on updating yours. |
|||
| msg398944 - (view) | Author: Łukasz Langa (lukasz.langa) * ![]() |
Date: 2021-08-04 19:39 | |
New changeset a8dc4893d2b28827e82447326ea47759c161a722 by andrei kulakov in branch 'main': bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166) https://github.com/python/cpython/commit/a8dc4893d2b28827e82447326ea47759c161a722 |
|||
| msg398949 - (view) | Author: miss-islington (miss-islington) | Date: 2021-08-04 20:03 | |
New changeset c2593b4d06712cefcdeae93b32f88faa4772bc3a by Miss Islington (bot) in branch '3.10': bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166) https://github.com/python/cpython/commit/c2593b4d06712cefcdeae93b32f88faa4772bc3a |
|||
| msg398950 - (view) | Author: Łukasz Langa (lukasz.langa) * ![]() |
Date: 2021-08-04 20:09 | |
New changeset 7dad0337518a0d599caf8f803a5bf45db67cbe9b by Miss Islington (bot) in branch '3.9': bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166) (GH-27608) https://github.com/python/cpython/commit/7dad0337518a0d599caf8f803a5bf45db67cbe9b |
|||
| msg398951 - (view) | Author: Łukasz Langa (lukasz.langa) * ![]() |
Date: 2021-08-04 20:11 | |
Thanks all for your effort on improving this! ✨ 🍰 ✨ |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:40 | admin | set | github: 87124 |
| 2021-08-16 23:30:26 | ned.deily | link | issue33896 superseder |
| 2021-08-04 20:11:39 | lukasz.langa | set | status: open -> closed assignee: docs@python messages:
+ msg398951 |
| 2021-08-04 20:09:59 | lukasz.langa | set | messages: + msg398950 |
| 2021-08-04 20:03:40 | miss-islington | set | messages: + msg398949 |
| 2021-08-04 19:39:58 | miss-islington | set | pull_requests: + pull_request26103 |
| 2021-08-04 19:39:54 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request26102 |
| 2021-08-04 19:39:54 | lukasz.langa | set | nosy:
+ lukasz.langa messages: + msg398944 |
| 2021-08-04 18:51:38 | lukasz.langa | link | issue41354 superseder |
| 2021-07-15 15:45:58 | andrei.avk | set | messages: + msg397561 |
| 2021-07-15 15:37:24 | andrei.avk | set | messages: + msg397560 |
| 2021-07-15 15:21:52 | andrei.avk | set | pull_requests: + pull_request25702 |
| 2021-07-13 05:44:46 | chanke | set | messages: + msg397381 |
| 2021-07-12 22:49:21 | andrei.avk | set | messages: + msg397364 |
| 2021-07-12 22:30:15 | chanke | set | messages: + msg397363 |
| 2021-07-12 21:04:32 | andrei.avk | set | messages: + msg397356 |
| 2021-07-12 17:35:27 | andrei.avk | set | nosy:
+ andrei.avk messages: + msg397340 |
| 2021-07-12 15:05:08 | chanke | set | nosy:
+ chanke messages: + msg397320 |
| 2021-01-19 22:40:57 | josh.r | set | nosy:
+ josh.r messages: + msg385304 |
| 2021-01-18 20:31:56 | AlexVndnblcke | set | keywords:
+ patch stage: patch review pull_requests: + pull_request23067 |
| 2021-01-18 20:24:48 | AlexVndnblcke | create | |

