Yes, file copy (open() + read() + write()) is of course more expensive than just "reading" a tree (os.walk(), glob()) or deleting it (rmtree()) and the "pure file copy" time adds up to the benchmark. And indeed it's not an coincidence that #33671 (which replaced read() + write() with sendfile()) shaved off a 5% gain from the benchmark I posted initially for Linux.
Still, in a 8k small-files-tree scenario we're seeing ~9% gain on Linux, 20% on Windows and 30% on a SMB share on localhost vs. VirtualBox. I do not consider this a "hardly noticeable gain" as you imply: it is noticeable, exponential and measurable, even with cache being involved (as it is).
Note that the number of stat() syscalls per file is being reduced from 6 to 1 (or more if follow_symlinks=False), and that is the real gist here. That *does* make a difference on a regular Windows fs and makes a huge difference with network filesystems in general, as a simple stat() call implies access to the network, not the disk. |