Message 228866 - Python tracker

Message228866

Author	benhoyt
Recipients	abacabadabacaba, akira, benhoyt, giampaolo.rodola, pitrou, socketpair, tim.golden, vstinner
Date	2014-10-09.12:35:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1412858141.51.0.983204385418.issue22524@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup! The numbers are significantly better on Windows, as you can see. Even the smallest numbers I've seen with "--scandir os" are around 12x range on Windows. In any case, Victor's last tests are "right" -- I presume we'll have some C, so what we want to be comparing is "benchmark.py --scandir c" versus "benchmark.py --scandir os": the some C version versus the all C version in the attached CPython 3.5 patch. BTW, Victor, "Generic" isn't really useful. I just used it as a test case that calls listdir() and os.stat() to implement the scandir/DirEntry interface. So it's going to be strictly slower than listdir + stat due to using listdir and creating all those DirEntry objects. Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

Content

Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

The numbers are significantly better on Windows, as you can see. Even the smallest numbers I've seen with "--scandir os" are around 12x range on Windows.

In any case, Victor's last tests are "right" -- I presume we'll have *some* C, so what we want to be comparing is "benchmark.py --scandir c" versus "benchmark.py --scandir os": the some C version versus the all C version in the attached CPython 3.5 patch.

BTW, Victor, "Generic" isn't really useful. I just used it as a test case that calls listdir() and os.stat() to implement the scandir/DirEntry interface. So it's going to be strictly slower than listdir + stat due to using listdir and creating all those DirEntry objects.

Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

History
Date	User	Action	Args
2014-10-09 12:35:41	benhoyt	set	recipients: + benhoyt, pitrou, vstinner, giampaolo.rodola, tim.golden, abacabadabacaba, akira, socketpair
2014-10-09 12:35:41	benhoyt	set	messageid: <1412858141.51.0.983204385418.issue22524@psf.upfronthosting.co.za>
2014-10-09 12:35:41	benhoyt	link	issue22524 messages
2014-10-09 12:35:40	benhoyt	create