Allow -E to take a block size argument so errors cause a skip to the next block

Date: Tue, 04 Oct 2016 21:10:36 -0400
From: Anthony DeRobertis
To: Debian Bug Tracking System
Subject: Bug#839796: Ability to specify size for --skip-errors
X-Spam: NO
X-Spam-Rating: 82

Package: pv
Version: 1.6.0-1
Severity: wishlist

pv's --skip-errors behavior is currently to try skipping 1 byte a few
times, then 2 bytes a few times, then 1 byte a few more (because it was
at 14 bytes), then 4 a few, then... blah blah, until it finally finishes
by skipping 512 bytes 7 times; all in all, I counted 25 attempts,
totaling of course 4k.

Because it's a disk, with 4k sectors. Bad blocks on the disk,
fundamentally, can not be smaller than 4k. (Nor larger, really, but of
course you can have two in a row.)

This wouldn't be a huge deal, except that disks take forever attempting
to read bad sectors. 30 seconds per failed read is quite possible (some
take longer, actually, and Linux times out, then you get to wait through
a reset cycle, though strictly speaking that's admin error in failing to
raise the Linux timeout)

I'm not sure if there is any storage device which can have a 1-byte
error. Wikipedia tells me there were once floppy formats with 128-byte
sectors. It was apparently common enough with 8-inch and even 5??-inch
floppies. (Did floppies have error-detecting codes? No idea).

Also, I think without O_DIRECT, you can't actually get a bad block
smaller than Linux's page size of 4k.

So really it seems like -E ought to just skip 4k at once.

Or at least there ought to be an option to specify the skip size. I'd
suggest -E should take an optional argument, the skip size, and default
to 4k.

-- System Information:
Debian Release: stretch/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'testing'), (500, 'stable'), (130, 'unstable'), (120, 'experimental'), (1, 'experimental-debug')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.6.0-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)

Versions of packages pv depends on:
ii libc6 2.24-3

pv recommends no packages.

Versions of packages pv suggests:
ii doc-base 0.10.7

-- no debconf information

Date: Thu, 20 Oct 2016 21:47:52 +0100
From: Andrew Wood
To: Anthony DeRobertis
Subject: Re: Bug#839796: Ability to specify size for --skip-errors

I agree that this sounds like a good idea but there are some complications.
What if we get an error while half-way through a block, because the disk is
failing but not wholly failed? Do we skip a whole block, to half-way
through the next block, or only skip to the start of the next block? How do
we know where the next block is? Can we safely assume that all files on all
filesystems always start at the beginning of a block?

So perhaps if -E took an argument, we would explicitly state in the
documentation that on error, PV would skip to the start of the next block,
and would assume that the file starts at the beginning of a block. So with
"-E 4k", an error at 2k would skip to 4k, an error at 4k would skip to 8k,
and so on.

Does this seem reasonable?

Date: Fri, 9 Dec 2016 16:11:48 -0500
From: Anthony DeRobertis
To: Andrew Wood
Subject: Re: Bug#839796: Ability to specify size for --skip-errors

I lost track of this email, sorry for the very late response.

On Thu, Oct 20, 2016 at 09:47:52PM +0100, Andrew Wood wrote:

I agree that this sounds like a good idea but there are some complications.
What if we get an error while half-way through a block, because the disk is
failing but not wholly failed?

BTW: If -E is given the actual disk sector size, that can't happen. Also,
unless using O_DIRECT, I believe it can't happen on Linux if it's given
the page size (even if disk sectors are smaller).

Do we skip a whole block, to half-way
through the next block, or only skip to the start of the next block? How do
we know where the next block is? Can we safely assume that all files on all
filesystems always start at the beginning of a block?

On almost every filesystem, yes. At least for files of non-trivial size.
The performance of partial-sector writes is bad (forces an extra read
first), so filesystems avoid that. The only exceptions you're likely to
see are filesystems designed for tiny flash (embedded) and FUSE stuff
(e.g., read-only access to archives).

So perhaps if -E took an argument, we would explicitly state in the
documentation that on error, PV would skip to the start of the next block,
and would assume that the file starts at the beginning of a block. So with
"-E 4k", an error at 2k would skip to 4k, an error at 4k would skip to 8k,
and so on.

Does this seem reasonable?

That sounds reasonable. Sensible, even.

Date: Tue, 04 Oct 2016 21:10:36 -0400 From: [Anthony DeRobertis](https://github.com/derobert) To: Debian Bug Tracking System Subject: Bug#839796: Ability to specify size for --skip-errors X-Spam: NO X-Spam-Rating: 82 Package: pv Version: 1.6.0-1 Severity: wishlist pv's --skip-errors behavior is currently to try skipping 1 byte a few times, then 2 bytes a few times, then 1 byte a few more (because it was at 14 bytes), then 4 a few, then... blah blah, until it finally finishes by skipping 512 bytes 7 times; all in all, I counted 25 attempts, totaling of course 4k. Because it's a disk, with 4k sectors. Bad blocks on the disk, fundamentally, can not be smaller than 4k. (Nor larger, really, but of course you can have two in a row.) This wouldn't be a huge deal, except that disks take forever attempting to read bad sectors. 30 seconds per failed read is quite possible (some take longer, actually, and Linux times out, then you get to wait through a reset cycle, though strictly speaking that's admin error in failing to raise the Linux timeout) I'm not sure if there is *any* storage device which can have a 1-byte error. Wikipedia tells me there were once floppy formats with 128-byte sectors. It was apparently common enough with 8-inch and even 5??-inch floppies. (Did floppies have error-detecting codes? No idea). Also, I think without O_DIRECT, you can't actually get a bad block smaller than Linux's page size of 4k. So really it seems like -E ought to just skip 4k at once. Or at least there ought to be an option to specify the skip size. I'd suggest -E should take an optional argument, the skip size, and default to 4k. -- System Information: Debian Release: stretch/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'testing'), (500, 'stable'), (130, 'unstable'), (120, 'experimental'), (1, 'experimental-debug') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.6.0-1-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Init: systemd (via /run/systemd/system) Versions of packages pv depends on: ii libc6 2.24-3 pv recommends no packages. Versions of packages pv suggests: ii doc-base 0.10.7 -- no debconf information ---- Date: Thu, 20 Oct 2016 21:47:52 +0100 From: Andrew Wood To: Anthony DeRobertis Subject: Re: Bug#839796: Ability to specify size for --skip-errors I agree that this sounds like a good idea but there are some complications. What if we get an error while half-way through a block, because the disk is failing but not wholly failed? Do we skip a whole block, to half-way through the next block, or only skip to the start of the next block? How do we know where the next block is? Can we safely assume that all files on all filesystems always start at the beginning of a block? So perhaps if -E took an argument, we would explicitly state in the documentation that on error, PV would skip to the start of the next block, and would assume that the file starts at the beginning of a block. So with "-E 4k", an error at 2k would skip to 4k, an error at 4k would skip to 8k, and so on. Does this seem reasonable? ---- Date: Fri, 9 Dec 2016 16:11:48 -0500 From: Anthony DeRobertis To: Andrew Wood Subject: Re: Bug#839796: Ability to specify size for --skip-errors I lost track of this email, sorry for the very late response. On Thu, Oct 20, 2016 at 09:47:52PM +0100, Andrew Wood wrote: > I agree that this sounds like a good idea but there are some complications. > What if we get an error while half-way through a block, because the disk is > failing but not wholly failed? BTW: If -E is given the actual disk sector size, that can't happen. Also, unless using O_DIRECT, I believe it can't happen on Linux if it's given the page size (even if disk sectors are smaller). > Do we skip a whole block, to half-way > through the next block, or only skip to the start of the next block? How do > we know where the next block is? Can we safely assume that all files on all > filesystems always start at the beginning of a block? On almost every filesystem, yes. At least for files of non-trivial size. The performance of partial-sector writes is bad (forces an extra read first), so filesystems avoid that. The only exceptions you're likely to see are filesystems designed for tiny flash (embedded) and FUSE stuff (e.g., read-only access to archives). > So perhaps if -E took an argument, we would explicitly state in the > documentation that on error, PV would skip to the start of the next block, > and would assume that the file starts at the beginning of a block. So with > "-E 4k", an error at 2k would skip to 4k, an error at 4k would skip to 8k, > and so on. > > Does this seem reasonable? That sounds reasonable. Sensible, even.