* What would happen if the block device driver/firmware found some block of a bio is corrupted?
@ 2023-01-24 2:31 Qu Wenruo
2023-01-24 4:44 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-01-24 2:31 UTC (permalink / raw)
To: linux-block@vger.kernel.org, Linux FS Devel,
linux-block@vger.kernel.org
Hi,
I'm wondering what would happen if we submit a read bio containing
multiple sectors, while the block disk driver/firmware has internal
checksum and found just one sector is corrupted (mismatch with its
internal csum)?
For example, we submit a read bio sized 16KiB, and the device is in 4K
sector size (like most modern HDD/SSD).
The corruption happens at the 2nd sector of the 16KiB.
My instinct points to either of them:
A) Mark the whole 16KiB bio as BLK_STS_IOERR
This means even we have 3 good sectors, we have to treat them all as
errors.
B) Ignore the error mark the bio as BLK_STS_OK
This means higher layer must have extra ways to verify the contents.
But my concern is, if we go path A), it means after a read bio failure,
we should try read again with much smaller block size, until we hit a
failure with one sector.
IIRC VFS would do some retry, but otherwise the FS/driver layer needs to
do some internal work and hit an error, then they need to do the
split-and-retry manually.
On the other hand path B) seems more straightforward, but the problem is
also obvious. Thankfully most fses are already doing checksum for their
metadata at least.
So what's the common solution in real world for device drivers/firmware?
Path A/B or some other solution?
And should the upper layer do extra split-and-retry by themselves?
I know btrfs scrub code and repair is doing such split-and-retry, but
not 100% sure if this is really needed or helpful in real world.
Thanks,
Qu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 2:31 What would happen if the block device driver/firmware found some block of a bio is corrupted? Qu Wenruo
@ 2023-01-24 4:44 ` Keith Busch
2023-01-24 5:38 ` Qu Wenruo
0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2023-01-24 4:44 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-block@vger.kernel.org, Linux FS Devel
On Tue, Jan 24, 2023 at 10:31:47AM +0800, Qu Wenruo wrote:
> I'm wondering what would happen if we submit a read bio containing multiple
> sectors, while the block disk driver/firmware has internal checksum and
> found just one sector is corrupted (mismatch with its internal csum)?
>
> For example, we submit a read bio sized 16KiB, and the device is in 4K
> sector size (like most modern HDD/SSD).
> The corruption happens at the 2nd sector of the 16KiB.
>
> My instinct points to either of them:
>
> A) Mark the whole 16KiB bio as BLK_STS_IOERR
> This means even we have 3 good sectors, we have to treat them all as
> errors.
I believe BLK_STS_MEDIUM is the appropriate status for this scenario,
not IOERR. The MEDIUM errno is propogated up as ENODATA.
Finding specific failed sectors makes sense if a partial of the
originally requested data is useful. That's application specific, so the
retry logic should probably be driven at a higher level than the block
layer based on seeing a MEDIUM error.
Some protocols can report partial transfers. If you trust the device,
you could know the first unreadable sector and retry from there.
Some protocols like NVMe optionally support querying which sectors are
not readable. We're not making use of that in kernel, but these kinds of
features exist if you need to know which LBAs to exclude for future
retries.
Outside that, you could search for specific unrecoverable LBAs with
split retries till you find them all, divide-and-conquer.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 4:44 ` Keith Busch
@ 2023-01-24 5:38 ` Qu Wenruo
2023-01-24 6:32 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-01-24 5:38 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-block@vger.kernel.org, Linux FS Devel
On 2023/1/24 12:44, Keith Busch wrote:
> On Tue, Jan 24, 2023 at 10:31:47AM +0800, Qu Wenruo wrote:
>> I'm wondering what would happen if we submit a read bio containing multiple
>> sectors, while the block disk driver/firmware has internal checksum and
>> found just one sector is corrupted (mismatch with its internal csum)?
>>
>> For example, we submit a read bio sized 16KiB, and the device is in 4K
>> sector size (like most modern HDD/SSD).
>> The corruption happens at the 2nd sector of the 16KiB.
>>
>> My instinct points to either of them:
>>
>> A) Mark the whole 16KiB bio as BLK_STS_IOERR
>> This means even we have 3 good sectors, we have to treat them all as
>> errors.
>
> I believe BLK_STS_MEDIUM is the appropriate status for this scenario,
> not IOERR. The MEDIUM errno is propogated up as ENODATA.
>
> Finding specific failed sectors makes sense if a partial of the
> originally requested data is useful. That's application specific, so the
> retry logic should probably be driven at a higher level than the block
> layer based on seeing a MEDIUM error.
Thanks a lot, that indeed makes more sense.
The retry for file read is indeed triggered inside VFS, not fs/block/dm
layer itself.
Thanks,
Qu
>
> Some protocols can report partial transfers. If you trust the device,
> you could know the first unreadable sector and retry from there.
>
> Some protocols like NVMe optionally support querying which sectors are
> not readable. We're not making use of that in kernel, but these kinds of
> features exist if you need to know which LBAs to exclude for future
> retries.
>
> Outside that, you could search for specific unrecoverable LBAs with
> split retries till you find them all, divide-and-conquer.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 5:38 ` Qu Wenruo
@ 2023-01-24 6:32 ` Christoph Hellwig
2023-01-24 7:52 ` Qu Wenruo
2023-01-24 16:08 ` Matthew Wilcox
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2023-01-24 6:32 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Keith Busch, linux-block@vger.kernel.org, Linux FS Devel
On Tue, Jan 24, 2023 at 01:38:41PM +0800, Qu Wenruo wrote:
> The retry for file read is indeed triggered inside VFS, not fs/block/dm
> layer itself.
Well, it's really MM code. If ->readahead fails, we eventually fall
back to a single-page ->radpage. That might still be more than one
sector in some cases, but at least nicely narrows down the range.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 6:32 ` Christoph Hellwig
@ 2023-01-24 7:52 ` Qu Wenruo
2023-01-24 7:54 ` Christoph Hellwig
2023-01-24 16:08 ` Matthew Wilcox
1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-01-24 7:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-block@vger.kernel.org, Linux FS Devel
On 2023/1/24 14:32, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 01:38:41PM +0800, Qu Wenruo wrote:
>> The retry for file read is indeed triggered inside VFS, not fs/block/dm
>> layer itself.
>
> Well, it's really MM code. If ->readahead fails, we eventually fall
> back to a single-page ->radpage. That might still be more than one
> sector in some cases, but at least nicely narrows down the range.
This also means, if some internal work (like btrfs scrub) is not
triggered by MM, then we have to do the split all by ourselves...
Thanks,
Qu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 7:52 ` Qu Wenruo
@ 2023-01-24 7:54 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2023-01-24 7:54 UTC (permalink / raw)
To: Qu Wenruo
Cc: Christoph Hellwig, Keith Busch, linux-block@vger.kernel.org,
Linux FS Devel
On Tue, Jan 24, 2023 at 03:52:38PM +0800, Qu Wenruo wrote:
> This also means, if some internal work (like btrfs scrub) is not triggered
> by MM, then we have to do the split all by ourselves...
Yes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What would happen if the block device driver/firmware found some block of a bio is corrupted?
2023-01-24 6:32 ` Christoph Hellwig
2023-01-24 7:52 ` Qu Wenruo
@ 2023-01-24 16:08 ` Matthew Wilcox
1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2023-01-24 16:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Qu Wenruo, Keith Busch, linux-block@vger.kernel.org,
Linux FS Devel
On Mon, Jan 23, 2023 at 10:32:06PM -0800, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 01:38:41PM +0800, Qu Wenruo wrote:
> > The retry for file read is indeed triggered inside VFS, not fs/block/dm
> > layer itself.
>
> Well, it's really MM code. If ->readahead fails, we eventually fall
> back to a single-page ->radpage. That might still be more than one
> sector in some cases, but at least nicely narrows down the range.
I had code to split a large folio into single-page folios at one point,
but I don't believe that ever got merged. At least, I can't find any
trace of it in filemap.c or iomap/buffered_io.c. So I think we just
retry the ->read_folio() call each time.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-01-24 16:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-24 2:31 What would happen if the block device driver/firmware found some block of a bio is corrupted? Qu Wenruo
2023-01-24 4:44 ` Keith Busch
2023-01-24 5:38 ` Qu Wenruo
2023-01-24 6:32 ` Christoph Hellwig
2023-01-24 7:52 ` Qu Wenruo
2023-01-24 7:54 ` Christoph Hellwig
2023-01-24 16:08 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).