* [PATCH] block: Allow REQ_FUA|REQ_READ
@ 2025-03-11 13:35 Kent Overstreet
2025-03-11 14:53 ` Niklas Cassel
2025-03-11 15:02 ` Martin K. Petersen
0 siblings, 2 replies; 5+ messages in thread
From: Kent Overstreet @ 2025-03-11 13:35 UTC (permalink / raw)
Cc: Kent Overstreet, Jens Axboe, linux-block
REQ_FUA|REQ_READ means "do a read that bypasses the controller cache",
the same as writes.
This is useful for when the filesystem gets a checksum error, it's
possible that a bit was flipped in the controller cache, and when we
retry we want to retry the entire IO, not just from cache.
The nvme driver already passes through REQ_FUA for reads, not just
writes, so disabling the warning is sufficient to start using it, and
bcachefs is implementing additional retries for checksum errors so can
immediately use it.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
block/blk-core.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d6c4fa3943b5..7b1103eb877d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -793,20 +793,21 @@ void submit_bio_noacct(struct bio *bio)
goto end_io;
}
+ if (WARN_ON_ONCE((bio->bi_opf & REQ_PREFLUSH) &&
+ bio_op(bio) != REQ_OP_WRITE &&
+ bio_op(bio) != REQ_OP_ZONE_APPEND))
+ goto end_io;
+
/*
* Filter flush bio's early so that bio based drivers without flush
* support don't have to worry about them.
*/
- if (op_is_flush(bio->bi_opf)) {
- if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
- bio_op(bio) != REQ_OP_ZONE_APPEND))
+ if (op_is_flush(bio->bi_opf) &&
+ !bdev_write_cache(bdev)) {
+ bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
+ if (!bio_sectors(bio)) {
+ status = BLK_STS_OK;
goto end_io;
- if (!bdev_write_cache(bdev)) {
- bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
- if (!bio_sectors(bio)) {
- status = BLK_STS_OK;
- goto end_io;
- }
}
}
--
2.47.2
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] block: Allow REQ_FUA|REQ_READ
2025-03-11 13:35 [PATCH] block: Allow REQ_FUA|REQ_READ Kent Overstreet
@ 2025-03-11 14:53 ` Niklas Cassel
2025-03-11 15:09 ` Kent Overstreet
2025-03-11 15:13 ` Keith Busch
2025-03-11 15:02 ` Martin K. Petersen
1 sibling, 2 replies; 5+ messages in thread
From: Niklas Cassel @ 2025-03-11 14:53 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Jens Axboe, linux-block, Damien Le Moal
Hello Kent,
On Tue, Mar 11, 2025 at 09:35:16AM -0400, Kent Overstreet wrote:
> REQ_FUA|REQ_READ means "do a read that bypasses the controller cache",
> the same as writes.
>
> This is useful for when the filesystem gets a checksum error, it's
> possible that a bit was flipped in the controller cache, and when we
> retry we want to retry the entire IO, not just from cache.
Looking at ATA Command Set - 6 (ACS-6),
7.23 READ FPDMA QUEUED - 60h
"""
If the Forced Unit Access (FUA) bit is set to one, the device shall retrieve
the data from the non-volatile media regardless of whether the device holds
the requested information in the volatile cache.
If the device holds a modified copy of the requested data as a result of
having volatile cached writes, the modified data shall be written to the
non-volatile media before being retrieved from the non-volatile media as
part of this operation.
"""
So IIUC, at least for ATA, if something is corrupted in the volatile
write cache, setting the FUA bit will ensure that the corruption will
get propagated to the non-volatile media.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: Allow REQ_FUA|REQ_READ
2025-03-11 14:53 ` Niklas Cassel
@ 2025-03-11 15:09 ` Kent Overstreet
2025-03-11 15:13 ` Keith Busch
1 sibling, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2025-03-11 15:09 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Jens Axboe, linux-block, Damien Le Moal
On Tue, Mar 11, 2025 at 03:53:51PM +0100, Niklas Cassel wrote:
> Hello Kent,
>
> On Tue, Mar 11, 2025 at 09:35:16AM -0400, Kent Overstreet wrote:
> > REQ_FUA|REQ_READ means "do a read that bypasses the controller cache",
> > the same as writes.
> >
> > This is useful for when the filesystem gets a checksum error, it's
> > possible that a bit was flipped in the controller cache, and when we
> > retry we want to retry the entire IO, not just from cache.
>
> Looking at ATA Command Set - 6 (ACS-6),
> 7.23 READ FPDMA QUEUED - 60h
>
> """
> If the Forced Unit Access (FUA) bit is set to one, the device shall retrieve
> the data from the non-volatile media regardless of whether the device holds
> the requested information in the volatile cache.
>
> If the device holds a modified copy of the requested data as a result of
> having volatile cached writes, the modified data shall be written to the
> non-volatile media before being retrieved from the non-volatile media as
> part of this operation.
> """
>
> So IIUC, at least for ATA, if something is corrupted in the volatile
> write cache, setting the FUA bit will ensure that the corruption will
> get propagated to the non-volatile media.
Corrupted in the volatile writeback cache is not the expected case - we
don't usually read data we've just written.
Generally, the data will be in the device's cache only because we just
read it, and we don't want to retry the read from the device cache.
It's possible that either the device's cache was faulty, or there was a
transient error in reading from flash (SSD ec algorithms are effectively
probabilistic these days), so in either case retrying with a FUA read
has a much better chance of clearing a transient error and correctly
reading the bad data.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: Allow REQ_FUA|REQ_READ
2025-03-11 14:53 ` Niklas Cassel
2025-03-11 15:09 ` Kent Overstreet
@ 2025-03-11 15:13 ` Keith Busch
1 sibling, 0 replies; 5+ messages in thread
From: Keith Busch @ 2025-03-11 15:13 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Kent Overstreet, Jens Axboe, linux-block, Damien Le Moal
On Tue, Mar 11, 2025 at 03:53:51PM +0100, Niklas Cassel wrote:
> So IIUC, at least for ATA, if something is corrupted in the volatile
> write cache, setting the FUA bit will ensure that the corruption will
> get propagated to the non-volatile media.
It's not necessarily going to corrupt persistent media if the volatile
write cache is corrupted. It should just apply only to data written to
the volatile write cache that hasn't been flushed to the media. If the
device reads persisted media data into a corrupted cache, the FUA here
should catch that and see good data.
But if you haven't flushed previous writes from a corrupted volatile
write cache, then I believe you're right: the read FUA will presist that
corruption. Same is true for NVMe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: Allow REQ_FUA|REQ_READ
2025-03-11 13:35 [PATCH] block: Allow REQ_FUA|REQ_READ Kent Overstreet
2025-03-11 14:53 ` Niklas Cassel
@ 2025-03-11 15:02 ` Martin K. Petersen
1 sibling, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2025-03-11 15:02 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Jens Axboe, linux-block
Kent,
> REQ_FUA|REQ_READ means "do a read that bypasses the controller cache",
> the same as writes.
FUA does not bypass anything. On the contrary, A FUA READ causes data in
the volatile cache (if any) to be flushed to non-volatile storage
(either cache or media or both). So I'm afraid it has the exact opposite
effect of what you are looking for.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-03-11 15:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 13:35 [PATCH] block: Allow REQ_FUA|REQ_READ Kent Overstreet
2025-03-11 14:53 ` Niklas Cassel
2025-03-11 15:09 ` Kent Overstreet
2025-03-11 15:13 ` Keith Busch
2025-03-11 15:02 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox