From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Keith Busch <kbusch@kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
linux-bcachefs@vger.kernel.org, linux-block@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ
Date: Mon, 17 Mar 2025 13:57:53 -0400 [thread overview]
Message-ID: <yq1bjtzfyen.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <qhc7tpttpt57meqqyxrfuvvfaqg7hgrpivtwa5yxkvv22ubyia@ga3scmjr5kti> (Kent Overstreet's message of "Mon, 17 Mar 2025 11:43:46 -0400")
Kent,
>> At least for SCSI, given how FUA is usually implemented, I consider
>> it quite unlikely that two read operations back to back would somehow
>> cause different data to be transferred. Regardless of which flags you
>> use.
>
> Based on what, exactly?
Based on the fact that many devices will either blindly flush on FUA or
they'll do the equivalent of a media verify operation. In neither case
will you get different data returned. The emphasis for FUA is on media
durability, not caching.
In most implementations the cache isn't an optional memory buffer thingy
that can be sidestepped. It is the only access mechanism that exists
between the media and the host interface. Working memory if you will. So
bypassing the device cache is not really a good way to think about it.
The purpose of FUA is to ensure durability for future reads, it is a
media management flag. As such, any effect FUA may have on the device
cache is incidental.
For SCSI there is a different flag to specify caching behavior. That
flag is orthogonal to FUA and did not get carried over to NVMe.
> We _know_ devices are not perfect, and your claim that "it's quite
> unlikely that two reads back to back would return different data"
> amounts to claiming that there are no bugs in a good chunk of the IO
> path and all that is implemented perfectly.
I'm not saying that devices are perfect or that the standards make
sense. I'm just saying that your desired behavior does not match the
reality of how a large number of these devices are actually implemented.
The specs are largely written by device vendors and therefore
deliberately ambiguous. Many of the explicit cache management bits and
bobs have been removed from SCSI or are defined as hints because device
vendors don't want the OS to interfere with how they manage resources,
including caching. I get what your objective is. I just don't think FUA
offers sufficient guarantees in that department.
Also, given the amount of hardware checking done at the device level, my
experience tells me that you are way more likely to have undetected
corruption problems on the host side than inside the storage device. In
general storage devices implement very extensive checking on both
control and data paths. And they will return an error if there is a
mismatch (as opposed to returning random data).
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2025-03-17 17:58 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-11 20:15 [PATCH 00/14] better handling of checksum errors/bitrot Kent Overstreet
2025-03-11 20:15 ` [PATCH 01/14] bcachefs: Convert read path to standard error codes Kent Overstreet
2025-03-11 20:15 ` [PATCH 02/14] bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path Kent Overstreet
2025-03-11 20:15 ` [PATCH 03/14] bcachefs: Read error message now indicates if it was for an internal move Kent Overstreet
2025-03-11 20:15 ` [PATCH 04/14] bcachefs: BCH_ERR_data_read_buffer_too_small Kent Overstreet
2025-03-11 20:15 ` [PATCH 05/14] bcachefs: Return errors to top level bch2_rbio_retry() Kent Overstreet
2025-03-11 20:15 ` [PATCH 06/14] bcachefs: Print message on successful read retry Kent Overstreet
2025-03-11 20:15 ` [PATCH 07/14] bcachefs: Don't create bch_io_failures unless it's needed Kent Overstreet
2025-03-11 20:15 ` [PATCH 08/14] bcachefs: Checksum errors get additional retries Kent Overstreet
2025-03-11 20:15 ` [PATCH 09/14] bcachefs: __bch2_read() now takes a btree_trans Kent Overstreet
2025-03-11 20:15 ` [PATCH 10/14] bcachefs: Poison extents that can't be read due to checksum errors Kent Overstreet
2025-03-11 20:15 ` [PATCH 11/14] bcachefs: Data move can read from poisoned extents Kent Overstreet
2025-03-11 20:15 ` [PATCH 12/14] bcachefs: Debug params for data corruption injection Kent Overstreet
2025-03-11 20:15 ` [PATCH 13/14] block: Allow REQ_FUA|REQ_READ Kent Overstreet
2025-03-15 16:47 ` Jens Axboe
2025-03-15 17:01 ` Kent Overstreet
2025-03-15 17:03 ` Jens Axboe
2025-03-15 17:27 ` Kent Overstreet
2025-03-15 17:43 ` Jens Axboe
2025-03-15 18:07 ` Kent Overstreet
2025-03-15 18:32 ` Jens Axboe
2025-03-15 18:41 ` Kent Overstreet
2025-03-17 6:00 ` Christoph Hellwig
2025-03-17 12:15 ` Kent Overstreet
2025-03-17 14:13 ` Keith Busch
2025-03-17 14:49 ` Kent Overstreet
2025-03-17 15:15 ` Keith Busch
2025-03-17 15:22 ` Kent Overstreet
2025-03-17 15:30 ` Martin K. Petersen
2025-03-17 15:43 ` Kent Overstreet
2025-03-17 17:57 ` Martin K. Petersen [this message]
2025-03-17 18:21 ` Kent Overstreet
2025-03-17 19:24 ` Keith Busch
2025-03-17 19:40 ` Kent Overstreet
2025-03-17 20:39 ` Keith Busch
2025-03-17 21:13 ` Bart Van Assche
2025-03-18 1:06 ` Kent Overstreet
2025-03-18 6:16 ` Christoph Hellwig
2025-03-18 17:49 ` Bart Van Assche
2025-03-18 18:00 ` Kent Overstreet
2025-03-18 18:10 ` Keith Busch
2025-03-18 18:13 ` Kent Overstreet
2025-03-20 5:40 ` Christoph Hellwig
2025-03-20 10:28 ` Kent Overstreet
2025-03-18 0:27 ` Kent Overstreet
2025-03-18 6:11 ` Christoph Hellwig
2025-03-18 21:33 ` Kent Overstreet
2025-03-17 17:32 ` Keith Busch
2025-03-18 6:19 ` Christoph Hellwig
2025-03-18 6:01 ` Christoph Hellwig
2025-03-11 20:15 ` [PATCH 14/14] bcachefs: Read retries are after checksum errors now REQ_FUA Kent Overstreet
2025-03-17 20:55 ` [PATCH 00/14] better handling of checksum errors/bitrot John Stoffel
2025-03-17 21:12 ` errors compiling bcachefs-tools v1.20.0 on debian 12 John Stoffel
2025-03-17 21:48 ` Malte Schröder
2025-03-17 23:10 ` John Stoffel
2025-03-18 21:04 ` John Stoffel
2025-03-18 21:32 ` Malte Schröder
2025-03-19 14:16 ` John Stoffel
2025-03-24 15:25 ` Krzysztof Hajdamowicz
2025-03-26 13:45 ` John Stoffel
2025-03-18 1:15 ` [PATCH 00/14] better handling of checksum errors/bitrot Kent Overstreet
2025-03-18 14:47 ` John Stoffel
2025-03-20 17:15 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1bjtzfyen.fsf@ca-mkp.ca.oracle.com \
--to=martin.petersen@oracle.com \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox