From: Keith Busch <kbusch@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
linux-bcachefs@vger.kernel.org, linux-block@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ
Date: Mon, 17 Mar 2025 14:39:07 -0600 [thread overview]
Message-ID: <Z9iIa770ySTgKgp0@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <ysvt4npanz4qfoxm5cv627cq2ommq6rxpka6pkvl3l2crcatmc@ic7tn5tt7aw4>
On Mon, Mar 17, 2025 at 03:40:10PM -0400, Kent Overstreet wrote:
> On Mon, Mar 17, 2025 at 01:24:23PM -0600, Keith Busch wrote:
> > This is a bit dangerous to assume. I don't find anywhere in any nvme
> > specifications (also checked T10 SBC) with text saying anything similiar
> > to "bypass" in relation to the cache for FUA reads. I am reasonably
> > confident some vendors, especially ones developing active-active
> > controllers, will fight you to the their win on the spec committee for
> > this if you want to take it up in those forums.
>
> "Read will come direct from media" reads pretty clear to me.
>
> But even if it's not supported the way we want, I'm not seeing anything
> dangerous about using it this way. Worst case, our retries aren't as
> good as we want them to be, and it'll be an item to work on in the
> future.
I don't think you're appreciating the complications that active-active
and multi-host brings to the scenario. Those are why this is not the
forum to solve it. The specs need to be clear on the guarantees, and
what they currently guarnatee might provide some overlap with what
you're seeking in specific scenarios, but I really think (and I believe
Martin agrees) your use is outside its targeted use case.
> As long as drives aren't barfing when we give them a read fua (and so
> far they haven't when running this code), we're fine for now.
In this specific regard, I think its safe to assume the devices will
remain operational.
> > > If devices don't support the behaviour we want today, then nudging the
> > > drive manufacturers to support it is infinitely saner than getting a
> > > whole nother bit plumbed through the NVME standard, especially given
> > > that the letter of the spec does describe exactly what we want.
> >
> > I my experience, the storage standards committees are more aligned to
> > accomodate appliance vendors than anything Linux specific. Your desired
> > read behavior would almost certainly be a new TPAR in NVMe to get spec
> > defined behavior. It's not impossible, but I'll just say it is an uphill
> > battle and the end result may or may not look like what you have in
> > mind.
>
> I'm not so sure.
>
> If there are users out there depending on a different meaning of read
> fua, then yes, absolutely (and it sounds like Martin might have been
> alluding to that - but why wouldn't the write have been done fua? I'd
> want to hear more about that)
As I mentioned, READ FUA provides an optimization opportunity that
can be used instead of Write + Flush or WriteFUA when the host isn't
sure about the persistence needs at the time of the initial Write: it
can be used as a checkpoint on a specific block range that you may have
written and overwritten. This kind of "read" command provides a well
defined persistence barrier. Thinking of Read FUA as a barrier is better
aligned with how the standards and device makers intended it to be used.
> If, OTOH, this is just something that hasn't come up before - the
> language in the spec is already there, so once code is out there with
> enough users and a demonstrated use case then it might be a pretty
> simple nudge - "shoot down this range of the cache, don't just flush it"
> is a pretty simple code change, as far as these things go.
So you're telling me you've never written SSD firmware then waited for
the manufacturer to release it to your users? Yes, I jest, and maybe
YMMV; but relying on that process is putting your destiny in the wrong
hands.
next prev parent reply other threads:[~2025-03-17 20:39 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-11 20:15 [PATCH 00/14] better handling of checksum errors/bitrot Kent Overstreet
2025-03-11 20:15 ` [PATCH 01/14] bcachefs: Convert read path to standard error codes Kent Overstreet
2025-03-11 20:15 ` [PATCH 02/14] bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path Kent Overstreet
2025-03-11 20:15 ` [PATCH 03/14] bcachefs: Read error message now indicates if it was for an internal move Kent Overstreet
2025-03-11 20:15 ` [PATCH 04/14] bcachefs: BCH_ERR_data_read_buffer_too_small Kent Overstreet
2025-03-11 20:15 ` [PATCH 05/14] bcachefs: Return errors to top level bch2_rbio_retry() Kent Overstreet
2025-03-11 20:15 ` [PATCH 06/14] bcachefs: Print message on successful read retry Kent Overstreet
2025-03-11 20:15 ` [PATCH 07/14] bcachefs: Don't create bch_io_failures unless it's needed Kent Overstreet
2025-03-11 20:15 ` [PATCH 08/14] bcachefs: Checksum errors get additional retries Kent Overstreet
2025-03-11 20:15 ` [PATCH 09/14] bcachefs: __bch2_read() now takes a btree_trans Kent Overstreet
2025-03-11 20:15 ` [PATCH 10/14] bcachefs: Poison extents that can't be read due to checksum errors Kent Overstreet
2025-03-11 20:15 ` [PATCH 11/14] bcachefs: Data move can read from poisoned extents Kent Overstreet
2025-03-11 20:15 ` [PATCH 12/14] bcachefs: Debug params for data corruption injection Kent Overstreet
2025-03-11 20:15 ` [PATCH 13/14] block: Allow REQ_FUA|REQ_READ Kent Overstreet
2025-03-15 16:47 ` Jens Axboe
2025-03-15 17:01 ` Kent Overstreet
2025-03-15 17:03 ` Jens Axboe
2025-03-15 17:27 ` Kent Overstreet
2025-03-15 17:43 ` Jens Axboe
2025-03-15 18:07 ` Kent Overstreet
2025-03-15 18:32 ` Jens Axboe
2025-03-15 18:41 ` Kent Overstreet
2025-03-17 6:00 ` Christoph Hellwig
2025-03-17 12:15 ` Kent Overstreet
2025-03-17 14:13 ` Keith Busch
2025-03-17 14:49 ` Kent Overstreet
2025-03-17 15:15 ` Keith Busch
2025-03-17 15:22 ` Kent Overstreet
2025-03-17 15:30 ` Martin K. Petersen
2025-03-17 15:43 ` Kent Overstreet
2025-03-17 17:57 ` Martin K. Petersen
2025-03-17 18:21 ` Kent Overstreet
2025-03-17 19:24 ` Keith Busch
2025-03-17 19:40 ` Kent Overstreet
2025-03-17 20:39 ` Keith Busch [this message]
2025-03-17 21:13 ` Bart Van Assche
2025-03-18 1:06 ` Kent Overstreet
2025-03-18 6:16 ` Christoph Hellwig
2025-03-18 17:49 ` Bart Van Assche
2025-03-18 18:00 ` Kent Overstreet
2025-03-18 18:10 ` Keith Busch
2025-03-18 18:13 ` Kent Overstreet
2025-03-20 5:40 ` Christoph Hellwig
2025-03-20 10:28 ` Kent Overstreet
2025-03-18 0:27 ` Kent Overstreet
2025-03-18 6:11 ` Christoph Hellwig
2025-03-18 21:33 ` Kent Overstreet
2025-03-17 17:32 ` Keith Busch
2025-03-18 6:19 ` Christoph Hellwig
2025-03-18 6:01 ` Christoph Hellwig
2025-03-11 20:15 ` [PATCH 14/14] bcachefs: Read retries are after checksum errors now REQ_FUA Kent Overstreet
2025-03-17 20:55 ` [PATCH 00/14] better handling of checksum errors/bitrot John Stoffel
2025-03-17 21:12 ` errors compiling bcachefs-tools v1.20.0 on debian 12 John Stoffel
2025-03-17 21:48 ` Malte Schröder
2025-03-17 23:10 ` John Stoffel
2025-03-18 21:04 ` John Stoffel
2025-03-18 21:32 ` Malte Schröder
2025-03-19 14:16 ` John Stoffel
2025-03-24 15:25 ` Krzysztof Hajdamowicz
2025-03-26 13:45 ` John Stoffel
2025-03-18 1:15 ` [PATCH 00/14] better handling of checksum errors/bitrot Kent Overstreet
2025-03-18 14:47 ` John Stoffel
2025-03-20 17:15 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z9iIa770ySTgKgp0@kbusch-mbp.dhcp.thefacebook.com \
--to=kbusch@kernel.org \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox