linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* What should we do about the nvme atomics mess?
@ 2025-07-07 14:18 Christoph Hellwig
  2025-07-07 14:24 ` Keith Busch
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Christoph Hellwig @ 2025-07-07 14:18 UTC (permalink / raw)
  To: Alan Adamson, John Garry, Keith Busch, Martin K. Petersen,
	Jens Axboe
  Cc: linux-nvme, linux-block

Hi all,

I'm a bit lost on what to do about the sad state of NVMe atomic writes.

As a short reminder the main issues are:

 1) there is no flag on a command to request atomic (aka non-torn)
    behavior, instead writes adhering to the atomicy requirements will
    never be torn, and writes not adhering them can be torn any time.
    This differs from SCSI where atomic writes have to be be explicitly
    requested and fail when they can't be satisfied
 2) the original way to indicate the main atomicy limit is the AWUPF
    field, which is in Identify Controller, but specified in logical
    blocks which only exist at a namespace layer.  This a) lead to
    various problems because the limit is a mess when namespace have
    different logical block sizes, and it b) also causes additional
    issues because NVMe allows it to be different for different
    controllers in the same subsystem.

Commit 8695f060a029 added some sanity checks to deal with issue 2b,
but we kept running into more issues with it.  Partially because
the check wasn't quite correct, but also because we've gotten
reports of controllers that change the AWUPF value when reformatting
namespaces to deal with issue 2a.

And I'm a bit lost on what to do here.

We could:

 I.	 revert the check and the subsequent fixup.  If you really want
         to use the nvme atomics you already better pray a lot anyway
	 due to issue 1)
 II.	 limit the check to multi-controller subsystems
 III.	 don't allow atomics on controllers that only report AWUPF and
 	 limit support to controllers that support that more sanely
	 defined NAWUPF

I guess for 6.16 we are limited to I. to bring us back to the previous
state, but I have a really bad gut feeling about it given the really
bad spec language and a lot of low quality NVMe implementations we're
seeing these days.
 not the 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-12-09  8:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 14:18 What should we do about the nvme atomics mess? Christoph Hellwig
2025-07-07 14:24 ` Keith Busch
2025-07-07 15:26   ` Hannes Reinecke
2025-07-07 15:56     ` Keith Busch
2025-07-07 23:35       ` Chaitanya Kulkarni
2025-07-08  9:47       ` Christoph Hellwig
2025-07-08 15:19         ` Keith Busch
2025-07-08  1:27 ` Ming Lei
2025-07-08  2:27   ` Keith Busch
2025-07-08  2:46     ` Ming Lei
2025-07-08  2:56       ` Keith Busch
2025-07-08  3:17         ` Ming Lei
2025-07-08  9:38 ` Niklas Cassel
2025-07-08  9:48   ` Christoph Hellwig
2025-07-08 10:08 ` John Garry
2025-07-09  7:51 ` Nilay Shroff
2025-07-09 21:28   ` Keith Busch
2025-07-10  5:07     ` Nilay Shroff
2025-07-10  7:17       ` Christoph Hellwig
2025-10-20 13:42       ` John Garry
2025-10-21 15:02         ` Nilay Shroff
2025-10-22  8:50           ` John Garry
2025-10-22 15:24             ` Nilay Shroff
2025-12-08 12:11               ` Nilay Shroff
2025-12-09  8:26                 ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).