From: Niklas Cassel <cassel@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Alan Adamson <alan.adamson@oracle.com>,
John Garry <john.g.garry@oracle.com>,
Keith Busch <kbusch@kernel.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Jens Axboe <axboe@kernel.dk>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: What should we do about the nvme atomics mess?
Date: Tue, 8 Jul 2025 11:38:09 +0200 [thread overview]
Message-ID: <aGznAUVxSl5_Xa3E@ryzen> (raw)
In-Reply-To: <20250707141834.GA30198@lst.de>
On Mon, Jul 07, 2025 at 04:18:34PM +0200, Christoph Hellwig wrote:
> Hi all,
>
> I'm a bit lost on what to do about the sad state of NVMe atomic writes.
>
> As a short reminder the main issues are:
>
> 1) there is no flag on a command to request atomic (aka non-torn)
> behavior, instead writes adhering to the atomicy requirements will
> never be torn, and writes not adhering them can be torn any time.
> This differs from SCSI where atomic writes have to be be explicitly
> requested and fail when they can't be satisfied
> 2) the original way to indicate the main atomicy limit is the AWUPF
> field, which is in Identify Controller, but specified in logical
> blocks which only exist at a namespace layer. This a) lead to
> various problems because the limit is a mess when namespace have
> different logical block sizes, and it b) also causes additional
> issues because NVMe allows it to be different for different
> controllers in the same subsystem.
>
> Commit 8695f060a029 added some sanity checks to deal with issue 2b,
> but we kept running into more issues with it. Partially because
> the check wasn't quite correct, but also because we've gotten
> reports of controllers that change the AWUPF value when reformatting
> namespaces to deal with issue 2a.
>
> And I'm a bit lost on what to do here.
>
> We could:
>
> I. revert the check and the subsequent fixup. If you really want
> to use the nvme atomics you already better pray a lot anyway
> due to issue 1)
> II. limit the check to multi-controller subsystems
> III. don't allow atomics on controllers that only report AWUPF and
> limit support to controllers that support that more sanely
> defined NAWUPF
I like III.
But NVMe should probably push to deprecate AUWPF, and introduce a new field
that is like AUWPF but which is specified in a fixed unit, e.g. bytes or
CAP.MPSMIN. (I'm thinking of e.g. Zone Append Size Limit (ZASL) which is also
a per controller limit, but the value is specified in units of CAP.MPSMIN,
just like the Maximum Data Transfer Size (MDTS).)
Kind regards,
Niklas
next prev parent reply other threads:[~2025-07-08 9:38 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-07 14:18 What should we do about the nvme atomics mess? Christoph Hellwig
2025-07-07 14:24 ` Keith Busch
2025-07-07 15:26 ` Hannes Reinecke
2025-07-07 15:56 ` Keith Busch
2025-07-07 23:35 ` Chaitanya Kulkarni
2025-07-08 9:47 ` Christoph Hellwig
2025-07-08 15:19 ` Keith Busch
2025-07-08 1:27 ` Ming Lei
2025-07-08 2:27 ` Keith Busch
2025-07-08 2:46 ` Ming Lei
2025-07-08 2:56 ` Keith Busch
2025-07-08 3:17 ` Ming Lei
2025-07-08 9:38 ` Niklas Cassel [this message]
2025-07-08 9:48 ` Christoph Hellwig
2025-07-08 10:08 ` John Garry
2025-07-09 7:51 ` Nilay Shroff
2025-07-09 21:28 ` Keith Busch
2025-07-10 5:07 ` Nilay Shroff
2025-07-10 7:17 ` Christoph Hellwig
2025-10-20 13:42 ` John Garry
2025-10-21 15:02 ` Nilay Shroff
2025-10-22 8:50 ` John Garry
2025-10-22 15:24 ` Nilay Shroff
2025-12-08 12:11 ` Nilay Shroff
2025-12-09 8:26 ` John Garry
2026-01-22 10:06 ` Nilay Shroff
2026-01-22 10:16 ` John Garry
2026-01-26 12:56 ` Christoph Hellwig
2026-01-26 12:58 ` John Garry
2026-01-26 13:01 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGznAUVxSl5_Xa3E@ryzen \
--to=cassel@kernel.org \
--cc=alan.adamson@oracle.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.