From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
Alan Adamson <alan.adamson@oracle.com>,
John Garry <john.g.garry@oracle.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Jens Axboe <axboe@kernel.dk>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: What should we do about the nvme atomics mess?
Date: Tue, 8 Jul 2025 11:17:54 +0800 [thread overview]
Message-ID: <aGyN4pOWyclgV6-H@fedora> (raw)
In-Reply-To: <aGyI-sl68Y0klsJn@kbusch-mbp>
On Mon, Jul 07, 2025 at 08:56:58PM -0600, Keith Busch wrote:
> On Tue, Jul 08, 2025 at 10:46:06AM +0800, Ming Lei wrote:
> > On Mon, Jul 07, 2025 at 08:27:43PM -0600, Keith Busch wrote:
> > > On Tue, Jul 08, 2025 at 09:27:06AM +0800, Ming Lei wrote:
> > > > On Mon, Jul 07, 2025 at 04:18:34PM +0200, Christoph Hellwig wrote:
> > > > > Hi all,
> > > > >
> > > > > I'm a bit lost on what to do about the sad state of NVMe atomic writes.
> > > > >
> > > > > As a short reminder the main issues are:
> > > > >
> > > > > 1) there is no flag on a command to request atomic (aka non-torn)
> > > > > behavior, instead writes adhering to the atomicy requirements will
> > > > > never be torn, and writes not adhering them can be torn any time.
> > > > > This differs from SCSI where atomic writes have to be be explicitly
> > > > > requested and fail when they can't be satisfied
> > > > > 2) the original way to indicate the main atomicy limit is the AWUPF
> > > > > field, which is in Identify Controller, but specified in logical
> > > > > blocks which only exist at a namespace layer. This a) lead to
> > > >
> > > > If controller-wide AWUPF is a must property, the length has to be aligned
> > > > with block size.
> > >
> > > What block size? The controller doesn't have one. Block sizes are
> >
> > It should be any NS format's block size.
>
> That requires an artificial reduction to a meaningless value.
Any value has to be 'block size' aligned.
>
> > > properties of namespaces, not controllers or subsystems. If you have 10
> > > namespaces with 10 different block formats, what does AUWPF mean? If the
> > > controller must report something, the only rational thing it could
> > > declare is reduced to the greatest common denominator, which is out of
> > > sync with the true value reported in the appropriately scoped NAUWPF
> > > value.
> >
> > Yes, please see the words I quoted from NVMe spec, also `6.4 Atomic Operations`
> > mentioned: `NAWUPF >= AWUPF`.
>
> The problem is when Namespace X changes its format that then alters
> Namesace Y's reported atomic size. That's unacceptable for any
> filesystem utilizing this feature.
When X changes its format, FS has to be umount.
The actual length(byte unit) of atomic write does not changed for Y,
just the unit(block size) is changed, at least from Yi's report.
Thanks,
Ming
next prev parent reply other threads:[~2025-07-08 3:18 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-07 14:18 What should we do about the nvme atomics mess? Christoph Hellwig
2025-07-07 14:24 ` Keith Busch
2025-07-07 15:26 ` Hannes Reinecke
2025-07-07 15:56 ` Keith Busch
2025-07-07 23:35 ` Chaitanya Kulkarni
2025-07-08 9:47 ` Christoph Hellwig
2025-07-08 15:19 ` Keith Busch
2025-07-08 1:27 ` Ming Lei
2025-07-08 2:27 ` Keith Busch
2025-07-08 2:46 ` Ming Lei
2025-07-08 2:56 ` Keith Busch
2025-07-08 3:17 ` Ming Lei [this message]
2025-07-08 9:38 ` Niklas Cassel
2025-07-08 9:48 ` Christoph Hellwig
2025-07-08 10:08 ` John Garry
2025-07-09 7:51 ` Nilay Shroff
2025-07-09 21:28 ` Keith Busch
2025-07-10 5:07 ` Nilay Shroff
2025-07-10 7:17 ` Christoph Hellwig
2025-10-20 13:42 ` John Garry
2025-10-21 15:02 ` Nilay Shroff
2025-10-22 8:50 ` John Garry
2025-10-22 15:24 ` Nilay Shroff
2025-12-08 12:11 ` Nilay Shroff
2025-12-09 8:26 ` John Garry
2026-01-22 10:06 ` Nilay Shroff
2026-01-22 10:16 ` John Garry
2026-01-26 12:56 ` Christoph Hellwig
2026-01-26 12:58 ` John Garry
2026-01-26 13:01 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGyN4pOWyclgV6-H@fedora \
--to=ming.lei@redhat.com \
--cc=alan.adamson@oracle.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox