public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/2] NVMe Atomic Write fixes
@ 2025-04-30 17:18 Alan Adamson
  2025-04-30 17:18 ` [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited Alan Adamson
  2025-04-30 17:18 ` [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size Alan Adamson
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Adamson @ 2025-04-30 17:18 UTC (permalink / raw)
  To: linux-nvme; +Cc: alan.adamson, kch, kbusch, hch, sagi

This patch set includes 2 fixes for NVMe Atomic Writes that are reproducable
when CMIC.MCTRS (multi-controller support) is set:

    nvme: multipath: atomic queue limits need to be inherited for multipathing
    nvme: all namespaces in a subsystem must adhere to a common atomic write size

QEMU v10.0 can be used to reproduce and validate the fixes.


nvme: multipath: atomic queue limits need to be inherited for multipathing
--------------------------------------------------------------------------
- Include BLK_FEAT_ATOMIC_WRITES feature when allocating multipath disk (nvme_mpath_alloc_disk).


QEMU Config
===========
-device nvme,id=nvme-ctrl-0,serial=nvme-1,atomic.dn=off,atomic.awun=31,atomic.awupf=15 \
    -drive file=/dev/nullb2,if=none,id=nvm-1 \
    -device nvme-ns,drive=nvm-1,bus=nvme-ctrl-0,nsid=1 \
-device nvme,id=nvme-ctrl-1,serial=nvme-2,atomic.dn=off,atomic.awun=31,atomic.awupf=7 \
    -drive file=/dev/nullb3,if=none,id=nvm-2 \
    -device nvme-ns,drive=nvm-2,bus=nvme-ctrl-1,nsid=2 \
-device nvme,id=nvme-ctrl-2,serial=nvme-3,atomic.dn=off,atomic.awun=127,atomic.awupf=63 \
    -drive file=/dev/nullb4,if=none,id=nvm-3 \
    -device nvme-ns,drive=nvm-3,bus=nvme-ctrl-2 \


Before
======
[root@localhost ~]# nvme id-ctrl /dev/nvme1n1 | grep cmic
cmic      : 0x2
[root@localhost ~]# nvme id-ctrl /dev/nvme1n1 | grep awupf
awupf     : 63
[root@localhost ~]# nvme id-ns /dev/nvme1n1 | grep nawupf
nawupf  : 0
[root@localhost ~]# cat /sys/block/nvme1n1/queue/atomic_write_max_bytes
0
[root@localhost ~]# 

AFTER
=====
[root@localhost ~]# nvme id-ctrl /dev/nvme1n1 | grep cmic
cmic      : 0x2
[root@localhost ~]# nvme id-ctrl /dev/nvme1n1 | grep awupf
awupf     : 63
[root@localhost ~]# nvme id-ns /dev/nvme1n1 | grep nawupf
nawupf  : 0
[root@localhost ~]#  cat /sys/block/nvme1n1/queue/atomic_write_max_bytes
32768
[root@localhost ~]# 






nvme: all namespaces in a subsystem must adhere to a common atomic write size
------------------------------------------------------------------------------
- Replace awupf field in nvme_subsystem struct with atomic_bs.  The atomic_bs 
value with awupf or nawupf. (nvme_update_disk_info)
- When a namespace is added, the atomic write size (from awupf or nawupf) is
checked with subsys->atomic_bs.  If they don't match, the atomic write size is
set to 512 (1U << head->lba_shift) and a message in logged. (nvme_update_disk_info)


QEMU Config
===========
-device nvme-subsys,id=subsys0 \
    -device nvme,serial=deadbeef,id=nvme0,subsys=subsys0,atomic.dn=off,atomic.awun=31,atomic.awupf=15 \
        -drive id=ns1,file=/dev/nullb1,if=none \
        -device nvme-ns,drive=ns1,bus=nvme0,nsid=1,zoned=false,shared=false \
    -device nvme,serial=deadbeef,id=nvme1,subsys=subsys0,atomic.dn=off,atomic.awun=63,atomic.awupf=31 \
        -drive id=ns2,file=/dev/nullb2,if=none \
        -device nvme-ns,drive=ns2,bus=nvme1,nsid=2,zoned=false,shared=false \


BEFORE
======
[root@localhost ~]# nvme id-ctrl /dev/nvme0n1 | grep cmic
cmic      : 0x2
[root@localhost ~]#  nvme id-ctrl /dev/nvme0n1 | grep awupf
awupf     : 15
[root@localhost ~]# nvme id-ns /dev/nvme0n1 | grep nawupf
nawupf  : 0
[root@localhost ~]# cat /sys/block/nvme0n1/queue/atomic_write_max_bytes
8192
[root@localhost ~]# nvme id-ctrl /dev/nvme0n2 | grep cmic
cmic      : 0x2
[root@localhost ~]# nvme id-ctrl /dev/nvme0n2 | grep awupf
awupf     : 31
[root@localhost ~]# nvme id-ns /dev/nvme0n2 | grep nawupf
nawupf  : 0
[root@localhost ~]# cat /sys/block/nvme0n2/queue/atomic_write_max_bytes
8192
[root@localhost ~]# 

AFTER
=====
[root@localhost ~]# nvme id-ctrl /dev/nvme0n1 | grep cmic
cmic      : 0x2
[root@localhost ~]# nvme id-ctrl /dev/nvme0n1 | grep awupf
awupf     : 15
[root@localhost ~]# nvme id-ns /dev/nvme0n1 | grep nawupf
nawupf  : 0
[root@localhost ~]# cat /sys/block/nvme0n1/queue/atomic_write_max_bytes
8192
[root@localhost ~]#  nvme id-ctrl /dev/nvme0n2 | grep cmic
cmic      : 0x2
[root@localhost ~]#  nvme id-ctrl /dev/nvme0n2 | grep awupf
awupf     : 31
[root@localhost ~]# nvme id-ns /dev/nvme0n2 | grep nawupf
nawupf  : 0
[root@localhost ~]# cat /sys/block/nvme0n2/queue/atomic_write_max_bytes
512

Console Message:
[    2.852814] nvme0c1n2: Inconsistent Atomic Write Size: Subsystem=8192 bytes, Controller/Namespace=16384 bytes




Alan Adamson (2):
  nvme: multipath: atomic queue limits need to be inherited for
    multipathing
  nvme: all namespaces in a subsystem must adhere to a common atomic
    write size

 drivers/nvme/host/core.c       | 20 +++++++++++++++++---
 drivers/nvme/host/multipath.c  |  3 ++-
 drivers/nvme/host/nvme.h       |  3 ++-
 drivers/ras/amd/atl/internal.h |  3 +++
 drivers/ras/amd/atl/umc.c      | 19 +++++++++++++++++--
 drivers/ras/amd/fmpm.c         |  9 ++++++++-
 6 files changed, 49 insertions(+), 8 deletions(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited
  2025-04-30 17:18 [PATCH 0/2] NVMe Atomic Write fixes Alan Adamson
@ 2025-04-30 17:18 ` Alan Adamson
  2025-05-01  5:47   ` John Garry
  2025-04-30 17:18 ` [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size Alan Adamson
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Adamson @ 2025-04-30 17:18 UTC (permalink / raw)
  To: linux-nvme; +Cc: alan.adamson, kch, kbusch, hch, sagi

When a controller is attached that has the CMIC.MCTRS bit set, it indicates
the subsystem supports multiple controllers and it is possible a namespace
can be shared between those multiple controllers in a multipathed
configuration.

When a namespace of a CMIC.MCTRS enabled subsystem is allocated, a
multipath node is created.  The queue limits for this node are inherited
from the namespace being allocated. When inheriting queue limits, the
features being inherited need to be specified. The atomic write feature
(BLK_FEAT_ATOMIC_WRITES) was not specified so the atomic queue limits
were not inherited by the multipath disk node which resulted in the sysfs
atomic write attributes being zeroed. The fix is to include
BLK_FEAT_ATOMIC_WRITES in the list of features to be inherited.

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/nvme/host/multipath.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 250f3da67cc9..cf0ef4745564 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -638,7 +638,8 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 
 	blk_set_stacking_limits(&lim);
 	lim.dma_alignment = 3;
-	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL;
+	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT |
+		BLK_FEAT_POLL | BLK_FEAT_ATOMIC_WRITES;
 	if (head->ids.csi == NVME_CSI_ZNS)
 		lim.features |= BLK_FEAT_ZONED;
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size
  2025-04-30 17:18 [PATCH 0/2] NVMe Atomic Write fixes Alan Adamson
  2025-04-30 17:18 ` [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited Alan Adamson
@ 2025-04-30 17:18 ` Alan Adamson
  2025-05-02  6:52   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Adamson @ 2025-04-30 17:18 UTC (permalink / raw)
  To: linux-nvme; +Cc: alan.adamson, kch, kbusch, hch, sagi

The first namespace that is configured in a subsystem will define the
subsystem's atomic write size.  This will be based on either AWUPF or
NAWUPF. Configuring subsequent namespaces in the subsystem requires its
atomic write size as defined by its AWUPF or NAWUPF to match the
subsystem's atomic write size. If a namespace doesn't adhere to the
subsystem's atomic write size, its atomic queue limits will be based on an
atomic write size of 512 bytes and an error message will be logged.

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
---
 drivers/nvme/host/core.c | 20 +++++++++++++++++---
 drivers/nvme/host/nvme.h |  3 ++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index eb6ea8acb3cc..e0fe53b919a8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2059,7 +2058,21 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
 		if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf)
 			atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs;
 		else
-			atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs;
+			atomic_bs = (1 + ns->ctrl->awupf) * bs;
+
+		/*
+		 * Set subsystem atomic bs.
+		 */
+		if (ns->ctrl->subsys->atomic_bs) {
+			if (atomic_bs != ns->ctrl->subsys->atomic_bs) {
+				pr_err_ratelimited("%s: Inconsistent Atomic Write Size: Subsystem=%d bytes, Controller/Namespace=%d bytes\n",
+					ns->disk ? ns->disk->disk_name : "?",
+					ns->ctrl->subsys->atomic_bs,
+					atomic_bs);
+				atomic_bs = bs;
+			}
+		} else
+			ns->ctrl->subsys->atomic_bs = atomic_bs;
 
 		nvme_update_atomic_write_disk_info(ns, id, lim, bs, atomic_bs);
 	}
@@ -3031,7 +3045,6 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		kfree(subsys);
 		return -EINVAL;
 	}
-	subsys->awupf = le16_to_cpu(id->awupf);
 	nvme_mpath_default_iopolicy(subsys);
 
 	subsys->dev.class = &nvme_subsys_class;
@@ -3441,7 +3454,7 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl)
 		dev_pm_qos_expose_latency_tolerance(ctrl->device);
 	else if (!ctrl->apst_enabled && prev_apst_enabled)
 		dev_pm_qos_hide_latency_tolerance(ctrl->device);
-
+	ctrl->awupf = le16_to_cpu(id->awupf);
 out_free:
 	kfree(id);
 	return ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 51e078642127..ff1d94468613 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -410,6 +410,7 @@ struct nvme_ctrl {
 
 	enum nvme_ctrl_type cntrltype;
 	enum nvme_dctype dctype;
+	u16 awupf;
 };
 
 static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl)
@@ -442,11 +443,11 @@ struct nvme_subsystem {
 	u8			cmic;
 	enum nvme_subsys_type	subtype;
 	u16			vendor_id;
-	u16			awupf;	/* 0's based awupf value. */
 	struct ida		ns_ida;
 #ifdef CONFIG_NVME_MULTIPATH
 	enum nvme_iopolicy	iopolicy;
 #endif
+	u32			atomic_bs;
 };
 
 /*
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited
  2025-04-30 17:18 ` [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited Alan Adamson
@ 2025-05-01  5:47   ` John Garry
  2025-05-01  6:06     ` John Garry
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2025-05-01  5:47 UTC (permalink / raw)
  To: Alan Adamson, linux-nvme; +Cc: kch, kbusch, hch, sagi

On 30/04/2025 18:18, Alan Adamson wrote:

Thanks for doing this.

> When a controller is attached that has the CMIC.MCTRS bit set, it indicates
> the subsystem supports multiple controllers and it is possible a namespace
> can be shared between those multiple controllers in a multipathed
> configuration.
> 
> When a namespace of a CMIC.MCTRS enabled subsystem is allocated, a
> multipath node is created.  The queue limits for this node are inherited
> from the namespace being allocated. When inheriting queue limits, the
> features being inherited need to be specified. The atomic write feature
> (BLK_FEAT_ATOMIC_WRITES) was not specified so the atomic queue limits
> were not inherited by the multipath disk node which resulted in the sysfs
> atomic write attributes being zeroed. The fix is to include
> BLK_FEAT_ATOMIC_WRITES in the list of features to be inherited.

nit: is this really being inherited? It seems to be just be explicitly 
enabled.

> 
> Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Please add the following instead Signed-off-by:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   drivers/nvme/host/multipath.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 250f3da67cc9..cf0ef4745564 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -638,7 +638,8 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>   
>   	blk_set_stacking_limits(&lim);
>   	lim.dma_alignment = 3;
> -	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL;
> +	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT |
> +		BLK_FEAT_POLL | BLK_FEAT_ATOMIC_WRITES;
>   	if (head->ids.csi == NVME_CSI_ZNS)
>   		lim.features |= BLK_FEAT_ZONED;
>   



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited
  2025-05-01  5:47   ` John Garry
@ 2025-05-01  6:06     ` John Garry
  0 siblings, 0 replies; 8+ messages in thread
From: John Garry @ 2025-05-01  6:06 UTC (permalink / raw)
  To: Alan Adamson, linux-nvme; +Cc: kch, kbusch, hch, sagi

On 01/05/2025 06:47, John Garry wrote:
> When a controller is attached that has the CMIC.MCTRS bit set, it indicates
> the subsystem supports multiple controllers and it is possible a namespace
> can be shared between those multiple controllers in a multipathed
> configuration.
> 
> When a namespace of a CMIC.MCTRS enabled subsystem is allocated, a
> multipath node is created.  The queue limits for this node are inherited
> from the namespace being allocated. When inheriting queue limits, the
> features being inherited need to be specified. The atomic write feature
> (BLK_FEAT_ATOMIC_WRITES) was not specified so the atomic queue limits
> were not inherited by the multipath disk node which resulted in the sysfs
> atomic write attributes being zeroed. The fix is to include
> BLK_FEAT_ATOMIC_WRITES in the list of features to be inherited.

BTW, I think that it is worth repeating that latest QEMU has CMIC 
enabled for MP, so we can't use atomic writes without this change. But I 
doubt whether QEMU should enable this always.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size
  2025-04-30 17:18 ` [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size Alan Adamson
@ 2025-05-02  6:52   ` Christoph Hellwig
  2025-05-06 15:57     ` alan.adamson
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-05-02  6:52 UTC (permalink / raw)
  To: Alan Adamson; +Cc: linux-nvme, kch, kbusch, hch, sagi

On Wed, Apr 30, 2025 at 10:18:30AM -0700, Alan Adamson wrote:
> The first namespace that is configured in a subsystem will define the
> subsystem's atomic write size.  This will be based on either AWUPF or
> NAWUPF. Configuring subsequent namespaces in the subsystem requires its
> atomic write size as defined by its AWUPF or NAWUPF to match the
> subsystem's atomic write size. If a namespace doesn't adhere to the
> subsystem's atomic write size, its atomic queue limits will be based on an
> atomic write size of 512 bytes and an error message will be logged.

I don't think an error message is enough, we need to reject the probe
for this buggy configuration to prevent data integrity issues.

Otherwise the series looks good to me.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size
  2025-05-02  6:52   ` Christoph Hellwig
@ 2025-05-06 15:57     ` alan.adamson
  2025-05-07  6:46       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: alan.adamson @ 2025-05-06 15:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, kch, kbusch, sagi


On 5/1/25 11:52 PM, Christoph Hellwig wrote:
> On Wed, Apr 30, 2025 at 10:18:30AM -0700, Alan Adamson wrote:
>> The first namespace that is configured in a subsystem will define the
>> subsystem's atomic write size.  This will be based on either AWUPF or
>> NAWUPF. Configuring subsequent namespaces in the subsystem requires its
>> atomic write size as defined by its AWUPF or NAWUPF to match the
>> subsystem's atomic write size. If a namespace doesn't adhere to the
>> subsystem's atomic write size, its atomic queue limits will be based on an
>> atomic write size of 512 bytes and an error message will be logged.

Along with the error message, we are setting atomic_write_max_bytes to 
512b which will at least allow the device to still be used.

Alan

> I don't think an error message is enough, we need to reject the probe
> for this buggy configuration to prevent data integrity issues.
>
> Otherwise the series looks good to me.
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size
  2025-05-06 15:57     ` alan.adamson
@ 2025-05-07  6:46       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2025-05-07  6:46 UTC (permalink / raw)
  To: alan.adamson; +Cc: Christoph Hellwig, linux-nvme, kch, kbusch, sagi

On Tue, May 06, 2025 at 08:57:35AM -0700, alan.adamson@oracle.com wrote:
>
> On 5/1/25 11:52 PM, Christoph Hellwig wrote:
>> On Wed, Apr 30, 2025 at 10:18:30AM -0700, Alan Adamson wrote:
>>> The first namespace that is configured in a subsystem will define the
>>> subsystem's atomic write size.  This will be based on either AWUPF or
>>> NAWUPF. Configuring subsequent namespaces in the subsystem requires its
>>> atomic write size as defined by its AWUPF or NAWUPF to match the
>>> subsystem's atomic write size. If a namespace doesn't adhere to the
>>> subsystem's atomic write size, its atomic queue limits will be based on an
>>> atomic write size of 512 bytes and an error message will be logged.
>
> Along with the error message, we are setting atomic_write_max_bytes to 512b 
> which will at least allow the device to still be used.

But that means you change the limit when later adding a new path for
a namespace, and file systems do not expect it to change underneath
them.  This is especially fatal because nvme will not even fail
writes not fitting into the atomic boundary.

So we can't do anything but reject adding the new controller if the
limit is smaller than the existing limit.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-07  6:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 17:18 [PATCH 0/2] NVMe Atomic Write fixes Alan Adamson
2025-04-30 17:18 ` [PATCH 1/2] nvme: multipath: atomic queue limits need to be inherited Alan Adamson
2025-05-01  5:47   ` John Garry
2025-05-01  6:06     ` John Garry
2025-04-30 17:18 ` [PATCH 2/2] nvme: all namespaces in a subsystem must adhere to a common atomic write size Alan Adamson
2025-05-02  6:52   ` Christoph Hellwig
2025-05-06 15:57     ` alan.adamson
2025-05-07  6:46       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox