[PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration
@ 2026-04-10  7:39 Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

This patchset tries to address some limitations in how the NVMe driver handles
command timeouts.
Currently, the driver relies heavily on global module parameters
(NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT), making it difficult for users to
tune timeouts for specific controllers that may have very different
characteristics. Also, in some cases, manual changes to sysfs timeout values
are ignored by the driver logic.

For example this patchset removes the unconditional timeout assignment in
nvme_init_request. This allows the block layer to correctly apply the request
queue's timeout settings, ensuring that user-initiated changes via sysfs
are actually respected for all requests.

It introduces new sysfs attributes (admin_timeout and io_timeout) to the NVMe
controller. This allows users to configure distinct timeout requirements for
different controllers rather than relying on global module parameters.

Some examples:

Changes to the controller's io_timeout gets propagated to all
the associated namespaces' queues:

# find /sys -name 'io_timeout' 
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout

# echo 27000 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
27000

When adding a namespace target-side, the io_timeout is inherited from
the controller's preferred timeout:

* target side *
# nvmetcli
/> cd subsystems/test-nqn/namespaces/4 
/subsystems/t.../namespaces/4> enable
The Namespace has been enabled.

************

* Host-side *
nvme nvme0: rescanning namespaces.
# find /sys -name 'io_timeout' 
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout <-- new namespace
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout

# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout
27000
***********

io_timeout and admin_timeout module parameters are used as default
values for new controllers:

# nvme connect -t tcp -a 10.37.153.138 -s 8000 -n test-nqn2
connecting to device: nvme1

# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/nvme1c1n1/queue/io_timeout
30000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/admin_timeout
60000

V3: - Rebase on top of nvme 7.1 branch
    - add an admin_timeout variable to nvme_ctrl structure
    - Wait until the controller has reached the LIVE state
      for the first time before allowing the user to modify
      the timeouts, this prevents the dereferencing of the admin_q,
      fabrics_q and connect_q queues before their initialization.
    - move blk_put_queue(fabrics_q) to nvme_free_ctrl() to align it to
      admin_q teardown
    - modify nvmet-loop to avoid deleting and re-creating the admin_q queue
      when the controller enters the resetting state.
    - add a warning if nvme_alloc_admin_tag_set() is called twice against
      the same controller

V2: - Drop the RFC tag
    - apply the timeout settings to fabrics_q and connect_q too
    - Code style fixes
    - remove unnecessary check for null admin_q in __nvme_delete_io_queues()
    - Use DEVICE_ATTR() macro

Heyne, Maximilian (1):
  nvme: Let the blocklayer set timeouts for requests

Maurizio Lombardi (7):
  nvme: add sysfs attribute to change admin timeout per nvme controller
  nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  nvme: add sysfs attribute to change IO timeout per nvme controller
  nvme: use per controller timeout waits over depending on global
    default
  nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
  nvmet-loop: do not alloc admin tag set during reset
  nvme-core: warn on allocating admin tag set with existing queue

 drivers/nvme/host/apple.c  |  2 +-
 drivers/nvme/host/core.c   | 23 +++++-----
 drivers/nvme/host/nvme.h   |  4 +-
 drivers/nvme/host/pci.c    |  4 +-
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/host/sysfs.c  | 89 ++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp.c    |  2 +-
 drivers/nvme/target/loop.c | 38 ++++++++--------
 8 files changed, 128 insertions(+), 36 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-22  9:58   ` Daniel Wagner
  2026-04-24 13:38   ` Christoph Hellwig
  2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

From: "Heyne, Maximilian" <mheyne@amazon.de>

When initializing an nvme request which is about to be send to the block
layer, we do not need to initialize its timeout. If it's left
uninitialized at 0 the block layer will use the request queue's timeout
in blk_add_timer (via nvme_start_request which is called from
nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
NVME_ADMIN_TIMEOUT when the request queues were created.

Because the io_timeout of the IO queues can be modified via sysfs, the
following situation can occur:

1) NVME_IO_TIMEOUT = 30 (default module parameter)
2) nvme1n1 is probed. IO queues default timeout is 30 s
3) manually change the IO timeout to 90 s
   echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
   commands with the 30 s timeout instead of the wanted 90 s which might
   be more suitable for this device.

Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
changed the behavior for ioctl's already because it unconditionally
overrides the request's timeout that was set in nvme_init_request. If it
was unset by the user of the ioctl if will be overridden with 0 meaning
the block layer will pick the request queue's IO timeout.

Following up on that, this patch further improves the consistency of IO
timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
could be inconsistent with what is set in the device's request_queue by
the user.

Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b42d8768d297..111f9a226952 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
 		struct nvme_ns *ns = req->q->disk->private_data;

 		logging_enabled = ns->head->passthru_err_log_enabled;
-		req->timeout = NVME_IO_TIMEOUT;
 	} else { /* no queuedata implies admin queue */
 		logging_enabled = nr->ctrl->passthru_err_log_enabled;
-		req->timeout = NVME_ADMIN_TIMEOUT;
 	}

 	if (!logging_enabled)
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-22 10:10   ` Daniel Wagner
  2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, there is no method to adjust the timeout values
on a per controller basis with nvme admin queues.
Add an admin_timeout attribute to nvme so that different
nvme controllers which may have different timeout
requirements can have custom admin timeouts set.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c  |  1 +
 drivers/nvme/host/nvme.h  |  1 +
 drivers/nvme/host/sysfs.c | 42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 111f9a226952..6dc3d273623c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5135,6 +5135,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	ctrl->ka_last_check_time = jiffies;
+	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
 
 	BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
 			PAGE_SIZE);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ccd5e05dac98..9da3ebebe9c8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -370,6 +370,7 @@ struct nvme_ctrl {
 	u16 mtfa;
 	u32 ctrl_config;
 	u32 queue_count;
+	u32 admin_timeout;
 
 	u64 cap;
 	u32 max_hw_sectors;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 7bf2e972126b..000c29bd1f1f 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -623,6 +623,47 @@ static ssize_t quirks_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(quirks);
 
+static ssize_t nvme_admin_timeout_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%u\n",
+				jiffies_to_msecs(ctrl->admin_timeout));
+}
+
+static ssize_t nvme_admin_timeout_store(struct device *dev,
+			struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	u32 timeout;
+	int err;
+
+	/*
+	 * Wait until the controller reaches the LIVE state
+	 * to be sure that admin_q and fabrics_q are
+	 * properly initialized.
+	 */
+	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+		return -EBUSY;
+
+	err = kstrtou32(buf, 10, &timeout);
+	if (err || !timeout)
+		return -EINVAL;
+
+	ctrl->admin_timeout = msecs_to_jiffies(timeout);
+
+	blk_queue_rq_timeout(ctrl->admin_q, ctrl->admin_timeout);
+	if (ctrl->fabrics_q)
+		blk_queue_rq_timeout(ctrl->fabrics_q, ctrl->admin_timeout);
+
+	return count;
+}
+
+static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
+	nvme_admin_timeout_show, nvme_admin_timeout_store);
+
 #ifdef CONFIG_NVME_HOST_AUTH
 static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
@@ -765,6 +806,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_cntrltype.attr,
 	&dev_attr_dctype.attr,
 	&dev_attr_quirks.attr,
+	&dev_attr_admin_timeout.attr,
 #ifdef CONFIG_NVME_HOST_AUTH
 	&dev_attr_dhchap_secret.attr,
 	&dev_attr_dhchap_ctrl_secret.attr,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-27  9:05   ` Daniel Wagner
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

While tearing down its queues, nvme-pci uses NVME_ADMIN_TIMEOUT as its
timeout target. Instead, use the configured admin queue's timeout value
to match the device's existing timeout setting.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9aa19255b041..379c75c6d35a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3091,7 +3091,7 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
 	unsigned long timeout;
 
  retry:
-	timeout = NVME_ADMIN_TIMEOUT;
+	timeout = dev->ctrl.admin_timeout;
 	while (nr_queues > 0) {
 		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
 			break;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (2 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-24 13:40   ` Christoph Hellwig
                     ` (2 more replies)
  2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, there is no method to adjust the timeout values on a
per controller basis with nvme I/O queues.
Add an io_timeout attribute to nvme so that different nvme controllers
which may have different timeout requirements can have custom
I/O timeouts set.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/nvme.h  |  1 +
 drivers/nvme/host/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6dc3d273623c..1b676fd4d454 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4197,6 +4197,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 		mutex_unlock(&ctrl->namespaces_lock);
 		goto out_unlink_ns;
 	}
+	blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
 	nvme_ns_add_to_ctrl_list(ns);
 	mutex_unlock(&ctrl->namespaces_lock);
 	synchronize_srcu(&ctrl->srcu);
@@ -5136,6 +5137,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	ctrl->ka_last_check_time = jiffies;
 	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->io_timeout = NVME_IO_TIMEOUT;
 
 	BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
 			PAGE_SIZE);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9da3ebebe9c8..a6d998c2e0e5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -371,6 +371,7 @@ struct nvme_ctrl {
 	u32 ctrl_config;
 	u32 queue_count;
 	u32 admin_timeout;
+	u32 io_timeout;
 
 	u64 cap;
 	u32 max_hw_sectors;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 000c29bd1f1f..1e6a0eecb30e 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -664,6 +664,52 @@ static ssize_t nvme_admin_timeout_store(struct device *dev,
 static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
 	nvme_admin_timeout_show, nvme_admin_timeout_store);
 
+static ssize_t nvme_io_timeout_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%u\n", jiffies_to_msecs(ctrl->io_timeout));
+}
+
+static ssize_t nvme_io_timeout_store(struct device *dev,
+			struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	struct nvme_ns *ns;
+	u32 timeout;
+	int err;
+
+	/*
+	 * Wait until the controller reaches the LIVE state
+	 * to be sure that connect_q is properly initialized.
+	 */
+	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+		return -EBUSY;
+
+	err = kstrtou32(buf, 10, &timeout);
+	if (err || !timeout)
+		return -EINVAL;
+
+	/* Take the namespaces_lock to avoid racing against nvme_alloc_ns() */
+	mutex_lock(&ctrl->namespaces_lock);
+
+	ctrl->io_timeout = msecs_to_jiffies(timeout);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
+
+	if (ctrl->connect_q)
+		blk_queue_rq_timeout(ctrl->connect_q, ctrl->io_timeout);
+
+	mutex_unlock(&ctrl->namespaces_lock);
+
+	return count;
+}
+
+static DEVICE_ATTR(io_timeout, S_IRUGO | S_IWUSR,
+	nvme_io_timeout_show, nvme_io_timeout_store);
+
 #ifdef CONFIG_NVME_HOST_AUTH
 static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
@@ -807,6 +853,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_dctype.attr,
 	&dev_attr_quirks.attr,
 	&dev_attr_admin_timeout.attr,
+	&dev_attr_io_timeout.attr,
 #ifdef CONFIG_NVME_HOST_AUTH
 	&dev_attr_dhchap_secret.attr,
 	&dev_attr_dhchap_ctrl_secret.attr,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (3 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-27  9:22   ` Daniel Wagner
  2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
nvme_wait_freeze_timeout, use the controller's preferred timeout.

Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/apple.c | 2 +-
 drivers/nvme/host/core.c  | 5 +++--
 drivers/nvme/host/nvme.h  | 2 +-
 drivers/nvme/host/pci.c   | 2 +-
 drivers/nvme/host/rdma.c  | 2 +-
 drivers/nvme/host/tcp.c   | 2 +-
 6 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index ed61b97fde59..1958f39484d9 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -858,7 +858,7 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
 	 * doing a safe shutdown.
 	 */
 	if (!dead && shutdown && freeze)
-		nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+		nvme_wait_freeze_timeout(&anv->ctrl);
 
 	nvme_quiesce_io_queues(&anv->ctrl);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1b676fd4d454..f801d5475cd6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5244,8 +5244,9 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
 {
+	unsigned long timeout = ctrl->io_timeout;
 	struct nvme_ns *ns;
 	int srcu_idx;
 
@@ -5253,7 +5254,7 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 	list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
 				 srcu_read_lock_held(&ctrl->srcu)) {
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
-		if (timeout <= 0)
+		if (!timeout)
 			break;
 	}
 	srcu_read_unlock(&ctrl->srcu, srcu_idx);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a6d998c2e0e5..9ccaed0b9dbf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -902,7 +902,7 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 static inline enum req_op nvme_req_op(struct nvme_command *cmd)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 379c75c6d35a..dcc554895429 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3273,7 +3273,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * if doing a safe shutdown.
 		 */
 		if (!dead && shutdown)
-			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+			nvme_wait_freeze_timeout(&dev->ctrl);
 	}
 
 	nvme_quiesce_io_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f77c960f7632..bf73135c1439 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -888,7 +888,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(&ctrl->ctrl);
 		nvme_unquiesce_io_queues(&ctrl->ctrl);
-		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(&ctrl->ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 02c95c32b07e..e921130ec25f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2194,7 +2194,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(ctrl);
 		nvme_unquiesce_io_queues(ctrl);
-		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (4 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-27  9:35   ` Daniel Wagner
  2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, the final reference for the fabrics admin queue (fabrics_q)
is dropped inside nvme_remove_admin_tag_set(). However, the primary
admin queue (admin_q) defers dropping its final reference until
nvme_free_ctrl().

Move the blk_put_queue() call for fabrics_q from nvme_remove_admin_tag_set()
to nvme_free_ctrl(). This aligns the lifecycle management of both admin
queues, ensuring they are freed symmetrically when the controller is finally
torn down.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f801d5475cd6..f3aa7b29d2a2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4927,10 +4927,8 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
 	 */
 	nvme_stop_keep_alive(ctrl);
 	blk_mq_destroy_queue(ctrl->admin_q);
-	if (ctrl->ops->flags & NVME_F_FABRICS) {
+	if (ctrl->ops->flags & NVME_F_FABRICS)
 		blk_mq_destroy_queue(ctrl->fabrics_q);
-		blk_put_queue(ctrl->fabrics_q);
-	}
 	blk_mq_free_tag_set(ctrl->admin_tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_admin_tag_set);
@@ -5072,6 +5070,8 @@ static void nvme_free_ctrl(struct device *dev)
 
 	if (ctrl->admin_q)
 		blk_put_queue(ctrl->admin_q);
+	if (ctrl->fabrics_q)
+		blk_put_queue(ctrl->fabrics_q);
 	if (!subsys || ctrl->instance != subsys->instance)
 		ida_free(&nvme_instance_ida, ctrl->instance);
 	nvme_free_cels(ctrl);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (5 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-27  9:46   ` Daniel Wagner
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
  2026-04-13  8:12 ` [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Hannes Reinecke
  8 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, resetting a loopback controller unconditionally invokes
nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
Doing so drops the old queue and allocates a new one. Consequently,
this reverts the admin queue's timeout (q->rq_timeout) back to the
module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
timeout values the user may have configured via sysfs and potentially
racing against the sysfs nvme_admin_timeout_store() function
that may dereference the admin_q pointer during the RESETTING state.

Add a 'new' boolean parameter to nvme_loop_configure_admin_queue() to
distinguish between an initial probe and a reset. Only allocate the
admin tag set during the initial controller creation and do not
remove it when resetting the controller.
This ensures the existing queues and their sysfs-configured timeouts
remain intact across controller resets.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/target/loop.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index d98d0cdc5d6f..c9cf0ddc1b85 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -261,7 +261,7 @@ static const struct blk_mq_ops nvme_loop_admin_mq_ops = {
 	.init_hctx	= nvme_loop_init_admin_hctx,
 };
 
-static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl, bool remove)
 {
 	if (!test_and_clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags))
 		return;
@@ -274,7 +274,8 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
 	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
-	nvme_remove_admin_tag_set(&ctrl->ctrl);
+	if (remove)
+		nvme_remove_admin_tag_set(&ctrl->ctrl);
 }
 
 static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
@@ -361,7 +362,7 @@ static int nvme_loop_connect_io_queues(struct nvme_loop_ctrl *ctrl)
 	return 0;
 }
 
-static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
+static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl, bool new)
 {
 	int error;
 
@@ -375,12 +376,15 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	}
 	ctrl->ctrl.queue_count = 1;
 
-	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
-			&nvme_loop_admin_mq_ops,
-			sizeof(struct nvme_loop_iod) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
-	if (error)
-		goto out_free_sq;
+	if (new) {
+		error = nvme_alloc_admin_tag_set(&ctrl->ctrl,
+				&ctrl->admin_tag_set,
+				&nvme_loop_admin_mq_ops,
+				sizeof(struct nvme_loop_iod) +
+				NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
+		if (error)
+			goto out_free_sq;
+	}
 
 	/* reset stopped state for the fresh admin queue */
 	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
@@ -415,7 +419,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	return error;
 }
 
-static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl, bool remove)
 {
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_quiesce_io_queues(&ctrl->ctrl);
@@ -426,12 +430,12 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 	if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_LIVE)
 		nvme_disable_ctrl(&ctrl->ctrl, true);
 
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, remove);
 }
 
 static void nvme_loop_delete_ctrl_host(struct nvme_ctrl *ctrl)
 {
-	nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl));
+	nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl), true);
 }
 
 static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
@@ -453,7 +457,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 	int ret;
 
 	nvme_stop_ctrl(&ctrl->ctrl);
-	nvme_loop_shutdown_ctrl(ctrl);
+	nvme_loop_shutdown_ctrl(ctrl, false);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
@@ -465,7 +469,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 		return;
 	}
 
-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_loop_configure_admin_queue(ctrl, false);
 	if (ret)
 		goto out_disable;
 
@@ -492,7 +496,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 out_destroy_admin:
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, true);
 out_disable:
 	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
 	nvme_uninit_ctrl(&ctrl->ctrl);
@@ -594,7 +598,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;
 
-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_loop_configure_admin_queue(ctrl, true);
 	if (ret)
 		goto out_free_queues;
 
@@ -632,7 +636,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 out_remove_admin_queue:
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, true);
 out_free_queues:
 	kfree(ctrl->queues);
 out_uninit_ctrl:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (6 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-24 13:40   ` Christoph Hellwig
  2026-04-27  9:54   ` Daniel Wagner
  2026-04-13  8:12 ` [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Hannes Reinecke
  8 siblings, 2 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, nvme_alloc_admin_tag_set() silently drops and releases
the existing admin_q if it called on a controller that already
had one (e.g., during a controller reset).

However, transport drivers should not be reallocating the admin tag
set and queue during a reset. Dropping the old queue and allocating
a new one destroys user-configured timeouts and may race against
nvme_admin_timeout_store()

Since all transport drivers are now expected to preserve the admin queue
across resets, calling nvme_alloc_admin_tag_set() when ctrl->admin_q
is already populated is a bug.

Remove the silent cleanup and replace it with a WARN_ON_ONCE() to
explicitly catch any transport drivers that violate this lifecycle rule

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f3aa7b29d2a2..ae084373a79b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4884,12 +4884,7 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 	if (ret)
 		return ret;
 
-	/*
-	 * If a previous admin queue exists (e.g., from before a reset),
-	 * put it now before allocating a new one to avoid orphaning it.
-	 */
-	if (ctrl->admin_q)
-		blk_put_queue(ctrl->admin_q);
+	WARN_ON_ONCE(ctrl->admin_q);
 
 	ctrl->admin_q = blk_mq_alloc_queue(set, &lim, NULL);
 	if (IS_ERR(ctrl->admin_q)) {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (7 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
@ 2026-04-13  8:12 ` Hannes Reinecke
  2026-04-13  9:21   ` Maurizio Lombardi
  8 siblings, 1 reply; 38+ messages in thread
From: Hannes Reinecke @ 2026-04-13  8:12 UTC (permalink / raw)
  To: Maurizio Lombardi, kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

On 4/10/26 09:39, Maurizio Lombardi wrote:
> This patchset tries to address some limitations in how the NVMe driver handles
> command timeouts.
> Currently, the driver relies heavily on global module parameters
> (NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT), making it difficult for users to
> tune timeouts for specific controllers that may have very different
> characteristics. Also, in some cases, manual changes to sysfs timeout values
> are ignored by the driver logic.
> 
> For example this patchset removes the unconditional timeout assignment in
> nvme_init_request. This allows the block layer to correctly apply the request
> queue's timeout settings, ensuring that user-initiated changes via sysfs
> are actually respected for all requests.
> 
> It introduces new sysfs attributes (admin_timeout and io_timeout) to the NVMe
> controller. This allows users to configure distinct timeout requirements for
> different controllers rather than relying on global module parameters.
> 
> Some examples:
> 
> Changes to the controller's io_timeout gets propagated to all
> the associated namespaces' queues:
> 
> # find /sys -name 'io_timeout'
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
> 
> # echo 27000 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
> 27000
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
> 27000
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
> 27000
> 
> When adding a namespace target-side, the io_timeout is inherited from
> the controller's preferred timeout:
> 
> * target side *
> # nvmetcli
> /> cd subsystems/test-nqn/namespaces/4
> /subsystems/t.../namespaces/4> enable
> The Namespace has been enabled.
> 
> ************
> 
> * Host-side *
> nvme nvme0: rescanning namespaces.
> # find /sys -name 'io_timeout'
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout <-- new namespace
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
> 
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout
> 27000
> ***********
> 
> io_timeout and admin_timeout module parameters are used as default
> values for new controllers:
> 
> # nvme connect -t tcp -a 10.37.153.138 -s 8000 -n test-nqn2
> connecting to device: nvme1
> 
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/nvme1c1n1/queue/io_timeout
> 30000
> # cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/admin_timeout
> 60000
> 
> V3: - Rebase on top of nvme 7.1 branch
>      - add an admin_timeout variable to nvme_ctrl structure
>      - Wait until the controller has reached the LIVE state
>        for the first time before allowing the user to modify
>        the timeouts, this prevents the dereferencing of the admin_q,
>        fabrics_q and connect_q queues before their initialization.
>      - move blk_put_queue(fabrics_q) to nvme_free_ctrl() to align it to
>        admin_q teardown
>      - modify nvmet-loop to avoid deleting and re-creating the admin_q queue
>        when the controller enters the resetting state.
>      - add a warning if nvme_alloc_admin_tag_set() is called twice against
>        the same controller
> 
> V2: - Drop the RFC tag
>      - apply the timeout settings to fabrics_q and connect_q too
>      - Code style fixes
>      - remove unnecessary check for null admin_q in __nvme_delete_io_queues()
>      - Use DEVICE_ATTR() macro
> 
What about KATO?
With this patchset the user can set arbitrary values to the I/O timeout,
which easily can be lower than KATO.
And as per spec a KATO timeout implies a transport disruption, requiring
a controller reset.
But due to the internal design of the nvme error handling we do conflate
transport disruption and command timeout, so an _I/O_ timeout triggers
a controller reset.
Which means that a command timeout lower than KATO will result in false
positives, with the controller being reset even though the connection
is perfectly happy.

If you were to go that way of making the I/O timeout configurable I
strongly suggest to implement command abort first (hello John ;-),
as the you can simply cancel a command which ran into a timeout
without having to reset the controller.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration
  2026-04-13  8:12 ` [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Hannes Reinecke
@ 2026-04-13  9:21   ` Maurizio Lombardi
  2026-04-14 19:14     ` John Meneghini
  0 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-13  9:21 UTC (permalink / raw)
  To: Hannes Reinecke, Maurizio Lombardi, kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Mon Apr 13, 2026 at 10:12 AM CEST, Hannes Reinecke wrote:
> On 4/10/26 09:39, Maurizio Lombardi wrote:
>> This patchset tries to address some limitations in how the NVMe driver handles
>> command timeouts.
>> Currently, the driver relies heavily on global module parameters
>> (NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT), making it difficult for users to
>> tune timeouts for specific controllers that may have very different
>> characteristics. Also, in some cases, manual changes to sysfs timeout values
>> are ignored by the driver logic.
>> 
>> For example this patchset removes the unconditional timeout assignment in
>> nvme_init_request. This allows the block layer to correctly apply the request
>> queue's timeout settings, ensuring that user-initiated changes via sysfs
>> are actually respected for all requests.
>> 
>> It introduces new sysfs attributes (admin_timeout and io_timeout) to the NVMe
>> controller. This allows users to configure distinct timeout requirements for
>> different controllers rather than relying on global module parameters.
>> 
> What about KATO?
> With this patchset the user can set arbitrary values to the I/O timeout,
> which easily can be lower than KATO.

it's worth noting that unless I am missing something the user can already
trigger this exact scenario today by setting an I/O timeout lower than KATO,
using the global nvme_io_timeout module parameter. 

> And as per spec a KATO timeout implies a transport disruption, requiring
> a controller reset.
> But due to the internal design of the nvme error handling we do conflate
> transport disruption and command timeout, so an _I/O_ timeout triggers
> a controller reset.

this is true only for TCP and RDMA host drivers,
because PCI and FC already support I/O aborts.

Apple doesn't support abort, but it's not clear in the comments if it's
the driver that lacks support for it or it's the controller that
doesn't handle the abort command.

> Which means that a command timeout lower than KATO will result in false
> positives, with the controller being reset even though the connection
> is perfectly happy.

Right.

We could try to send abort commands in RDMA and TCP host drivers timeout handlers.
Maybe cancel, if supported, and falling back to abort if cancel
commands are not available. I already had patches for this kind of
stuff.

Maurizio


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration
  2026-04-13  9:21   ` Maurizio Lombardi
@ 2026-04-14 19:14     ` John Meneghini
  0 siblings, 0 replies; 38+ messages in thread
From: John Meneghini @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Maurizio Lombardi, Hannes Reinecke, Maurizio Lombardi, kbusch
  Cc: mheyne, emilne, linux-nvme, dwagner, mkhalfella, chaitanyak, hare,
	hch

On 4/13/26 5:21 AM, Maurizio Lombardi wrote:
> it's worth noting that unless I am missing something the user can already
> trigger this exact scenario today by setting an I/O timeout lower than KATO,
> using the global nvme_io_timeout module parameter.

This is true.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
@ 2026-04-22  9:58   ` Daniel Wagner
  2026-04-24 13:38   ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-22  9:58 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:17AM +0200, Maurizio Lombardi wrote:
> From: "Heyne, Maximilian" <mheyne@amazon.de>
> 
> When initializing an nvme request which is about to be send to the block
> layer, we do not need to initialize its timeout. If it's left
> uninitialized at 0 the block layer will use the request queue's timeout
> in blk_add_timer (via nvme_start_request which is called from
> nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
> NVME_ADMIN_TIMEOUT when the request queues were created.
> 
> Because the io_timeout of the IO queues can be modified via sysfs, the
> following situation can occur:
> 
> 1) NVME_IO_TIMEOUT = 30 (default module parameter)
> 2) nvme1n1 is probed. IO queues default timeout is 30 s
> 3) manually change the IO timeout to 90 s
>    echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
> 4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
>    commands with the 30 s timeout instead of the wanted 90 s which might
>    be more suitable for this device.
> 
> Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
> changed the behavior for ioctl's already because it unconditionally
> overrides the request's timeout that was set in nvme_init_request. If it
> was unset by the user of the ioctl if will be overridden with 0 meaning
> the block layer will pick the request queue's IO timeout.
> 
> Following up on that, this patch further improves the consistency of IO
> timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
> could be inconsistent with what is set in the device's request_queue by
> the user.
> 
> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Signed-off-by: Maximilian Heyne <mheyne@amazon.de>

Maurizio, you need to add your SoB too.

Besides this,
Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
@ 2026-04-22 10:10   ` Daniel Wagner
  2026-04-22 11:08     ` Maurizio Lombardi
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Wagner @ 2026-04-22 10:10 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:18AM +0200, Maurizio Lombardi wrote:
> +	/*
> +	 * Wait until the controller reaches the LIVE state
> +	 * to be sure that admin_q and fabrics_q are
> +	 * properly initialized.
> +	 */
> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
> +		return -EBUSY;

I assume the idea is to use an udev rule to set the value. Though I
think this is racy:

	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
	set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-22 10:10   ` Daniel Wagner
@ 2026-04-22 11:08     ` Maurizio Lombardi
  2026-04-23 12:46       ` Daniel Wagner
  0 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-22 11:08 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Wed Apr 22, 2026 at 12:10 PM CEST, Daniel Wagner wrote:
> On Fri, Apr 10, 2026 at 09:39:18AM +0200, Maurizio Lombardi wrote:
>> +	/*
>> +	 * Wait until the controller reaches the LIVE state
>> +	 * to be sure that admin_q and fabrics_q are
>> +	 * properly initialized.
>> +	 */
>> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
>> +		return -EBUSY;
>
> I assume the idea is to use an udev rule to set the value. Though I
> think this is racy:
>
> 	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
> 	set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);


Not really, the idea was about to prevent the user from setting the
timeouts too early during the controller's initialization.

Otherwise the user could try to change the timeout while
admin_q and fabrics_q are still being created.

if the NVME_CTRL_STARTED_ONCE bit is set, we know that the
initialization has been completed and that admin_q and fabrics_q will
exist until the controller will be torn down.


Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-22 11:08     ` Maurizio Lombardi
@ 2026-04-23 12:46       ` Daniel Wagner
  2026-04-23 12:58         ` Maurizio Lombardi
  2026-04-23 13:45         ` Hannes Reinecke
  0 siblings, 2 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-23 12:46 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Wed, Apr 22, 2026 at 01:08:44PM +0200, Maurizio Lombardi wrote:
> On Wed Apr 22, 2026 at 12:10 PM CEST, Daniel Wagner wrote:
> > On Fri, Apr 10, 2026 at 09:39:18AM +0200, Maurizio Lombardi wrote:
> >> +	/*
> >> +	 * Wait until the controller reaches the LIVE state
> >> +	 * to be sure that admin_q and fabrics_q are
> >> +	 * properly initialized.
> >> +	 */
> >> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
> >> +		return -EBUSY;
> >
> > I assume the idea is to use an udev rule to set the value. Though I
> > think this is racy:
> >
> > 	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
> > 	set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
> 
> 
> Not really, the idea was about to prevent the user from setting the
> timeouts too early during the controller's initialization.

Yes, but then how do you know when is too early? Do you read/poll the
state of the controller? 

> Otherwise the user could try to change the timeout while
> admin_q and fabrics_q are still being created.
> 
> if the NVME_CTRL_STARTED_ONCE bit is set, we know that the
> initialization has been completed and that admin_q and fabrics_q will
> exist until the controller will be torn down.

Yes, that is all fine but I think it would be good to be able to use
udev here as we have it other timeouts.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-23 12:46       ` Daniel Wagner
@ 2026-04-23 12:58         ` Maurizio Lombardi
  2026-04-23 17:04           ` Daniel Wagner
  2026-04-23 13:45         ` Hannes Reinecke
  1 sibling, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-23 12:58 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Thu Apr 23, 2026 at 2:46 PM CEST, Daniel Wagner wrote:
> On Wed, Apr 22, 2026 at 01:08:44PM +0200, Maurizio Lombardi wrote:
>> On Wed Apr 22, 2026 at 12:10 PM CEST, Daniel Wagner wrote:
>> > On Fri, Apr 10, 2026 at 09:39:18AM +0200, Maurizio Lombardi wrote:
>> >> +	/*
>> >> +	 * Wait until the controller reaches the LIVE state
>> >> +	 * to be sure that admin_q and fabrics_q are
>> >> +	 * properly initialized.
>> >> +	 */
>> >> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
>> >> +		return -EBUSY;
>> >
>> > I assume the idea is to use an udev rule to set the value. Though I
>> > think this is racy:
>> >
>> > 	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
>> > 	set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
>> 
>> 
>> Not really, the idea was about to prevent the user from setting the
>> timeouts too early during the controller's initialization.
>
> Yes, but then how do you know when is too early? Do you read/poll the
> state of the controller? 
>
>> Otherwise the user could try to change the timeout while
>> admin_q and fabrics_q are still being created.
>> 
>> if the NVME_CTRL_STARTED_ONCE bit is set, we know that the
>> initialization has been completed and that admin_q and fabrics_q will
>> exist until the controller will be torn down.
>
> Yes, that is all fine but I think it would be good to be able to use
> udev here as we have it other timeouts.

Ok I understand, what do you think if we swap the order in
nvme_start_ctrl() so the flag is set before userspace is notified?

set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
nvme_change_uevent(ctrl, "NVME_EVENT=connected");

Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-23 12:46       ` Daniel Wagner
  2026-04-23 12:58         ` Maurizio Lombardi
@ 2026-04-23 13:45         ` Hannes Reinecke
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2026-04-23 13:45 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On 4/23/26 14:46, Daniel Wagner wrote:
> On Wed, Apr 22, 2026 at 01:08:44PM +0200, Maurizio Lombardi wrote:
>> On Wed Apr 22, 2026 at 12:10 PM CEST, Daniel Wagner wrote:
>>> On Fri, Apr 10, 2026 at 09:39:18AM +0200, Maurizio Lombardi wrote:
>>>> +	/*
>>>> +	 * Wait until the controller reaches the LIVE state
>>>> +	 * to be sure that admin_q and fabrics_q are
>>>> +	 * properly initialized.
>>>> +	 */
>>>> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
>>>> +		return -EBUSY;
>>>
>>> I assume the idea is to use an udev rule to set the value. Though I
>>> think this is racy:
>>>
>>> 	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
>>> 	set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
>>
>>
>> Not really, the idea was about to prevent the user from setting the
>> timeouts too early during the controller's initialization.
> 
> Yes, but then how do you know when is too early? Do you read/poll the
> state of the controller?
> 
Well, a timeout less than KATO is questionable, as we cannot detect any
timeouts reliably until then. So having a KATO of 20 seconds with a
command timeout of 10 seconds will be pointless.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-23 12:58         ` Maurizio Lombardi
@ 2026-04-23 17:04           ` Daniel Wagner
  2026-04-23 18:35             ` Maurizio Lombardi
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Wagner @ 2026-04-23 17:04 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Thu, Apr 23, 2026 at 02:58:25PM +0200, Maurizio Lombardi wrote:
> >> if the NVME_CTRL_STARTED_ONCE bit is set, we know that the
> >> initialization has been completed and that admin_q and fabrics_q will
> >> exist until the controller will be torn down.
> >
> > Yes, that is all fine but I think it would be good to be able to use
> > udev here as we have it other timeouts.
> 
> Ok I understand, what do you think if we swap the order in
> nvme_start_ctrl() so the flag is set before userspace is notified?
> 
> set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
> nvme_change_uevent(ctrl, "NVME_EVENT=connected");

I doesn't look like there is anything else depending on this order. So
yes, that would make this race go away.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-23 17:04           ` Daniel Wagner
@ 2026-04-23 18:35             ` Maurizio Lombardi
  0 siblings, 0 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-23 18:35 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Thu Apr 23, 2026 at 7:04 PM CEST, Daniel Wagner wrote:
> On Thu, Apr 23, 2026 at 02:58:25PM +0200, Maurizio Lombardi wrote:
>> >> if the NVME_CTRL_STARTED_ONCE bit is set, we know that the
>> >> initialization has been completed and that admin_q and fabrics_q will
>> >> exist until the controller will be torn down.
>> >
>> > Yes, that is all fine but I think it would be good to be able to use
>> > udev here as we have it other timeouts.
>> 
>> Ok I understand, what do you think if we swap the order in
>> nvme_start_ctrl() so the flag is set before userspace is notified?
>> 
>> set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
>> nvme_change_uevent(ctrl, "NVME_EVENT=connected");
>
> I doesn't look like there is anything else depending on this order. So
> yes, that would make this race go away.


Noted, I will fix it in the next version.

Thanks,
Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
  2026-04-22  9:58   ` Daniel Wagner
@ 2026-04-24 13:38   ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2026-04-24 13:38 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Looks good modulo the missing signoff:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
@ 2026-04-24 13:40   ` Christoph Hellwig
  2026-04-27  9:11   ` Daniel Wagner
  2026-04-28 17:23   ` Mohamed Khalfella
  2 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2026-04-24 13:40 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

nit: the second nvme in the subject is redundant.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
@ 2026-04-24 13:40   ` Christoph Hellwig
  2026-04-27  9:54   ` Daniel Wagner
  1 sibling, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2026-04-24 13:40 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
@ 2026-04-27  9:05   ` Daniel Wagner
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:05 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:19AM +0200, Maurizio Lombardi wrote:
> While tearing down its queues, nvme-pci uses NVME_ADMIN_TIMEOUT as its
> timeout target. Instead, use the configured admin queue's timeout value
> to match the device's existing timeout setting.
> 
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  drivers/nvme/host/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9aa19255b041..379c75c6d35a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3091,7 +3091,7 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
>  	unsigned long timeout;
>  
>   retry:
> -	timeout = NVME_ADMIN_TIMEOUT;
> +	timeout = dev->ctrl.admin_timeout;

Looks good to me. I went through the fabrics code to double check if
there is something similiar but it seems we are fine there as well.

Reviewed-by: Daniel Wagner <dwagner@suse.de>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
  2026-04-24 13:40   ` Christoph Hellwig
@ 2026-04-27  9:11   ` Daniel Wagner
  2026-04-27  9:13     ` Maurizio Lombardi
  2026-04-28 17:23   ` Mohamed Khalfella
  2 siblings, 1 reply; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:11 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:20AM +0200, Maurizio Lombardi wrote:
> Currently, there is no method to adjust the timeout values on a
> per controller basis with nvme I/O queues.
> Add an io_timeout attribute to nvme so that different nvme controllers
> which may have different timeout requirements can have custom
> I/O timeouts set.
> 
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  drivers/nvme/host/core.c  |  2 ++
>  drivers/nvme/host/nvme.h  |  1 +
>  drivers/nvme/host/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 6dc3d273623c..1b676fd4d454 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4197,6 +4197,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
>  		mutex_unlock(&ctrl->namespaces_lock);
>  		goto out_unlink_ns;
>  	}
> +	blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
>  	nvme_ns_add_to_ctrl_list(ns);
>  	mutex_unlock(&ctrl->namespaces_lock);
>  	synchronize_srcu(&ctrl->srcu);
> @@ -5136,6 +5137,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>  	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>  	ctrl->ka_last_check_time = jiffies;
>  	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
> +	ctrl->io_timeout = NVME_IO_TIMEOUT;

Should this be initialized with the defaults from the module load time?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-27  9:11   ` Daniel Wagner
@ 2026-04-27  9:13     ` Maurizio Lombardi
  2026-04-27  9:19       ` Daniel Wagner
  0 siblings, 1 reply; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-27  9:13 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Mon Apr 27, 2026 at 11:11 AM CEST, Daniel Wagner wrote:
> On Fri, Apr 10, 2026 at 09:39:20AM +0200, Maurizio Lombardi wrote:
>> Currently, there is no method to adjust the timeout values on a
>> per controller basis with nvme I/O queues.
>> Add an io_timeout attribute to nvme so that different nvme controllers
>> which may have different timeout requirements can have custom
>> I/O timeouts set.
>> 
>> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
>> ---
>>  drivers/nvme/host/core.c  |  2 ++
>>  drivers/nvme/host/nvme.h  |  1 +
>>  drivers/nvme/host/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 50 insertions(+)
>> 
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 6dc3d273623c..1b676fd4d454 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4197,6 +4197,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
>>  		mutex_unlock(&ctrl->namespaces_lock);
>>  		goto out_unlink_ns;
>>  	}
>> +	blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
>>  	nvme_ns_add_to_ctrl_list(ns);
>>  	mutex_unlock(&ctrl->namespaces_lock);
>>  	synchronize_srcu(&ctrl->srcu);
>> @@ -5136,6 +5137,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>>  	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>>  	ctrl->ka_last_check_time = jiffies;
>>  	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
>> +	ctrl->io_timeout = NVME_IO_TIMEOUT;
>
> Should this be initialized with the defaults from the module load time?


Yes, note that NVME_IO_TIMEOUT is a macro that depends on the
"nvme_io_timeout" module parameter set at load time.

Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-27  9:13     ` Maurizio Lombardi
@ 2026-04-27  9:19       ` Daniel Wagner
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:19 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Mon, Apr 27, 2026 at 11:13:43AM +0200, Maurizio Lombardi wrote:
> >> @@ -5136,6 +5137,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> >>  	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
> >>  	ctrl->ka_last_check_time = jiffies;
> >>  	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
> >> +	ctrl->io_timeout = NVME_IO_TIMEOUT;
> >
> > Should this be initialized with the defaults from the module load time?
> 
> 
> Yes, note that NVME_IO_TIMEOUT is a macro that depends on the
> "nvme_io_timeout" module parameter set at load time.

Ah yes, didn't read the definition of NVME_IO_TIMEOUT. Sorry for the
noise.

The rest looks good to me.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default
  2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
@ 2026-04-27  9:22   ` Daniel Wagner
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:22 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:21AM +0200, Maurizio Lombardi wrote:
> @@ -5253,7 +5254,7 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
>  	list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
>  				 srcu_read_lock_held(&ctrl->srcu)) {
>  		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
> -		if (timeout <= 0)
> +		if (!timeout)
>  			break;

I'd suggest to drop this change, as it doesn't really belong to the
change itself.

Rest looks good.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
  2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
@ 2026-04-27  9:35   ` Daniel Wagner
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:35 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:22AM +0200, Maurizio Lombardi wrote:
> Currently, the final reference for the fabrics admin queue (fabrics_q)
> is dropped inside nvme_remove_admin_tag_set(). However, the primary
> admin queue (admin_q) defers dropping its final reference until
> nvme_free_ctrl().
> 
> Move the blk_put_queue() call for fabrics_q from nvme_remove_admin_tag_set()
> to nvme_free_ctrl(). This aligns the lifecycle management of both admin
> queues, ensuring they are freed symmetrically when the controller is finally
> torn down.

Looks good to me.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
@ 2026-04-27  9:46   ` Daniel Wagner
  2026-04-27  9:47     ` Maurizio Lombardi
  2026-04-28 11:17     ` Maurizio Lombardi
  0 siblings, 2 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:46 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:23AM +0200, Maurizio Lombardi wrote:
> @@ -375,12 +376,15 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>  	}
>  	ctrl->ctrl.queue_count = 1;
>  
> -	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
> -			&nvme_loop_admin_mq_ops,
> -			sizeof(struct nvme_loop_iod) +
> -			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
> -	if (error)
> -		goto out_free_sq;
> +	if (new) {
> +		error = nvme_alloc_admin_tag_set(&ctrl->ctrl,
> +				&ctrl->admin_tag_set,
> +				&nvme_loop_admin_mq_ops,
> +				sizeof(struct nvme_loop_iod) +
> +				NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
> +		if (error)
> +			goto out_free_sq;
> +	}
>  
>  	/* reset stopped state for the fresh admin queue */
>  	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
> @@ -415,7 +419,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>  	return error;
>  }

Personally, I am just not a big fan of conditional creation/removal of
resources controlled via a bool. If it's the cleanest solution fine with
me, but I wonder if here it wouldn't be cleaner to allocated the tagset
outside of nvme_loop_configure_admin_queue and obviously the same idea
for destroy.

Have you tried this approach, does it look too ugly?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-27  9:46   ` Daniel Wagner
@ 2026-04-27  9:47     ` Maurizio Lombardi
  2026-04-28 11:17     ` Maurizio Lombardi
  1 sibling, 0 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-27  9:47 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Mon Apr 27, 2026 at 11:46 AM CEST, Daniel Wagner wrote:
> On Fri, Apr 10, 2026 at 09:39:23AM +0200, Maurizio Lombardi wrote:
>> @@ -375,12 +376,15 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>>  	}
>>  	ctrl->ctrl.queue_count = 1;
>>  
>> -	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
>> -			&nvme_loop_admin_mq_ops,
>> -			sizeof(struct nvme_loop_iod) +
>> -			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
>> -	if (error)
>> -		goto out_free_sq;
>> +	if (new) {
>> +		error = nvme_alloc_admin_tag_set(&ctrl->ctrl,
>> +				&ctrl->admin_tag_set,
>> +				&nvme_loop_admin_mq_ops,
>> +				sizeof(struct nvme_loop_iod) +
>> +				NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
>> +		if (error)
>> +			goto out_free_sq;
>> +	}
>>  
>>  	/* reset stopped state for the fresh admin queue */
>>  	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
>> @@ -415,7 +419,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>>  	return error;
>>  }
>
> Personally, I am just not a big fan of conditional creation/removal of
> resources controlled via a bool. If it's the cleanest solution fine with
> me, but I wonder if here it wouldn't be cleaner to allocated the tagset
> outside of nvme_loop_configure_admin_queue and obviously the same idea
> for destroy.
>
> Have you tried this approach, does it look too ugly?

I have not tried, but I will look for an alternative approach before
submitting the next version and I will update you.

Thanks,

Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
  2026-04-24 13:40   ` Christoph Hellwig
@ 2026-04-27  9:54   ` Daniel Wagner
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-27  9:54 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Fri, Apr 10, 2026 at 09:39:24AM +0200, Maurizio Lombardi wrote:
> Currently, nvme_alloc_admin_tag_set() silently drops and releases
> the existing admin_q if it called on a controller that already
> had one (e.g., during a controller reset).
> 
> However, transport drivers should not be reallocating the admin tag
> set and queue during a reset. Dropping the old queue and allocating
> a new one destroys user-configured timeouts and may race against
> nvme_admin_timeout_store()
> 
> Since all transport drivers are now expected to preserve the admin queue
> across resets, calling nvme_alloc_admin_tag_set() when ctrl->admin_q
> is already populated is a bug.
> 
> Remove the silent cleanup and replace it with a WARN_ON_ONCE() to
> explicitly catch any transport drivers that violate this lifecycle rule

Looks good to me.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-27  9:46   ` Daniel Wagner
  2026-04-27  9:47     ` Maurizio Lombardi
@ 2026-04-28 11:17     ` Maurizio Lombardi
  2026-04-28 12:46       ` Daniel Wagner
  2026-05-07 16:23       ` Daniel Wagner
  1 sibling, 2 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-28 11:17 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mlombard,
	mkhalfella, chaitanyak, hare, hch

On Mon Apr 27, 2026 at 11:46 AM CEST, Daniel Wagner wrote:
> On Fri, Apr 10, 2026 at 09:39:23AM +0200, Maurizio Lombardi wrote:
>> @@ -375,12 +376,15 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>>  	}
>>  	ctrl->ctrl.queue_count = 1;
>>  
>> -	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
>> -			&nvme_loop_admin_mq_ops,
>> -			sizeof(struct nvme_loop_iod) +
>> -			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
>> -	if (error)
>> -		goto out_free_sq;
>> +	if (new) {
>> +		error = nvme_alloc_admin_tag_set(&ctrl->ctrl,
>> +				&ctrl->admin_tag_set,
>> +				&nvme_loop_admin_mq_ops,
>> +				sizeof(struct nvme_loop_iod) +
>> +				NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
>> +		if (error)
>> +			goto out_free_sq;
>> +	}
>>  
>>  	/* reset stopped state for the fresh admin queue */
>>  	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
>> @@ -415,7 +419,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>>  	return error;
>>  }
>
> Personally, I am just not a big fan of conditional creation/removal of
> resources controlled via a bool. If it's the cleanest solution fine with
> me, but I wonder if here it wouldn't be cleaner to allocated the tagset
> outside of nvme_loop_configure_admin_queue and obviously the same idea
> for destroy.
>
> Have you tried this approach, does it look too ugly?


It would be something like this (just a compile tested draft,
I still need to properly test it, but it should give you an idea):


nvmet-loop: do not alloc admin tag set during reset

Currently, resetting a loopback controller unconditionally invokes
nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
Doing so drops the old queue and allocates a new one. Consequently,
this reverts the admin queue's timeout (q->rq_timeout) back to the
module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
timeout values the user may have configured via sysfs and potentially
racing against the sysfs nvme_admin_timeout_store() function
that may dereference the admin_q pointer during the RESETTING state.

Decouple the admin tag set lifecycle from the admin queue
configuration and destruction paths, which are executed during resets;
Specifically:

* Move nvme_alloc_admin_tag_set() into nvme_loop_create_ctrl() so it
  is only allocated once during the initial controller creation.

* Defer the destruction of the admin tag set to
  nvme_loop_delete_ctrl_host(), the terminal error-handling
  paths of nvme_loop_reset_ctrl_work() and
  nvme_loop_create_ctrl().

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/target/loop.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index d98d0cdc5d6f..070d16068e6b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -274,7 +274,6 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)

 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
 	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
-	nvme_remove_admin_tag_set(&ctrl->ctrl);
 }

 static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
@@ -375,25 +374,18 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	}
 	ctrl->ctrl.queue_count = 1;

-	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
-			&nvme_loop_admin_mq_ops,
-			sizeof(struct nvme_loop_iod) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
-	if (error)
-		goto out_free_sq;
-
 	/* reset stopped state for the fresh admin queue */
 	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);

 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_tagset;
+		goto out_free_sq;

 	set_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);

 	error = nvme_enable_ctrl(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_tagset;
+		goto out_free_sq;

 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << PAGE_SECTORS_SHIFT;
@@ -402,14 +394,12 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)

 	error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (error)
-		goto out_cleanup_tagset;
+		goto out_free_sq;

 	return 0;

-out_cleanup_tagset:
-	clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
-	nvme_remove_admin_tag_set(&ctrl->ctrl);
 out_free_sq:
+	clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
 	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
 	return error;
@@ -432,6 +422,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 static void nvme_loop_delete_ctrl_host(struct nvme_ctrl *ctrl)
 {
 	nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl));
+	nvme_remove_admin_tag_set(ctrl);
 }

 static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
@@ -494,6 +485,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	nvme_loop_destroy_admin_queue(ctrl);
 out_disable:
+	nvme_remove_admin_tag_set(&ctrl->ctrl);
 	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
 	nvme_uninit_ctrl(&ctrl->ctrl);
 }
@@ -594,10 +586,17 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;

-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
+			&nvme_loop_admin_mq_ops,
+			sizeof(struct nvme_loop_iod) +
+			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
 	if (ret)
 		goto out_free_queues;

+	ret = nvme_loop_configure_admin_queue(ctrl);
+	if (ret)
+		goto out_remove_admin_tagset;
+
 	if (opts->queue_size > ctrl->ctrl.maxcmd) {
 		/* warn if maxcmd is lower than queue_size */
 		dev_warn(ctrl->ctrl.device,
@@ -633,6 +632,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	nvme_loop_destroy_admin_queue(ctrl);
+out_remove_admin_tagset:
+	nvme_remove_admin_tag_set(&ctrl->ctrl);
 out_free_queues:
 	kfree(ctrl->queues);
 out_uninit_ctrl:
--
2.53.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-28 11:17     ` Maurizio Lombardi
@ 2026-04-28 12:46       ` Daniel Wagner
  2026-05-07 16:23       ` Daniel Wagner
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Wagner @ 2026-04-28 12:46 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mheyne, emilne, jmeneghi, linux-nvme,
	mkhalfella, chaitanyak, hare, hch

On Tue, Apr 28, 2026 at 01:17:17PM +0200, Maurizio Lombardi wrote:
> It would be something like this (just a compile tested draft,
> I still need to properly test it, but it should give you an idea):

This looks way better to me. Thanks!


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
  2026-04-24 13:40   ` Christoph Hellwig
  2026-04-27  9:11   ` Daniel Wagner
@ 2026-04-28 17:23   ` Mohamed Khalfella
  2026-04-29 12:06     ` Maurizio Lombardi
  2 siblings, 1 reply; 38+ messages in thread
From: Mohamed Khalfella @ 2026-04-28 17:23 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	chaitanyak, hare, hch

On Fri 2026-04-10 09:39:20 +0200, Maurizio Lombardi wrote:
> Currently, there is no method to adjust the timeout values on a
> per controller basis with nvme I/O queues.
> Add an io_timeout attribute to nvme so that different nvme controllers
> which may have different timeout requirements can have custom
> I/O timeouts set.
> 
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>

> diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
> index 000c29bd1f1f..1e6a0eecb30e 100644
> --- a/drivers/nvme/host/sysfs.c
> +++ b/drivers/nvme/host/sysfs.c
> @@ -664,6 +664,52 @@ static ssize_t nvme_admin_timeout_store(struct device *dev,
>  static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
>  	nvme_admin_timeout_show, nvme_admin_timeout_store);
>  
> +static ssize_t nvme_io_timeout_show(struct device *dev,
> +			struct device_attribute *attr, char *buf)
> +{
> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> +
> +	return sysfs_emit(buf, "%u\n", jiffies_to_msecs(ctrl->io_timeout));
> +}
> +
> +static ssize_t nvme_io_timeout_store(struct device *dev,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count)
> +{
> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> +	struct nvme_ns *ns;
> +	u32 timeout;
> +	int err;
> +
> +	/*
> +	 * Wait until the controller reaches the LIVE state
> +	 * to be sure that connect_q is properly initialized.
> +	 */
> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
> +		return -EBUSY;
> +
> +	err = kstrtou32(buf, 10, &timeout);
> +	if (err || !timeout)
> +		return -EINVAL;
> +
> +	/* Take the namespaces_lock to avoid racing against nvme_alloc_ns() */
> +	mutex_lock(&ctrl->namespaces_lock);
> +
> +	ctrl->io_timeout = msecs_to_jiffies(timeout);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
> +
> +	if (ctrl->connect_q)
> +		blk_queue_rq_timeout(ctrl->connect_q, ctrl->io_timeout);
> +
> +	mutex_unlock(&ctrl->namespaces_lock);

ctrl->namespaces_lock is not needed to set timeout for ctrl->connect_q.
Maybe move mutex_unlock() up just after iterating on namespaces?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-28 17:23   ` Mohamed Khalfella
@ 2026-04-29 12:06     ` Maurizio Lombardi
  0 siblings, 0 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-04-29 12:06 UTC (permalink / raw)
  To: Mohamed Khalfella, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	chaitanyak, hare, hch

On Tue Apr 28, 2026 at 7:23 PM CEST, Mohamed Khalfella wrote:
> On Fri 2026-04-10 09:39:20 +0200, Maurizio Lombardi wrote:
>> Currently, there is no method to adjust the timeout values on a
>> per controller basis with nvme I/O queues.
>> Add an io_timeout attribute to nvme so that different nvme controllers
>> which may have different timeout requirements can have custom
>> I/O timeouts set.
>> 
>> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
>
>> diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
>> index 000c29bd1f1f..1e6a0eecb30e 100644
>> --- a/drivers/nvme/host/sysfs.c
>> +++ b/drivers/nvme/host/sysfs.c
>> @@ -664,6 +664,52 @@ static ssize_t nvme_admin_timeout_store(struct device *dev,
>>  static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
>>  	nvme_admin_timeout_show, nvme_admin_timeout_store);
>>  
>> +static ssize_t nvme_io_timeout_show(struct device *dev,
>> +			struct device_attribute *attr, char *buf)
>> +{
>> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
>> +
>> +	return sysfs_emit(buf, "%u\n", jiffies_to_msecs(ctrl->io_timeout));
>> +}
>> +
>> +static ssize_t nvme_io_timeout_store(struct device *dev,
>> +			struct device_attribute *attr,
>> +			const char *buf, size_t count)
>> +{
>> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
>> +	struct nvme_ns *ns;
>> +	u32 timeout;
>> +	int err;
>> +
>> +	/*
>> +	 * Wait until the controller reaches the LIVE state
>> +	 * to be sure that connect_q is properly initialized.
>> +	 */
>> +	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
>> +		return -EBUSY;
>> +
>> +	err = kstrtou32(buf, 10, &timeout);
>> +	if (err || !timeout)
>> +		return -EINVAL;
>> +
>> +	/* Take the namespaces_lock to avoid racing against nvme_alloc_ns() */
>> +	mutex_lock(&ctrl->namespaces_lock);
>> +
>> +	ctrl->io_timeout = msecs_to_jiffies(timeout);
>> +	list_for_each_entry(ns, &ctrl->namespaces, list)
>> +		blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
>> +
>> +	if (ctrl->connect_q)
>> +		blk_queue_rq_timeout(ctrl->connect_q, ctrl->io_timeout);
>> +
>> +	mutex_unlock(&ctrl->namespaces_lock);
>
> ctrl->namespaces_lock is not needed to set timeout for ctrl->connect_q.
> Maybe move mutex_unlock() up just after iterating on namespaces?

Yes, it can be unlocked just after scanning the namespaces' list.

Thanks,
Maurizio



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-28 11:17     ` Maurizio Lombardi
  2026-04-28 12:46       ` Daniel Wagner
@ 2026-05-07 16:23       ` Daniel Wagner
  2026-05-08  6:14         ` Maurizio Lombardi
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Wagner @ 2026-05-07 16:23 UTC (permalink / raw)
  To: Maurizio Lombardi, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mkhalfella,
	chaitanyak, hare, hch

Hi Maurizio,

On 4/28/26 1:17 PM, Maurizio Lombardi wrote:
> It would be something like this (just a compile tested draft,
> I still need to properly test it, but it should give you an idea):
> 
> 
> nvmet-loop: do not alloc admin tag set during reset
> 
> Currently, resetting a loopback controller unconditionally invokes
> nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
> Doing so drops the old queue and allocates a new one. Consequently,
> this reverts the admin queue's timeout (q->rq_timeout) back to the
> module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
> timeout values the user may have configured via sysfs and potentially
> racing against the sysfs nvme_admin_timeout_store() function
> that may dereference the admin_q pointer during the RESETTING state.
> 
> Decouple the admin tag set lifecycle from the admin queue
> configuration and destruction paths, which are executed during resets;
> Specifically:
> 
> * Move nvme_alloc_admin_tag_set() into nvme_loop_create_ctrl() so it
>    is only allocated once during the initial controller creation.
> 
> * Defer the destruction of the admin tag set to
>    nvme_loop_delete_ctrl_host(), the terminal error-handling
>    paths of nvme_loop_reset_ctrl_work() and
>    nvme_loop_create_ctrl().

This looks way better IMO. So yes please, add use this version.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-05-07 16:23       ` Daniel Wagner
@ 2026-05-08  6:14         ` Maurizio Lombardi
  0 siblings, 0 replies; 38+ messages in thread
From: Maurizio Lombardi @ 2026-05-08  6:14 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi, Maurizio Lombardi
  Cc: kbusch, mheyne, emilne, jmeneghi, linux-nvme, mkhalfella,
	chaitanyak, hare, hch

On Thu May 7, 2026 at 6:23 PM CEST, Daniel Wagner wrote:
> Hi Maurizio,
>
> On 4/28/26 1:17 PM, Maurizio Lombardi wrote:
>> It would be something like this (just a compile tested draft,
>> I still need to properly test it, but it should give you an idea):
>> 
>> 
>> nvmet-loop: do not alloc admin tag set during reset
>> 
>> Currently, resetting a loopback controller unconditionally invokes
>> nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
>> Doing so drops the old queue and allocates a new one. Consequently,
>> this reverts the admin queue's timeout (q->rq_timeout) back to the
>> module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
>> timeout values the user may have configured via sysfs and potentially
>> racing against the sysfs nvme_admin_timeout_store() function
>> that may dereference the admin_q pointer during the RESETTING state.
>> 
>> Decouple the admin tag set lifecycle from the admin queue
>> configuration and destruction paths, which are executed during resets;
>> Specifically:
>> 
>> * Move nvme_alloc_admin_tag_set() into nvme_loop_create_ctrl() so it
>>    is only allocated once during the initial controller creation.
>> 
>> * Defer the destruction of the admin tag set to
>>    nvme_loop_delete_ctrl_host(), the terminal error-handling
>>    paths of nvme_loop_reset_ctrl_work() and
>>    nvme_loop_create_ctrl().
>
> This looks way better IMO. So yes please, add use this version.

Sure, I will post a new version very soon.

Maurizio


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2026-05-08  6:14 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
2026-04-22  9:58   ` Daniel Wagner
2026-04-24 13:38   ` Christoph Hellwig
2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
2026-04-22 10:10   ` Daniel Wagner
2026-04-22 11:08     ` Maurizio Lombardi
2026-04-23 12:46       ` Daniel Wagner
2026-04-23 12:58         ` Maurizio Lombardi
2026-04-23 17:04           ` Daniel Wagner
2026-04-23 18:35             ` Maurizio Lombardi
2026-04-23 13:45         ` Hannes Reinecke
2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
2026-04-27  9:05   ` Daniel Wagner
2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
2026-04-24 13:40   ` Christoph Hellwig
2026-04-27  9:11   ` Daniel Wagner
2026-04-27  9:13     ` Maurizio Lombardi
2026-04-27  9:19       ` Daniel Wagner
2026-04-28 17:23   ` Mohamed Khalfella
2026-04-29 12:06     ` Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
2026-04-27  9:22   ` Daniel Wagner
2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
2026-04-27  9:35   ` Daniel Wagner
2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
2026-04-27  9:46   ` Daniel Wagner
2026-04-27  9:47     ` Maurizio Lombardi
2026-04-28 11:17     ` Maurizio Lombardi
2026-04-28 12:46       ` Daniel Wagner
2026-05-07 16:23       ` Daniel Wagner
2026-05-08  6:14         ` Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
2026-04-24 13:40   ` Christoph Hellwig
2026-04-27  9:54   ` Daniel Wagner
2026-04-13  8:12 ` [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Hannes Reinecke
2026-04-13  9:21   ` Maurizio Lombardi
2026-04-14 19:14     ` John Meneghini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox