[PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration
@ 2026-04-10  7:39 Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

This patchset tries to address some limitations in how the NVMe driver handles
command timeouts.
Currently, the driver relies heavily on global module parameters
(NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT), making it difficult for users to
tune timeouts for specific controllers that may have very different
characteristics. Also, in some cases, manual changes to sysfs timeout values
are ignored by the driver logic.

For example this patchset removes the unconditional timeout assignment in
nvme_init_request. This allows the block layer to correctly apply the request
queue's timeout settings, ensuring that user-initiated changes via sysfs
are actually respected for all requests.

It introduces new sysfs attributes (admin_timeout and io_timeout) to the NVMe
controller. This allows users to configure distinct timeout requirements for
different controllers rather than relying on global module parameters.

Some examples:

Changes to the controller's io_timeout gets propagated to all
the associated namespaces' queues:

# find /sys -name 'io_timeout' 
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout

# echo 27000 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
27000

When adding a namespace target-side, the io_timeout is inherited from
the controller's preferred timeout:

* target side *
# nvmetcli
/> cd subsystems/test-nqn/namespaces/4 
/subsystems/t.../namespaces/4> enable
The Namespace has been enabled.

************

* Host-side *
nvme nvme0: rescanning namespaces.
# find /sys -name 'io_timeout' 
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout <-- new namespace
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout

# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout
27000
***********

io_timeout and admin_timeout module parameters are used as default
values for new controllers:

# nvme connect -t tcp -a 10.37.153.138 -s 8000 -n test-nqn2
connecting to device: nvme1

# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/nvme1c1n1/queue/io_timeout
30000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/admin_timeout
60000

V3: - Rebase on top of nvme 7.1 branch
    - add an admin_timeout variable to nvme_ctrl structure
    - Wait until the controller has reached the LIVE state
      for the first time before allowing the user to modify
      the timeouts, this prevents the dereferencing of the admin_q,
      fabrics_q and connect_q queues before their initialization.
    - move blk_put_queue(fabrics_q) to nvme_free_ctrl() to align it to
      admin_q teardown
    - modify nvmet-loop to avoid deleting and re-creating the admin_q queue
      when the controller enters the resetting state.
    - add a warning if nvme_alloc_admin_tag_set() is called twice against
      the same controller

V2: - Drop the RFC tag
    - apply the timeout settings to fabrics_q and connect_q too
    - Code style fixes
    - remove unnecessary check for null admin_q in __nvme_delete_io_queues()
    - Use DEVICE_ATTR() macro

Heyne, Maximilian (1):
  nvme: Let the blocklayer set timeouts for requests

Maurizio Lombardi (7):
  nvme: add sysfs attribute to change admin timeout per nvme controller
  nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  nvme: add sysfs attribute to change IO timeout per nvme controller
  nvme: use per controller timeout waits over depending on global
    default
  nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
  nvmet-loop: do not alloc admin tag set during reset
  nvme-core: warn on allocating admin tag set with existing queue

 drivers/nvme/host/apple.c  |  2 +-
 drivers/nvme/host/core.c   | 23 +++++-----
 drivers/nvme/host/nvme.h   |  4 +-
 drivers/nvme/host/pci.c    |  4 +-
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/host/sysfs.c  | 89 ++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp.c    |  2 +-
 drivers/nvme/target/loop.c | 38 ++++++++--------
 8 files changed, 128 insertions(+), 36 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

From: "Heyne, Maximilian" <mheyne@amazon.de>

When initializing an nvme request which is about to be send to the block
layer, we do not need to initialize its timeout. If it's left
uninitialized at 0 the block layer will use the request queue's timeout
in blk_add_timer (via nvme_start_request which is called from
nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
NVME_ADMIN_TIMEOUT when the request queues were created.

Because the io_timeout of the IO queues can be modified via sysfs, the
following situation can occur:

1) NVME_IO_TIMEOUT = 30 (default module parameter)
2) nvme1n1 is probed. IO queues default timeout is 30 s
3) manually change the IO timeout to 90 s
   echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
   commands with the 30 s timeout instead of the wanted 90 s which might
   be more suitable for this device.

Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
changed the behavior for ioctl's already because it unconditionally
overrides the request's timeout that was set in nvme_init_request. If it
was unset by the user of the ioctl if will be overridden with 0 meaning
the block layer will pick the request queue's IO timeout.

Following up on that, this patch further improves the consistency of IO
timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
could be inconsistent with what is set in the device's request_queue by
the user.

Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b42d8768d297..111f9a226952 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
 		struct nvme_ns *ns = req->q->disk->private_data;

 		logging_enabled = ns->head->passthru_err_log_enabled;
-		req->timeout = NVME_IO_TIMEOUT;
 	} else { /* no queuedata implies admin queue */
 		logging_enabled = nr->ctrl->passthru_err_log_enabled;
-		req->timeout = NVME_ADMIN_TIMEOUT;
 	}

 	if (!logging_enabled)
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, there is no method to adjust the timeout values
on a per controller basis with nvme admin queues.
Add an admin_timeout attribute to nvme so that different
nvme controllers which may have different timeout
requirements can have custom admin timeouts set.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c  |  1 +
 drivers/nvme/host/nvme.h  |  1 +
 drivers/nvme/host/sysfs.c | 42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 111f9a226952..6dc3d273623c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5135,6 +5135,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	ctrl->ka_last_check_time = jiffies;
+	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
 
 	BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
 			PAGE_SIZE);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ccd5e05dac98..9da3ebebe9c8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -370,6 +370,7 @@ struct nvme_ctrl {
 	u16 mtfa;
 	u32 ctrl_config;
 	u32 queue_count;
+	u32 admin_timeout;
 
 	u64 cap;
 	u32 max_hw_sectors;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 7bf2e972126b..000c29bd1f1f 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -623,6 +623,47 @@ static ssize_t quirks_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(quirks);
 
+static ssize_t nvme_admin_timeout_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%u\n",
+				jiffies_to_msecs(ctrl->admin_timeout));
+}
+
+static ssize_t nvme_admin_timeout_store(struct device *dev,
+			struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	u32 timeout;
+	int err;
+
+	/*
+	 * Wait until the controller reaches the LIVE state
+	 * to be sure that admin_q and fabrics_q are
+	 * properly initialized.
+	 */
+	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+		return -EBUSY;
+
+	err = kstrtou32(buf, 10, &timeout);
+	if (err || !timeout)
+		return -EINVAL;
+
+	ctrl->admin_timeout = msecs_to_jiffies(timeout);
+
+	blk_queue_rq_timeout(ctrl->admin_q, ctrl->admin_timeout);
+	if (ctrl->fabrics_q)
+		blk_queue_rq_timeout(ctrl->fabrics_q, ctrl->admin_timeout);
+
+	return count;
+}
+
+static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
+	nvme_admin_timeout_show, nvme_admin_timeout_store);
+
 #ifdef CONFIG_NVME_HOST_AUTH
 static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
@@ -765,6 +806,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_cntrltype.attr,
 	&dev_attr_dctype.attr,
 	&dev_attr_quirks.attr,
+	&dev_attr_admin_timeout.attr,
 #ifdef CONFIG_NVME_HOST_AUTH
 	&dev_attr_dhchap_secret.attr,
 	&dev_attr_dhchap_ctrl_secret.attr,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

While tearing down its queues, nvme-pci uses NVME_ADMIN_TIMEOUT as its
timeout target. Instead, use the configured admin queue's timeout value
to match the device's existing timeout setting.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9aa19255b041..379c75c6d35a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3091,7 +3091,7 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
 	unsigned long timeout;
 
  retry:
-	timeout = NVME_ADMIN_TIMEOUT;
+	timeout = dev->ctrl.admin_timeout;
 	while (nr_queues > 0) {
 		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
 			break;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (2 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, there is no method to adjust the timeout values on a
per controller basis with nvme I/O queues.
Add an io_timeout attribute to nvme so that different nvme controllers
which may have different timeout requirements can have custom
I/O timeouts set.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/nvme.h  |  1 +
 drivers/nvme/host/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6dc3d273623c..1b676fd4d454 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4197,6 +4197,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 		mutex_unlock(&ctrl->namespaces_lock);
 		goto out_unlink_ns;
 	}
+	blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
 	nvme_ns_add_to_ctrl_list(ns);
 	mutex_unlock(&ctrl->namespaces_lock);
 	synchronize_srcu(&ctrl->srcu);
@@ -5136,6 +5137,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	ctrl->ka_last_check_time = jiffies;
 	ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->io_timeout = NVME_IO_TIMEOUT;
 
 	BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
 			PAGE_SIZE);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9da3ebebe9c8..a6d998c2e0e5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -371,6 +371,7 @@ struct nvme_ctrl {
 	u32 ctrl_config;
 	u32 queue_count;
 	u32 admin_timeout;
+	u32 io_timeout;
 
 	u64 cap;
 	u32 max_hw_sectors;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 000c29bd1f1f..1e6a0eecb30e 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -664,6 +664,52 @@ static ssize_t nvme_admin_timeout_store(struct device *dev,
 static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
 	nvme_admin_timeout_show, nvme_admin_timeout_store);
 
+static ssize_t nvme_io_timeout_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%u\n", jiffies_to_msecs(ctrl->io_timeout));
+}
+
+static ssize_t nvme_io_timeout_store(struct device *dev,
+			struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	struct nvme_ns *ns;
+	u32 timeout;
+	int err;
+
+	/*
+	 * Wait until the controller reaches the LIVE state
+	 * to be sure that connect_q is properly initialized.
+	 */
+	if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+		return -EBUSY;
+
+	err = kstrtou32(buf, 10, &timeout);
+	if (err || !timeout)
+		return -EINVAL;
+
+	/* Take the namespaces_lock to avoid racing against nvme_alloc_ns() */
+	mutex_lock(&ctrl->namespaces_lock);
+
+	ctrl->io_timeout = msecs_to_jiffies(timeout);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
+
+	if (ctrl->connect_q)
+		blk_queue_rq_timeout(ctrl->connect_q, ctrl->io_timeout);
+
+	mutex_unlock(&ctrl->namespaces_lock);
+
+	return count;
+}
+
+static DEVICE_ATTR(io_timeout, S_IRUGO | S_IWUSR,
+	nvme_io_timeout_show, nvme_io_timeout_store);
+
 #ifdef CONFIG_NVME_HOST_AUTH
 static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
@@ -807,6 +853,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_dctype.attr,
 	&dev_attr_quirks.attr,
 	&dev_attr_admin_timeout.attr,
+	&dev_attr_io_timeout.attr,
 #ifdef CONFIG_NVME_HOST_AUTH
 	&dev_attr_dhchap_secret.attr,
 	&dev_attr_dhchap_ctrl_secret.attr,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (3 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
nvme_wait_freeze_timeout, use the controller's preferred timeout.

Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/apple.c | 2 +-
 drivers/nvme/host/core.c  | 5 +++--
 drivers/nvme/host/nvme.h  | 2 +-
 drivers/nvme/host/pci.c   | 2 +-
 drivers/nvme/host/rdma.c  | 2 +-
 drivers/nvme/host/tcp.c   | 2 +-
 6 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index ed61b97fde59..1958f39484d9 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -858,7 +858,7 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
 	 * doing a safe shutdown.
 	 */
 	if (!dead && shutdown && freeze)
-		nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+		nvme_wait_freeze_timeout(&anv->ctrl);
 
 	nvme_quiesce_io_queues(&anv->ctrl);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1b676fd4d454..f801d5475cd6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5244,8 +5244,9 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
 {
+	unsigned long timeout = ctrl->io_timeout;
 	struct nvme_ns *ns;
 	int srcu_idx;
 
@@ -5253,7 +5254,7 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 	list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
 				 srcu_read_lock_held(&ctrl->srcu)) {
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
-		if (timeout <= 0)
+		if (!timeout)
 			break;
 	}
 	srcu_read_unlock(&ctrl->srcu, srcu_idx);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a6d998c2e0e5..9ccaed0b9dbf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -902,7 +902,7 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 static inline enum req_op nvme_req_op(struct nvme_command *cmd)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 379c75c6d35a..dcc554895429 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3273,7 +3273,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * if doing a safe shutdown.
 		 */
 		if (!dead && shutdown)
-			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+			nvme_wait_freeze_timeout(&dev->ctrl);
 	}
 
 	nvme_quiesce_io_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f77c960f7632..bf73135c1439 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -888,7 +888,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(&ctrl->ctrl);
 		nvme_unquiesce_io_queues(&ctrl->ctrl);
-		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(&ctrl->ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 02c95c32b07e..e921130ec25f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2194,7 +2194,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(ctrl);
 		nvme_unquiesce_io_queues(ctrl);
-		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (4 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, the final reference for the fabrics admin queue (fabrics_q)
is dropped inside nvme_remove_admin_tag_set(). However, the primary
admin queue (admin_q) defers dropping its final reference until
nvme_free_ctrl().

Move the blk_put_queue() call for fabrics_q from nvme_remove_admin_tag_set()
to nvme_free_ctrl(). This aligns the lifecycle management of both admin
queues, ensuring they are freed symmetrically when the controller is finally
torn down.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f801d5475cd6..f3aa7b29d2a2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4927,10 +4927,8 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
 	 */
 	nvme_stop_keep_alive(ctrl);
 	blk_mq_destroy_queue(ctrl->admin_q);
-	if (ctrl->ops->flags & NVME_F_FABRICS) {
+	if (ctrl->ops->flags & NVME_F_FABRICS)
 		blk_mq_destroy_queue(ctrl->fabrics_q);
-		blk_put_queue(ctrl->fabrics_q);
-	}
 	blk_mq_free_tag_set(ctrl->admin_tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_admin_tag_set);
@@ -5072,6 +5070,8 @@ static void nvme_free_ctrl(struct device *dev)
 
 	if (ctrl->admin_q)
 		blk_put_queue(ctrl->admin_q);
+	if (ctrl->fabrics_q)
+		blk_put_queue(ctrl->fabrics_q);
 	if (!subsys || ctrl->instance != subsys->instance)
 		ida_free(&nvme_instance_ida, ctrl->instance);
 	nvme_free_cels(ctrl);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (5 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, resetting a loopback controller unconditionally invokes
nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
Doing so drops the old queue and allocates a new one. Consequently,
this reverts the admin queue's timeout (q->rq_timeout) back to the
module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
timeout values the user may have configured via sysfs and potentially
racing against the sysfs nvme_admin_timeout_store() function
that may dereference the admin_q pointer during the RESETTING state.

Add a 'new' boolean parameter to nvme_loop_configure_admin_queue() to
distinguish between an initial probe and a reset. Only allocate the
admin tag set during the initial controller creation and do not
remove it when resetting the controller.
This ensures the existing queues and their sysfs-configured timeouts
remain intact across controller resets.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/target/loop.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index d98d0cdc5d6f..c9cf0ddc1b85 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -261,7 +261,7 @@ static const struct blk_mq_ops nvme_loop_admin_mq_ops = {
 	.init_hctx	= nvme_loop_init_admin_hctx,
 };
 
-static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl, bool remove)
 {
 	if (!test_and_clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags))
 		return;
@@ -274,7 +274,8 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
 	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
-	nvme_remove_admin_tag_set(&ctrl->ctrl);
+	if (remove)
+		nvme_remove_admin_tag_set(&ctrl->ctrl);
 }
 
 static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
@@ -361,7 +362,7 @@ static int nvme_loop_connect_io_queues(struct nvme_loop_ctrl *ctrl)
 	return 0;
 }
 
-static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
+static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl, bool new)
 {
 	int error;
 
@@ -375,12 +376,15 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	}
 	ctrl->ctrl.queue_count = 1;
 
-	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
-			&nvme_loop_admin_mq_ops,
-			sizeof(struct nvme_loop_iod) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
-	if (error)
-		goto out_free_sq;
+	if (new) {
+		error = nvme_alloc_admin_tag_set(&ctrl->ctrl,
+				&ctrl->admin_tag_set,
+				&nvme_loop_admin_mq_ops,
+				sizeof(struct nvme_loop_iod) +
+				NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
+		if (error)
+			goto out_free_sq;
+	}
 
 	/* reset stopped state for the fresh admin queue */
 	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
@@ -415,7 +419,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	return error;
 }
 
-static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl, bool remove)
 {
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_quiesce_io_queues(&ctrl->ctrl);
@@ -426,12 +430,12 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 	if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_LIVE)
 		nvme_disable_ctrl(&ctrl->ctrl, true);
 
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, remove);
 }
 
 static void nvme_loop_delete_ctrl_host(struct nvme_ctrl *ctrl)
 {
-	nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl));
+	nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl), true);
 }
 
 static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
@@ -453,7 +457,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 	int ret;
 
 	nvme_stop_ctrl(&ctrl->ctrl);
-	nvme_loop_shutdown_ctrl(ctrl);
+	nvme_loop_shutdown_ctrl(ctrl, false);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
@@ -465,7 +469,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 		return;
 	}
 
-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_loop_configure_admin_queue(ctrl, false);
 	if (ret)
 		goto out_disable;
 
@@ -492,7 +496,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 out_destroy_admin:
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, true);
 out_disable:
 	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
 	nvme_uninit_ctrl(&ctrl->ctrl);
@@ -594,7 +598,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;
 
-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_loop_configure_admin_queue(ctrl, true);
 	if (ret)
 		goto out_free_queues;
 
@@ -632,7 +636,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 out_remove_admin_queue:
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	nvme_loop_destroy_admin_queue(ctrl);
+	nvme_loop_destroy_admin_queue(ctrl, true);
 out_free_queues:
 	kfree(ctrl->queues);
 out_uninit_ctrl:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue
  2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
                   ` (6 preceding siblings ...)
  2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
@ 2026-04-10  7:39 ` Maurizio Lombardi
  7 siblings, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-04-10  7:39 UTC (permalink / raw)
  To: kbusch
  Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
	mkhalfella, chaitanyak, hare, hch

Currently, nvme_alloc_admin_tag_set() silently drops and releases
the existing admin_q if it called on a controller that already
had one (e.g., during a controller reset).

However, transport drivers should not be reallocating the admin tag
set and queue during a reset. Dropping the old queue and allocating
a new one destroys user-configured timeouts and may race against
nvme_admin_timeout_store()

Since all transport drivers are now expected to preserve the admin queue
across resets, calling nvme_alloc_admin_tag_set() when ctrl->admin_q
is already populated is a bug.

Remove the silent cleanup and replace it with a WARN_ON_ONCE() to
explicitly catch any transport drivers that violate this lifecycle rule

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f3aa7b29d2a2..ae084373a79b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4884,12 +4884,7 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 	if (ret)
 		return ret;
 
-	/*
-	 * If a previous admin queue exists (e.g., from before a reset),
-	 * put it now before allocating a new one to avoid orphaning it.
-	 */
-	if (ctrl->admin_q)
-		blk_put_queue(ctrl->admin_q);
+	WARN_ON_ONCE(ctrl->admin_q);
 
 	ctrl->admin_q = blk_mq_alloc_queue(set, &lim, NULL);
 	if (IS_ERR(ctrl->admin_q)) {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-10  7:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10  7:39 [PATCH V3 0/8] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 1/8] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 2/8] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 3/8] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 4/8] nvme: add sysfs attribute to change IO timeout per nvme controller Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 5/8] nvme: use per controller timeout waits over depending on global default Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 6/8] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 7/8] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
2026-04-10  7:39 ` [PATCH V3 8/8] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox