* [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration
@ 2026-05-14 8:32 Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 1/7] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
This patchset tries to address some limitations in how the NVMe driver handles
command timeouts.
Currently, the driver relies heavily on global module parameters
(NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT), making it difficult for users to
tune timeouts for specific controllers that may have very different
characteristics. Also, in some cases, manual changes to sysfs timeout values
are ignored by the driver logic.
For example this patchset removes the unconditional timeout assignment in
nvme_init_request. This allows the block layer to correctly apply the request
queue's timeout settings, ensuring that user-initiated changes via sysfs
are actually respected for all requests.
It introduces new sysfs attributes (admin_timeout and io_timeout) to the NVMe
controller. This allows users to configure distinct timeout requirements for
different controllers rather than relying on global module parameters.
Some examples:
Changes to the controller's io_timeout gets propagated to all
the associated namespaces' queues:
# find /sys -name 'io_timeout'
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
# echo 27000 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
27000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
27000
When adding a namespace target-side, the io_timeout is inherited from
the controller's preferred timeout:
* target side *
# nvmetcli
/> cd subsystems/test-nqn/namespaces/4
/subsystems/t.../namespaces/4> enable
The Namespace has been enabled.
************
* Host-side *
nvme nvme0: rescanning namespaces.
# find /sys -name 'io_timeout'
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n1/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n2/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n3/queue/io_timeout
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout <-- new namespace
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/io_timeout
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0c0n4/queue/io_timeout
27000
***********
io_timeout and admin_timeout module parameters are used as default
values for new controllers:
# nvme connect -t tcp -a 10.37.153.138 -s 8000 -n test-nqn2
connecting to device: nvme1
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/nvme1c1n1/queue/io_timeout
30000
# cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/admin_timeout
60000
V5: - A few cosmethical changes (80-char lines etc.)
- Move the ex patch 6 ("nvme: use per controller timeout waits")
earlier in the serie, change it to just dropping the timeout parameter.
- ex patch 4 ("nvme: pci: use admin queue timeout")
eliminated, its nvme-pci changes are now part of patch 3
- ex patch 3 ("nvme: fix race condition between connected uevent")
has been already merged in linux-nvme tree.
- patch 7: change it to use "if (ctrl->fabrics_q)" instead of
"if (ctrl->ops->flags & NVME_F_FABRICS)" where it makes sense.
- In the commit messages of patch 3 and patch 4, I clarify that changing the timeouts
of fabrics_q and connect_q queues is intended.
V4: - Refactor the nvmet-loop patch as suggested by Daniel Wagner
- Add a patch to fix a potential race condition between
connected uevent and STARTED_ONCE controller's flag.
V3: - Rebase on top of nvme 7.1 branch
- add an admin_timeout variable to nvme_ctrl structure
- Wait until the controller has reached the LIVE state
for the first time before allowing the user to modify
the timeouts, this prevents the dereferencing of the admin_q,
fabrics_q and connect_q queues before their initialization.
- move blk_put_queue(fabrics_q) to nvme_free_ctrl() to align it to
admin_q teardown
- modify nvmet-loop to avoid deleting and re-creating the admin_q queue
when the controller enters the resetting state.
- add a warning if nvme_alloc_admin_tag_set() is called twice against
the same controller
V2: - Drop the RFC tag
- apply the timeout settings to fabrics_q and connect_q too
- Code style fixes
- remove unnecessary check for null admin_q in __nvme_delete_io_queues()
- Use DEVICE_ATTR() macro
Maurizio Lombardi (6):
nvme: remove redundant timeout argument from nvme_wait_freeze_timeout
nvme: add sysfs attribute to change admin timeout per nvme controller
nvme: add sysfs attribute to change IO timeout per controller
nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
nvmet-loop: do not alloc admin tag set during reset
nvme-core: warn on allocating admin tag set with existing queue
Maximilian Heyne (1):
nvme: Let the blocklayer set timeouts for requests
drivers/nvme/host/apple.c | 2 +-
drivers/nvme/host/core.c | 21 ++++-----
drivers/nvme/host/nvme.h | 4 +-
drivers/nvme/host/pci.c | 4 +-
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/sysfs.c | 88 ++++++++++++++++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 2 +-
drivers/nvme/target/loop.c | 31 +++++++-------
8 files changed, 121 insertions(+), 33 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V5 1/7] nvme: Let the blocklayer set timeouts for requests
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 2/7] nvme: remove redundant timeout argument from nvme_wait_freeze_timeout Maurizio Lombardi
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
From: Maximilian Heyne <mheyne@amazon.de>
When initializing an nvme request which is about to be send to the block
layer, we do not need to initialize its timeout. If it's left
uninitialized at 0 the block layer will use the request queue's timeout
in blk_add_timer (via nvme_start_request which is called from
nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
NVME_ADMIN_TIMEOUT when the request queues were created.
Because the io_timeout of the IO queues can be modified via sysfs, the
following situation can occur:
1) NVME_IO_TIMEOUT = 30 (default module parameter)
2) nvme1n1 is probed. IO queues default timeout is 30 s
3) manually change the IO timeout to 90 s
echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
commands with the 30 s timeout instead of the wanted 90 s which might
be more suitable for this device.
Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
changed the behavior for ioctl's already because it unconditionally
overrides the request's timeout that was set in nvme_init_request. If it
was unset by the user of the ioctl if will be overridden with 0 meaning
the block layer will pick the request queue's IO timeout.
Following up on that, this patch further improves the consistency of IO
timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
could be inconsistent with what is set in the device's request_queue by
the user.
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/core.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3032d6ad6b1..6c75253d2866 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -729,10 +729,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
struct nvme_ns *ns = req->q->disk->private_data;
logging_enabled = ns->head->passthru_err_log_enabled;
- req->timeout = NVME_IO_TIMEOUT;
} else { /* no queuedata implies admin queue */
logging_enabled = nr->ctrl->passthru_err_log_enabled;
- req->timeout = NVME_ADMIN_TIMEOUT;
}
if (!logging_enabled)
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 2/7] nvme: remove redundant timeout argument from nvme_wait_freeze_timeout
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 1/7] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 3/7] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
All callers of nvme_wait_freeze_timeout() currently pass the exact same
NVME_IO_TIMEOUT default as their timeout argument.
Remove it and use a local variable.
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/apple.c | 2 +-
drivers/nvme/host/core.c | 3 ++-
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/pci.c | 2 +-
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
6 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 423c9c628e7b..e77c47408102 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -858,7 +858,7 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
* doing a safe shutdown.
*/
if (!dead && shutdown && freeze)
- nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+ nvme_wait_freeze_timeout(&anv->ctrl);
nvme_quiesce_io_queues(&anv->ctrl);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6c75253d2866..84f295e3bf08 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5250,8 +5250,9 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
}
EXPORT_SYMBOL_GPL(nvme_unfreeze);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
{
+ unsigned long timeout = NVME_IO_TIMEOUT;
struct nvme_ns *ns;
int srcu_idx;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ccd5e05dac98..6f9ecb4948f4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -900,7 +900,7 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl);
void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
void nvme_unfreeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze(struct nvme_ctrl *ctrl);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl);
void nvme_start_freeze(struct nvme_ctrl *ctrl);
static inline enum req_op nvme_req_op(struct nvme_command *cmd)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 139a10cd687f..8dc353451b21 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3278,7 +3278,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* if doing a safe shutdown.
*/
if (!dead && shutdown)
- nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+ nvme_wait_freeze_timeout(&dev->ctrl);
}
nvme_quiesce_io_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f77c960f7632..bf73135c1439 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -888,7 +888,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
if (!new) {
nvme_start_freeze(&ctrl->ctrl);
nvme_unquiesce_io_queues(&ctrl->ctrl);
- if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+ if (!nvme_wait_freeze_timeout(&ctrl->ctrl)) {
/*
* If we timed out waiting for freeze we are likely to
* be stuck. Fail the controller initialization just
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 15d36d6a728e..0552aa8a1150 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2208,7 +2208,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
if (!new) {
nvme_start_freeze(ctrl);
nvme_unquiesce_io_queues(ctrl);
- if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
+ if (!nvme_wait_freeze_timeout(ctrl)) {
/*
* If we timed out waiting for freeze we are likely to
* be stuck. Fail the controller initialization just
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 3/7] nvme: add sysfs attribute to change admin timeout per nvme controller
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 1/7] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 2/7] nvme: remove redundant timeout argument from nvme_wait_freeze_timeout Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 4/7] nvme: add sysfs attribute to change IO timeout per controller Maurizio Lombardi
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
Currently, there is no method to adjust the timeout values on a
per-controller basis with nvme admin queues.
Add an admin_timeout attribute to nvme so that different nvme controllers
which may have different timeout requirements can have custom admin
timeouts set.
The admin timeout is also applied to the fabrics queue (fabrics_q).
The fabrics queue is utilized for fabric-specific administrative and
control operations, such as Connect and Property Get/Set commands.
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/pci.c | 2 +-
drivers/nvme/host/sysfs.c | 41 +++++++++++++++++++++++++++++++++++++++
4 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 84f295e3bf08..fe6dcd19cecb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5144,6 +5144,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
ctrl->ka_last_check_time = jiffies;
+ ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
PAGE_SIZE);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6f9ecb4948f4..7923533cce00 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -370,6 +370,7 @@ struct nvme_ctrl {
u16 mtfa;
u32 ctrl_config;
u32 queue_count;
+ u32 admin_timeout;
u64 cap;
u32 max_hw_sectors;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8dc353451b21..80d2e517aac3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3096,7 +3096,7 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
unsigned long timeout;
retry:
- timeout = NVME_ADMIN_TIMEOUT;
+ timeout = dev->ctrl.admin_timeout;
while (nr_queues > 0) {
if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
break;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index e59758616f27..3b39b64cd9da 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -623,6 +623,46 @@ static ssize_t quirks_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(quirks);
+static ssize_t nvme_admin_timeout_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%u\n",
+ jiffies_to_msecs(ctrl->admin_timeout));
+}
+
+static ssize_t nvme_admin_timeout_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+ u32 timeout;
+ int err;
+
+ /*
+ * Wait until the controller reaches the LIVE state to be sure that
+ * admin_q and fabrics_q are properly initialized.
+ */
+ if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+ return -EBUSY;
+
+ err = kstrtou32(buf, 10, &timeout);
+ if (err || !timeout)
+ return -EINVAL;
+
+ ctrl->admin_timeout = msecs_to_jiffies(timeout);
+
+ blk_queue_rq_timeout(ctrl->admin_q, ctrl->admin_timeout);
+ if (ctrl->fabrics_q)
+ blk_queue_rq_timeout(ctrl->fabrics_q, ctrl->admin_timeout);
+
+ return count;
+}
+
+static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
+ nvme_admin_timeout_show, nvme_admin_timeout_store);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -765,6 +805,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_cntrltype.attr,
&dev_attr_dctype.attr,
&dev_attr_quirks.attr,
+ &dev_attr_admin_timeout.attr,
#ifdef CONFIG_NVME_HOST_AUTH
&dev_attr_dhchap_secret.attr,
&dev_attr_dhchap_ctrl_secret.attr,
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 4/7] nvme: add sysfs attribute to change IO timeout per controller
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
` (2 preceding siblings ...)
2026-05-14 8:32 ` [PATCH V5 3/7] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 5/7] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
Currently, there is no method to adjust the timeout values on a
per controller basis with nvme I/O queues.
Add an io_timeout attribute to nvme so that different nvme controllers
which may have different timeout requirements can have custom I/O
timeouts set.
The I/O timeout is also applied to the connect queue (connect_q).
In NVMe over Fabrics, the connect queue is utilized specifically to
issue Connect commands that establish the I/O queues.
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/core.c | 4 +++-
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fe6dcd19cecb..626db6088dcd 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4207,6 +4207,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
mutex_unlock(&ctrl->namespaces_lock);
goto out_unlink_ns;
}
+ blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
nvme_ns_add_to_ctrl_list(ns);
mutex_unlock(&ctrl->namespaces_lock);
synchronize_srcu(&ctrl->srcu);
@@ -5145,6 +5146,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
ctrl->ka_last_check_time = jiffies;
ctrl->admin_timeout = NVME_ADMIN_TIMEOUT;
+ ctrl->io_timeout = NVME_IO_TIMEOUT;
BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
PAGE_SIZE);
@@ -5253,7 +5255,7 @@ EXPORT_SYMBOL_GPL(nvme_unfreeze);
int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
{
- unsigned long timeout = NVME_IO_TIMEOUT;
+ unsigned long timeout = ctrl->io_timeout;
struct nvme_ns *ns;
int srcu_idx;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 7923533cce00..9ccaed0b9dbf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -371,6 +371,7 @@ struct nvme_ctrl {
u32 ctrl_config;
u32 queue_count;
u32 admin_timeout;
+ u32 io_timeout;
u64 cap;
u32 max_hw_sectors;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 3b39b64cd9da..b682c1a4b23f 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -663,6 +663,52 @@ static ssize_t nvme_admin_timeout_store(struct device *dev,
static DEVICE_ATTR(admin_timeout, S_IRUGO | S_IWUSR,
nvme_admin_timeout_show, nvme_admin_timeout_store);
+static ssize_t nvme_io_timeout_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%u\n", jiffies_to_msecs(ctrl->io_timeout));
+}
+
+static ssize_t nvme_io_timeout_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+ struct nvme_ns *ns;
+ u32 timeout;
+ int err;
+
+ /*
+ * Wait until the controller reaches the LIVE state to be sure that
+ * connect_q is properly initialized.
+ */
+ if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+ return -EBUSY;
+
+ err = kstrtou32(buf, 10, &timeout);
+ if (err || !timeout)
+ return -EINVAL;
+
+ /* Take the namespaces_lock to avoid racing against nvme_alloc_ns() */
+ mutex_lock(&ctrl->namespaces_lock);
+
+ ctrl->io_timeout = msecs_to_jiffies(timeout);
+ list_for_each_entry(ns, &ctrl->namespaces, list)
+ blk_queue_rq_timeout(ns->queue, ctrl->io_timeout);
+
+ mutex_unlock(&ctrl->namespaces_lock);
+
+ if (ctrl->connect_q)
+ blk_queue_rq_timeout(ctrl->connect_q, ctrl->io_timeout);
+
+ return count;
+}
+
+static DEVICE_ATTR(io_timeout, S_IRUGO | S_IWUSR,
+ nvme_io_timeout_show, nvme_io_timeout_store);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -806,6 +852,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_dctype.attr,
&dev_attr_quirks.attr,
&dev_attr_admin_timeout.attr,
+ &dev_attr_io_timeout.attr,
#ifdef CONFIG_NVME_HOST_AUTH
&dev_attr_dhchap_secret.attr,
&dev_attr_dhchap_ctrl_secret.attr,
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 5/7] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
` (3 preceding siblings ...)
2026-05-14 8:32 ` [PATCH V5 4/7] nvme: add sysfs attribute to change IO timeout per controller Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 6/7] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 7/7] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
Currently, the final reference for the fabrics admin queue (fabrics_q)
is dropped inside nvme_remove_admin_tag_set(). However, the primary admin
queue (admin_q) defers dropping its final reference until
nvme_free_ctrl().
Move the blk_put_queue() call for fabrics_q from
nvme_remove_admin_tag_set() to nvme_free_ctrl(). This aligns the
lifecycle management of both admin queues, ensuring they are freed
symmetrically when the controller is finally torn down.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 626db6088dcd..e6bb1f5e9657 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4936,10 +4936,8 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
*/
nvme_stop_keep_alive(ctrl);
blk_mq_destroy_queue(ctrl->admin_q);
- if (ctrl->ops->flags & NVME_F_FABRICS) {
+ if (ctrl->fabrics_q)
blk_mq_destroy_queue(ctrl->fabrics_q);
- blk_put_queue(ctrl->fabrics_q);
- }
blk_mq_free_tag_set(ctrl->admin_tagset);
}
EXPORT_SYMBOL_GPL(nvme_remove_admin_tag_set);
@@ -5081,6 +5079,8 @@ static void nvme_free_ctrl(struct device *dev)
if (ctrl->admin_q)
blk_put_queue(ctrl->admin_q);
+ if (ctrl->fabrics_q)
+ blk_put_queue(ctrl->fabrics_q);
if (!subsys || ctrl->instance != subsys->instance)
ida_free(&nvme_instance_ida, ctrl->instance);
nvme_free_cels(ctrl);
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 6/7] nvmet-loop: do not alloc admin tag set during reset
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
` (4 preceding siblings ...)
2026-05-14 8:32 ` [PATCH V5 5/7] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 7/7] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
Currently, resetting a loopback controller unconditionally invokes
nvme_alloc_admin_tag_set() inside nvme_loop_configure_admin_queue().
Doing so drops the old queue and allocates a new one. Consequently,
this reverts the admin queue's timeout (q->rq_timeout) back to the
module default (NVME_ADMIN_TIMEOUT), completely wiping out any custom
timeout values the user may have configured via sysfs and potentially
racing against the sysfs nvme_admin_timeout_store() function
that may dereference the admin_q pointer during the RESETTING state.
Decouple the admin tag set lifecycle from the admin queue
configuration and destruction paths, which are executed during resets;
Specifically:
* Move nvme_alloc_admin_tag_set() into nvme_loop_create_ctrl() so it
is only allocated once during the initial controller creation.
* Defer the destruction of the admin tag set to
nvme_loop_delete_ctrl_host() and the terminal error-handling
paths of nvme_loop_reset_ctrl_work() and
nvme_loop_create_ctrl().
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/target/loop.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index d98d0cdc5d6f..070d16068e6b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -274,7 +274,6 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
nvmet_cq_put(&ctrl->queues[0].nvme_cq);
- nvme_remove_admin_tag_set(&ctrl->ctrl);
}
static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
@@ -375,25 +374,18 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
}
ctrl->ctrl.queue_count = 1;
- error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
- &nvme_loop_admin_mq_ops,
- sizeof(struct nvme_loop_iod) +
- NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
- if (error)
- goto out_free_sq;
-
/* reset stopped state for the fresh admin queue */
clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
error = nvmf_connect_admin_queue(&ctrl->ctrl);
if (error)
- goto out_cleanup_tagset;
+ goto out_free_sq;
set_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
error = nvme_enable_ctrl(&ctrl->ctrl);
if (error)
- goto out_cleanup_tagset;
+ goto out_free_sq;
ctrl->ctrl.max_hw_sectors =
(NVME_LOOP_MAX_SEGMENTS - 1) << PAGE_SECTORS_SHIFT;
@@ -402,14 +394,12 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
if (error)
- goto out_cleanup_tagset;
+ goto out_free_sq;
return 0;
-out_cleanup_tagset:
- clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
- nvme_remove_admin_tag_set(&ctrl->ctrl);
out_free_sq:
+ clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
nvmet_cq_put(&ctrl->queues[0].nvme_cq);
return error;
@@ -432,6 +422,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
static void nvme_loop_delete_ctrl_host(struct nvme_ctrl *ctrl)
{
nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl));
+ nvme_remove_admin_tag_set(ctrl);
}
static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
@@ -494,6 +485,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
nvme_cancel_admin_tagset(&ctrl->ctrl);
nvme_loop_destroy_admin_queue(ctrl);
out_disable:
+ nvme_remove_admin_tag_set(&ctrl->ctrl);
dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
nvme_uninit_ctrl(&ctrl->ctrl);
}
@@ -594,10 +586,17 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
if (!ctrl->queues)
goto out_uninit_ctrl;
- ret = nvme_loop_configure_admin_queue(ctrl);
+ ret = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
+ &nvme_loop_admin_mq_ops,
+ sizeof(struct nvme_loop_iod) +
+ NVME_INLINE_SG_CNT * sizeof(struct scatterlist));
if (ret)
goto out_free_queues;
+ ret = nvme_loop_configure_admin_queue(ctrl);
+ if (ret)
+ goto out_remove_admin_tagset;
+
if (opts->queue_size > ctrl->ctrl.maxcmd) {
/* warn if maxcmd is lower than queue_size */
dev_warn(ctrl->ctrl.device,
@@ -633,6 +632,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
nvme_quiesce_admin_queue(&ctrl->ctrl);
nvme_cancel_admin_tagset(&ctrl->ctrl);
nvme_loop_destroy_admin_queue(ctrl);
+out_remove_admin_tagset:
+ nvme_remove_admin_tag_set(&ctrl->ctrl);
out_free_queues:
kfree(ctrl->queues);
out_uninit_ctrl:
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V5 7/7] nvme-core: warn on allocating admin tag set with existing queue
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
` (5 preceding siblings ...)
2026-05-14 8:32 ` [PATCH V5 6/7] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
@ 2026-05-14 8:32 ` Maurizio Lombardi
6 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2026-05-14 8:32 UTC (permalink / raw)
To: kbusch
Cc: mheyne, emilne, jmeneghi, linux-nvme, dwagner, mlombard,
mkhalfella, chaitanyak, hare, hch
Currently, nvme_alloc_admin_tag_set() silently drops and releases
the existing admin_q if it called on a controller that already
had one (e.g., during a controller reset).
However, transport drivers should not be reallocating the admin tag
set and queue during a reset. Dropping the old queue and allocating
a new one destroys user-configured timeouts and may race against
nvme_admin_timeout_store()
Since all transport drivers are now expected to preserve the admin queue
across resets, calling nvme_alloc_admin_tag_set() when ctrl->admin_q
is already populated is a bug.
Remove the silent cleanup and replace it with a WARN_ON_ONCE() to
explicitly catch any transport drivers that violate this lifecycle rule
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
drivers/nvme/host/core.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e6bb1f5e9657..f23a957c6e8f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4893,12 +4893,7 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
if (ret)
return ret;
- /*
- * If a previous admin queue exists (e.g., from before a reset),
- * put it now before allocating a new one to avoid orphaning it.
- */
- if (ctrl->admin_q)
- blk_put_queue(ctrl->admin_q);
+ WARN_ON_ONCE(ctrl->admin_q);
ctrl->admin_q = blk_mq_alloc_queue(set, NULL, NULL);
if (IS_ERR(ctrl->admin_q)) {
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-14 8:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 8:32 [PATCH V5 0/7] nvme: Refactor and expose per-controller timeout configuration Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 1/7] nvme: Let the blocklayer set timeouts for requests Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 2/7] nvme: remove redundant timeout argument from nvme_wait_freeze_timeout Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 3/7] nvme: add sysfs attribute to change admin timeout per nvme controller Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 4/7] nvme: add sysfs attribute to change IO timeout per controller Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 5/7] nvme-core: align fabrics_q teardown with admin_q in nvme_free_ctrl Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 6/7] nvmet-loop: do not alloc admin tag set during reset Maurizio Lombardi
2026-05-14 8:32 ` [PATCH V5 7/7] nvme-core: warn on allocating admin tag set with existing queue Maurizio Lombardi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox