* [PATCH v2] nvme: Let the blocklayer set timeouts for requests
@ 2025-12-04 14:11 Heyne, Maximilian
2025-12-04 16:13 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Heyne, Maximilian @ 2025-12-04 14:11 UTC (permalink / raw)
Cc: Heyne, Maximilian, Keith Busch, Jens Axboe, Christoph Hellwig,
Sagi Grimberg, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org
When initializing an nvme request which is about to be send to the block
layer, we do not need to initialize its timeout. If it's left
uninitialized at 0 the block layer will use the request queue's timeout
in blk_add_timer (via nvme_start_request which is called from
nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
NVME_ADMIN_TIMEOUT when the request queues were created.
Because the io_timeout of the IO queues can be modified via sysfs, the
following situation can occur:
1) NVME_IO_TIMEOUT = 30 (default module parameter)
2) nvme1n1 is probed. IO queues default timeout is 30 s
3) manually change the IO timeout to 90 s
echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
commands with the 30 s timeout instead of the wanted 90 s which might
be more suitable for this device.
Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
changed the behavior for ioctl's already because it unconditionally
overrides the request's timeout that was set in nvme_init_request. If it
was unset by the user of the ioctl if will be overridden with 0 meaning
the block layer will pick the request queue's IO timeout.
Following up on that, this patch further improves the consistency of IO
timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
could be inconsistent with what is set in the device's request_queue by
the user.
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/nvme/host/core.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bf228df6001..b9315f0abf80 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
struct nvme_ns *ns = req->q->disk->private_data;
logging_enabled = ns->head->passthru_err_log_enabled;
- req->timeout = NVME_IO_TIMEOUT;
} else { /* no queuedata implies admin queue */
logging_enabled = nr->ctrl->passthru_err_log_enabled;
- req->timeout = NVME_ADMIN_TIMEOUT;
}
if (!logging_enabled)
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v2] nvme: Let the blocklayer set timeouts for requests 2025-12-04 14:11 [PATCH v2] nvme: Let the blocklayer set timeouts for requests Heyne, Maximilian @ 2025-12-04 16:13 ` Keith Busch 2025-12-04 19:34 ` Heyne, Maximilian 0 siblings, 1 reply; 6+ messages in thread From: Keith Busch @ 2025-12-04 16:13 UTC (permalink / raw) To: Heyne, Maximilian Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Thu, Dec 04, 2025 at 02:11:50PM +0000, Heyne, Maximilian wrote: > @@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd) > struct nvme_ns *ns = req->q->disk->private_data; > > logging_enabled = ns->head->passthru_err_log_enabled; > - req->timeout = NVME_IO_TIMEOUT; > } else { /* no queuedata implies admin queue */ > logging_enabled = nr->ctrl->passthru_err_log_enabled; > - req->timeout = NVME_ADMIN_TIMEOUT; > } I was trying to think of any in-kernel path using __submit_sync_cmd with an IO queue, and quick search shows there's just one: zns report zones. Everything else uses the admin queue, which doesn't have a sysfs tunable for its request_queue's default timeout. All we have is the nvme module parameter, which is writable after loading. Since that's the only way a user can modify the default time for that queue, I think we need to leave that req->timeout value as-is. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] nvme: Let the blocklayer set timeouts for requests 2025-12-04 16:13 ` Keith Busch @ 2025-12-04 19:34 ` Heyne, Maximilian 2025-12-05 7:33 ` Heyne, Maximilian 0 siblings, 1 reply; 6+ messages in thread From: Heyne, Maximilian @ 2025-12-04 19:34 UTC (permalink / raw) To: Keith Busch Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Thu, Dec 04, 2025 at 09:13:35AM -0700, Keith Busch wrote: > On Thu, Dec 04, 2025 at 02:11:50PM +0000, Heyne, Maximilian wrote: > > @@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd) > > struct nvme_ns *ns = req->q->disk->private_data; > > > > logging_enabled = ns->head->passthru_err_log_enabled; > > - req->timeout = NVME_IO_TIMEOUT; > > } else { /* no queuedata implies admin queue */ > > logging_enabled = nr->ctrl->passthru_err_log_enabled; > > - req->timeout = NVME_ADMIN_TIMEOUT; > > } > > I was trying to think of any in-kernel path using __submit_sync_cmd with > an IO queue, and quick search shows there's just one: zns report zones. > > Everything else uses the admin queue, which doesn't have a sysfs tunable > for its request_queue's default timeout. All we have is the nvme module > parameter, which is writable after loading. Since that's the only way a > user can modify the default time for that queue, I think we need to > leave that req->timeout value as-is. Ok sound like a v3 is needed where I only delete the line with NVME_IO_TIMEOUT but leave the NVME_ADMIN_TIMEOUT and add a comment about it. Will prepare such a patch. Thanks for you reviews. Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] nvme: Let the blocklayer set timeouts for requests 2025-12-04 19:34 ` Heyne, Maximilian @ 2025-12-05 7:33 ` Heyne, Maximilian 2025-12-12 13:43 ` Sagi Grimberg 0 siblings, 1 reply; 6+ messages in thread From: Heyne, Maximilian @ 2025-12-05 7:33 UTC (permalink / raw) To: Keith Busch Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Thu, Dec 04, 2025 at 07:34:26PM +0000, Heyne, Maximilian wrote: > On Thu, Dec 04, 2025 at 09:13:35AM -0700, Keith Busch wrote: > > On Thu, Dec 04, 2025 at 02:11:50PM +0000, Heyne, Maximilian wrote: > > > @@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd) > > > struct nvme_ns *ns = req->q->disk->private_data; > > > > > > logging_enabled = ns->head->passthru_err_log_enabled; > > > - req->timeout = NVME_IO_TIMEOUT; > > > } else { /* no queuedata implies admin queue */ > > > logging_enabled = nr->ctrl->passthru_err_log_enabled; > > > - req->timeout = NVME_ADMIN_TIMEOUT; > > > } > > > > I was trying to think of any in-kernel path using __submit_sync_cmd with > > an IO queue, and quick search shows there's just one: zns report zones. > > > > Everything else uses the admin queue, which doesn't have a sysfs tunable > > for its request_queue's default timeout. All we have is the nvme module > > parameter, which is writable after loading. Since that's the only way a > > user can modify the default time for that queue, I think we need to > > leave that req->timeout value as-is. > > Ok sound like a v3 is needed where I only delete the line with > NVME_IO_TIMEOUT but leave the NVME_ADMIN_TIMEOUT and add a comment about > it. Will prepare such a patch. > I thought about this a bit more. Considering that the module parameters can be written to produces even more inconsistencies. With my proposed change we will have the following situation: 1) when a device is probed current settings for IO and admin timeout from the module parameters will be the respective default timeouts 2) almost all admin commands to the device will default to the initial admin timeout 3) almost all IO commands will adhere to whatever is set via sysfs or the initial default -> changes to the module parameters will (mostly) not affect already probed devices When we keep initializing the admin timeouts to the module parameter as you suggest (so half of my patch), we'll have the following situation: 1) same as above 2) some admin commands will use the module parameters admin timeout, some (ioctl's via nvme_submit_user_cmd) will use the admin queue's timeout set in 1). This can be made consistent if we fix nvme_submit_user_cmd. 3) same as above; IO timeout not affected by module parameter changes. -> changes to the module parameters affect only the admin commands but (mostly) not the IO commands which is inconsistent behavior. Therefore, I wonder whether people are actually making changes to the module parameters at runtime to adjust all device's admin timeouts. In that case it's at least broken for ioctl's currently. Should we make that timeout runtime configurable per-device too instead? In summary, I think my patch as-is leads to better consistency. Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] nvme: Let the blocklayer set timeouts for requests 2025-12-05 7:33 ` Heyne, Maximilian @ 2025-12-12 13:43 ` Sagi Grimberg 2025-12-12 14:39 ` Maurizio Lombardi 0 siblings, 1 reply; 6+ messages in thread From: Sagi Grimberg @ 2025-12-12 13:43 UTC (permalink / raw) To: Heyne, Maximilian, Keith Busch Cc: Jens Axboe, Christoph Hellwig, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On 05/12/2025 9:33, Heyne, Maximilian wrote: > On Thu, Dec 04, 2025 at 07:34:26PM +0000, Heyne, Maximilian wrote: >> On Thu, Dec 04, 2025 at 09:13:35AM -0700, Keith Busch wrote: >>> On Thu, Dec 04, 2025 at 02:11:50PM +0000, Heyne, Maximilian wrote: >>>> @@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd) >>>> struct nvme_ns *ns = req->q->disk->private_data; >>>> >>>> logging_enabled = ns->head->passthru_err_log_enabled; >>>> - req->timeout = NVME_IO_TIMEOUT; >>>> } else { /* no queuedata implies admin queue */ >>>> logging_enabled = nr->ctrl->passthru_err_log_enabled; >>>> - req->timeout = NVME_ADMIN_TIMEOUT; >>>> } >>> I was trying to think of any in-kernel path using __submit_sync_cmd with >>> an IO queue, and quick search shows there's just one: zns report zones. >>> >>> Everything else uses the admin queue, which doesn't have a sysfs tunable >>> for its request_queue's default timeout. All we have is the nvme module >>> parameter, which is writable after loading. Since that's the only way a >>> user can modify the default time for that queue, I think we need to >>> leave that req->timeout value as-is. >> Ok sound like a v3 is needed where I only delete the line with >> NVME_IO_TIMEOUT but leave the NVME_ADMIN_TIMEOUT and add a comment about >> it. Will prepare such a patch. >> > I thought about this a bit more. Considering that the module parameters > can be written to produces even more inconsistencies. > > With my proposed change we will have the following situation: > 1) when a device is probed current settings for IO and admin timeout from > the module parameters will be the respective default timeouts > 2) almost all admin commands to the device will default to the initial > admin timeout > 3) almost all IO commands will adhere to whatever is set via sysfs or > the initial default > > -> changes to the module parameters will (mostly) not affect already > probed devices > > When we keep initializing the admin timeouts to the module parameter as > you suggest (so half of my patch), we'll have the following situation: > > 1) same as above > 2) some admin commands will use the module parameters admin timeout, > some (ioctl's via nvme_submit_user_cmd) will use the admin queue's timeout set in 1). > This can be made consistent if we fix nvme_submit_user_cmd. > 3) same as above; IO timeout not affected by module parameter changes. > > -> changes to the module parameters affect only the admin commands but > (mostly) not the IO commands which is inconsistent behavior. > > Therefore, I wonder whether people are actually making changes to the > module parameters at runtime to adjust all device's admin timeouts. In > that case it's at least broken for ioctl's currently. > Should we make that timeout runtime configurable per-device too instead? > > In summary, I think my patch as-is leads to better consistency. Perhaps you can simply add admin_timeout sysfs file for the controller that would alter the set->timeout and take the value from there in nvme_init_request ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] nvme: Let the blocklayer set timeouts for requests 2025-12-12 13:43 ` Sagi Grimberg @ 2025-12-12 14:39 ` Maurizio Lombardi 0 siblings, 0 replies; 6+ messages in thread From: Maurizio Lombardi @ 2025-12-12 14:39 UTC (permalink / raw) To: Sagi Grimberg, Heyne, Maximilian, Keith Busch Cc: Jens Axboe, Christoph Hellwig, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Fri Dec 12, 2025 at 2:43 PM CET, Sagi Grimberg wrote: > > Perhaps you can simply add admin_timeout sysfs file for the controller > that would alter > the set->timeout and take the value from there in nvme_init_request Curiously, this is something I was looking at recently. Could it be done by calling blk_queue_rq_timeout() in sysfs and removing "req->timeout = NVME_ADMIN_TIMEOUT;" from nvme_init_request() ? From 1a36c2fcafc2502298d25da64cf3740721560f30 Mon Sep 17 00:00:00 2001 From: Maurizio Lombardi <mlombard@redhat.com> Date: Tue, 11 Nov 2025 09:13:58 +0100 Subject: [PATCH] nvme: add sysfs attribute to change admin timeout per nvme controller Currently, there is no method to adjust the timeout values on a per controller basis with nvme admin queues. Add an admin_timeout attribute to nvme so that different nvme controllers which may have different timeout requirements can have custom admin timeouts set. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> --- drivers/nvme/host/sysfs.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c index 29430949ce2f..da83b13d2cdd 100644 --- a/drivers/nvme/host/sysfs.c +++ b/drivers/nvme/host/sysfs.c @@ -601,6 +601,36 @@ static ssize_t dctype_show(struct device *dev, } static DEVICE_ATTR_RO(dctype); +static ssize_t nvme_admin_timeout_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvme_ctrl *ctrl = dev_get_drvdata(dev); + + return sysfs_emit(buf, "%u\n", + jiffies_to_msecs(ctrl->admin_q->rq_timeout) / 1000); +} + +static ssize_t nvme_admin_timeout_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct nvme_ctrl *ctrl = dev_get_drvdata(dev); + u16 timeout; + int err; + + err = kstrtou16(buf, 10, &timeout); + if (err || !timeout) + return -EINVAL; + + blk_queue_rq_timeout(ctrl->admin_q, secs_to_jiffies(timeout)); + + return count; +} + +static struct device_attribute dev_attr_admin_timeout = \ + __ATTR(admin_timeout, S_IRUGO | S_IWUSR, \ + nvme_admin_timeout_show, nvme_admin_timeout_store); + #ifdef CONFIG_NVME_HOST_AUTH static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -742,6 +772,7 @@ static struct attribute *nvme_dev_attrs[] = { &dev_attr_kato.attr, &dev_attr_cntrltype.attr, &dev_attr_dctype.attr, + &dev_attr_admin_timeout.attr, #ifdef CONFIG_NVME_HOST_AUTH &dev_attr_dhchap_secret.attr, &dev_attr_dhchap_ctrl_secret.attr, -- 2.47.3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-12-12 14:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-04 14:11 [PATCH v2] nvme: Let the blocklayer set timeouts for requests Heyne, Maximilian 2025-12-04 16:13 ` Keith Busch 2025-12-04 19:34 ` Heyne, Maximilian 2025-12-05 7:33 ` Heyne, Maximilian 2025-12-12 13:43 ` Sagi Grimberg 2025-12-12 14:39 ` Maurizio Lombardi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).