* [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru
2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
2024-09-13 13:44 ` [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev David Jeffery
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, David Jeffery
The nvme_init_request function overrides a device's timeout value and forces
use of NVME_ADMIN_TIMEOUT or NVME_IO_TIMEOUT. Remove use of these values so
that devices with different transports and characteristics are not forced to
use the same timeout value.
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
drivers/nvme/host/core.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 983909a600ad..7ad83d443101 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -708,10 +708,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
struct nvme_ns *ns = req->q->disk->private_data;
logging_enabled = ns->head->passthru_err_log_enabled;
- req->timeout = NVME_IO_TIMEOUT;
} else { /* no queuedata implies admin queue */
logging_enabled = nr->ctrl->passthru_err_log_enabled;
- req->timeout = NVME_ADMIN_TIMEOUT;
}
if (!logging_enabled)
--
2.46.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev
2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
2024-09-13 13:44 ` [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT David Jeffery
2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, David Jeffery
Currently, there is no method to adjust the timeout values on a per device
basis with nvme admin queues. Add an admin_timeout attribute to nvme so
that different types of nvme devices which may have different timeout
requirements can have custom admin timeouts set.
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
drivers/nvme/host/sysfs.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index ba05faaac562..1d60316274ce 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -60,6 +60,36 @@ static ssize_t nvme_adm_passthru_err_log_enabled_store(struct device *dev,
return count;
}
+static ssize_t nvme_admin_timeout_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%u\n", jiffies_to_msecs(
+ ctrl->admin_q->rq_timeout));
+}
+
+static ssize_t nvme_admin_timeout_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+ unsigned int timeout;
+ int err;
+
+ err = kstrtou32(buf, 10, &timeout);
+ if (err || !timeout)
+ return -EINVAL;
+
+ blk_queue_rq_timeout(ctrl->admin_q, msecs_to_jiffies(timeout));
+
+ return count;
+}
+
+static struct device_attribute dev_attr_admin_timeout = \
+ __ATTR(admin_timeout, S_IRUGO | S_IWUSR, \
+ nvme_admin_timeout_show, nvme_admin_timeout_store);
+
+
static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
{
struct gendisk *disk = dev_to_disk(dev);
@@ -708,6 +738,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_tls_key.attr,
#endif
&dev_attr_adm_passthru_err_log_enabled.attr,
+ &dev_attr_admin_timeout.attr,
NULL
};
--
2.46.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
2024-09-13 13:44 ` [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, David Jeffery
While tearing down its queues, nvme-pci uses NVME_ADMIN_TIMEOUT as its
timeout target. Instead, use the configured admin queue's timeout value
when available to match the device's existing timeout setting.
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
drivers/nvme/host/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c0533f3f64cb..823476f4e42f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2467,9 +2467,10 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
{
int nr_queues = dev->online_queues - 1, sent = 0;
unsigned long timeout;
+ struct request_queue *q = dev->ctrl.admin_q;
retry:
- timeout = NVME_ADMIN_TIMEOUT;
+ timeout = q ? q->rq_timeout : NVME_ADMIN_TIMEOUT;
while (nr_queues > 0) {
if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
break;
--
2.46.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
` (2 preceding siblings ...)
2024-09-13 13:44 ` [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
2024-09-13 15:17 ` Keith Busch
3 siblings, 1 reply; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, David Jeffery
Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
nvme_wait_freeze_timeout, use the largest timeout value from all the
namespaces for the nvme controller. This will match the wait to the
configuration of the devices instead of assuming a single value can
fulfill the needs of all devices on the system.
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
drivers/nvme/host/apple.c | 2 +-
drivers/nvme/host/core.c | 10 +++++++++-
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/pci.c | 2 +-
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
6 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index b1387dc459a3..37333de2accc 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -820,7 +820,7 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
* doing a safe shutdown.
*/
if (!dead && shutdown && freeze)
- nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+ nvme_wait_freeze_timeout(&anv->ctrl);
nvme_quiesce_io_queues(&anv->ctrl);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7ad83d443101..398e1f5de33f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4843,12 +4843,20 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
}
EXPORT_SYMBOL_GPL(nvme_unfreeze);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;
int srcu_idx;
+ long timeout = 0;
srcu_idx = srcu_read_lock(&ctrl->srcu);
+
+ /* find largest I/O timeout */
+ list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
+ if (ns->queue->rq_timeout > timeout)
+ timeout = ns->queue->rq_timeout;
+ }
+
list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
if (timeout <= 0)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index da57947130cc..859b70733b99 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -813,7 +813,7 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl);
void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
void nvme_unfreeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze(struct nvme_ctrl *ctrl);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl);
void nvme_start_freeze(struct nvme_ctrl *ctrl);
static inline enum req_op nvme_req_op(struct nvme_command *cmd)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 823476f4e42f..43d0ac4a5065 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2646,7 +2646,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* if doing a safe shutdown.
*/
if (!dead && shutdown)
- nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+ nvme_wait_freeze_timeout(&dev->ctrl);
}
nvme_quiesce_io_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2eb33842f971..ac9c788ad5a2 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -888,7 +888,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
if (!new) {
nvme_start_freeze(&ctrl->ctrl);
nvme_unquiesce_io_queues(&ctrl->ctrl);
- if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+ if (!nvme_wait_freeze_timeout(&ctrl->ctrl)) {
/*
* If we timed out waiting for freeze we are likely to
* be stuck. Fail the controller initialization just
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a2a47d3ab99f..641202eca0b5 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2040,7 +2040,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
if (!new) {
nvme_start_freeze(ctrl);
nvme_unquiesce_io_queues(ctrl);
- if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
+ if (!nvme_wait_freeze_timeout(ctrl)) {
/*
* If we timed out waiting for freeze we are likely to
* be stuck. Fail the controller initialization just
--
2.46.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
@ 2024-09-13 15:17 ` Keith Busch
2024-09-13 18:36 ` David Jeffery
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2024-09-13 15:17 UTC (permalink / raw)
To: David Jeffery; +Cc: ens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme
On Fri, Sep 13, 2024 at 09:44:30AM -0400, David Jeffery wrote:
> Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
> nvme_wait_freeze_timeout, use the largest timeout value from all the
> namespaces for the nvme controller. This will match the wait to the
> configuration of the devices instead of assuming a single value can
> fulfill the needs of all devices on the system.
The module parameter is obviously too coarse, but does IO timeout really
need to be per-namespace? I'd think setting the preferred timeout at the
controller level is sufficient.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
2024-09-13 15:17 ` Keith Busch
@ 2024-09-13 18:36 ` David Jeffery
2024-09-16 7:17 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: David Jeffery @ 2024-09-13 18:36 UTC (permalink / raw)
To: Keith Busch; +Cc: ens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme
On Fri, Sep 13, 2024 at 11:17 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Sep 13, 2024 at 09:44:30AM -0400, David Jeffery wrote:
> > Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
> > nvme_wait_freeze_timeout, use the largest timeout value from all the
> > namespaces for the nvme controller. This will match the wait to the
> > configuration of the devices instead of assuming a single value can
> > fulfill the needs of all devices on the system.
>
> The module parameter is obviously too coarse, but does IO timeout really
> need to be per-namespace? I'd think setting the preferred timeout at the
> controller level is sufficient.
>
The expansion of nvme to be a remote storage interface means different
namespaces provided by the same controller can have very different
characteristics. Take someone making an nvme-tcp target where one
namespace is backed by local, high speed SSDs while another namespace
is storage backed by a slower or more finicky data solution like nfs
or various clustered storage products. I would expect such cases to be
rare, but they are still cases where independent timeout values may be
needed to better handle different namespaces through the same
controller.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
2024-09-13 18:36 ` David Jeffery
@ 2024-09-16 7:17 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-09-16 7:17 UTC (permalink / raw)
To: David Jeffery
Cc: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme
On Fri, Sep 13, 2024 at 02:36:45PM -0400, David Jeffery wrote:
> The expansion of nvme to be a remote storage interface means different
> namespaces provided by the same controller can have very different
> characteristics.
You can do the same over PCIe attachment. But that doesn't really
chnge that the fundamental relevant abstraction level in NVMe is
the controller here.
^ permalink raw reply [flat|nested] 8+ messages in thread