[PATCH 0/4] cleaning up nvme timeout over-dependence on global values

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] cleaning up nvme timeout over-dependence on global values
@ 2024-09-13 13:44 David Jeffery
  2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
  To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, David Jeffery

With nvme timeout handling, the global configuration options behind
NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT are broadly used in nvme timeout
behavior.

This can become quite a problem with systems using mixed nvme transports.
The timeoue values needed with local PCI devices may be very different from
what is needed over tcp or other fabric transports. The intention of this
patch series is to limit the NVME_IO_TIMEOUT and NVME_ADMIN_TIMEOUT values
to providing only an initial timeout value which can then be overridden and
configured separately per device. Thus, each device can have a timeout
value configured to match its characteristics.

David Jeffery (4):
  nvme: use queue's configured timeout not global config value for
    passthru
  nvme: add sysfs attribute to change admin timeout per nvme dev
  nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  nvme: use per device timeout waits over depending on global default
    value

 drivers/nvme/host/apple.c |  2 +-
 drivers/nvme/host/core.c  | 12 +++++++++---
 drivers/nvme/host/nvme.h  |  2 +-
 drivers/nvme/host/pci.c   |  5 +++--
 drivers/nvme/host/rdma.c  |  2 +-
 drivers/nvme/host/sysfs.c | 31 +++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp.c   |  2 +-
 7 files changed, 47 insertions(+), 9 deletions(-)

-- 
2.46.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru
  2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
  2024-09-13 13:44 ` [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev David Jeffery
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
  To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, David Jeffery

The nvme_init_request function overrides a device's timeout value and forces
use of NVME_ADMIN_TIMEOUT or NVME_IO_TIMEOUT. Remove use of these values so
that devices with different transports and characteristics are not forced to
use the same timeout value.

Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 983909a600ad..7ad83d443101 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -708,10 +708,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
 		struct nvme_ns *ns = req->q->disk->private_data;
 
 		logging_enabled = ns->head->passthru_err_log_enabled;
-		req->timeout = NVME_IO_TIMEOUT;
 	} else { /* no queuedata implies admin queue */
 		logging_enabled = nr->ctrl->passthru_err_log_enabled;
-		req->timeout = NVME_ADMIN_TIMEOUT;
 	}
 
 	if (!logging_enabled)
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev
  2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
  2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
  2024-09-13 13:44 ` [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT David Jeffery
  2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
  3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
  To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, David Jeffery

Currently, there is no method to adjust the timeout values on a per device
basis with nvme admin queues. Add an admin_timeout attribute to nvme so
that different types of nvme devices which may have different timeout
requirements can have custom admin timeouts set.

Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/nvme/host/sysfs.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index ba05faaac562..1d60316274ce 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -60,6 +60,36 @@ static ssize_t nvme_adm_passthru_err_log_enabled_store(struct device *dev,
 	return count;
 }
 
+static ssize_t nvme_admin_timeout_show(struct device *dev,
+                struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%u\n", jiffies_to_msecs(
+					ctrl->admin_q->rq_timeout));
+}
+
+static ssize_t nvme_admin_timeout_store(struct device *dev,
+                struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	unsigned int timeout;
+	int err;
+
+	err = kstrtou32(buf, 10, &timeout);
+	if (err || !timeout)
+		return -EINVAL;
+
+	blk_queue_rq_timeout(ctrl->admin_q, msecs_to_jiffies(timeout));
+
+	return count;
+}
+
+static struct device_attribute dev_attr_admin_timeout = \
+	__ATTR(admin_timeout, S_IRUGO | S_IWUSR, \
+	nvme_admin_timeout_show, nvme_admin_timeout_store);
+
+
 static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
 {
 	struct gendisk *disk = dev_to_disk(dev);
@@ -708,6 +738,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_tls_key.attr,
 #endif
 	&dev_attr_adm_passthru_err_log_enabled.attr,
+	&dev_attr_admin_timeout.attr,
 	NULL
 };
 
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT
  2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
  2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
  2024-09-13 13:44 ` [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
  2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
  3 siblings, 0 replies; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
  To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, David Jeffery

While tearing down its queues, nvme-pci uses NVME_ADMIN_TIMEOUT as its
timeout target. Instead, use the configured admin queue's timeout value
when available to match the device's existing timeout setting.

Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c0533f3f64cb..823476f4e42f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2467,9 +2467,10 @@ static bool __nvme_delete_io_queues(struct nvme_dev *dev, u8 opcode)
 {
 	int nr_queues = dev->online_queues - 1, sent = 0;
 	unsigned long timeout;
+	struct request_queue *q = dev->ctrl.admin_q;
 
  retry:
-	timeout = NVME_ADMIN_TIMEOUT;
+	timeout = q ? q->rq_timeout : NVME_ADMIN_TIMEOUT;
 	while (nr_queues > 0) {
 		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
 			break;
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
  2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
                   ` (2 preceding siblings ...)
  2024-09-13 13:44 ` [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT David Jeffery
@ 2024-09-13 13:44 ` David Jeffery
  2024-09-13 15:17   ` Keith Busch
  3 siblings, 1 reply; 8+ messages in thread
From: David Jeffery @ 2024-09-13 13:44 UTC (permalink / raw)
  To: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, David Jeffery

Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
nvme_wait_freeze_timeout, use the largest timeout value from all the
namespaces for the nvme controller. This will match the wait to the
configuration of the devices instead of assuming a single value can
fulfill the needs of all devices on the system.

Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/nvme/host/apple.c |  2 +-
 drivers/nvme/host/core.c  | 10 +++++++++-
 drivers/nvme/host/nvme.h  |  2 +-
 drivers/nvme/host/pci.c   |  2 +-
 drivers/nvme/host/rdma.c  |  2 +-
 drivers/nvme/host/tcp.c   |  2 +-
 6 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index b1387dc459a3..37333de2accc 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -820,7 +820,7 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
 	 * doing a safe shutdown.
 	 */
 	if (!dead && shutdown && freeze)
-		nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+		nvme_wait_freeze_timeout(&anv->ctrl);
 
 	nvme_quiesce_io_queues(&anv->ctrl);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7ad83d443101..398e1f5de33f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4843,12 +4843,20 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 	int srcu_idx;
+	long timeout = 0;
 
 	srcu_idx = srcu_read_lock(&ctrl->srcu);
+
+	/* find largest I/O timeout */
+	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
+		if (ns->queue->rq_timeout > timeout)
+			timeout = ns->queue->rq_timeout;
+	}
+
 	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
 		if (timeout <= 0)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index da57947130cc..859b70733b99 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -813,7 +813,7 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
-int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 static inline enum req_op nvme_req_op(struct nvme_command *cmd)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 823476f4e42f..43d0ac4a5065 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2646,7 +2646,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * if doing a safe shutdown.
 		 */
 		if (!dead && shutdown)
-			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+			nvme_wait_freeze_timeout(&dev->ctrl);
 	}
 
 	nvme_quiesce_io_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2eb33842f971..ac9c788ad5a2 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -888,7 +888,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(&ctrl->ctrl);
 		nvme_unquiesce_io_queues(&ctrl->ctrl);
-		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(&ctrl->ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a2a47d3ab99f..641202eca0b5 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2040,7 +2040,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	if (!new) {
 		nvme_start_freeze(ctrl);
 		nvme_unquiesce_io_queues(ctrl);
-		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
+		if (!nvme_wait_freeze_timeout(ctrl)) {
 			/*
 			 * If we timed out waiting for freeze we are likely to
 			 * be stuck.  Fail the controller initialization just
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
  2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
@ 2024-09-13 15:17   ` Keith Busch
  2024-09-13 18:36     ` David Jeffery
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2024-09-13 15:17 UTC (permalink / raw)
  To: David Jeffery; +Cc: ens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme

On Fri, Sep 13, 2024 at 09:44:30AM -0400, David Jeffery wrote:
> Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
> nvme_wait_freeze_timeout, use the largest timeout value from all the
> namespaces for the nvme controller. This will match the wait to the
> configuration of the devices instead of assuming a single value can
> fulfill the needs of all devices on the system.

The module parameter is obviously too coarse, but does IO timeout really
need to be per-namespace? I'd think setting the preferred timeout at the
controller level is sufficient.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
  2024-09-13 15:17   ` Keith Busch
@ 2024-09-13 18:36     ` David Jeffery
  2024-09-16  7:17       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: David Jeffery @ 2024-09-13 18:36 UTC (permalink / raw)
  To: Keith Busch; +Cc: ens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme

On Fri, Sep 13, 2024 at 11:17 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Sep 13, 2024 at 09:44:30AM -0400, David Jeffery wrote:
> > Instead of passing NVME_IO_TIMEOUT as a parameter with every call to
> > nvme_wait_freeze_timeout, use the largest timeout value from all the
> > namespaces for the nvme controller. This will match the wait to the
> > configuration of the devices instead of assuming a single value can
> > fulfill the needs of all devices on the system.
>
> The module parameter is obviously too coarse, but does IO timeout really
> need to be per-namespace? I'd think setting the preferred timeout at the
> controller level is sufficient.
>

The expansion of nvme to be a remote storage interface means different
namespaces provided by the same controller can have very different
characteristics. Take someone making an nvme-tcp target where one
namespace is backed by local, high speed SSDs while another namespace
is storage backed by a slower or more finicky data solution like nfs
or various clustered storage products. I would expect such cases to be
rare, but they are still cases where independent timeout values may be
needed to better handle different namespaces through the same
controller.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] nvme: use per device timeout waits over depending on global default value
  2024-09-13 18:36     ` David Jeffery
@ 2024-09-16  7:17       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-09-16  7:17 UTC (permalink / raw)
  To: David Jeffery
  Cc: Keith Busch, ens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Fri, Sep 13, 2024 at 02:36:45PM -0400, David Jeffery wrote:
> The expansion of nvme to be a remote storage interface means different
> namespaces provided by the same controller can have very different
> characteristics.

You can do the same over PCIe attachment.  But that doesn't really
chnge that the fundamental relevant abstraction level in NVMe is
the controller here.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-16  7:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-13 13:44 [PATCH 0/4] cleaning up nvme timeout over-dependence on global values David Jeffery
2024-09-13 13:44 ` [PATCH 1/4] nvme: use queue's configured timeout not global config value for passthru David Jeffery
2024-09-13 13:44 ` [PATCH 2/4] nvme: add sysfs attribute to change admin timeout per nvme dev David Jeffery
2024-09-13 13:44 ` [PATCH 3/4] nvme: pci: use admin queue timeout over NVME_ADMIN_TIMEOUT David Jeffery
2024-09-13 13:44 ` [PATCH 4/4] nvme: use per device timeout waits over depending on global default value David Jeffery
2024-09-13 15:17   ` Keith Busch
2024-09-13 18:36     ` David Jeffery
2024-09-16  7:17       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox