public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Handle number of queue changes
@ 2022-08-29  9:28 Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 1/3] nvmet: Expose max queues to configfs Daniel Wagner
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-08-29  9:28 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Hannes Reinecke, Chao Leng, Daniel Wagner

Updated the first patch with the feedback from Sagi and Hannes.
While at it also collected the review tags.


From the previous cover letter:

We got a report from our customer that a firmware upgrade on the
storage array is able to 'break' host. This is caused a change of
number of queues which the target supports after a reconnect.

Let's assume the number of queues is 8 and all is working fine. Then
the connection is dropped and the host starts to try to
reconnect. Eventually, this succeeds but now the new number of queues
is 10:

nvme0: creating 8 I/O queues.
nvme0: mapped 8/0/0 default/read/poll queues.
nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
nvme0: queue 0: timeout request 0x0 type 4
nvme0: starting error recovery
nvme0: failed nvme_keep_alive_end_io error=10
nvme0: Reconnecting in 10 seconds...
nvme0: failed to connect socket: -110
nvme0: Failed reconnect attempt 1
nvme0: Reconnecting in 10 seconds...
nvme0: creating 10 I/O queues.
nvme0: Connect command failed, error wo/DNR bit: -16389
nvme0: failed to connect queue: 9 ret=-5
nvme0: Failed reconnect attempt 2

As you can see queue number 9 is not able to connect.

As the order of starting and unfreezing is important we can't just
move the start of the queues after the tagset update. So my stupid
idea was to start just the older number of queues and then the rest.

This seems work:

nvme nvme0: creating 4 I/O queues.
nvme nvme0: mapped 4/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
nvme nvme0: queue 0: timeout request 0x0 type 4
nvme nvme0: starting error recovery
nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
nvme nvme0: failed nvme_keep_alive_end_io error=10
nvme nvme0: Reconnecting in 10 seconds...
nvme nvme0: creating 6 I/O queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 7 qcnt 5
nvme nvme0: mapped 6/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 7 qcnt 7
nvme nvme0: Successfully reconnected (1 attempt)
nvme nvme0: starting error recovery
nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
nvme nvme0: failed nvme_keep_alive_end_io error=10
nvme nvme0: Reconnecting in 10 seconds...
nvme nvme0: creating 4 I/O queues.
nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 5 qcnt 5
nvme nvme0: mapped 4/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme nvme0: Successfully reconnected (1 attempt)


changes:
v4:
 - updated subject s/sysfs/configfs/
 - updated nvmet_subsys_attrs entry placement
 - added missing port variable
 - collected reviewed tags
v3:
 - only allow max_qid changes when port is disabled
 - use ctrl->tagset and avoid code churn
 - use correct error label for goto
v2:
 - removed debug logging
 - pass in queue range idx as argument to nvme_tcp_start_io_queues
v1:
 - https://lore.kernel.org/linux-nvme/20220812142824.17766-1-dwagner@suse.de/

Daniel Wagner (3):
  nvmet: Expose max queues to configfs
  nvme-tcp: Handle number of queue changes
  nvme-rdma: Handle number of queue changes

 drivers/nvme/host/rdma.c       | 26 +++++++++++++++++++++-----
 drivers/nvme/host/tcp.c        | 26 +++++++++++++++++++++-----
 drivers/nvme/target/configfs.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 10 deletions(-)

-- 
2.37.2



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/3] nvmet: Expose max queues to configfs
  2022-08-29  9:28 [PATCH v4 0/3] Handle number of queue changes Daniel Wagner
@ 2022-08-29  9:28 ` Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 2/3] nvme-tcp: Handle number of queue changes Daniel Wagner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-08-29  9:28 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Hannes Reinecke, Chao Leng, Daniel Wagner

Allow to set the max queues the target supports. This is useful for
testing the reconnect attempt of the host with changing numbers of
supported queues.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/configfs.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 2bcd60758919..e34a2896fedb 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1281,6 +1281,34 @@ static ssize_t nvmet_subsys_attr_pi_enable_store(struct config_item *item,
 CONFIGFS_ATTR(nvmet_subsys_, attr_pi_enable);
 #endif
 
+static ssize_t nvmet_subsys_attr_qid_max_show(struct config_item *item,
+					      char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%u\n", to_subsys(item)->max_qid);
+}
+
+static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
+					       const char *page, size_t cnt)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	u16 qid_max;
+
+	if (nvmet_is_port_enabled(port, __func__))
+		return -EACCES;
+
+	if (sscanf(page, "%hu\n", &qid_max) != 1)
+		return -EINVAL;
+
+	if (qid_max < 1 || qid_max > NVMET_NR_QUEUES)
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	to_subsys(item)->max_qid = qid_max;
+	up_write(&nvmet_config_sem);
+	return cnt;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_qid_max);
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
@@ -1288,6 +1316,7 @@ static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_cntlid_min,
 	&nvmet_subsys_attr_attr_cntlid_max,
 	&nvmet_subsys_attr_attr_model,
+	&nvmet_subsys_attr_attr_qid_max,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 	&nvmet_subsys_attr_attr_pi_enable,
 #endif
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/3] nvme-tcp: Handle number of queue changes
  2022-08-29  9:28 [PATCH v4 0/3] Handle number of queue changes Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 1/3] nvmet: Expose max queues to configfs Daniel Wagner
@ 2022-08-29  9:28 ` Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 3/3] nvme-rdma: " Daniel Wagner
  2022-09-05 11:09 ` [PATCH v4 0/3] " Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-08-29  9:28 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Hannes Reinecke, Chao Leng, Daniel Wagner

On reconnect, the number of queues might have changed.

In the case where we have more queues available than previously we try
to access queues which are not initialized yet.

The other case where we have less queues than previously, the
connection attempt will fail because the target doesn't support the
old number of queues and we end up in a reconnect loop.

Thus, only start queues which are currently present in the tagset
limited by the number of available queues. Then we update the tagset
and we can start any new queue.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 044da18c06f5..d97b0b6369b5 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1762,11 +1762,12 @@ static void nvme_tcp_stop_io_queues(struct nvme_ctrl *ctrl)
 		nvme_tcp_stop_queue(ctrl, i);
 }
 
-static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
+static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl,
+				    int first, int last)
 {
 	int i, ret;
 
-	for (i = 1; i < ctrl->queue_count; i++) {
+	for (i = first; i < last; i++) {
 		ret = nvme_tcp_start_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
@@ -1775,7 +1776,7 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
 	return 0;
 
 out_stop_queues:
-	for (i--; i >= 1; i--)
+	for (i--; i >= first; i--)
 		nvme_tcp_stop_queue(ctrl, i);
 	return ret;
 }
@@ -1901,7 +1902,7 @@ static void nvme_tcp_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
 
 static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 {
-	int ret;
+	int ret, nr_queues;
 
 	ret = nvme_tcp_alloc_io_queues(ctrl);
 	if (ret)
@@ -1917,7 +1918,13 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 			goto out_free_tag_set;
 	}
 
-	ret = nvme_tcp_start_io_queues(ctrl);
+	/*
+	 * Only start IO queues for which we have allocated the tagset
+	 * and limitted it to the available queues. On reconnects, the
+	 * queue number might have changed.
+	 */
+	nr_queues = min(ctrl->tagset->nr_hw_queues + 1, ctrl->queue_count);
+	ret = nvme_tcp_start_io_queues(ctrl, 1, nr_queues);
 	if (ret)
 		goto out_cleanup_connect_q;
 
@@ -1937,6 +1944,15 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 		nvme_unfreeze(ctrl);
 	}
 
+	/*
+	 * If the number of queues has increased (reconnect case)
+	 * start all new queues now.
+	 */
+	ret = nvme_tcp_start_io_queues(ctrl, nr_queues,
+				       ctrl->tagset->nr_hw_queues + 1);
+	if (ret)
+		goto out_wait_freeze_timed_out;
+
 	return 0;
 
 out_wait_freeze_timed_out:
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 3/3] nvme-rdma: Handle number of queue changes
  2022-08-29  9:28 [PATCH v4 0/3] Handle number of queue changes Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 1/3] nvmet: Expose max queues to configfs Daniel Wagner
  2022-08-29  9:28 ` [PATCH v4 2/3] nvme-tcp: Handle number of queue changes Daniel Wagner
@ 2022-08-29  9:28 ` Daniel Wagner
  2022-09-05 11:09 ` [PATCH v4 0/3] " Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-08-29  9:28 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Hannes Reinecke, Chao Leng, Daniel Wagner

On reconnect, the number of queues might have changed.

In the case where we have more queues available than previously we try
to access queues which are not initialized yet.

The other case where we have less queues than previously, the
connection attempt will fail because the target doesn't support the
old number of queues and we end up in a reconnect loop.

Thus, only start queues which are currently present in the tagset
limited by the number of available queues. Then we update the tagset
and we can start any new queue.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/rdma.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3100643be299..0d1d29e644d9 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -696,11 +696,12 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 	return ret;
 }
 
-static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl,
+				     int first, int last)
 {
 	int i, ret = 0;
 
-	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
+	for (i = first; i < last; i++) {
 		ret = nvme_rdma_start_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
@@ -709,7 +710,7 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return 0;
 
 out_stop_queues:
-	for (i--; i >= 1; i--)
+	for (i--; i >= first; i--)
 		nvme_rdma_stop_queue(&ctrl->queues[i]);
 	return ret;
 }
@@ -964,7 +965,7 @@ static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl,
 
 static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 {
-	int ret;
+	int ret, nr_queues;
 
 	ret = nvme_rdma_alloc_io_queues(ctrl);
 	if (ret)
@@ -980,7 +981,13 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 			goto out_free_tag_set;
 	}
 
-	ret = nvme_rdma_start_io_queues(ctrl);
+	/*
+	 * Only start IO queues for which we have allocated the tagset
+	 * and limitted it to the available queues. On reconnects, the
+	 * queue number might have changed.
+	 */
+	nr_queues = min(ctrl->tag_set.nr_hw_queues + 1, nctrl->queue_count);
+	ret = nvme_rdma_start_io_queues(ctrl, 1, nr_queues);
 	if (ret)
 		goto out_cleanup_connect_q;
 
@@ -1000,6 +1007,15 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		nvme_unfreeze(&ctrl->ctrl);
 	}
 
+	/*
+	 * If the number of queues has increased (reconnect case)
+	 * start all new queues now.
+	 */
+	ret = nvme_rdma_start_io_queues(ctrl, nr_queues,
+					ctrl->tag_set.nr_hw_queues + 1);
+	if (ret)
+		goto out_wait_freeze_timed_out;
+
 	return 0;
 
 out_wait_freeze_timed_out:
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 0/3] Handle number of queue changes
  2022-08-29  9:28 [PATCH v4 0/3] Handle number of queue changes Daniel Wagner
                   ` (2 preceding siblings ...)
  2022-08-29  9:28 ` [PATCH v4 3/3] nvme-rdma: " Daniel Wagner
@ 2022-09-05 11:09 ` Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2022-09-05 11:09 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-nvme, Sagi Grimberg, Hannes Reinecke, Chao Leng

Thanks,

applied to nvme-6.1.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-05 14:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-29  9:28 [PATCH v4 0/3] Handle number of queue changes Daniel Wagner
2022-08-29  9:28 ` [PATCH v4 1/3] nvmet: Expose max queues to configfs Daniel Wagner
2022-08-29  9:28 ` [PATCH v4 2/3] nvme-tcp: Handle number of queue changes Daniel Wagner
2022-08-29  9:28 ` [PATCH v4 3/3] nvme-rdma: " Daniel Wagner
2022-09-05 11:09 ` [PATCH v4 0/3] " Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox