All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] nvme: various bugs fix & code cleanup
@ 2024-11-27  9:27 brookxu.cn
  2024-11-27  9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Here try to fix the hang issue of nvme-rdma, memory leak issue of
nvme-tcp, cleanup the code of nvme-tcp.

Chunguang.xu (5):
  nvme-tcp: fix the memleak while create new ctrl failed
  nvme-rdma: unquiesce admin_q before destroy it
  nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  nvme-tcp: simplify nvme_tcp_configure_admin_queue()
  nvme-tcp: remove nvme_tcp_destroy_io_queues()

 drivers/nvme/host/rdma.c |  8 +------
 drivers/nvme/host/tcp.c  | 49 ++++++++++++++++------------------------
 2 files changed, 20 insertions(+), 37 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed
  2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
@ 2024-11-27  9:27 ` brookxu.cn
  2024-11-29  8:10   ` Christoph Hellwig
  2024-11-27  9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.

Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3e416af2659f..55abfe5e1d25 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(ctrl);
-	nvme_tcp_teardown_admin_queue(ctrl, false);
+	nvme_tcp_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it
  2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
  2024-11-27  9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-11-27  9:27 ` brookxu.cn
  2024-11-29  8:11   ` Christoph Hellwig
  2024-11-27  9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:

PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
 #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
 #1 [ff61d23de260fc08] schedule at ffffffff8323c014
 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
 #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce [nvme_rdma]
 #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced [nvme_rdma]
 #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b [nvme_rdma]
 #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 [nvme_fabrics]
 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
    RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
    RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
    RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
    R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.

Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
---
 drivers/nvme/host/rdma.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 24a2759798d0..913e6e5a8070 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1091,13 +1091,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_quiesce_admin_queue(&ctrl->ctrl);
-	blk_sync_queue(ctrl->ctrl.admin_q);
-	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	if (new)
-		nvme_remove_admin_tag_set(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl);
+	nvme_rdma_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
  2024-11-27  9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
  2024-11-27  9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-27  9:27 ` brookxu.cn
  2024-11-29  8:11   ` Christoph Hellwig
  2024-11-27  9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
  2024-11-27  9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
  4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 55abfe5e1d25..98bf758dc6fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	nvme_quiesce_admin_queue(ctrl);
 	nvme_quiesce_io_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
  2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
                   ` (2 preceding siblings ...)
  2024-11-27  9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-27  9:27 ` brookxu.cn
  2024-11-29  8:11   ` Christoph Hellwig
  2024-11-27  9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
  4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As nvme_tcp_configure_admin_queue() is the only one caller of
nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
into nvme_tcp_destroy_admin_queue() to simplify the code.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 98bf758dc6fc..28c76a3e1bd2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2101,14 +2101,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	return ret;
 }
 
-static void nvme_tcp_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_tcp_stop_queue(ctrl, 0);
-	if (remove)
-		nvme_remove_admin_tag_set(ctrl);
-	nvme_tcp_free_admin_queue(ctrl);
-}
-
 static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
 	int error;
@@ -2163,9 +2155,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
-	if (remove)
+	if (remove) {
 		nvme_unquiesce_admin_queue(ctrl);
-	nvme_tcp_destroy_admin_queue(ctrl, remove);
+		nvme_remove_admin_tag_set(ctrl);
+	}
+	nvme_tcp_free_admin_queue(ctrl);
 	if (ctrl->tls_pskid) {
 		dev_dbg(ctrl->device, "Wipe negotiated TLS_PSK %08x\n",
 			ctrl->tls_pskid);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
  2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
                   ` (3 preceding siblings ...)
  2024-11-27  9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-27  9:27 ` brookxu.cn
  2024-11-29  8:12   ` Christoph Hellwig
  4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27  9:27 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Now when destroying the IO queue we call nvme_tcp_stop_io_queues() twice,
nvme_tcp_destroy_io_queues() has an unnecessary call. Here we try to remove
nvme_tcp_destroy_io_queues() and merge it into nvme_tcp_teardown_io_queues(),
simplify the code and align with nvme-rdma, make it easy to maintaince.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 28c76a3e1bd2..36c7e49af38a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2024,14 +2024,6 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 	return __nvme_tcp_alloc_io_queues(ctrl);
 }
 
-static void nvme_tcp_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_tcp_stop_io_queues(ctrl);
-	if (remove)
-		nvme_remove_io_tag_set(ctrl);
-	nvme_tcp_free_io_queues(ctrl);
-}
-
 static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 {
 	int ret, nr_queues;
@@ -2170,15 +2162,17 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	if (ctrl->queue_count <= 1)
-		return;
-	nvme_quiesce_io_queues(ctrl);
-	nvme_sync_io_queues(ctrl);
-	nvme_tcp_stop_io_queues(ctrl);
-	nvme_cancel_tagset(ctrl);
-	if (remove)
-		nvme_unquiesce_io_queues(ctrl);
-	nvme_tcp_destroy_io_queues(ctrl, remove);
+	if (ctrl->queue_count > 1) {
+		nvme_quiesce_io_queues(ctrl);
+		nvme_sync_io_queues(ctrl);
+		nvme_tcp_stop_io_queues(ctrl);
+		nvme_cancel_tagset(ctrl);
+		if (remove) {
+			nvme_unquiesce_io_queues(ctrl);
+			nvme_remove_io_tag_set(ctrl);
+		}
+		nvme_tcp_free_io_queues(ctrl);
+	}
 }
 
 static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
@@ -2267,7 +2261,9 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_sync_io_queues(ctrl);
 		nvme_tcp_stop_io_queues(ctrl);
 		nvme_cancel_tagset(ctrl);
-		nvme_tcp_destroy_io_queues(ctrl, new);
+		if (new)
+			nvme_remove_io_tag_set(ctrl);
+		nvme_tcp_free_io_queues(ctrl);
 	}
 destroy_admin:
 	nvme_stop_keep_alive(ctrl);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed
  2024-11-27  9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-11-29  8:10   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29  8:10 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it
  2024-11-27  9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-29  8:11   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29  8:11 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  2024-11-27  9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-29  8:11   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29  8:11 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
  2024-11-27  9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-29  8:11   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29  8:11 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel

On Wed, Nov 27, 2024 at 05:27:50PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> As nvme_tcp_configure_admin_queue() is the only one caller of
> nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
> into nvme_tcp_destroy_admin_queue() to simplify the code.


Need a little fixing for the line length here in the commit message,
but otherwise looks good:


Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
  2024-11-27  9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
@ 2024-11-29  8:12   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29  8:12 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel

On Wed, Nov 27, 2024 at 05:27:51PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Now when destroying the IO queue we call nvme_tcp_stop_io_queues() twice,
> nvme_tcp_destroy_io_queues() has an unnecessary call. Here we try to remove
> nvme_tcp_destroy_io_queues() and merge it into nvme_tcp_teardown_io_queues(),
> simplify the code and align with nvme-rdma, make it easy to maintaince.

Please split the reorganization from the fix.

Also can you add Fixes tag to the various bug fix patches?



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-29  8:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
2024-11-27  9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-11-29  8:10   ` Christoph Hellwig
2024-11-27  9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
2024-11-29  8:11   ` Christoph Hellwig
2024-11-27  9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
2024-11-29  8:11   ` Christoph Hellwig
2024-11-27  9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
2024-11-29  8:11   ` Christoph Hellwig
2024-11-27  9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
2024-11-29  8:12   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.