public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] nvme: various bugs fix & code cleanup
@ 2024-12-03  3:34 brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03  3:34 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Here try to fix the hang issue of nvme-rdma, memory leak issue of
nvme-tcp, and cleanup the relative code of nvme-tcp.

Chunguang.xu (4):
  nvme-tcp: fix the memleak while create new ctrl failed
  nvme-rdma: unquiesce admin_q before destroy it
  nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
  nvme-tcp: simplify nvme_tcp_teardown_io_queues()

 drivers/nvme/host/rdma.c |  8 +-------
 drivers/nvme/host/tcp.c  | 17 +++++------------
 2 files changed, 6 insertions(+), 19 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed
  2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
@ 2024-12-03  3:34 ` brookxu.cn
  2024-12-03  7:19   ` Hannes Reinecke
  2024-12-03  3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: brookxu.cn @ 2024-12-03  3:34 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.

Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3e416af2659f..55abfe5e1d25 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(ctrl);
-	nvme_tcp_teardown_admin_queue(ctrl, false);
+	nvme_tcp_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it
  2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-12-03  3:34 ` brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03  3:34 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:

PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
 #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
 #1 [ff61d23de260fc08] schedule at ffffffff8323c014
 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
 #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce
 #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced
 #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b
 #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362
 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
    RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
    RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
    RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
    R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.

Fixes: 958dc1d32c80 ("nvme-rdma: add clean action for failed reconnection")
Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/rdma.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index baf7d2490152..86a2891d9bcc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1091,13 +1091,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_quiesce_admin_queue(&ctrl->ctrl);
-	blk_sync_queue(ctrl->ctrl.admin_q);
-	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	if (new)
-		nvme_remove_admin_tag_set(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl);
+	nvme_rdma_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
  2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-12-03  3:34 ` brookxu.cn
  2024-12-03  3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
  2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch
  4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03  3:34 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As we quiesce admin_q in nvme_tcp_teardown_admin_queue(), so we should no
need to quiesce it in nvme_tcp_reaardown_io_queues(), make things simple.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 55abfe5e1d25..98bf758dc6fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	nvme_quiesce_admin_queue(ctrl);
 	nvme_quiesce_io_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues()
  2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
                   ` (2 preceding siblings ...)
  2024-12-03  3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-12-03  3:34 ` brookxu.cn
  2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch
  4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03  3:34 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As nvme_tcp_teardown_io_queues() is the only one caller of
nvme_tcp_destroy_admin_queue(), so we can merge it into
nvme_tcp_teardown_io_queues() to simplify the code.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

v3: Update the commit log, no code changed.

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 98bf758dc6fc..28c76a3e1bd2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2101,14 +2101,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	return ret;
 }
 
-static void nvme_tcp_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_tcp_stop_queue(ctrl, 0);
-	if (remove)
-		nvme_remove_admin_tag_set(ctrl);
-	nvme_tcp_free_admin_queue(ctrl);
-}
-
 static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
 	int error;
@@ -2163,9 +2155,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
-	if (remove)
+	if (remove) {
 		nvme_unquiesce_admin_queue(ctrl);
-	nvme_tcp_destroy_admin_queue(ctrl, remove);
+		nvme_remove_admin_tag_set(ctrl);
+	}
+	nvme_tcp_free_admin_queue(ctrl);
 	if (ctrl->tls_pskid) {
 		dev_dbg(ctrl->device, "Wipe negotiated TLS_PSK %08x\n",
 			ctrl->tls_pskid);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed
  2024-12-03  3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-12-03  7:19   ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-12-03  7:19 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: lengchao

On 12/3/24 04:34, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Now while we create new ctrl failed, we have not free the
> tagset occupied by admin_q, here try to fix it.
> 
> Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/tcp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 3e416af2659f..55abfe5e1d25 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
>   	}
>   destroy_admin:
>   	nvme_stop_keep_alive(ctrl);
> -	nvme_tcp_teardown_admin_queue(ctrl, false);
> +	nvme_tcp_teardown_admin_queue(ctrl, new);
>   	return ret;
>   }
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/4] nvme: various bugs fix & code cleanup
  2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
                   ` (3 preceding siblings ...)
  2024-12-03  3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-12-04 18:20 ` Keith Busch
  4 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2024-12-04 18:20 UTC (permalink / raw)
  To: brookxu.cn; +Cc: axboe, hch, sagi, linux-nvme, linux-kernel, hare, lengchao

On Tue, Dec 03, 2024 at 11:34:39AM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Here try to fix the hang issue of nvme-rdma, memory leak issue of
> nvme-tcp, and cleanup the relative code of nvme-tcp.

Thanks, applied to nvme-6.13.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-04 18:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03  3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
2024-12-03  3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-12-03  7:19   ` Hannes Reinecke
2024-12-03  3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
2024-12-03  3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
2024-12-03  3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox