* [PATCH v3 0/4] nvme: various bugs fix & code cleanup
@ 2024-12-03 3:34 brookxu.cn
2024-12-03 3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03 3:34 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Here try to fix the hang issue of nvme-rdma, memory leak issue of
nvme-tcp, and cleanup the relative code of nvme-tcp.
Chunguang.xu (4):
nvme-tcp: fix the memleak while create new ctrl failed
nvme-rdma: unquiesce admin_q before destroy it
nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
nvme-tcp: simplify nvme_tcp_teardown_io_queues()
drivers/nvme/host/rdma.c | 8 +-------
drivers/nvme/host/tcp.c | 17 +++++------------
2 files changed, 6 insertions(+), 19 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
@ 2024-12-03 3:34 ` brookxu.cn
2024-12-03 7:19 ` Hannes Reinecke
2024-12-03 3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
` (3 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: brookxu.cn @ 2024-12-03 3:34 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.
Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3e416af2659f..55abfe5e1d25 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
}
destroy_admin:
nvme_stop_keep_alive(ctrl);
- nvme_tcp_teardown_admin_queue(ctrl, false);
+ nvme_tcp_teardown_admin_queue(ctrl, new);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
2024-12-03 3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-12-03 3:34 ` brookxu.cn
2024-12-03 3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03 3:34 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:
PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme"
#0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
#1 [ff61d23de260fc08] schedule at ffffffff8323c014
#2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
#3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
#4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
#5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce
#6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced
#7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b
#8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362
#9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574
RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004
RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004
R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.
Fixes: 958dc1d32c80 ("nvme-rdma: add clean action for failed reconnection")
Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/rdma.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index baf7d2490152..86a2891d9bcc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1091,13 +1091,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
}
destroy_admin:
nvme_stop_keep_alive(&ctrl->ctrl);
- nvme_quiesce_admin_queue(&ctrl->ctrl);
- blk_sync_queue(ctrl->ctrl.admin_q);
- nvme_rdma_stop_queue(&ctrl->queues[0]);
- nvme_cancel_admin_tagset(&ctrl->ctrl);
- if (new)
- nvme_remove_admin_tag_set(&ctrl->ctrl);
- nvme_rdma_destroy_admin_queue(ctrl);
+ nvme_rdma_teardown_admin_queue(ctrl, new);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
2024-12-03 3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-12-03 3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-12-03 3:34 ` brookxu.cn
2024-12-03 3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch
4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03 3:34 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao
From: "Chunguang.xu" <chunguang.xu@shopee.com>
As we quiesce admin_q in nvme_tcp_teardown_admin_queue(), so we should no
need to quiesce it in nvme_tcp_reaardown_io_queues(), make things simple.
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/tcp.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 55abfe5e1d25..98bf758dc6fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
{
if (ctrl->queue_count <= 1)
return;
- nvme_quiesce_admin_queue(ctrl);
nvme_quiesce_io_queues(ctrl);
nvme_sync_io_queues(ctrl);
nvme_tcp_stop_io_queues(ctrl);
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues()
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
` (2 preceding siblings ...)
2024-12-03 3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-12-03 3:34 ` brookxu.cn
2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch
4 siblings, 0 replies; 7+ messages in thread
From: brookxu.cn @ 2024-12-03 3:34 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: hare, lengchao
From: "Chunguang.xu" <chunguang.xu@shopee.com>
As nvme_tcp_teardown_io_queues() is the only one caller of
nvme_tcp_destroy_admin_queue(), so we can merge it into
nvme_tcp_teardown_io_queues() to simplify the code.
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/tcp.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
v3: Update the commit log, no code changed.
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 98bf758dc6fc..28c76a3e1bd2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2101,14 +2101,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
return ret;
}
-static void nvme_tcp_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
- nvme_tcp_stop_queue(ctrl, 0);
- if (remove)
- nvme_remove_admin_tag_set(ctrl);
- nvme_tcp_free_admin_queue(ctrl);
-}
-
static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
{
int error;
@@ -2163,9 +2155,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
blk_sync_queue(ctrl->admin_q);
nvme_tcp_stop_queue(ctrl, 0);
nvme_cancel_admin_tagset(ctrl);
- if (remove)
+ if (remove) {
nvme_unquiesce_admin_queue(ctrl);
- nvme_tcp_destroy_admin_queue(ctrl, remove);
+ nvme_remove_admin_tag_set(ctrl);
+ }
+ nvme_tcp_free_admin_queue(ctrl);
if (ctrl->tls_pskid) {
dev_dbg(ctrl->device, "Wipe negotiated TLS_PSK %08x\n",
ctrl->tls_pskid);
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed
2024-12-03 3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-12-03 7:19 ` Hannes Reinecke
0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-12-03 7:19 UTC (permalink / raw)
To: brookxu.cn, kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: lengchao
On 12/3/24 04:34, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
>
> Now while we create new ctrl failed, we have not free the
> tagset occupied by admin_q, here try to fix it.
>
> Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/nvme/host/tcp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 3e416af2659f..55abfe5e1d25 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
> }
> destroy_admin:
> nvme_stop_keep_alive(ctrl);
> - nvme_tcp_teardown_admin_queue(ctrl, false);
> + nvme_tcp_teardown_admin_queue(ctrl, new);
> return ret;
> }
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 0/4] nvme: various bugs fix & code cleanup
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
` (3 preceding siblings ...)
2024-12-03 3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-12-04 18:20 ` Keith Busch
4 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2024-12-04 18:20 UTC (permalink / raw)
To: brookxu.cn; +Cc: axboe, hch, sagi, linux-nvme, linux-kernel, hare, lengchao
On Tue, Dec 03, 2024 at 11:34:39AM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
>
> Here try to fix the hang issue of nvme-rdma, memory leak issue of
> nvme-tcp, and cleanup the relative code of nvme-tcp.
Thanks, applied to nvme-6.13.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-04 18:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 3:34 [PATCH v3 0/4] nvme: various bugs fix & code cleanup brookxu.cn
2024-12-03 3:34 ` [PATCH v3 1/4] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-12-03 7:19 ` Hannes Reinecke
2024-12-03 3:34 ` [PATCH v3 2/4] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
2024-12-03 3:34 ` [PATCH v3 3/4] nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
2024-12-03 3:34 ` [PATCH v3 4/4] nvme-tcp: simplify nvme_tcp_teardown_io_queues() brookxu.cn
2024-12-04 18:20 ` [PATCH v3 0/4] nvme: various bugs fix & code cleanup Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox