* [PATCH v2 0/5] nvme: various bugs fix & code cleanup
@ 2024-11-27 9:27 brookxu.cn
2024-11-27 9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Here try to fix the hang issue of nvme-rdma, memory leak issue of
nvme-tcp, cleanup the code of nvme-tcp.
Chunguang.xu (5):
nvme-tcp: fix the memleak while create new ctrl failed
nvme-rdma: unquiesce admin_q before destroy it
nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
nvme-tcp: simplify nvme_tcp_configure_admin_queue()
nvme-tcp: remove nvme_tcp_destroy_io_queues()
drivers/nvme/host/rdma.c | 8 +------
drivers/nvme/host/tcp.c | 49 ++++++++++++++++------------------------
2 files changed, 20 insertions(+), 37 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
@ 2024-11-27 9:27 ` brookxu.cn
2024-11-29 8:10 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.
Fixes: fd1418de10b9 ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3e416af2659f..55abfe5e1d25 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2278,7 +2278,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
}
destroy_admin:
nvme_stop_keep_alive(ctrl);
- nvme_tcp_teardown_admin_queue(ctrl, false);
+ nvme_tcp_teardown_admin_queue(ctrl, new);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
2024-11-27 9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-11-27 9:27 ` brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:
PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme"
#0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
#1 [ff61d23de260fc08] schedule at ffffffff8323c014
#2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
#3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
#4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
#5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce [nvme_rdma]
#6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced [nvme_rdma]
#7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b [nvme_rdma]
#8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 [nvme_fabrics]
#9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574
RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004
RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004
R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.
Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
---
drivers/nvme/host/rdma.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 24a2759798d0..913e6e5a8070 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1091,13 +1091,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
}
destroy_admin:
nvme_stop_keep_alive(&ctrl->ctrl);
- nvme_quiesce_admin_queue(&ctrl->ctrl);
- blk_sync_queue(ctrl->ctrl.admin_q);
- nvme_rdma_stop_queue(&ctrl->queues[0]);
- nvme_cancel_admin_tagset(&ctrl->ctrl);
- if (new)
- nvme_remove_admin_tag_set(&ctrl->ctrl);
- nvme_rdma_destroy_admin_queue(ctrl);
+ nvme_rdma_teardown_admin_queue(ctrl, new);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
2024-11-27 9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-11-27 9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-27 9:27 ` brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
2024-11-27 9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/tcp.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 55abfe5e1d25..98bf758dc6fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
{
if (ctrl->queue_count <= 1)
return;
- nvme_quiesce_admin_queue(ctrl);
nvme_quiesce_io_queues(ctrl);
nvme_sync_io_queues(ctrl);
nvme_tcp_stop_io_queues(ctrl);
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
` (2 preceding siblings ...)
2024-11-27 9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-27 9:27 ` brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
As nvme_tcp_configure_admin_queue() is the only one caller of
nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
into nvme_tcp_destroy_admin_queue() to simplify the code.
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/tcp.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 98bf758dc6fc..28c76a3e1bd2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2101,14 +2101,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
return ret;
}
-static void nvme_tcp_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
- nvme_tcp_stop_queue(ctrl, 0);
- if (remove)
- nvme_remove_admin_tag_set(ctrl);
- nvme_tcp_free_admin_queue(ctrl);
-}
-
static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
{
int error;
@@ -2163,9 +2155,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
blk_sync_queue(ctrl->admin_q);
nvme_tcp_stop_queue(ctrl, 0);
nvme_cancel_admin_tagset(ctrl);
- if (remove)
+ if (remove) {
nvme_unquiesce_admin_queue(ctrl);
- nvme_tcp_destroy_admin_queue(ctrl, remove);
+ nvme_remove_admin_tag_set(ctrl);
+ }
+ nvme_tcp_free_admin_queue(ctrl);
if (ctrl->tls_pskid) {
dev_dbg(ctrl->device, "Wipe negotiated TLS_PSK %08x\n",
ctrl->tls_pskid);
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
` (3 preceding siblings ...)
2024-11-27 9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-27 9:27 ` brookxu.cn
2024-11-29 8:12 ` Christoph Hellwig
4 siblings, 1 reply; 11+ messages in thread
From: brookxu.cn @ 2024-11-27 9:27 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, hare; +Cc: linux-nvme, linux-kernel
From: "Chunguang.xu" <chunguang.xu@shopee.com>
Now when destroying the IO queue we call nvme_tcp_stop_io_queues() twice,
nvme_tcp_destroy_io_queues() has an unnecessary call. Here we try to remove
nvme_tcp_destroy_io_queues() and merge it into nvme_tcp_teardown_io_queues(),
simplify the code and align with nvme-rdma, make it easy to maintaince.
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/tcp.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 28c76a3e1bd2..36c7e49af38a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2024,14 +2024,6 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
return __nvme_tcp_alloc_io_queues(ctrl);
}
-static void nvme_tcp_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
-{
- nvme_tcp_stop_io_queues(ctrl);
- if (remove)
- nvme_remove_io_tag_set(ctrl);
- nvme_tcp_free_io_queues(ctrl);
-}
-
static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
{
int ret, nr_queues;
@@ -2170,15 +2162,17 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
bool remove)
{
- if (ctrl->queue_count <= 1)
- return;
- nvme_quiesce_io_queues(ctrl);
- nvme_sync_io_queues(ctrl);
- nvme_tcp_stop_io_queues(ctrl);
- nvme_cancel_tagset(ctrl);
- if (remove)
- nvme_unquiesce_io_queues(ctrl);
- nvme_tcp_destroy_io_queues(ctrl, remove);
+ if (ctrl->queue_count > 1) {
+ nvme_quiesce_io_queues(ctrl);
+ nvme_sync_io_queues(ctrl);
+ nvme_tcp_stop_io_queues(ctrl);
+ nvme_cancel_tagset(ctrl);
+ if (remove) {
+ nvme_unquiesce_io_queues(ctrl);
+ nvme_remove_io_tag_set(ctrl);
+ }
+ nvme_tcp_free_io_queues(ctrl);
+ }
}
static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
@@ -2267,7 +2261,9 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
nvme_sync_io_queues(ctrl);
nvme_tcp_stop_io_queues(ctrl);
nvme_cancel_tagset(ctrl);
- nvme_tcp_destroy_io_queues(ctrl, new);
+ if (new)
+ nvme_remove_io_tag_set(ctrl);
+ nvme_tcp_free_io_queues(ctrl);
}
destroy_admin:
nvme_stop_keep_alive(ctrl);
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed
2024-11-27 9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-11-29 8:10 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29 8:10 UTC (permalink / raw)
To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it
2024-11-27 9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-29 8:11 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29 8:11 UTC (permalink / raw)
To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
2024-11-27 9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-29 8:11 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29 8:11 UTC (permalink / raw)
To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
2024-11-27 9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-29 8:11 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29 8:11 UTC (permalink / raw)
To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel
On Wed, Nov 27, 2024 at 05:27:50PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
>
> As nvme_tcp_configure_admin_queue() is the only one caller of
> nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
> into nvme_tcp_destroy_admin_queue() to simplify the code.
Need a little fixing for the line length here in the commit message,
but otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
2024-11-27 9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
@ 2024-11-29 8:12 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-11-29 8:12 UTC (permalink / raw)
To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, hare, linux-nvme, linux-kernel
On Wed, Nov 27, 2024 at 05:27:51PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
>
> Now when destroying the IO queue we call nvme_tcp_stop_io_queues() twice,
> nvme_tcp_destroy_io_queues() has an unnecessary call. Here we try to remove
> nvme_tcp_destroy_io_queues() and merge it into nvme_tcp_teardown_io_queues(),
> simplify the code and align with nvme-rdma, make it easy to maintaince.
Please split the reorganization from the fix.
Also can you add Fixes tag to the various bug fix patches?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-11-29 8:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 9:27 [PATCH v2 0/5] nvme: various bugs fix & code cleanup brookxu.cn
2024-11-27 9:27 ` [PATCH v2 1/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-11-29 8:10 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 2/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 3/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 4/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
2024-11-29 8:11 ` Christoph Hellwig
2024-11-27 9:27 ` [PATCH v2 5/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
2024-11-29 8:12 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.