public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it
       [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
@ 2024-11-23 13:37 ` brookxu.cn
  2024-11-25  7:21   ` Christoph Hellwig
                     ` (2 more replies)
  2024-11-23 13:37 ` [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 17+ messages in thread
From: brookxu.cn @ 2024-11-23 13:37 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:

PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
 #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
 #1 [ff61d23de260fc08] schedule at ffffffff8323c014
 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
 #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce [nvme_rdma]
 #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced [nvme_rdma]
 #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b [nvme_rdma]
 #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 [nvme_fabrics]
 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
    RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
    RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
    RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
    R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.

Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
---
 drivers/nvme/host/rdma.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 24a2759798d0..913e6e5a8070 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1091,13 +1091,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_quiesce_admin_queue(&ctrl->ctrl);
-	blk_sync_queue(ctrl->ctrl.admin_q);
-	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	nvme_cancel_admin_tagset(&ctrl->ctrl);
-	if (new)
-		nvme_remove_admin_tag_set(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl);
+	nvme_rdma_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
       [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
  2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-23 13:37 ` brookxu.cn
  2024-11-25  7:22   ` Christoph Hellwig
  2024-11-29 11:18   ` Hannes Reinecke
  2024-11-23 13:37 ` [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: brookxu.cn @ 2024-11-23 13:37 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3e416af2659f..47d8f10b1f75 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	nvme_quiesce_admin_queue(ctrl);
 	nvme_quiesce_io_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
       [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
  2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
  2024-11-23 13:37 ` [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-23 13:37 ` brookxu.cn
  2024-11-25  7:22   ` Christoph Hellwig
  2024-11-29 11:19   ` Hannes Reinecke
  2024-11-23 13:37 ` [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
  2024-11-23 13:37 ` [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
  4 siblings, 2 replies; 17+ messages in thread
From: brookxu.cn @ 2024-11-23 13:37 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

As nvme_tcp_configure_admin_queue() is the only one caller of
nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
into nvme_tcp_destroy_admin_queue() to simplify the code.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 47d8f10b1f75..45cbaa7523e6 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2101,14 +2101,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 	return ret;
 }
 
-static void nvme_tcp_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_tcp_stop_queue(ctrl, 0);
-	if (remove)
-		nvme_remove_admin_tag_set(ctrl);
-	nvme_tcp_free_admin_queue(ctrl);
-}
-
 static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
 	int error;
@@ -2163,9 +2155,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
-	if (remove)
+	if (remove) {
 		nvme_unquiesce_admin_queue(ctrl);
-	nvme_tcp_destroy_admin_queue(ctrl, remove);
+		nvme_remove_admin_tag_set(ctrl);
+	}
+	nvme_tcp_free_admin_queue(ctrl);
 	if (ctrl->tls_pskid) {
 		dev_dbg(ctrl->device, "Wipe negotiated TLS_PSK %08x\n",
 			ctrl->tls_pskid);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
       [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
                   ` (2 preceding siblings ...)
  2024-11-23 13:37 ` [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-23 13:37 ` brookxu.cn
  2024-11-25  7:24   ` Christoph Hellwig
  2024-11-29 11:22   ` Hannes Reinecke
  2024-11-23 13:37 ` [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
  4 siblings, 2 replies; 17+ messages in thread
From: brookxu.cn @ 2024-11-23 13:37 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Now maybe we can remove nvme_tcp_destroy_io_queues() to simplify
the code and avoid unnecessary call of nvme_tcp_stop_io_queues().

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 45cbaa7523e6..57f0f0290cc8 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2024,14 +2024,6 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 	return __nvme_tcp_alloc_io_queues(ctrl);
 }
 
-static void nvme_tcp_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_tcp_stop_io_queues(ctrl);
-	if (remove)
-		nvme_remove_io_tag_set(ctrl);
-	nvme_tcp_free_io_queues(ctrl);
-}
-
 static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 {
 	int ret, nr_queues;
@@ -2170,15 +2162,17 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	if (ctrl->queue_count <= 1)
-		return;
-	nvme_quiesce_io_queues(ctrl);
-	nvme_sync_io_queues(ctrl);
-	nvme_tcp_stop_io_queues(ctrl);
-	nvme_cancel_tagset(ctrl);
-	if (remove)
-		nvme_unquiesce_io_queues(ctrl);
-	nvme_tcp_destroy_io_queues(ctrl, remove);
+	if (ctrl->queue_count > 1) {
+		nvme_quiesce_io_queues(ctrl);
+		nvme_sync_io_queues(ctrl);
+		nvme_tcp_stop_io_queues(ctrl);
+		nvme_cancel_tagset(ctrl);
+		if (remove) {
+			nvme_unquiesce_io_queues(ctrl);
+			nvme_remove_io_tag_set(ctrl);
+		}
+		nvme_tcp_free_io_queues(ctrl);
+	}
 }
 
 static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
@@ -2267,7 +2261,9 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_sync_io_queues(ctrl);
 		nvme_tcp_stop_io_queues(ctrl);
 		nvme_cancel_tagset(ctrl);
-		nvme_tcp_destroy_io_queues(ctrl, new);
+		if (new)
+			nvme_remove_io_tag_set(ctrl);
+		nvme_tcp_free_io_queues(ctrl);
 	}
 destroy_admin:
 	nvme_stop_keep_alive(ctrl);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed
       [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
                   ` (3 preceding siblings ...)
  2024-11-23 13:37 ` [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
@ 2024-11-23 13:37 ` brookxu.cn
  2024-11-25  7:25   ` Christoph Hellwig
  4 siblings, 1 reply; 17+ messages in thread
From: brookxu.cn @ 2024-11-23 13:37 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: "Chunguang.xu" <chunguang.xu@shopee.com>

Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.

Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 57f0f0290cc8..36c7e49af38a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2267,7 +2267,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 	}
 destroy_admin:
 	nvme_stop_keep_alive(ctrl);
-	nvme_tcp_teardown_admin_queue(ctrl, false);
+	nvme_tcp_teardown_admin_queue(ctrl, new);
 	return ret;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it
  2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
@ 2024-11-25  7:21   ` Christoph Hellwig
  2024-11-29 11:18   ` Hannes Reinecke
  2024-12-24 11:53   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-11-25  7:21 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel

On Sat, Nov 23, 2024 at 09:37:37PM +0800, brookxu.cn wrote:
> This due to we have quiesced admi_q before cancel requests, but forgot
> to unquiesce before destroy it, as a result we fail to drain the
> pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
> try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
> simplify the code.

Looks good, and matches what the TCP driver is doing:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  2024-11-23 13:37 ` [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
@ 2024-11-25  7:22   ` Christoph Hellwig
  2024-11-25  7:55     ` 许春光
  2024-11-29 11:18   ` Hannes Reinecke
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-11-25  7:22 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel

On Sat, Nov 23, 2024 at 09:37:38PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
> need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.

Yes.  And this matches what RDMA is doing.  We really need to go
back to the attempt to consolidaste this code..

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
  2024-11-23 13:37 ` [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
@ 2024-11-25  7:22   ` Christoph Hellwig
  2024-11-29 11:19   ` Hannes Reinecke
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-11-25  7:22 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
  2024-11-23 13:37 ` [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
@ 2024-11-25  7:24   ` Christoph Hellwig
  2024-11-25  7:51     ` 许春光
  2024-11-29 11:22   ` Hannes Reinecke
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-11-25  7:24 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel

On Sat, Nov 23, 2024 at 09:37:40PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Now maybe we can remove nvme_tcp_destroy_io_queues() to simplify
> the code and avoid unnecessary call of nvme_tcp_stop_io_queues().

Please split the behavior change from the cleanup, and explain why
the behavior change is fine and desirable.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed
  2024-11-23 13:37 ` [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
@ 2024-11-25  7:25   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-11-25  7:25 UTC (permalink / raw)
  To: brookxu.cn; +Cc: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel

On Sat, Nov 23, 2024 at 09:37:41PM +0800, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Now while we create new ctrl failed, we have not free the
> tagset occupied by admin_q, here try to fix it.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

but please add a Fixes tag and move it to the beginning of the series.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
  2024-11-25  7:24   ` Christoph Hellwig
@ 2024-11-25  7:51     ` 许春光
  0 siblings, 0 replies; 17+ messages in thread
From: 许春光 @ 2024-11-25  7:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, axboe, sagi, linux-nvme, linux-kernel

Christoph Hellwig <hch@lst.de> 于2024年11月25日周一 15:24写道:
>
> On Sat, Nov 23, 2024 at 09:37:40PM +0800, brookxu.cn wrote:
> > From: "Chunguang.xu" <chunguang.xu@shopee.com>
> >
> > Now maybe we can remove nvme_tcp_destroy_io_queues() to simplify
> > the code and avoid unnecessary call of nvme_tcp_stop_io_queues().
>
> Please split the behavior change from the cleanup, and explain why
> the behavior change is fine and desirable.
I think what really change is  removed the unnecessary call of
nvme_tcp_stop_io_queues(),
as we have do it outside of nvme_tcp_destroy_io_queues(), no other
behaviour changed,
inaddition align with the code nvme-rdma, so it is easy to maintaince.

>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  2024-11-25  7:22   ` Christoph Hellwig
@ 2024-11-25  7:55     ` 许春光
  0 siblings, 0 replies; 17+ messages in thread
From: 许春光 @ 2024-11-25  7:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, axboe, sagi, linux-nvme, linux-kernel

Christoph Hellwig <hch@lst.de> 于2024年11月25日周一 15:22写道:
>
> On Sat, Nov 23, 2024 at 09:37:38PM +0800, brookxu.cn wrote:
> > From: "Chunguang.xu" <chunguang.xu@shopee.com>
> >
> > As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
> > need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.
>
> Yes.  And this matches what RDMA is doing.  We really need to go
> back to the attempt to consolidaste this code..

Yes, I also think we can do it

> Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it
  2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
  2024-11-25  7:21   ` Christoph Hellwig
@ 2024-11-29 11:18   ` Hannes Reinecke
  2024-12-24 11:53   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2024-11-29 11:18 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 11/23/24 14:37, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Kernel will hang on destroy admin_q while we create ctrl failed, such
> as following calltrace:
> 
> PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
>   #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
>   #1 [ff61d23de260fc08] schedule at ffffffff8323c014
>   #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
>   #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
>   #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
>   #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce [nvme_rdma]
>   #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced [nvme_rdma]
>   #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b [nvme_rdma]
>   #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 [nvme_fabrics]
>   #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
>      RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
>      RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
>      RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
>      RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
>      R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
>      R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
>      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
> 
> This due to we have quiesced admi_q before cancel requests, but forgot
> to unquiesce before destroy it, as a result we fail to drain the
> pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
> try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
> simplify the code.
> 
> Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
> ---
>   drivers/nvme/host/rdma.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues()
  2024-11-23 13:37 ` [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
  2024-11-25  7:22   ` Christoph Hellwig
@ 2024-11-29 11:18   ` Hannes Reinecke
  1 sibling, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2024-11-29 11:18 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 11/23/24 14:37, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> As we quiesec admin_q in nvme_tcp_teardown_admin_queue(), so we should no
> need to quiesec it in nvme_tcp_reaardown_io_queues(), make things simple.
> 

quiesce ...

> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> ---
>   drivers/nvme/host/tcp.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 3e416af2659f..47d8f10b1f75 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2178,7 +2178,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
>   {
>   	if (ctrl->queue_count <= 1)
>   		return;
> -	nvme_quiesce_admin_queue(ctrl);
>   	nvme_quiesce_io_queues(ctrl);
>   	nvme_sync_io_queues(ctrl);
>   	nvme_tcp_stop_io_queues(ctrl);

Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue()
  2024-11-23 13:37 ` [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
  2024-11-25  7:22   ` Christoph Hellwig
@ 2024-11-29 11:19   ` Hannes Reinecke
  1 sibling, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2024-11-29 11:19 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 11/23/24 14:37, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> As nvme_tcp_configure_admin_queue() is the only one caller of
> nvme_tcp_destroy_admin_queue(), so we can merge nvme_tcp_configure_admin_queue()
> into nvme_tcp_destroy_admin_queue() to simplify the code.
> 
> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> ---
>   drivers/nvme/host/tcp.c | 14 ++++----------
>   1 file changed, 4 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues()
  2024-11-23 13:37 ` [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
  2024-11-25  7:24   ` Christoph Hellwig
@ 2024-11-29 11:22   ` Hannes Reinecke
  1 sibling, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2024-11-29 11:22 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 11/23/24 14:37, brookxu.cn wrote:
> From: "Chunguang.xu" <chunguang.xu@shopee.com>
> 
> Now maybe we can remove nvme_tcp_destroy_io_queues() to simplify
> the code and avoid unnecessary call of nvme_tcp_stop_io_queues().
> 
> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
> ---
>   drivers/nvme/host/tcp.c | 32 ++++++++++++++------------------
>   1 file changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 45cbaa7523e6..57f0f0290cc8 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2024,14 +2024,6 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>   	return __nvme_tcp_alloc_io_queues(ctrl);
>   }
>   
> -static void nvme_tcp_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
> -{
> -	nvme_tcp_stop_io_queues(ctrl);
> -	if (remove)
> -		nvme_remove_io_tag_set(ctrl);
> -	nvme_tcp_free_io_queues(ctrl);
> -}
> -
>   static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>   {
>   	int ret, nr_queues;
> @@ -2170,15 +2162,17 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
>   static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
>   		bool remove)
>   {
> -	if (ctrl->queue_count <= 1)
> -		return;
> -	nvme_quiesce_io_queues(ctrl);
> -	nvme_sync_io_queues(ctrl);
> -	nvme_tcp_stop_io_queues(ctrl);
> -	nvme_cancel_tagset(ctrl);
> -	if (remove)
> -		nvme_unquiesce_io_queues(ctrl);
> -	nvme_tcp_destroy_io_queues(ctrl, remove);
> +	if (ctrl->queue_count > 1) {
> +		nvme_quiesce_io_queues(ctrl);
> +		nvme_sync_io_queues(ctrl);
> +		nvme_tcp_stop_io_queues(ctrl);
> +		nvme_cancel_tagset(ctrl);
> +		if (remove) {
> +			nvme_unquiesce_io_queues(ctrl);
> +			nvme_remove_io_tag_set(ctrl);
> +		}
> +		nvme_tcp_free_io_queues(ctrl);
> +	}
>   }
>   
Please don't reorder the function; if you keep
the first 'if (ctrl->queue_count <= 1)' condition
you'll incur less churn and the patch is easier to read.

>   static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
> @@ -2267,7 +2261,9 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
>   		nvme_sync_io_queues(ctrl);
>   		nvme_tcp_stop_io_queues(ctrl);
>   		nvme_cancel_tagset(ctrl);
> -		nvme_tcp_destroy_io_queues(ctrl, new);
> +		if (new)
> +			nvme_remove_io_tag_set(ctrl);
> +		nvme_tcp_free_io_queues(ctrl);
>   	}
>   destroy_admin:
>   	nvme_stop_keep_alive(ctrl);

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it
  2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
  2024-11-25  7:21   ` Christoph Hellwig
  2024-11-29 11:18   ` Hannes Reinecke
@ 2024-12-24 11:53   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-12-24 11:53 UTC (permalink / raw)
  To: brookxu.cn, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-12-24 11:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1732368538.git.chunguang.xu@shopee.com>
2024-11-23 13:37 ` [PATCH 1/5] nvme-rdma: unquiesce admin_q before destroy it brookxu.cn
2024-11-25  7:21   ` Christoph Hellwig
2024-11-29 11:18   ` Hannes Reinecke
2024-12-24 11:53   ` Sagi Grimberg
2024-11-23 13:37 ` [PATCH 2/5] nvme-tcp: no need to quiesec admin_q in nvme_tcp_teardown_io_queues() brookxu.cn
2024-11-25  7:22   ` Christoph Hellwig
2024-11-25  7:55     ` 许春光
2024-11-29 11:18   ` Hannes Reinecke
2024-11-23 13:37 ` [PATCH 3/5] nvme-tcp: simplify nvme_tcp_configure_admin_queue() brookxu.cn
2024-11-25  7:22   ` Christoph Hellwig
2024-11-29 11:19   ` Hannes Reinecke
2024-11-23 13:37 ` [PATCH 4/5] nvme-tcp: remove nvme_tcp_destroy_io_queues() brookxu.cn
2024-11-25  7:24   ` Christoph Hellwig
2024-11-25  7:51     ` 许春光
2024-11-29 11:22   ` Hannes Reinecke
2024-11-23 13:37 ` [PATCH 5/5] nvme-tcp: fix the memleak while create new ctrl failed brookxu.cn
2024-11-25  7:25   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox