* [PATCH 1/2] nvmet-tcp: avoid circular locking dependency on install_queue()
2023-12-08 12:53 [PATCHv3 0/2] nvmet: avoid circular locking warning hare
@ 2023-12-08 12:53 ` hare
2023-12-11 14:10 ` Sagi Grimberg
2023-12-08 12:53 ` [PATCH 2/2] nvmet-rdma: " hare
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: hare @ 2023-12-08 12:53 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Hannes Reinecke,
Shin'ichiro Kawasaki
From: Hannes Reinecke <hare@suse.de>
nvmet_tcp_install_queue() is driven from the ->io_work workqueue
function, but will call flush_workqueue() which might trigger
->release_work() which in itself calls flush_work on ->io_work.
To avoid that check for pending queue in disconnecting status,
and return 'controller busy' when we reached a certain threshold.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/nvme/target/tcp.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 4cc27856aa8f..2be0eb6cdb23 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -24,6 +24,7 @@
#include "nvmet.h"
#define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
+#define NVMET_TCP_BACKLOG 128
static int param_store_val(const char *str, int *val, int min, int max)
{
@@ -2053,7 +2054,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
goto err_sock;
}
- ret = kernel_listen(port->sock, 128);
+ ret = kernel_listen(port->sock, NVMET_TCP_BACKLOG);
if (ret) {
pr_err("failed to listen %d on port sock\n", ret);
goto err_sock;
@@ -2119,8 +2120,19 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq *sq)
container_of(sq, struct nvmet_tcp_queue, nvme_sq);
if (sq->qid == 0) {
- /* Let inflight controller teardown complete */
- flush_workqueue(nvmet_wq);
+ struct nvmet_tcp_queue *q;
+ int pending = 0;
+
+ /* Check for pending controller teardown */
+ mutex_lock(&nvmet_tcp_queue_mutex);
+ list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
+ if (q->nvme_sq.ctrl == sq->ctrl &&
+ q->state == NVMET_TCP_Q_DISCONNECTING)
+ pending++;
+ }
+ mutex_unlock(&nvmet_tcp_queue_mutex);
+ if (pending > NVMET_TCP_BACKLOG)
+ return NVME_SC_CONNECT_CTRL_BUSY;
}
queue->nr_cmds = sq->size * 2;
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 2/2] nvmet-rdma: avoid circular locking dependency on install_queue()
2023-12-08 12:53 [PATCHv3 0/2] nvmet: avoid circular locking warning hare
2023-12-08 12:53 ` [PATCH 1/2] nvmet-tcp: avoid circular locking dependency on install_queue() hare
@ 2023-12-08 12:53 ` hare
2023-12-11 14:10 ` Sagi Grimberg
[not found] ` <3eefcbfd-a93f-4c1c-957a-66058e5cdb54@nvidia.com>
2023-12-11 12:44 ` [PATCHv3 0/2] nvmet: avoid circular locking warning Shinichiro Kawasaki
2024-01-10 4:12 ` Shinichiro Kawasaki
3 siblings, 2 replies; 9+ messages in thread
From: hare @ 2023-12-08 12:53 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Hannes Reinecke,
Shin'ichiro Kawasaki
From: Hannes Reinecke <hare@suse.de>
nvmet_rdma_install_queue() is driven from the ->io_work workqueue
function, but will call flush_workqueue() which might trigger
->release_work() which in itself calls flush_work on ->io_work.
To avoid that check for pending queue in disconnecting status,
and return 'controller busy' when we reached a certain threshold.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/nvme/target/rdma.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 4597bca43a6d..667f9c04f35d 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -37,6 +37,8 @@
#define NVMET_RDMA_MAX_MDTS 8
#define NVMET_RDMA_MAX_METADATA_MDTS 5
+#define NVMET_RDMA_BACKLOG 128
+
struct nvmet_rdma_srq;
struct nvmet_rdma_cmd {
@@ -1583,8 +1585,19 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
}
if (queue->host_qid == 0) {
- /* Let inflight controller teardown complete */
- flush_workqueue(nvmet_wq);
+ struct nvmet_rdma_queue *q;
+ int pending = 0;
+
+ /* Check for pending controller teardown */
+ mutex_lock(&nvmet_rdma_queue_mutex);
+ list_for_each_entry(q, &nvmet_rdma_queue_list, queue_list) {
+ if (q->nvme_sq.ctrl == queue->nvme_sq.ctrl &&
+ q->state == NVMET_RDMA_Q_DISCONNECTING)
+ pending++;
+ }
+ mutex_unlock(&nvmet_rdma_queue_mutex);
+ if (pending > NVMET_RDMA_BACKLOG)
+ return NVME_SC_CONNECT_CTRL_BUSY;
}
ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
@@ -1880,7 +1893,7 @@ static int nvmet_rdma_enable_port(struct nvmet_rdma_port *port)
goto out_destroy_id;
}
- ret = rdma_listen(cm_id, 128);
+ ret = rdma_listen(cm_id, NVMET_RDMA_BACKLOG);
if (ret) {
pr_err("listening to %pISpcs failed (%d)\n", addr, ret);
goto out_destroy_id;
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 2/2] nvmet-rdma: avoid circular locking dependency on install_queue()
2023-12-08 12:53 ` [PATCH 2/2] nvmet-rdma: " hare
@ 2023-12-11 14:10 ` Sagi Grimberg
[not found] ` <3eefcbfd-a93f-4c1c-957a-66058e5cdb54@nvidia.com>
1 sibling, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2023-12-11 14:10 UTC (permalink / raw)
To: hare
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Hannes Reinecke,
Shin'ichiro Kawasaki
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <3eefcbfd-a93f-4c1c-957a-66058e5cdb54@nvidia.com>]
* Re: [PATCH 2/2] nvmet-rdma: avoid circular locking dependency on install_queue()
[not found] ` <3eefcbfd-a93f-4c1c-957a-66058e5cdb54@nvidia.com>
@ 2024-06-03 6:43 ` Sagi Grimberg
0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2024-06-03 6:43 UTC (permalink / raw)
To: Max Gurtovoy, hare
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Hannes Reinecke,
Shin'ichiro Kawasaki
On 02/06/2024 16:09, Max Gurtovoy wrote:
> Hi Hannes/Sagi,
>
> sorry for the late response on this one.
>
> On 08/12/2023 14:53, hare@kernel.org wrote:
>> From: Hannes Reinecke<hare@suse.de>
>>
>> nvmet_rdma_install_queue() is driven from the ->io_work workqueue
>
> nvmet_rdma_install_queue callback is not implemented in the driver.
> And RDMA doesn't use io_work workqueue..
>
>
>> function, but will call flush_workqueue() which might trigger
>> ->release_work() which in itself calls flush_work on ->io_work.
>>
>> To avoid that check for pending queue in disconnecting status,
>> and return 'controller busy' when we reached a certain threshold.
>>
>> Signed-off-by: Hannes Reinecke<hare@suse.de>
>> Tested-by: Shin'ichiro Kawasaki<shinichiro.kawasaki@wdc.com>
>> ---
>> drivers/nvme/target/rdma.c | 19 ++++++++++++++++---
>> 1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 4597bca43a6d..667f9c04f35d 100644
>> --- a/drivers/nvme/target/rdma.c
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -37,6 +37,8 @@
>> #define NVMET_RDMA_MAX_MDTS 8
>> #define NVMET_RDMA_MAX_METADATA_MDTS 5
>> +#define NVMET_RDMA_BACKLOG 128
>> +
>> struct nvmet_rdma_srq;
>> struct nvmet_rdma_cmd {
>> @@ -1583,8 +1585,19 @@ static int nvmet_rdma_queue_connect(struct
>> rdma_cm_id *cm_id,
>> }
>> if (queue->host_qid == 0) {
>> - /* Let inflight controller teardown complete */
>> - flush_workqueue(nvmet_wq);
>> + struct nvmet_rdma_queue *q;
>> + int pending = 0;
>> +
>> + /* Check for pending controller teardown */
>> + mutex_lock(&nvmet_rdma_queue_mutex);
>> + list_for_each_entry(q, &nvmet_rdma_queue_list, queue_list) {
>> + if (q->nvme_sq.ctrl == queue->nvme_sq.ctrl &&
>
> nvme_sq->ctrl pointer is set upon FABRICS_CONNECT during
> nvmet_install_queue().
>
> obviously this check is not relevant since queue->nvme_sq.ctrl is
> always NULL, isn't it ?
I think that the goal was to look at all the queues that were scheduled
for teardown (queue->release_work)
so sq->ctrl should be valid for those queues afaict.
>
> So, I wonder what does this check is doing ?
>
> Was the intention to check that port->cm_id (listener) is not
> overloaded ?
>
> Also the return value is "FABRICS" return code and not the value that
> rdma_cm is looking for.
>
> We probably should introduce nvmet_rdma_install_queue() and check the
> status of pending disconnections associated with port->cm_it which is
> the one with the configured backlog.
>
> I don't understand the solution also for the nvmet_tcp.
>
> I think this series only removed the "flush_workqueue(nvmet_wq);"
> actually.
>
> Is there something I miss in this commit ?
IIRC the goal was to get rid of the circular locking lockdep complaint.
nvmet needs a back-pressure
mechanism for incoming connections because queue teardown is async and
nvmet may hit OOM in
connect-disconnect storm scenario. The goal here was to detect such
scenario, by counting the number
of queues that are queued for teardown, and if they exceed a certain
threshold, return a BUSY status
to the incoming fabrics connect.
Before, nvmet flushed the pending queue teardowns, but that triggered a
lockdep complaint and
hannes attempted to address it.
>
>> + q->state == NVMET_RDMA_Q_DISCONNECTING)
>> + pending++;
>> + }
>> + mutex_unlock(&nvmet_rdma_queue_mutex);
>> + if (pending > NVMET_RDMA_BACKLOG)
>> + return NVME_SC_CONNECT_CTRL_BUSY;
>> }
>> ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>> @@ -1880,7 +1893,7 @@ static int nvmet_rdma_enable_port(struct
>> nvmet_rdma_port *port)
>> goto out_destroy_id;
>> }
>> - ret = rdma_listen(cm_id, 128);
>> + ret = rdma_listen(cm_id, NVMET_RDMA_BACKLOG);
>> if (ret) {
>> pr_err("listening to %pISpcs failed (%d)\n", addr, ret);
>> goto out_destroy_id;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv3 0/2] nvmet: avoid circular locking warning
2023-12-08 12:53 [PATCHv3 0/2] nvmet: avoid circular locking warning hare
2023-12-08 12:53 ` [PATCH 1/2] nvmet-tcp: avoid circular locking dependency on install_queue() hare
2023-12-08 12:53 ` [PATCH 2/2] nvmet-rdma: " hare
@ 2023-12-11 12:44 ` Shinichiro Kawasaki
2024-01-10 4:12 ` Shinichiro Kawasaki
3 siblings, 0 replies; 9+ messages in thread
From: Shinichiro Kawasaki @ 2023-12-11 12:44 UTC (permalink / raw)
To: hare@kernel.org
Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch,
linux-nvme@lists.infradead.org, Hannes Reinecke
On Dec 08, 2023 / 13:53, hare@kernel.org wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> nvmet-rdma and nvmet-tcp trigger a circular locking warning when
> tearing down; reason is a call to 'flush_workqueue' when creating
> a new controller which tries to cover for the fact that old controller
> instances might be in the process of tearing down.
> However, this is pure speculation as we don't know (and don't check)
> if there really _are_ controllers in shutdown.
> And even if there were, that should be short-lived, and would have been
> resolved by connecting just a tad later.
> So this patch returns 'controller busy' if we really find ourselves in this
> situation, allowing the caller to reconnect later.
Thanks Hannes. I reconfirmed the patches avoid the circular lockdep WARN with
rdma and tcp transports using v6.7-rc5 kernel. And I observed no regression in
my blktests runs. Looks good.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv3 0/2] nvmet: avoid circular locking warning
2023-12-08 12:53 [PATCHv3 0/2] nvmet: avoid circular locking warning hare
` (2 preceding siblings ...)
2023-12-11 12:44 ` [PATCHv3 0/2] nvmet: avoid circular locking warning Shinichiro Kawasaki
@ 2024-01-10 4:12 ` Shinichiro Kawasaki
2024-01-10 21:38 ` Keith Busch
3 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2024-01-10 4:12 UTC (permalink / raw)
To: Keith Busch
Cc: Sagi Grimberg, Christoph Hellwig, linux-nvme@lists.infradead.org,
Hannes Reinecke, hare@kernel.org
On Dec 08, 2023 / 13:53, hare@kernel.org wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> nvmet-rdma and nvmet-tcp trigger a circular locking warning when
> tearing down; reason is a call to 'flush_workqueue' when creating
> a new controller which tries to cover for the fact that old controller
> instances might be in the process of tearing down.
> However, this is pure speculation as we don't know (and don't check)
> if there really _are_ controllers in shutdown.
> And even if there were, that should be short-lived, and would have been
> resolved by connecting just a tad later.
> So this patch returns 'controller busy' if we really find ourselves in this
> situation, allowing the caller to reconnect later.
Keith,
Could you consider to upstream these patches for kernel v6.8-rcX? They fix
lockdep WARNs observed with blktests nvme test groups and rdma/tcp transport.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCHv3 0/2] nvmet: avoid circular locking warning
2024-01-10 4:12 ` Shinichiro Kawasaki
@ 2024-01-10 21:38 ` Keith Busch
0 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2024-01-10 21:38 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: Sagi Grimberg, Christoph Hellwig, linux-nvme@lists.infradead.org,
Hannes Reinecke, hare@kernel.org
On Wed, Jan 10, 2024 at 04:12:16AM +0000, Shinichiro Kawasaki wrote:
> On Dec 08, 2023 / 13:53, hare@kernel.org wrote:
> > From: Hannes Reinecke <hare@suse.de>
> >
> > nvmet-rdma and nvmet-tcp trigger a circular locking warning when
> > tearing down; reason is a call to 'flush_workqueue' when creating
> > a new controller which tries to cover for the fact that old controller
> > instances might be in the process of tearing down.
> > However, this is pure speculation as we don't know (and don't check)
> > if there really _are_ controllers in shutdown.
> > And even if there were, that should be short-lived, and would have been
> > resolved by connecting just a tad later.
> > So this patch returns 'controller busy' if we really find ourselves in this
> > situation, allowing the caller to reconnect later.
>
> Keith,
>
> Could you consider to upstream these patches for kernel v6.8-rcX? They fix
> lockdep WARNs observed with blktests nvme test groups and rdma/tcp transport.
Definitely, I knew I was forgetting something. Patches applied now for
6.8 and our 2nd merge window pull request will be issued today.
^ permalink raw reply [flat|nested] 9+ messages in thread