Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
@ 2018-10-25 15:19 Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Hi Christoph,

This patch series addresses a lockdep complaint that has been
discussed recently on the NVMe mailing list. Please consider this
patch series for the upstream kernel.

Thanks,

Bart.

Bart Van Assche (2):
  nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a
    lockdep complaint
  nvmet-rdma: Use the system workqueues again for deletion work

 drivers/nvme/target/rdma.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
  2018-10-25 19:43   ` Sagi Grimberg
  2018-10-25 21:05   ` Johannes Berg
  2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
  2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2 siblings, 2 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Call flush_work() instead of flush_workqueue(). That is sufficient to
suppress the following lockdep complaint (see also commit 87915adc3f0a
("workqueue: re-add lockdep dependencies for flushing")):

======================================================
WARNING: possible circular locking dependency detected
4.19.0-dbg+ #1 Not tainted
------------------------------------------------------
kworker/u12:0/7 is trying to acquire lock:
00000000c03a91d1 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]

but task is already holding lock:
(work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #3 ((work_completion)(&queue->release_work)){+.+.}:
       process_one_work+0x447/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}:
       flush_workqueue+0xf3/0x970
       nvmet_rdma_cm_handler+0x1320/0x170f [nvmet_rdma]
       cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_work_handler+0x431e/0x50ba [ib_cm]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #1 (&id_priv->handler_mutex/1){+.+.}:
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_work_handler+0x431e/0x50ba [ib_cm]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #0 (&id_priv->handler_mutex){+.+.}:
       lock_acquire+0xc5/0x200
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       rdma_destroy_id+0x6f/0x440 [rdma_cm]
       nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
  &id_priv->handler_mutex --> (wq_completion)"nvmet-rdma-delete-wq" --> (work_completion)(&queue->release_work)

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&queue->release_work));
                               lock((wq_completion)"nvmet-rdma-delete-wq");
                               lock((work_completion)(&queue->release_work));
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

2 locks held by kworker/u12:0/7:
 #0: 00000000272134f2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}, at: process_one_work+0x3c9/0x9f0
 #1: 0000000090531fcd ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0

               stack backtrace:
CPU: 1 PID: 7 Comm: kworker/u12:0 Not tainted 4.19.0-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: nvmet-rdma-delete-wq nvmet_rdma_release_queue_work [nvmet_rdma]
Call Trace:
 dump_stack+0x86/0xc5
 print_circular_bug.isra.32+0x20a/0x218
 __lock_acquire+0x1c68/0x1cf0
 lock_acquire+0xc5/0x200
 __mutex_lock+0xfe/0xbe0
 mutex_lock_nested+0x1b/0x20
 rdma_destroy_id+0x6f/0x440 [rdma_cm]
 nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
 process_one_work+0x481/0x9f0
 worker_thread+0x63/0x5a0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
 drivers/nvme/target/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index bd265aceb90c..b3ee64fd66c5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 
 	if (queue->host_qid == 0) {
 		/* Let inflight controller teardown complete */
-		flush_workqueue(nvmet_rdma_delete_wq);
+		flush_work(&queue->release_work);
 	}
 
 	ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
  2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Minimize system resource usage by switching back to the system
workqueues for deletion work. This patch reverts commit 2acf70ade79d
("nvmet-rdma: use a private workqueue for delete").

Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
 drivers/nvme/target/rdma.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b3ee64fd66c5..65c95cb06218 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -122,7 +122,6 @@ struct nvmet_rdma_device {
 	int			inline_page_count;
 };
 
-static struct workqueue_struct *nvmet_rdma_delete_wq;
 static bool nvmet_rdma_use_srq;
 module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444);
 MODULE_PARM_DESC(use_srq, "Use shared receive queue.");
@@ -1273,7 +1272,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 
 	ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
 	if (ret) {
-		queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+		schedule_work(&queue->release_work);
 		/* Destroying rdma_cm id is not needed here */
 		return 0;
 	}
@@ -1338,7 +1337,7 @@ static void __nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue)
 
 	if (disconnect) {
 		rdma_disconnect(queue->cm_id);
-		queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+		schedule_work(&queue->release_work);
 	}
 }
 
@@ -1368,7 +1367,7 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id,
 	mutex_unlock(&nvmet_rdma_queue_mutex);
 
 	pr_err("failed to connect queue %d\n", queue->idx);
-	queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+	schedule_work(&queue->release_work);
 }
 
 /**
@@ -1650,17 +1649,8 @@ static int __init nvmet_rdma_init(void)
 	if (ret)
 		goto err_ib_client;
 
-	nvmet_rdma_delete_wq = alloc_workqueue("nvmet-rdma-delete-wq",
-			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
-	if (!nvmet_rdma_delete_wq) {
-		ret = -ENOMEM;
-		goto err_unreg_transport;
-	}
-
 	return 0;
 
-err_unreg_transport:
-	nvmet_unregister_transport(&nvmet_rdma_ops);
 err_ib_client:
 	ib_unregister_client(&nvmet_rdma_ib_client);
 	return ret;
@@ -1668,7 +1658,6 @@ static int __init nvmet_rdma_init(void)
 
 static void __exit nvmet_rdma_exit(void)
 {
-	destroy_workqueue(nvmet_rdma_delete_wq);
 	nvmet_unregister_transport(&nvmet_rdma_ops);
 	ib_unregister_client(&nvmet_rdma_ib_client);
 	WARN_ON_ONCE(!list_empty(&nvmet_rdma_queue_list));
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
  2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
@ 2018-10-25 16:08 ` Bart Van Assche
  2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 16:08 UTC (permalink / raw)


On Thu, 2018-10-25@08:19 -0700, Bart Van Assche wrote:
> This patch series addresses a lockdep complaint that has been
> discussed recently on the NVMe mailing list. Please consider this
> patch series for the upstream kernel.

Please ignore this patch series - the real reason the lockdep complaint
did not appear anymore was an incorrect patch for the kernel workqueue
implementation.

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 19:43   ` Sagi Grimberg
  2018-10-25 21:05   ` Johannes Berg
  1 sibling, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2018-10-25 19:43 UTC (permalink / raw)



> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index bd265aceb90c..b3ee64fd66c5 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
>   
>   	if (queue->host_qid == 0) {
>   		/* Let inflight controller teardown complete */
> -		flush_workqueue(nvmet_rdma_delete_wq);
> +		flush_work(&queue->release_work);

That is the opposite of what is attempted to be achieved here. The queue
connect is trying to wait for other queue deletion to complete.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
  2018-10-25 19:43   ` Sagi Grimberg
@ 2018-10-25 21:05   ` Johannes Berg
  2018-10-25 22:20     ` Bart Van Assche
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2018-10-25 21:05 UTC (permalink / raw)


Hi,

Yes, you said to ignore this, but I'm looking at the lockdep report
again anyway ...

So ... if I understand this correctly, we have:

cma's id_priv->handler_mutex    - a listen_id's mutex
cma's id_priv->handler_mutex/1  - a conn_id's mutex
queue                           - an RDMA queue, belonging to ...?
nvmet-rdma-delete-wq            - the global deletion workqueue

In cma_ib_req_handler(), we create a dependency from
	handler_mutex -> handler_mutex/1
since we lock both of them (here's the "real _nested()" part).

While holding those, we call the event handler, which is
nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
workqueue, creating that dependency.

Obviously, queue->release_work has a dependency to the workqueue since
it runs on there.

Finally, the queue->release_work calls rdma_destroy_id() which locks the
destroyed id_priv's mutex.

Does that about seem right?

I'm not sure I see a solution right now. If you were to replace the
workqueue by a list of objects to be destroyed and then iterate it where
you have the flush_workqueue() now (*), you'd end up with a very similar
lockdep report I think, so it's not inherent in the lockdep workqueue
annotations, it's inherent in locking some object(s) while holding the
same class of lock for another object. This would be a pretty typical
case of mutex_lock_nested(), but it's hard to do across the layers here,
and pretty much impossible to do across the workqueue.

I'd usually solve that sort of problem by flushing/iterating before
taking all the locks, since it's a different object anyway, but given
the two layers here that doesn't seem feasible either.

(*) of course in addition to iterating it in some thread, or work
struct, or something else asynchronous

I think again you could suppress this by using flush_workqueue_nested()
here, basically saying that while it's the same workqueue, it cannot
actually connect back to itself locking-wise due to being guaranteed to
have different objects on it... I think, but it's getting late here.

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 21:05   ` Johannes Berg
@ 2018-10-25 22:20     ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 22:20 UTC (permalink / raw)


On Thu, 2018-10-25@23:05 +0200, Johannes Berg wrote:
> Yes, you said to ignore this, but I'm looking at the lockdep report
> again anyway ...
> 
> So ... if I understand this correctly, we have:
> 
> cma's id_priv->handler_mutex    - a listen_id's mutex
> cma's id_priv->handler_mutex/1  - a conn_id's mutex
> queue                           - an RDMA queue, belonging to ...?
> nvmet-rdma-delete-wq            - the global deletion workqueue
> 
> In cma_ib_req_handler(), we create a dependency from
> 	handler_mutex -> handler_mutex/1
> since we lock both of them (here's the "real _nested()" part).
> 
> While holding those, we call the event handler, which is
> nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
> workqueue, creating that dependency.
> 
> Obviously, queue->release_work has a dependency to the workqueue since
> it runs on there.
> 
> Finally, the queue->release_work calls rdma_destroy_id() which locks the
> destroyed id_priv's mutex.
> 
> Does that about seem right?

Hi Johannes,

I don't think it's that complicated. I think only the conn_id handler_mutex
and the workqueue lockdep annotations are involved in the lockdep complaint.
In drivers/infiniband/core/cma.c one can see that a new CM ID (conn_id) is
created before the RDMA/CM handler callback occurs for event type
RDMA_CM_EVENT_CONNECT_REQUEST. Otherwise I agree that a flush_workqueue()
call occurs with the handler_mutex held and that the same handler_mutex is
obtained while executing work queued on the same workqueue that gets flushed.

> I'm not sure I see a solution right now. If you were to replace the
> workqueue by a list of objects to be destroyed and then iterate it where
> you have the flush_workqueue() now (*), you'd end up with a very similar
> lockdep report I think, so it's not inherent in the lockdep workqueue
> annotations, it's inherent in locking some object(s) while holding the
> same class of lock for another object. This would be a pretty typical
> case of mutex_lock_nested(), but it's hard to do across the layers here,
> and pretty much impossible to do across the workqueue.
> 
> I'd usually solve that sort of problem by flushing/iterating before
> taking all the locks, since it's a different object anyway, but given
> the two layers here that doesn't seem feasible either.
> 
> (*) of course in addition to iterating it in some thread, or work
> struct, or something else asynchronous
> 
> I think again you could suppress this by using flush_workqueue_nested()
> here, basically saying that while it's the same workqueue, it cannot
> actually connect back to itself locking-wise due to being guaranteed to
> have different objects on it... I think, but it's getting late here.

Since I'm not sure accepting every connect request in case of a DoS attack
is the best approach, my own preference is to address this by refusing new
connections if too many CM IDs are being cleaned up. So I changed the
approach of this patch series. In case you would like to have a look at v3
of this patch series, it is available at
http://lists.infradead.org/pipermail/linux-nvme/2018-October/020553.html.

Best regards,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-25 22:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
2018-10-25 19:43   ` Sagi Grimberg
2018-10-25 21:05   ` Johannes Berg
2018-10-25 22:20     ` Bart Van Assche
2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox