* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
@ 2018-10-25 15:19 Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)
Hi Christoph,
This patch series addresses a lockdep complaint that has been
discussed recently on the NVMe mailing list. Please consider this
patch series for the upstream kernel.
Thanks,
Bart.
Bart Van Assche (2):
nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a
lockdep complaint
nvmet-rdma: Use the system workqueues again for deletion work
drivers/nvme/target/rdma.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
--
2.19.1.568.g152ad8e336-goog
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
2018-10-25 19:43 ` Sagi Grimberg
2018-10-25 21:05 ` Johannes Berg
2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2 siblings, 2 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)
Call flush_work() instead of flush_workqueue(). That is sufficient to
suppress the following lockdep complaint (see also commit 87915adc3f0a
("workqueue: re-add lockdep dependencies for flushing")):
======================================================
WARNING: possible circular locking dependency detected
4.19.0-dbg+ #1 Not tainted
------------------------------------------------------
kworker/u12:0/7 is trying to acquire lock:
00000000c03a91d1 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]
but task is already holding lock:
(work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 ((work_completion)(&queue->release_work)){+.+.}:
process_one_work+0x447/0x9f0
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
-> #2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}:
flush_workqueue+0xf3/0x970
nvmet_rdma_cm_handler+0x1320/0x170f [nvmet_rdma]
cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
cm_process_work+0x2e/0x110 [ib_cm]
cm_work_handler+0x431e/0x50ba [ib_cm]
process_one_work+0x481/0x9f0
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
-> #1 (&id_priv->handler_mutex/1){+.+.}:
__mutex_lock+0xfe/0xbe0
mutex_lock_nested+0x1b/0x20
cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
cm_process_work+0x2e/0x110 [ib_cm]
cm_work_handler+0x431e/0x50ba [ib_cm]
process_one_work+0x481/0x9f0
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
-> #0 (&id_priv->handler_mutex){+.+.}:
lock_acquire+0xc5/0x200
__mutex_lock+0xfe/0xbe0
mutex_lock_nested+0x1b/0x20
rdma_destroy_id+0x6f/0x440 [rdma_cm]
nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
process_one_work+0x481/0x9f0
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
&id_priv->handler_mutex --> (wq_completion)"nvmet-rdma-delete-wq" --> (work_completion)(&queue->release_work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&queue->release_work));
lock((wq_completion)"nvmet-rdma-delete-wq");
lock((work_completion)(&queue->release_work));
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
2 locks held by kworker/u12:0/7:
#0: 00000000272134f2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}, at: process_one_work+0x3c9/0x9f0
#1: 0000000090531fcd ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0
stack backtrace:
CPU: 1 PID: 7 Comm: kworker/u12:0 Not tainted 4.19.0-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: nvmet-rdma-delete-wq nvmet_rdma_release_queue_work [nvmet_rdma]
Call Trace:
dump_stack+0x86/0xc5
print_circular_bug.isra.32+0x20a/0x218
__lock_acquire+0x1c68/0x1cf0
lock_acquire+0xc5/0x200
__mutex_lock+0xfe/0xbe0
mutex_lock_nested+0x1b/0x20
rdma_destroy_id+0x6f/0x440 [rdma_cm]
nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
process_one_work+0x481/0x9f0
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
drivers/nvme/target/rdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index bd265aceb90c..b3ee64fd66c5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
if (queue->host_qid == 0) {
/* Let inflight controller teardown complete */
- flush_workqueue(nvmet_rdma_delete_wq);
+ flush_work(&queue->release_work);
}
ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
--
2.19.1.568.g152ad8e336-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)
Minimize system resource usage by switching back to the system
workqueues for deletion work. This patch reverts commit 2acf70ade79d
("nvmet-rdma: use a private workqueue for delete").
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
drivers/nvme/target/rdma.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b3ee64fd66c5..65c95cb06218 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -122,7 +122,6 @@ struct nvmet_rdma_device {
int inline_page_count;
};
-static struct workqueue_struct *nvmet_rdma_delete_wq;
static bool nvmet_rdma_use_srq;
module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444);
MODULE_PARM_DESC(use_srq, "Use shared receive queue.");
@@ -1273,7 +1272,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
if (ret) {
- queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+ schedule_work(&queue->release_work);
/* Destroying rdma_cm id is not needed here */
return 0;
}
@@ -1338,7 +1337,7 @@ static void __nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue)
if (disconnect) {
rdma_disconnect(queue->cm_id);
- queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+ schedule_work(&queue->release_work);
}
}
@@ -1368,7 +1367,7 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id,
mutex_unlock(&nvmet_rdma_queue_mutex);
pr_err("failed to connect queue %d\n", queue->idx);
- queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+ schedule_work(&queue->release_work);
}
/**
@@ -1650,17 +1649,8 @@ static int __init nvmet_rdma_init(void)
if (ret)
goto err_ib_client;
- nvmet_rdma_delete_wq = alloc_workqueue("nvmet-rdma-delete-wq",
- WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
- if (!nvmet_rdma_delete_wq) {
- ret = -ENOMEM;
- goto err_unreg_transport;
- }
-
return 0;
-err_unreg_transport:
- nvmet_unregister_transport(&nvmet_rdma_ops);
err_ib_client:
ib_unregister_client(&nvmet_rdma_ib_client);
return ret;
@@ -1668,7 +1658,6 @@ static int __init nvmet_rdma_init(void)
static void __exit nvmet_rdma_exit(void)
{
- destroy_workqueue(nvmet_rdma_delete_wq);
nvmet_unregister_transport(&nvmet_rdma_ops);
ib_unregister_client(&nvmet_rdma_ib_client);
WARN_ON_ONCE(!list_empty(&nvmet_rdma_queue_list));
--
2.19.1.568.g152ad8e336-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
@ 2018-10-25 16:08 ` Bart Van Assche
2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 16:08 UTC (permalink / raw)
On Thu, 2018-10-25@08:19 -0700, Bart Van Assche wrote:
> This patch series addresses a lockdep complaint that has been
> discussed recently on the NVMe mailing list. Please consider this
> patch series for the upstream kernel.
Please ignore this patch series - the real reason the lockdep complaint
did not appear anymore was an incorrect patch for the kernel workqueue
implementation.
Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 19:43 ` Sagi Grimberg
2018-10-25 21:05 ` Johannes Berg
1 sibling, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2018-10-25 19:43 UTC (permalink / raw)
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index bd265aceb90c..b3ee64fd66c5 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
>
> if (queue->host_qid == 0) {
> /* Let inflight controller teardown complete */
> - flush_workqueue(nvmet_rdma_delete_wq);
> + flush_work(&queue->release_work);
That is the opposite of what is attempted to be achieved here. The queue
connect is trying to wait for other queue deletion to complete.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
2018-10-25 19:43 ` Sagi Grimberg
@ 2018-10-25 21:05 ` Johannes Berg
2018-10-25 22:20 ` Bart Van Assche
1 sibling, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2018-10-25 21:05 UTC (permalink / raw)
Hi,
Yes, you said to ignore this, but I'm looking at the lockdep report
again anyway ...
So ... if I understand this correctly, we have:
cma's id_priv->handler_mutex - a listen_id's mutex
cma's id_priv->handler_mutex/1 - a conn_id's mutex
queue - an RDMA queue, belonging to ...?
nvmet-rdma-delete-wq - the global deletion workqueue
In cma_ib_req_handler(), we create a dependency from
handler_mutex -> handler_mutex/1
since we lock both of them (here's the "real _nested()" part).
While holding those, we call the event handler, which is
nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
workqueue, creating that dependency.
Obviously, queue->release_work has a dependency to the workqueue since
it runs on there.
Finally, the queue->release_work calls rdma_destroy_id() which locks the
destroyed id_priv's mutex.
Does that about seem right?
I'm not sure I see a solution right now. If you were to replace the
workqueue by a list of objects to be destroyed and then iterate it where
you have the flush_workqueue() now (*), you'd end up with a very similar
lockdep report I think, so it's not inherent in the lockdep workqueue
annotations, it's inherent in locking some object(s) while holding the
same class of lock for another object. This would be a pretty typical
case of mutex_lock_nested(), but it's hard to do across the layers here,
and pretty much impossible to do across the workqueue.
I'd usually solve that sort of problem by flushing/iterating before
taking all the locks, since it's a different object anyway, but given
the two layers here that doesn't seem feasible either.
(*) of course in addition to iterating it in some thread, or work
struct, or something else asynchronous
I think again you could suppress this by using flush_workqueue_nested()
here, basically saying that while it's the same workqueue, it cannot
actually connect back to itself locking-wise due to being guaranteed to
have different objects on it... I think, but it's getting late here.
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
2018-10-25 21:05 ` Johannes Berg
@ 2018-10-25 22:20 ` Bart Van Assche
0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 22:20 UTC (permalink / raw)
On Thu, 2018-10-25@23:05 +0200, Johannes Berg wrote:
> Yes, you said to ignore this, but I'm looking at the lockdep report
> again anyway ...
>
> So ... if I understand this correctly, we have:
>
> cma's id_priv->handler_mutex - a listen_id's mutex
> cma's id_priv->handler_mutex/1 - a conn_id's mutex
> queue - an RDMA queue, belonging to ...?
> nvmet-rdma-delete-wq - the global deletion workqueue
>
> In cma_ib_req_handler(), we create a dependency from
> handler_mutex -> handler_mutex/1
> since we lock both of them (here's the "real _nested()" part).
>
> While holding those, we call the event handler, which is
> nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
> workqueue, creating that dependency.
>
> Obviously, queue->release_work has a dependency to the workqueue since
> it runs on there.
>
> Finally, the queue->release_work calls rdma_destroy_id() which locks the
> destroyed id_priv's mutex.
>
> Does that about seem right?
Hi Johannes,
I don't think it's that complicated. I think only the conn_id handler_mutex
and the workqueue lockdep annotations are involved in the lockdep complaint.
In drivers/infiniband/core/cma.c one can see that a new CM ID (conn_id) is
created before the RDMA/CM handler callback occurs for event type
RDMA_CM_EVENT_CONNECT_REQUEST. Otherwise I agree that a flush_workqueue()
call occurs with the handler_mutex held and that the same handler_mutex is
obtained while executing work queued on the same workqueue that gets flushed.
> I'm not sure I see a solution right now. If you were to replace the
> workqueue by a list of objects to be destroyed and then iterate it where
> you have the flush_workqueue() now (*), you'd end up with a very similar
> lockdep report I think, so it's not inherent in the lockdep workqueue
> annotations, it's inherent in locking some object(s) while holding the
> same class of lock for another object. This would be a pretty typical
> case of mutex_lock_nested(), but it's hard to do across the layers here,
> and pretty much impossible to do across the workqueue.
>
> I'd usually solve that sort of problem by flushing/iterating before
> taking all the locks, since it's a different object anyway, but given
> the two layers here that doesn't seem feasible either.
>
> (*) of course in addition to iterating it in some thread, or work
> struct, or something else asynchronous
>
> I think again you could suppress this by using flush_workqueue_nested()
> here, basically saying that while it's the same workqueue, it cannot
> actually connect back to itself locking-wise due to being guaranteed to
> have different objects on it... I think, but it's getting late here.
Since I'm not sure accepting every connect request in case of a DoS attack
is the best approach, my own preference is to address this by refusing new
connections if too many CM IDs are being cleaned up. So I changed the
approach of this patch series. In case you would like to have a look at v3
of this patch series, it is available at
http://lists.infradead.org/pipermail/linux-nvme/2018-October/020553.html.
Best regards,
Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-10-25 22:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
2018-10-25 19:43 ` Sagi Grimberg
2018-10-25 21:05 ` Johannes Berg
2018-10-25 22:20 ` Bart Van Assche
2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox