* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
@ 2018-10-25 15:19 Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)
Hi Christoph,
This patch series addresses a lockdep complaint that has been
discussed recently on the NVMe mailing list. Please consider this
patch series for the upstream kernel.
Thanks,
Bart.
Bart Van Assche (2):
nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a
lockdep complaint
nvmet-rdma: Use the system workqueues again for deletion work
drivers/nvme/target/rdma.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
--
2.19.1.568.g152ad8e336-goog
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint 2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche @ 2018-10-25 15:19 ` Bart Van Assche 2018-10-25 19:43 ` Sagi Grimberg 2018-10-25 21:05 ` Johannes Berg 2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche 2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche 2 siblings, 2 replies; 7+ messages in thread From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw) Call flush_work() instead of flush_workqueue(). That is sufficient to suppress the following lockdep complaint (see also commit 87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing")): ====================================================== WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted ------------------------------------------------------ kworker/u12:0/7 is trying to acquire lock: 00000000c03a91d1 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm] but task is already holding lock: (work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&queue->release_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}: flush_workqueue+0xf3/0x970 nvmet_rdma_cm_handler+0x1320/0x170f [nvmet_rdma] cma_ib_req_handler+0x72f/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_work_handler+0x431e/0x50ba [ib_cm] process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 (&id_priv->handler_mutex/1){+.+.}: __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 cma_ib_req_handler+0x6aa/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_work_handler+0x431e/0x50ba [ib_cm] process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 (&id_priv->handler_mutex){+.+.}: lock_acquire+0xc5/0x200 __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 rdma_destroy_id+0x6f/0x440 [rdma_cm] nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma] process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 other info that might help us debug this: Chain exists of: &id_priv->handler_mutex --> (wq_completion)"nvmet-rdma-delete-wq" --> (work_completion)(&queue->release_work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&queue->release_work)); lock((wq_completion)"nvmet-rdma-delete-wq"); lock((work_completion)(&queue->release_work)); lock(&id_priv->handler_mutex); *** DEADLOCK *** 2 locks held by kworker/u12:0/7: #0: 00000000272134f2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}, at: process_one_work+0x3c9/0x9f0 #1: 0000000090531fcd ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0 stack backtrace: CPU: 1 PID: 7 Comm: kworker/u12:0 Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: nvmet-rdma-delete-wq nvmet_rdma_release_queue_work [nvmet_rdma] Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 rdma_destroy_id+0x6f/0x440 [rdma_cm] nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma] process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sagi Grimberg <sagi at grimberg.me> Cc: Johannes Berg <johannes.berg at intel.com> Signed-off-by: Bart Van Assche <bvanassche at acm.org> --- drivers/nvme/target/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index bd265aceb90c..b3ee64fd66c5 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, if (queue->host_qid == 0) { /* Let inflight controller teardown complete */ - flush_workqueue(nvmet_rdma_delete_wq); + flush_work(&queue->release_work); } ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); -- 2.19.1.568.g152ad8e336-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint 2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche @ 2018-10-25 19:43 ` Sagi Grimberg 2018-10-25 21:05 ` Johannes Berg 1 sibling, 0 replies; 7+ messages in thread From: Sagi Grimberg @ 2018-10-25 19:43 UTC (permalink / raw) > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index bd265aceb90c..b3ee64fd66c5 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, > > if (queue->host_qid == 0) { > /* Let inflight controller teardown complete */ > - flush_workqueue(nvmet_rdma_delete_wq); > + flush_work(&queue->release_work); That is the opposite of what is attempted to be achieved here. The queue connect is trying to wait for other queue deletion to complete. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint 2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche 2018-10-25 19:43 ` Sagi Grimberg @ 2018-10-25 21:05 ` Johannes Berg 2018-10-25 22:20 ` Bart Van Assche 1 sibling, 1 reply; 7+ messages in thread From: Johannes Berg @ 2018-10-25 21:05 UTC (permalink / raw) Hi, Yes, you said to ignore this, but I'm looking at the lockdep report again anyway ... So ... if I understand this correctly, we have: cma's id_priv->handler_mutex - a listen_id's mutex cma's id_priv->handler_mutex/1 - a conn_id's mutex queue - an RDMA queue, belonging to ...? nvmet-rdma-delete-wq - the global deletion workqueue In cma_ib_req_handler(), we create a dependency from handler_mutex -> handler_mutex/1 since we lock both of them (here's the "real _nested()" part). While holding those, we call the event handler, which is nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the workqueue, creating that dependency. Obviously, queue->release_work has a dependency to the workqueue since it runs on there. Finally, the queue->release_work calls rdma_destroy_id() which locks the destroyed id_priv's mutex. Does that about seem right? I'm not sure I see a solution right now. If you were to replace the workqueue by a list of objects to be destroyed and then iterate it where you have the flush_workqueue() now (*), you'd end up with a very similar lockdep report I think, so it's not inherent in the lockdep workqueue annotations, it's inherent in locking some object(s) while holding the same class of lock for another object. This would be a pretty typical case of mutex_lock_nested(), but it's hard to do across the layers here, and pretty much impossible to do across the workqueue. I'd usually solve that sort of problem by flushing/iterating before taking all the locks, since it's a different object anyway, but given the two layers here that doesn't seem feasible either. (*) of course in addition to iterating it in some thread, or work struct, or something else asynchronous I think again you could suppress this by using flush_workqueue_nested() here, basically saying that while it's the same workqueue, it cannot actually connect back to itself locking-wise due to being guaranteed to have different objects on it... I think, but it's getting late here. johannes ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint 2018-10-25 21:05 ` Johannes Berg @ 2018-10-25 22:20 ` Bart Van Assche 0 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2018-10-25 22:20 UTC (permalink / raw) On Thu, 2018-10-25@23:05 +0200, Johannes Berg wrote: > Yes, you said to ignore this, but I'm looking at the lockdep report > again anyway ... > > So ... if I understand this correctly, we have: > > cma's id_priv->handler_mutex - a listen_id's mutex > cma's id_priv->handler_mutex/1 - a conn_id's mutex > queue - an RDMA queue, belonging to ...? > nvmet-rdma-delete-wq - the global deletion workqueue > > In cma_ib_req_handler(), we create a dependency from > handler_mutex -> handler_mutex/1 > since we lock both of them (here's the "real _nested()" part). > > While holding those, we call the event handler, which is > nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the > workqueue, creating that dependency. > > Obviously, queue->release_work has a dependency to the workqueue since > it runs on there. > > Finally, the queue->release_work calls rdma_destroy_id() which locks the > destroyed id_priv's mutex. > > Does that about seem right? Hi Johannes, I don't think it's that complicated. I think only the conn_id handler_mutex and the workqueue lockdep annotations are involved in the lockdep complaint. In drivers/infiniband/core/cma.c one can see that a new CM ID (conn_id) is created before the RDMA/CM handler callback occurs for event type RDMA_CM_EVENT_CONNECT_REQUEST. Otherwise I agree that a flush_workqueue() call occurs with the handler_mutex held and that the same handler_mutex is obtained while executing work queued on the same workqueue that gets flushed. > I'm not sure I see a solution right now. If you were to replace the > workqueue by a list of objects to be destroyed and then iterate it where > you have the flush_workqueue() now (*), you'd end up with a very similar > lockdep report I think, so it's not inherent in the lockdep workqueue > annotations, it's inherent in locking some object(s) while holding the > same class of lock for another object. This would be a pretty typical > case of mutex_lock_nested(), but it's hard to do across the layers here, > and pretty much impossible to do across the workqueue. > > I'd usually solve that sort of problem by flushing/iterating before > taking all the locks, since it's a different object anyway, but given > the two layers here that doesn't seem feasible either. > > (*) of course in addition to iterating it in some thread, or work > struct, or something else asynchronous > > I think again you could suppress this by using flush_workqueue_nested() > here, basically saying that while it's the same workqueue, it cannot > actually connect back to itself locking-wise due to being guaranteed to > have different objects on it... I think, but it's getting late here. Since I'm not sure accepting every connect request in case of a DoS attack is the best approach, my own preference is to address this by refusing new connections if too many CM IDs are being cleaned up. So I changed the approach of this patch series. In case you would like to have a look at v3 of this patch series, it is available at http://lists.infradead.org/pipermail/linux-nvme/2018-October/020553.html. Best regards, Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work 2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche 2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche @ 2018-10-25 15:19 ` Bart Van Assche 2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche 2 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw) Minimize system resource usage by switching back to the system workqueues for deletion work. This patch reverts commit 2acf70ade79d ("nvmet-rdma: use a private workqueue for delete"). Cc: Sagi Grimberg <sagi at grimberg.me> Cc: Johannes Berg <johannes.berg at intel.com> Signed-off-by: Bart Van Assche <bvanassche at acm.org> --- drivers/nvme/target/rdma.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index b3ee64fd66c5..65c95cb06218 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -122,7 +122,6 @@ struct nvmet_rdma_device { int inline_page_count; }; -static struct workqueue_struct *nvmet_rdma_delete_wq; static bool nvmet_rdma_use_srq; module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444); MODULE_PARM_DESC(use_srq, "Use shared receive queue."); @@ -1273,7 +1272,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); if (ret) { - queue_work(nvmet_rdma_delete_wq, &queue->release_work); + schedule_work(&queue->release_work); /* Destroying rdma_cm id is not needed here */ return 0; } @@ -1338,7 +1337,7 @@ static void __nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue) if (disconnect) { rdma_disconnect(queue->cm_id); - queue_work(nvmet_rdma_delete_wq, &queue->release_work); + schedule_work(&queue->release_work); } } @@ -1368,7 +1367,7 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id, mutex_unlock(&nvmet_rdma_queue_mutex); pr_err("failed to connect queue %d\n", queue->idx); - queue_work(nvmet_rdma_delete_wq, &queue->release_work); + schedule_work(&queue->release_work); } /** @@ -1650,17 +1649,8 @@ static int __init nvmet_rdma_init(void) if (ret) goto err_ib_client; - nvmet_rdma_delete_wq = alloc_workqueue("nvmet-rdma-delete-wq", - WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); - if (!nvmet_rdma_delete_wq) { - ret = -ENOMEM; - goto err_unreg_transport; - } - return 0; -err_unreg_transport: - nvmet_unregister_transport(&nvmet_rdma_ops); err_ib_client: ib_unregister_client(&nvmet_rdma_ib_client); return ret; @@ -1668,7 +1658,6 @@ static int __init nvmet_rdma_init(void) static void __exit nvmet_rdma_exit(void) { - destroy_workqueue(nvmet_rdma_delete_wq); nvmet_unregister_transport(&nvmet_rdma_ops); ib_unregister_client(&nvmet_rdma_ib_client); WARN_ON_ONCE(!list_empty(&nvmet_rdma_queue_list)); -- 2.19.1.568.g152ad8e336-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint 2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche 2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche 2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche @ 2018-10-25 16:08 ` Bart Van Assche 2 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2018-10-25 16:08 UTC (permalink / raw) On Thu, 2018-10-25@08:19 -0700, Bart Van Assche wrote: > This patch series addresses a lockdep complaint that has been > discussed recently on the NVMe mailing list. Please consider this > patch series for the upstream kernel. Please ignore this patch series - the real reason the lockdep complaint did not appear anymore was an incorrect patch for the kernel workqueue implementation. Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-10-25 22:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche 2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche 2018-10-25 19:43 ` Sagi Grimberg 2018-10-25 21:05 ` Johannes Berg 2018-10-25 22:20 ` Bart Van Assche 2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche 2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox