[PATCH RFC] nvmet-rdma: use a private workqueue for delete

All of lore.kernel.org
 help / color / mirror / Atom feed

From: bvanassche@acm.org (Bart Van Assche)
Subject: [PATCH RFC] nvmet-rdma: use a private workqueue for delete
Date: Fri, 19 Oct 2018 09:23:32 -0700	[thread overview]
Message-ID: <1539966212.81977.53.camel@acm.org> (raw)
In-Reply-To: <9716592b-6175-600d-c1a1-593cd3145b39@grimberg.me>

On Thu, 2018-10-18@18:08 -0700, Sagi Grimberg wrote:
+AD4 +AD4 It seems like this has not yet been fixed entirely. This is what appeared
+AD4 +AD4 in the kernel log this morning on my test setup with Christoph's nvme-4.20
+AD4 +AD4 branch (commit cb4bfda62afa (+ACI-nvme-pci: fix hot removal during error
+AD4 +AD4 handling+ACI)):
+AD4 
+AD4 There is something I'm missing here, the id+AF8-priv-+AD4-handler+AF8-mutex that the
+AD4 connect context is running on is guaranteed to be different than the
+AD4 one being removed (a different cm+AF8-id) and also the workqueues are
+AD4 different.
+AD4 
+AD4 Is it not allowed to flush workqueue A from a work that is hosted on
+AD4 workqueue B?

Hi Tejun and Johannes,

It seems like we ran into a lockdep complaint triggered by a recently queued
patch (87915adc3f0a (+ACI-workqueue: re-add lockdep dependencies for flushing+ACI)).
However, it's not clear to us whether anything is wrong with the code the
complaint refers to. Can any of you have a look? I have attached the lockep
complaint to this e-mail.

Thanks,

Bart.
-------------- next part --------------
======================================================
WARNING: possible circular locking dependency detected
4.19.0-rc6-dbg+ #1 Not tainted
------------------------------------------------------
kworker/u16:7/169 is trying to acquire lock:
00000000578ccf82 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]

but task is already holding lock:
000000005d67271b ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 ((work_completion)(&queue->release_work)){+.+.}:
       process_one_work+0x474/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}:
       flush_workqueue+0xf3/0x970
       nvmet_rdma_cm_handler+0x1319/0x170f [nvmet_rdma]
       cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #1 (&id_priv->handler_mutex/1){+.+.}:
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #0 (&id_priv->handler_mutex){+.+.}:
       lock_acquire+0xd2/0x210
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       rdma_destroy_id+0x6f/0x440 [rdma_cm]
       nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

other info that might help us debug this:

Chain exists of:
  &id_priv->handler_mutex --> (wq_completion)"nvmet-rdma-delete-wq" --> (work_completion)(&queue->release_work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&queue->release_work));
                               lock((wq_completion)"nvmet-rdma-delete-wq");
                               lock((work_completion)(&queue->release_work));
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

2 locks held by kworker/u16:7/169:
 #0: 00000000a32d4be9 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}, at: process_one_work+0x3ed/0xa20
 #1: 000000005d67271b ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20

stack backtrace:
CPU: 1 PID: 169 Comm: kworker/u16:7 Not tainted 4.19.0-rc6-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: nvmet-rdma-delete-wq nvmet_rdma_release_queue_work [nvmet_rdma]
Call Trace:
 dump_stack+0xa4/0xf5
 print_circular_bug.isra.32+0x20a/0x218
 __lock_acquire+0x1a5e/0x1b20
 lock_acquire+0xd2/0x210
 __mutex_lock+0xfe/0xbe0
 mutex_lock_nested+0x1b/0x20
 rdma_destroy_id+0x6f/0x440 [rdma_cm]
 nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
 process_one_work+0x4ae/0xa20
 worker_thread+0x63/0x5a0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

next prev parent reply	other threads:[~2018-10-19 16:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 18:00 [PATCH RFC] nvmet-rdma: use a private workqueue for delete Sagi Grimberg
2018-09-28 22:14 ` Bart Van Assche
2018-10-01 20:12   ` Sagi Grimberg
2018-10-02 15:02     ` Bart Van Assche
2018-10-05  7:25 ` Christoph Hellwig
     [not found] ` <CAO+b5-oBVw=-wvnWk1EF=RBaZtjX6bjUG+3WABXbvzX9UTu26w@mail.gmail.com>
2018-10-19  1:08   ` Sagi Grimberg
2018-10-19 16:23     ` Bart Van Assche [this message]
2018-10-22  8:56       ` Johannes Berg
2018-10-22 21:17         ` Bart Van Assche
2018-10-23 19:18           ` Johannes Berg
2018-10-23 19:54             ` Bart Van Assche
2018-10-23 19:59               ` Johannes Berg
2018-10-23 20:00                 ` Johannes Berg
2018-10-23  0:40         ` Sagi Grimberg
2018-10-23 19:22           ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1539966212.81977.53.camel@acm.org \
    --to=bvanassche@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.