All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Sagi Grimberg <sagi@grimberg.me>, hch@lst.de, kch@nvidia.com
Cc: linux-nvme@lists.infradead.org
Subject: Re: [PATCH RFC] nvmet-tcp: add new workqueue to surpress lockdep warning
Date: Wed, 13 Sep 2023 09:51:56 +0800	[thread overview]
Message-ID: <6de8bea3-dafa-3173-a8ec-6f69707ec237@linux.dev> (raw)
In-Reply-To: <b736e71b-5d4f-f43e-289d-fdcbffc4ce83@grimberg.me>



On 9/12/23 20:24, Sagi Grimberg wrote:
>
>
> On 7/26/23 17:29, Guoqing Jiang wrote:
>> During the test of nvme-tcp, lockdep complains when discover local
>> nvme tcp device.
>>
>> [   87.699136] ======================================================
>> [   87.699137] WARNING: possible circular locking dependency detected
>> [   87.699138] 6.5.0-rc3+ #16 Tainted: G            E
>> [   87.699139] ------------------------------------------------------
>> [   87.699140] kworker/0:4H/1522 is trying to acquire lock:
>> [   87.699141] ffff93c4df45f538 
>> ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: __flush_workqueue+0x99/0x4f0
>> [   87.699147]
>>                 but task is already holding lock:
>> [   87.699148] ffffafb40272fe40 
>> ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: 
>> process_one_work+0x236/0x590
>> [   87.699151]
>>                 which lock already depends on the new lock.
>> [   87.699152]
>>                 the existing dependency chain (in reverse order) is:
>> [   87.699153]
>>                 -> #2 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
>> [   87.699155]        __flush_work+0x7a/0x5c0
>> [   87.699157]        __cancel_work_timer+0x155/0x1e0
>> [   87.699158]        cancel_work_sync+0x10/0x20
>> [   87.699160]        nvmet_tcp_release_queue_work+0xcf/0x490 
>> [nvmet_tcp]
>> [   87.699163]        process_one_work+0x2bd/0x590
>> [   87.699165]        worker_thread+0x52/0x3f0
>> [   87.699166]        kthread+0x109/0x140
>> [   87.699168]        ret_from_fork+0x46/0x70
>> [   87.699170]        ret_from_fork_asm+0x1b/0x30
>> [   87.699172]
>>                 -> #1 
>> ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
>> [   87.699174]        process_one_work+0x28c/0x590
>> [   87.699175]        worker_thread+0x52/0x3f0
>> [   87.699177]        kthread+0x109/0x140
>> [   87.699177]        ret_from_fork+0x46/0x70
>> [   87.699179]        ret_from_fork_asm+0x1b/0x30
>> [   87.699180]
>>                 -> #0 ((wq_completion)nvmet-wq){+.+.}-{0:0}:
>> [   87.699182]        __lock_acquire+0x1523/0x2590
>> [   87.699184]        lock_acquire+0xd6/0x2f0
>> [   87.699185]        __flush_workqueue+0xc5/0x4f0
>> [   87.699187]        nvmet_tcp_install_queue+0x30/0x160 [nvmet_tcp]
>> [   87.699189]        nvmet_install_queue+0xbf/0x200 [nvmet]
>> [   87.699196]        nvmet_execute_admin_connect+0x18b/0x2f0 [nvmet]
>> [   87.699200]        nvmet_tcp_io_work+0x7e3/0x850 [nvmet_tcp]
>> [   87.699203]        process_one_work+0x2bd/0x590
>> [   87.699204]        worker_thread+0x52/0x3f0
>> [   87.699206]        kthread+0x109/0x140
>> [   87.699207]        ret_from_fork+0x46/0x70
>> [   87.699208]        ret_from_fork_asm+0x1b/0x30
>> [   87.699209]
>>                 other info that might help us debug this:
>> [   87.699210] Chain exists of:
>>                   (wq_completion)nvmet-wq --> 
>> (work_completion)(&queue->release_work) --> 
>> (work_completion)(&queue->io_work)
>> [   87.699212]  Possible unsafe locking scenario:
>> [   87.699213]        CPU0                    CPU1
>> [   87.699214]        ----                    ----
>> [   87.699214] lock((work_completion)(&queue->io_work));
>> [   87.699215] lock((work_completion)(&queue->release_work));
>> [   87.699217] lock((work_completion)(&queue->io_work));
>> [   87.699218]   lock((wq_completion)nvmet-wq);
>>                          -> need to hold release_work since 
>> queue_work(nvmet_wq, &queue->release_work)
>> [   87.699219]
>>                  *** DEADLOCK ***
>> [   87.699220] 2 locks held by kworker/0:4H/1522:
>> [   87.699221]  #0: ffff93c4df45f338 
>> ((wq_completion)nvmet_tcp_wq){+.+.}-{0:0}, at: 
>> process_one_work+0x236/0x590
>> [   87.699224]  #1: ffffafb40272fe40 
>> ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: 
>> process_one_work+0x236/0x590
>> [   87.699227]
>>                 stack backtrace:
>> [   87.699229] CPU: 0 PID: 1522 Comm: kworker/0:4H Tainted: 
>> G            E      6.5.0-rc3+ #16
>> [   87.699230] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>> BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
>> [   87.699231] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
>>
>> The above happens because nvmet_tcp_io_work can trigger below path
>>
>>     -> nvmet_tcp_try_recv
>>      -> nvmet_tcp_try_recv_one
>>       -> nvmet_tcp_try_recv_data
>>        -> nvmet_tcp_execute_request
>>         -> cmd->req.execute = nvmet_execute_admin_connect
>>          -> nvmet_install_queue
>>           -> ctrl->ops->install_queue = nvmet_install_queue

The above should be nvmet_tcp_install_queue instead of nvmet_install_queue.

>>            -> nvmet_tcp_install_queue
>>             -> flush_workqueue(nvmet_wq)
>>
>> And release_work (nvmet_tcp_release_queue_work) is also queued in
>> nvmet_wq, which need to flush io_work (nvmet_tcp_io_work) due to
>> cancel_work_sync(&queue->io_work).
>
> I'm not sure I understand the resolution here. io_work does not
> run on nvmet_wq, but on nvmet_tcp_wq. 

Yes, io_work is run on nvmet_tcp_wq, and the work may trigger
flush_workqueue(nvmet_wq)

io_work = nvmet_tcp_io_work
     -> nvmet_tcp_try_recv
      -> nvmet_tcp_try_recv_one
       -> nvmet_tcp_try_recv_data
        -> nvmet_tcp_execute_request
         -> cmd->req.execute = nvmet_execute_admin_connect
          -> nvmet_install_queue
           -> ctrl->ops->install_queue = nvmet_tcp_install_queue
            -> nvmet_tcp_install_queue
             -> flush_workqueue(nvmet_wq)

Also release_work = nvmet_tcp_release_queue_work need to
call cancel_work_sync(&queue->io_work), but release_work is
queued in nvmet_wq. I think this kind of mutual dependency
scenario is complained by lockdep.

> What does separating another workqueue give here?
>
>> We can surpress the lockdep warning by checking if the relevant work
>> is pending. So the simplest might be just add the checking before
>> flush_workqueue(nvmet_wq). However, there are other works are also
>> queued on the same queue, I am not sure if we should flush other
>> works unconditionally, so a new dedicated workqueue is added.

Please see above.

>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>>   drivers/nvme/target/tcp.c | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 868aa4de2e4c..ac611cb299a8 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -189,6 +189,7 @@ static LIST_HEAD(nvmet_tcp_queue_list);
>>   static DEFINE_MUTEX(nvmet_tcp_queue_mutex);
>>     static struct workqueue_struct *nvmet_tcp_wq;
>> +static struct workqueue_struct *nvmet_tcp_release_wq;
>>   static const struct nvmet_fabrics_ops nvmet_tcp_ops;
>>   static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
>>   static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd);
>> @@ -1288,7 +1289,7 @@ static void 
>> nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
>>       spin_lock(&queue->state_lock);
>>       if (queue->state != NVMET_TCP_Q_DISCONNECTING) {
>>           queue->state = NVMET_TCP_Q_DISCONNECTING;
>> -        queue_work(nvmet_wq, &queue->release_work);
>> +        queue_work(nvmet_tcp_release_wq, &queue->release_work);
>>       }
>>       spin_unlock(&queue->state_lock);
>>   }
>> @@ -1847,6 +1848,8 @@ static u16 nvmet_tcp_install_queue(struct 
>> nvmet_sq *sq)
>>       if (sq->qid == 0) {
>>           /* Let inflight controller teardown complete */
>>           flush_workqueue(nvmet_wq);
>> +        if (work_pending(&queue->release_work))
>> +            flush_workqueue(nvmet_tcp_release_wq);
>
> This is effectively just never flushes anything. when we install
> the queue it's own release_work never really runs. So what your
> patch effectively does is just to remove the flush altogether.

IMHO work_pending would check whether there is a pending work
item before flush relevant workqueue.

BTW, Hannes's patch can fix this as well, which might be better.

https://lore.kernel.org/linux-nvme/20230810132006.129365-1-hare@suse.de/

Thanks,
Guoqing


      reply	other threads:[~2023-09-13  1:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 14:29 [PATCH RFC] nvmet-tcp: add new workqueue to surpress lockdep warning Guoqing Jiang
2023-09-07  6:41 ` Yi Zhang
2023-09-07  8:12   ` Guoqing Jiang
2023-09-08  7:15     ` Hannes Reinecke
2023-09-08  8:09       ` Yi Zhang
2023-09-08  8:44         ` Yi Zhang
2023-09-08  8:46           ` Hannes Reinecke
2023-09-08  8:57           ` Hannes Reinecke
2023-09-08  9:00             ` Yi Zhang
2023-09-11  5:54               ` Yi Zhang
2023-09-08  8:08     ` Yi Zhang
2023-09-15  7:40       ` Guoqing Jiang
2023-09-12 12:24 ` Sagi Grimberg
2023-09-13  1:51   ` Guoqing Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6de8bea3-dafa-3173-a8ec-6f69707ec237@linux.dev \
    --to=guoqing.jiang@linux.dev \
    --cc=hch@lst.de \
    --cc=kch@nvidia.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.