From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 40E37EE49B7 for ; Wed, 13 Sep 2023 01:52:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Fjw3LcFiRY4Z8h+X1SWRzrqqbKDHd6gKqKrVLmMIJg0=; b=46SGkn4HtZQR5bGsPfab0nWk1X KHqeV3QnAyZ117sCoHI9x6UvI0ki0zbU0y8PXREcBF1D4ZoxR6elEruVJMNCd9fNnW8M4xZIXcMsH S8JCiAhQyN3EpHHA86BNqQZRTrSmi13wN5UCdEr9dKSTy8BHNpVjygvE+53+fqvUUk8RiDD36qgiC vNKhQ498KAlKWZbZpfMoYHXiULHmz07wVh5wGS7B6+Kqmhg8o0hDDPQZ26VzhRcMw4w4pRTAebUV0 Omd9MrG+fcTxsW59xT+ZEXYuVzmc06QL7IHfIq4M5FaYVl+LH+MdEAF4U8uWMcNSgDGscuvamghZX shpIaDLQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qgF3M-004PU5-11; Wed, 13 Sep 2023 01:52:28 +0000 Received: from out-219.mta0.migadu.com ([91.218.175.219]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qgF3B-004PTQ-2j for linux-nvme@lists.infradead.org; Wed, 13 Sep 2023 01:52:20 +0000 Message-ID: <6de8bea3-dafa-3173-a8ec-6f69707ec237@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1694569922; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fjw3LcFiRY4Z8h+X1SWRzrqqbKDHd6gKqKrVLmMIJg0=; b=GPBNGu+s7n3tAhw/Gn1NrjW77fWg9BiX9WzhRoMlg+Jafx2IAfyzCu8rJMckHLNUutKzNn jV859XRoVpHIEN3XGEtWOmxLvib99LjN58JnaFf/ixnLCOfQYmNrFZYuInfWYL/N0TlVSz zCSJGmbFhz77QRVccYv3M5OTp7sm2EQ= Date: Wed, 13 Sep 2023 09:51:56 +0800 MIME-Version: 1.0 Subject: Re: [PATCH RFC] nvmet-tcp: add new workqueue to surpress lockdep warning To: Sagi Grimberg , hch@lst.de, kch@nvidia.com Cc: linux-nvme@lists.infradead.org References: <20230726142939.10062-1-guoqing.jiang@linux.dev> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Guoqing Jiang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230912_185218_468561_6C4CB8AE X-CRM114-Status: GOOD ( 21.19 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On 9/12/23 20:24, Sagi Grimberg wrote: > > > On 7/26/23 17:29, Guoqing Jiang wrote: >> During the test of nvme-tcp, lockdep complains when discover local >> nvme tcp device. >> >> [   87.699136] ====================================================== >> [   87.699137] WARNING: possible circular locking dependency detected >> [   87.699138] 6.5.0-rc3+ #16 Tainted: G            E >> [   87.699139] ------------------------------------------------------ >> [   87.699140] kworker/0:4H/1522 is trying to acquire lock: >> [   87.699141] ffff93c4df45f538 >> ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: __flush_workqueue+0x99/0x4f0 >> [   87.699147] >>                 but task is already holding lock: >> [   87.699148] ffffafb40272fe40 >> ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: >> process_one_work+0x236/0x590 >> [   87.699151] >>                 which lock already depends on the new lock. >> [   87.699152] >>                 the existing dependency chain (in reverse order) is: >> [   87.699153] >>                 -> #2 ((work_completion)(&queue->io_work)){+.+.}-{0:0}: >> [   87.699155]        __flush_work+0x7a/0x5c0 >> [   87.699157]        __cancel_work_timer+0x155/0x1e0 >> [   87.699158]        cancel_work_sync+0x10/0x20 >> [   87.699160]        nvmet_tcp_release_queue_work+0xcf/0x490 >> [nvmet_tcp] >> [   87.699163]        process_one_work+0x2bd/0x590 >> [   87.699165]        worker_thread+0x52/0x3f0 >> [   87.699166]        kthread+0x109/0x140 >> [   87.699168]        ret_from_fork+0x46/0x70 >> [   87.699170]        ret_from_fork_asm+0x1b/0x30 >> [   87.699172] >>                 -> #1 >> ((work_completion)(&queue->release_work)){+.+.}-{0:0}: >> [   87.699174]        process_one_work+0x28c/0x590 >> [   87.699175]        worker_thread+0x52/0x3f0 >> [   87.699177]        kthread+0x109/0x140 >> [   87.699177]        ret_from_fork+0x46/0x70 >> [   87.699179]        ret_from_fork_asm+0x1b/0x30 >> [   87.699180] >>                 -> #0 ((wq_completion)nvmet-wq){+.+.}-{0:0}: >> [   87.699182]        __lock_acquire+0x1523/0x2590 >> [   87.699184]        lock_acquire+0xd6/0x2f0 >> [   87.699185]        __flush_workqueue+0xc5/0x4f0 >> [   87.699187]        nvmet_tcp_install_queue+0x30/0x160 [nvmet_tcp] >> [   87.699189]        nvmet_install_queue+0xbf/0x200 [nvmet] >> [   87.699196]        nvmet_execute_admin_connect+0x18b/0x2f0 [nvmet] >> [   87.699200]        nvmet_tcp_io_work+0x7e3/0x850 [nvmet_tcp] >> [   87.699203]        process_one_work+0x2bd/0x590 >> [   87.699204]        worker_thread+0x52/0x3f0 >> [   87.699206]        kthread+0x109/0x140 >> [   87.699207]        ret_from_fork+0x46/0x70 >> [   87.699208]        ret_from_fork_asm+0x1b/0x30 >> [   87.699209] >>                 other info that might help us debug this: >> [   87.699210] Chain exists of: >>                   (wq_completion)nvmet-wq --> >> (work_completion)(&queue->release_work) --> >> (work_completion)(&queue->io_work) >> [   87.699212]  Possible unsafe locking scenario: >> [   87.699213]        CPU0                    CPU1 >> [   87.699214]        ----                    ---- >> [   87.699214] lock((work_completion)(&queue->io_work)); >> [   87.699215] lock((work_completion)(&queue->release_work)); >> [   87.699217] lock((work_completion)(&queue->io_work)); >> [   87.699218]   lock((wq_completion)nvmet-wq); >>                          -> need to hold release_work since >> queue_work(nvmet_wq, &queue->release_work) >> [   87.699219] >>                  *** DEADLOCK *** >> [   87.699220] 2 locks held by kworker/0:4H/1522: >> [   87.699221]  #0: ffff93c4df45f338 >> ((wq_completion)nvmet_tcp_wq){+.+.}-{0:0}, at: >> process_one_work+0x236/0x590 >> [   87.699224]  #1: ffffafb40272fe40 >> ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: >> process_one_work+0x236/0x590 >> [   87.699227] >>                 stack backtrace: >> [   87.699229] CPU: 0 PID: 1522 Comm: kworker/0:4H Tainted: >> G            E      6.5.0-rc3+ #16 >> [   87.699230] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), >> BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 >> [   87.699231] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] >> >> The above happens because nvmet_tcp_io_work can trigger below path >> >>     -> nvmet_tcp_try_recv >>      -> nvmet_tcp_try_recv_one >>       -> nvmet_tcp_try_recv_data >>        -> nvmet_tcp_execute_request >>         -> cmd->req.execute = nvmet_execute_admin_connect >>          -> nvmet_install_queue >>           -> ctrl->ops->install_queue = nvmet_install_queue The above should be nvmet_tcp_install_queue instead of nvmet_install_queue. >>            -> nvmet_tcp_install_queue >>             -> flush_workqueue(nvmet_wq) >> >> And release_work (nvmet_tcp_release_queue_work) is also queued in >> nvmet_wq, which need to flush io_work (nvmet_tcp_io_work) due to >> cancel_work_sync(&queue->io_work). > > I'm not sure I understand the resolution here. io_work does not > run on nvmet_wq, but on nvmet_tcp_wq. Yes, io_work is run on nvmet_tcp_wq, and the work may trigger flush_workqueue(nvmet_wq) io_work = nvmet_tcp_io_work     -> nvmet_tcp_try_recv      -> nvmet_tcp_try_recv_one       -> nvmet_tcp_try_recv_data        -> nvmet_tcp_execute_request         -> cmd->req.execute = nvmet_execute_admin_connect          -> nvmet_install_queue           -> ctrl->ops->install_queue = nvmet_tcp_install_queue            -> nvmet_tcp_install_queue             -> flush_workqueue(nvmet_wq) Also release_work = nvmet_tcp_release_queue_work need to call cancel_work_sync(&queue->io_work), but release_work is queued in nvmet_wq. I think this kind of mutual dependency scenario is complained by lockdep. > What does separating another workqueue give here? > >> We can surpress the lockdep warning by checking if the relevant work >> is pending. So the simplest might be just add the checking before >> flush_workqueue(nvmet_wq). However, there are other works are also >> queued on the same queue, I am not sure if we should flush other >> works unconditionally, so a new dedicated workqueue is added. Please see above. >> Signed-off-by: Guoqing Jiang >> --- >>   drivers/nvme/target/tcp.c | 14 +++++++++++++- >>   1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c >> index 868aa4de2e4c..ac611cb299a8 100644 >> --- a/drivers/nvme/target/tcp.c >> +++ b/drivers/nvme/target/tcp.c >> @@ -189,6 +189,7 @@ static LIST_HEAD(nvmet_tcp_queue_list); >>   static DEFINE_MUTEX(nvmet_tcp_queue_mutex); >>     static struct workqueue_struct *nvmet_tcp_wq; >> +static struct workqueue_struct *nvmet_tcp_release_wq; >>   static const struct nvmet_fabrics_ops nvmet_tcp_ops; >>   static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c); >>   static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd); >> @@ -1288,7 +1289,7 @@ static void >> nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue) >>       spin_lock(&queue->state_lock); >>       if (queue->state != NVMET_TCP_Q_DISCONNECTING) { >>           queue->state = NVMET_TCP_Q_DISCONNECTING; >> -        queue_work(nvmet_wq, &queue->release_work); >> +        queue_work(nvmet_tcp_release_wq, &queue->release_work); >>       } >>       spin_unlock(&queue->state_lock); >>   } >> @@ -1847,6 +1848,8 @@ static u16 nvmet_tcp_install_queue(struct >> nvmet_sq *sq) >>       if (sq->qid == 0) { >>           /* Let inflight controller teardown complete */ >>           flush_workqueue(nvmet_wq); >> +        if (work_pending(&queue->release_work)) >> +            flush_workqueue(nvmet_tcp_release_wq); > > This is effectively just never flushes anything. when we install > the queue it's own release_work never really runs. So what your > patch effectively does is just to remove the flush altogether. IMHO work_pending would check whether there is a pending work item before flush relevant workqueue. BTW, Hannes's patch can fix this as well, which might be better. https://lore.kernel.org/linux-nvme/20230810132006.129365-1-hare@suse.de/ Thanks, Guoqing