Re: [PATCH 2/2] nvmet-tcp: Fix incorrect locking in state_change sk callback

From: Yi Zhang <yi.zhang@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH 2/2] nvmet-tcp: Fix incorrect locking in state_change sk callback
Date: Wed, 24 Mar 2021 10:06:03 +0800	[thread overview]
Message-ID: <4fe7519f-b93e-a9b9-841d-56f0e3b647c4@redhat.com> (raw)
In-Reply-To: <20210321070849.813104-2-sagi@grimberg.me>

Hi Sagi
With the two patch, I reproduced another lock dependency issue, here is 
the full log:

[  143.310362] run blktests nvme/003 at 2021-03-23 21:52:15
[  143.927284] loop: module loaded
[  144.027532] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  144.059070] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  144.201559] nvmet: creating controller 1 for subsystem 
nqn.2014-08.org.nvmexpress.discovery for NQN 
nqn.2014-08.org.nvmexpress:uuid:e25db33098f14032b70b755db1976647.
[  144.211644] nvme nvme1: new ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420
[  154.400575] nvme nvme1: Removing ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery"

[  154.407970] ======================================================
[  154.414871] WARNING: possible circular locking dependency detected
[  154.421765] 5.12.0-rc3.fix+ #2 Not tainted
[  154.426340] ------------------------------------------------------
[  154.433232] kworker/7:2/260 is trying to acquire lock:
[  154.438972] ffff888288e92030 
((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: 
__flush_work+0x118/0x1a0
[  154.449882]
                but task is already holding lock:
[  154.456395] ffffc90002b57db0 
((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: 
process_one_work+0x7c1/0x1480
[  154.468263]
                which lock already depends on the new lock.

[  154.477393]
                the existing dependency chain (in reverse order) is:
[  154.485739]
                -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
[  154.494884]        __lock_acquire+0xb77/0x18d0
[  154.499853]        lock_acquire+0x1ca/0x480
[  154.504528]        process_one_work+0x813/0x1480
[  154.509688]        worker_thread+0x590/0xf80
[  154.514458]        kthread+0x368/0x440
[  154.518650]        ret_from_fork+0x22/0x30
[  154.523232]
                -> #1 ((wq_completion)events){+.+.}-{0:0}:
[  154.530633]        __lock_acquire+0xb77/0x18d0
[  154.535597]        lock_acquire+0x1ca/0x480
[  154.540272]        flush_workqueue+0x101/0x1250
[  154.545334]        nvmet_tcp_install_queue+0x22c/0x2a0 [nvmet_tcp]
[  154.552242]        nvmet_install_queue+0x2a3/0x360 [nvmet]
[  154.558387]        nvmet_execute_admin_connect+0x321/0x420 [nvmet]
[  154.565305]        nvmet_tcp_io_work+0xa04/0xcfb [nvmet_tcp]
[  154.571629]        process_one_work+0x8b2/0x1480
[  154.576787]        worker_thread+0x590/0xf80
[  154.581560]        kthread+0x368/0x440
[  154.585749]        ret_from_fork+0x22/0x30
[  154.590328]
                -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
[  154.598989]        check_prev_add+0x15e/0x20f0
[  154.603953]        validate_chain+0xec9/0x19c0
[  154.608918]        __lock_acquire+0xb77/0x18d0
[  154.613883]        lock_acquire+0x1ca/0x480
[  154.618556]        __flush_work+0x139/0x1a0
[  154.623229]        nvmet_tcp_release_queue_work+0x2e5/0xcb0 [nvmet_tcp]
[  154.630621]        process_one_work+0x8b2/0x1480
[  154.635780]        worker_thread+0x590/0xf80
[  154.640549]        kthread+0x368/0x440
[  154.644741]        ret_from_fork+0x22/0x30
[  154.649321]
                other info that might help us debug this:

[  154.658257] Chain exists of:
                  (work_completion)(&queue->io_work) --> 
(wq_completion)events --> (work_completion)(&queue->release_work)

[  154.675070]  Possible unsafe locking scenario:

[  154.681679]        CPU0                    CPU1
[  154.686728]        ----                    ----
[  154.691776] lock((work_completion)(&queue->release_work));
[  154.698102] lock((wq_completion)events);
[  154.705493] lock((work_completion)(&queue->release_work));
[  154.714631]   lock((work_completion)(&queue->io_work));
[  154.720470]
                 *** DEADLOCK ***

[  154.727080] 2 locks held by kworker/7:2/260:
[  154.731849]  #0: ffff888100053148 
((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x792/0x1480
[  154.742458]  #1: ffffc90002b57db0 
((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: 
process_one_work+0x7c1/0x1480
[  154.754809]
                stack backtrace:
[  154.759674] CPU: 7 PID: 260 Comm: kworker/7:2 Not tainted 
5.12.0-rc3.fix+ #2
[  154.767549] Hardware name: Dell Inc. PowerEdge 
R730xd/\xc9\xb2\xdePow, BIOS 2.12.1 12/04/2020
[  154.776197] Workqueue: events nvmet_tcp_release_queue_work [nvmet_tcp]
[  154.783497] Call Trace:
[  154.786231]  dump_stack+0x93/0xc2
[  154.789942]  check_noncircular+0x26a/0x310
[  154.794521]  ? print_circular_bug+0x460/0x460
[  154.799391]  ? deref_stack_reg+0x170/0x170
[  154.803967]  ? alloc_chain_hlocks+0x1de/0x520
[  154.808843]  check_prev_add+0x15e/0x20f0
[  154.813231]  validate_chain+0xec9/0x19c0
[  154.817611]  ? check_prev_add+0x20f0/0x20f0
[  154.822286]  ? save_trace+0x88/0x5e0
[  154.826290]  __lock_acquire+0xb77/0x18d0
[  154.830682]  lock_acquire+0x1ca/0x480
[  154.834775]  ? __flush_work+0x118/0x1a0
[  154.839066]  ? rcu_read_unlock+0x40/0x40
[  154.843455]  ? __lock_acquire+0xb77/0x18d0
[  154.848036]  __flush_work+0x139/0x1a0
[  154.852120]  ? __flush_work+0x118/0x1a0
[  154.856409]  ? start_flush_work+0x810/0x810
[  154.861084]  ? mark_lock+0xd3/0x1470
[  154.865082]  ? mark_lock_irq+0x1d10/0x1d10
[  154.869662]  ? lock_downgrade+0x100/0x100
[  154.874147]  ? mark_held_locks+0xa5/0xe0
[  154.878522]  ? sk_stream_wait_memory+0xe40/0xe40
[  154.883686]  ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
[  154.890394]  ? __local_bh_enable_ip+0xa2/0x100
[  154.895358]  ? trace_hardirqs_on+0x1c/0x160
[  154.900034]  ? sk_stream_wait_memory+0xe40/0xe40
[  154.905192]  nvmet_tcp_release_queue_work+0x2e5/0xcb0 [nvmet_tcp]
[  154.911999]  ? lock_is_held_type+0x9a/0x110
[  154.916676]  process_one_work+0x8b2/0x1480
[  154.921255]  ? pwq_dec_nr_in_flight+0x260/0x260
[  154.926315]  ? __lock_contended+0x910/0x910
[  154.930990]  ? worker_thread+0x150/0xf80
[  154.935374]  worker_thread+0x590/0xf80
[  154.939564]  ? __kthread_parkme+0xcb/0x1b0
[  154.944140]  ? process_one_work+0x1480/0x1480
[  154.949007]  kthread+0x368/0x440
[  154.952615]  ? _raw_spin_unlock_irq+0x24/0x30
[  154.957482]  ? __kthread_bind_mask+0x90/0x90
[  154.962255]  ret_from_fork+0x22/0x30


On 3/21/21 3:08 PM, Sagi Grimberg wrote:
> We are not changing anything in the TCP connection state so
> we should not take a write_lock but rather a read lock.
>
> This caused a deadlock when running nvmet-tcp and nvme-tcp
> on the same system, where state_change callbacks on the
> host and on the controller side have causal relationship
> and made lockdep report on this with blktests:


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme