From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Aring Date: Wed, 29 Sep 2021 13:10:54 -0400 Subject: [Cluster-devel] [PATCH dlm/next] fs: dlm: avoid ls_waiter_mutex circular lock dependency warning Message-ID: <20210929171054.2623204-1-aahringo@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch adds comments about a very specific DLM behaviour which we require to avoid a deadlock. The behaviour is about a state when new DLM operations and processing of DLM messages are stopped. This DLM state indicates that a certain group of function in DLM can't occur e.g. any central locking logic stages functionality of fs/dlm/lock.c . To check on this state a new assert "DLM_ASSERT_OPS_LOCKED" was introduced to be sure that the required locks are held when this state is required. One of the function where this state is required is "dlm_recover_waiters_pre()", otherwise a deadlock can occur because a different lock order between ls_waiters_mutex and res_mutex. A reverse lock order of dlm_recover_waiters_pre() can't happen because the right locks are hold to stop all "lock operations" as DLM_ASSERT_OPS_LOCKED() checks on it. This patch moves the "ls_waiters_mutex" to a different lockclass to avoid a lockdep warning about a possible deadlock. The lock should be the same lockclass but so far I understand the only way to tell lockdep to handle the lock in dlm_recover_waiters_pre() differently than other ls_waiters_mutex lock calls. It should avoid the following warning: [ 619.855891] ====================================================== [ 619.856858] WARNING: possible circular locking dependency detected [ 619.857865] 5.14.0-1.el9.x86_64+debug #1 Not tainted [ 619.858646] ------------------------------------------------------ [ 619.859646] dlm_recoverd/3961 is trying to acquire lock: [ 619.860478] ffff888019dcd628 (&r->res_mutex){+.+.}-{3:3}, at: _receive_unlock_reply+0x78/0x600 [dlm] [ 619.861999] [ 619.861999] but task is already holding lock: [ 619.862933] ffff88800ee901a8 (&ls->ls_waiters_mutex){+.+.}-{3:3}, at: dlm_recover_waiters_pre+0x72/0xc80 [dlm] [ 619.864529] [ 619.864529] which lock already depends on the new lock. [ 619.864529] [ 619.865837] [ 619.865837] the existing dependency chain (in reverse order) is: [ 619.866993] [ 619.866993] -> #1 (&ls->ls_waiters_mutex){+.+.}-{3:3}: [ 619.868088] __lock_acquire+0xb72/0x1870 [ 619.868861] lock_acquire+0x1ca/0x570 [ 619.869554] __mutex_lock+0x14c/0x1170 [ 619.870283] add_to_waiters+0x6a/0x500 [dlm] [ 619.871047] _request_lock+0x39f/0x9f0 [dlm] [ 619.871860] request_lock.part.0+0x1ae/0x220 [dlm] [ 619.872713] dlm_user_request+0x237/0x5a0 [dlm] [ 619.873555] device_user_lock+0x42c/0x660 [dlm] [ 619.874366] device_write+0x5ff/0x8d0 [dlm] [ 619.875116] vfs_write+0x1c7/0x850 [ 619.875762] ksys_write+0xf9/0x1d0 [ 619.876385] do_syscall_64+0x3b/0x90 [ 619.877034] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 619.877972] [ 619.877972] -> #0 (&r->res_mutex){+.+.}-{3:3}: [ 619.878931] check_prev_add+0x15e/0x20f0 [ 619.879699] validate_chain+0xaba/0xde0 [ 619.880404] __lock_acquire+0xb72/0x1870 [ 619.881100] lock_acquire+0x1ca/0x570 [ 619.881823] __mutex_lock+0x14c/0x1170 [ 619.882506] _receive_unlock_reply+0x78/0x600 [dlm] [ 619.883365] dlm_recover_waiters_pre+0x6e8/0xc80 [dlm] [ 619.884262] ls_recover.isra.0+0x517/0x1090 [dlm] [ 619.885087] dlm_recoverd+0x348/0x430 [dlm] [ 619.885844] kthread+0x329/0x3e0 [ 619.886456] ret_from_fork+0x22/0x30 [ 619.887113] [ 619.887113] other info that might help us debug this: [ 619.887113] [ 619.888376] Possible unsafe locking scenario: [ 619.888376] [ 619.889359] CPU0 CPU1 [ 619.890064] ---- ---- [ 619.890775] lock(&ls->ls_waiters_mutex); [ 619.891436] lock(&r->res_mutex); [ 619.892378] lock(&ls->ls_waiters_mutex); [ 619.893436] lock(&r->res_mutex); [ 619.893991] [ 619.893991] *** DEADLOCK *** [ 619.893991] [ 619.894930] 3 locks held by dlm_recoverd/3961: [ 619.895647] #0: ffff88800ee90d78 (&ls->ls_in_recovery){++++}-{3:3}, at: dlm_recoverd+0x1d1/0x430 [dlm] [ 619.897173] #1: ffff88800ee90c68 (&ls->ls_recoverd_active){+.+.}-{3:3}, at: ls_recover.isra.0+0xf9/0x1090 [dlm] [ 619.898759] #2: ffff88800ee901a8 (&ls->ls_waiters_mutex){+.+.}-{3:3}, at: dlm_recover_waiters_pre+0x72/0xc80 [dlm] [ 619.900439] [ 619.900439] stack backtrace: [ 619.901145] CPU: 1 PID: 3961 Comm: dlm_recoverd Not tainted 5.14.0-1.el9.x86_64+debug #1 [ 619.902461] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 619.903390] Call Trace: [ 619.903808] dump_stack_lvl+0x57/0x7d [ 619.904493] check_noncircular+0x26a/0x310 [ 619.905155] ? print_circular_bug+0x1f0/0x1f0 [ 619.905839] ? alloc_chain_hlocks+0x1de/0x530 [ 619.906528] check_prev_add+0x15e/0x20f0 [ 619.907155] validate_chain+0xaba/0xde0 [ 619.907787] ? check_prev_add+0x20f0/0x20f0 [ 619.908489] __lock_acquire+0xb72/0x1870 [ 619.909147] lock_acquire+0x1ca/0x570 [ 619.909730] ? _receive_unlock_reply+0x78/0x600 [dlm] [ 619.910554] ? rcu_read_unlock+0x40/0x40 [ 619.911183] ? __lock_acquired+0x1d2/0x8c0 [ 619.911826] ? dlm_recoverd+0x348/0x430 [dlm] [ 619.912541] __mutex_lock+0x14c/0x1170 [ 619.913160] ? _receive_unlock_reply+0x78/0x600 [dlm] [ 619.913997] ? _receive_unlock_reply+0x78/0x600 [dlm] [ 619.914838] ? mutex_lock_io_nested+0xfc0/0xfc0 [ 619.915552] ? dlm_recover_waiters_pre+0x72/0xc80 [dlm] [ 619.916380] ? io_schedule_timeout+0x150/0x150 [ 619.917072] ? mutex_lock_io_nested+0xfc0/0xfc0 [ 619.917833] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350 [ 619.918738] ? _receive_unlock_reply+0x78/0x600 [dlm] [ 619.919568] _receive_unlock_reply+0x78/0x600 [dlm] [ 619.920352] dlm_recover_waiters_pre+0x6e8/0xc80 [dlm] [ 619.921186] ls_recover.isra.0+0x517/0x1090 [dlm] [ 619.921941] ? dlm_clear_toss+0x280/0x280 [dlm] [ 619.922666] ? dlm_recoverd+0x33d/0x430 [dlm] [ 619.923384] dlm_recoverd+0x348/0x430 [dlm] [ 619.924053] ? ls_recover.isra.0+0x1090/0x1090 [dlm] [ 619.924896] kthread+0x329/0x3e0 [ 619.925422] ? _raw_spin_unlock_irq+0x24/0x30 [ 619.926100] ? set_kthread_struct+0x100/0x100 [ 619.926788] ret_from_fork+0x22/0x30 Signed-off-by: Alexander Aring --- fs/dlm/dlm_internal.h | 22 ++++++++++++++++++++++ fs/dlm/lock.c | 12 +++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 49cf83e04c80..3fb610dfea19 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -746,6 +746,28 @@ static inline int dlm_no_directory(struct dlm_ls *ls) return test_bit(LSFL_NODIR, &ls->ls_flags); } +#ifdef CONFIG_PROVE_LOCKING +static inline int dlm_assert_ops_unlocked(struct dlm_ls *ls) +{ + int ret = 0; + + /* will block all new DLM-API from user */ + ret |= !rwsem_is_locked(&ls->ls_in_recovery); + /* assumed this flag was set while ls_recv_active was locked. + * This will make sure no receiving is processed and every + * further receivings are queued into the requestqueue. + */ + ret |= !dlm_locking_stopped(ls); + + return ret; +} +#define DLM_ASSERT_OPS_LOCKED(ls) \ + DLM_ASSERT(dlm_assert_ops_unlocked(ls) == 0, \ + printk("Locking active in ls: %s\n", ls->ls_name);) +#else +#define DLM_ASSERT_OPS_LOCKED(ls) +#endif /* CONFIG_PROVE_LOCKING */ + int dlm_netlink_init(void); void dlm_netlink_exit(void); void dlm_timeout_warn(struct dlm_lkb *lkb); diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index c502c065d007..4231edb3c614 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -5125,7 +5125,17 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls) if (!ms_stub) return; - mutex_lock(&ls->ls_waiters_mutex); + /* we do mutex_lock_nested() here in a different lockclass to avoid + * a false positive report about a possible circular lock dependency. + * The lockclass should be the same but doing it avoids the false + * positive report. This report is about a different lock order + * regarding to ls_waiters_mutex lock and res_mutex of lkb between here + * and during lock operations. However lock operations cannot run in + * parallel with dlm_recover_waiters_pre() because certain locks are + * held which DLM_ASSERT_OPS_LOCKED(ls) is checking on. + */ + DLM_ASSERT_OPS_LOCKED(ls); + mutex_lock_nested(&ls->ls_waiters_mutex, 1); list_for_each_entry_safe(lkb, safe, &ls->ls_waiters, lkb_wait_reply) { -- 2.27.0