From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Josef Bacik <josef@toxicpanda.com>,
Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
clm@fb.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.8 20/52] btrfs: take the cleaner_mutex earlier in qgroup disable
Date: Tue, 7 May 2024 19:06:46 -0400 [thread overview]
Message-ID: <20240507230800.392128-20-sashal@kernel.org> (raw)
In-Reply-To: <20240507230800.392128-1-sashal@kernel.org>
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit 0f2b8098d72a93890e69aa24ec549ef4bc34f4db ]
One of my CI runs popped the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc4+ #1 Not tainted
------------------------------------------------------
btrfs/471533 is trying to acquire lock:
ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0
but task is already holding lock:
ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&fs_info->subvol_sem){++++}-{3:3}:
down_read+0x42/0x170
btrfs_rename+0x607/0xb00
btrfs_rename2+0x2e/0x70
vfs_rename+0xaf8/0xfc0
do_renameat2+0x586/0x600
__x64_sys_rename+0x43/0x50
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
down_write+0x3f/0xc0
btrfs_inode_lock+0x40/0x70
prealloc_file_extent_cluster+0x1b0/0x370
relocate_file_extent_cluster+0xb2/0x720
relocate_data_extent+0x107/0x160
relocate_block_group+0x442/0x550
btrfs_relocate_block_group+0x2cb/0x4b0
btrfs_relocate_chunk+0x50/0x1b0
btrfs_balance+0x92f/0x13d0
btrfs_ioctl+0x1abf/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
__mutex_lock+0xbe/0xc00
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of:
&fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->subvol_sem);
lock(&sb->s_type->i_mutex_key#16);
lock(&fs_info->subvol_sem);
lock(&fs_info->cleaner_mutex);
*** DEADLOCK ***
2 locks held by btrfs/471533:
#0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
#1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
stack backtrace:
CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xb0
check_noncircular+0x148/0x160
? lock_acquire+0xcb/0x2e0
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? lock_is_held_type+0x9a/0x110
__mutex_lock+0xbe/0xc00
? btrfs_quota_disable+0x54/0x4c0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? btrfs_quota_disable+0x54/0x4c0
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
? srso_return_thunk+0x5/0x5f
? __do_sys_statfs+0x61/0x70
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
? srso_return_thunk+0x5/0x5f
? reacquire_held_locks+0xd1/0x1f0
? do_user_addr_fault+0x307/0x8a0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? find_held_lock+0x2b/0x80
? srso_return_thunk+0x5/0x5f
? lock_release+0xca/0x2a0
? srso_return_thunk+0x5/0x5f
? do_user_addr_fault+0x35c/0x8a0
? srso_return_thunk+0x5/0x5f
? trace_hardirqs_off+0x4b/0xc0
? srso_return_thunk+0x5/0x5f
? lockdep_hardirqs_on_prepare+0xde/0x190
? srso_return_thunk+0x5/0x5f
This happens because when we call rename we already have the inode mutex
held, and then we acquire the subvol_sem if we are a subvolume. This
makes the dependency
inode lock -> subvol sem
When we're running data relocation we will preallocate space for the
data relocation inode, and we always run the relocation under the
->cleaner_mutex. This now creates the dependency of
cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem
Qgroup delete is doing this in the opposite order, it is acquiring the
subvol_sem and then it is acquiring the cleaner_mutex, which results in
this lockdep splat. This deadlock can't happen in reality, because we
won't ever rename the data reloc inode, nor is the data reloc inode a
subvolume.
However this is fairly easy to fix, simply take the cleaner mutex in the
case where we are disabling qgroups before we take the subvol_sem. This
resolves the lockdep splat.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/ioctl.c | 33 ++++++++++++++++++++++++++++++---
fs/btrfs/qgroup.c | 21 ++++++++-------------
2 files changed, 38 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6b93fae74403d..8851ba7a1e271 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3722,15 +3722,43 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg)
goto drop_write;
}
- down_write(&fs_info->subvol_sem);
-
switch (sa->cmd) {
case BTRFS_QUOTA_CTL_ENABLE:
case BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA:
+ down_write(&fs_info->subvol_sem);
ret = btrfs_quota_enable(fs_info, sa);
+ up_write(&fs_info->subvol_sem);
break;
case BTRFS_QUOTA_CTL_DISABLE:
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent
+ * relocation, because relocation may be building backrefs for
+ * blocks of the quota root while we are deleting the root. This
+ * is like dropping fs roots of deleted snapshots/subvolumes, we
+ * need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to
+ * disable quotas, because we will unlock and relock
+ * qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ *
+ * We take this here because we have the dependency of
+ *
+ * inode_lock -> subvol_sem
+ *
+ * because of rename. With relocation we can prealloc extents,
+ * so that makes the dependency chain
+ *
+ * cleaner_mutex -> inode_lock -> subvol_sem
+ *
+ * so we must take the cleaner_mutex here before we take the
+ * subvol_sem. The deadlock can't actually happen, but this
+ * quiets lockdep.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+ down_write(&fs_info->subvol_sem);
ret = btrfs_quota_disable(fs_info);
+ up_write(&fs_info->subvol_sem);
+ mutex_unlock(&fs_info->cleaner_mutex);
break;
default:
ret = -EINVAL;
@@ -3738,7 +3766,6 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg)
}
kfree(sa);
- up_write(&fs_info->subvol_sem);
drop_write:
mnt_drop_write_file(file);
return ret;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 132802bd80999..9b8a61aa390eb 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1342,16 +1342,10 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
lockdep_assert_held_write(&fs_info->subvol_sem);
/*
- * Lock the cleaner mutex to prevent races with concurrent relocation,
- * because relocation may be building backrefs for blocks of the quota
- * root while we are deleting the root. This is like dropping fs roots
- * of deleted snapshots/subvolumes, we need the same protection.
- *
- * This also prevents races between concurrent tasks trying to disable
- * quotas, because we will unlock and relock qgroup_ioctl_lock across
- * BTRFS_FS_QUOTA_ENABLED changes.
+ * Relocation will mess with backrefs, so make sure we have the
+ * cleaner_mutex held to protect us from relocate.
*/
- mutex_lock(&fs_info->cleaner_mutex);
+ lockdep_assert_held(&fs_info->cleaner_mutex);
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
@@ -1373,9 +1367,13 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
btrfs_qgroup_wait_for_completion(fs_info, false);
+ /*
+ * We have nothing held here and no trans handle, just return the error
+ * if there is one.
+ */
ret = flush_reservations(fs_info);
if (ret)
- goto out_unlock_cleaner;
+ return ret;
/*
* 1 For the root item
@@ -1439,9 +1437,6 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_commit_transaction(trans);
-out_unlock_cleaner:
- mutex_unlock(&fs_info->cleaner_mutex);
-
return ret;
}
--
2.43.0
next prev parent reply other threads:[~2024-05-07 23:08 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-07 23:06 [PATCH AUTOSEL 6.8 01/52] ASoC: Intel: bytcr_rt5640: Apply Asus T100TA quirk to Asus T100TAM too Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 02/52] regulator: irq_helpers: duplicate IRQ name Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 03/52] ALSA: hda: cs35l56: Exit cache-only after cs35l56_wait_for_firmware_boot() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 04/52] ASoC: SOF: ipc4-pcm: Use consistent name for snd_sof_pcm_stream pointer Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 05/52] ASoC: SOF: ipc4-pcm: Use consistent name for sof_ipc4_timestamp_info pointer Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 06/52] ASoC: SOF: ipc4-pcm: Introduce generic sof_ipc4_pcm_stream_priv Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 07/52] ASoC: SOF: pcm: Restrict DSP D0i3 during S0ix to IPC3 Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 08/52] ASoC: acp: Support microphone from device Acer 315-24p Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 09/52] ASoC: rt5645: Fix the electric noise due to the CBJ contacts floating Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 10/52] ASoC: dt-bindings: rt5645: add cbj sleeve gpio property Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 11/52] ASoC: rt722-sdca: modify channel number to support 4 channels Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 12/52] ASoC: rt722-sdca: add headset microphone vrefo setting Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 13/52] regulator: qcom-refgen: fix module autoloading Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 14/52] regulator: vqmmc-ipq4019: " Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 15/52] ASoC: cs35l41: Update DSP1RX5/6 Sources for DSP config Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 16/52] ASoC: rt715: add vendor clear control register Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 17/52] ASoC: rt715-sdca: volume step modification Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 18/52] KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF Sasha Levin
2024-05-07 23:06 ` Sasha Levin
2024-05-08 6:25 ` Oliver Upton
2024-05-08 6:25 ` Oliver Upton
2024-05-08 17:55 ` Sasha Levin
2024-05-08 17:55 ` Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 19/52] Input: xpad - add support for ASUS ROG RAIKIRI Sasha Levin
2024-05-07 23:06 ` Sasha Levin [this message]
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 21/52] EDAC/versal: Do not register for NOC errors Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 22/52] fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 23/52] bpf, x86: Fix PROBE_MEM runtime load check Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 24/52] ALSA: emu10k1: factor out snd_emu1010_load_dock_firmware() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 25/52] ALSA: emu10k1: make E-MU FPGA writes potentially more reliable Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 26/52] erofs: reliably distinguish block based and fscache mode Sasha Levin
2024-05-07 23:06 ` Sasha Levin
2024-05-07 23:19 ` Gao Xiang
2024-05-07 23:19 ` Gao Xiang
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 27/52] softirq: Fix suspicious RCU usage in __do_softirq() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 28/52] net: qede: sanitize 'rc' in qede_add_tc_flower_fltr() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 29/52] firewire: nosy: ensure user_length is taken into account when fetching packet contents Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 30/52] platform/x86: ISST: Add Grand Ridge to HPM CPU list Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 31/52] ASoC: da7219-aad: fix usage of device_get_named_child_node() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 32/52] ASoC: cs35l56: fix usages " Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 33/52] ALSA: hda: intel-dsp-config: harden I2C/I2S codec detection Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 34/52] Input: amimouse - mark driver struct with __refdata to prevent section mismatch Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 35/52] drm/amdgpu: Fix VRAM memory accounting Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 36/52] drm/amd/display: Ensure that dmcub support flag is set for DCN20 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 37/52] drm/amd/display: Add dtbclk access to dcn315 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 38/52] drm/amd/display: Atom Integrated System Info v2_2 for DCN35 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 39/52] drm/amd/display: Allocate zero bw after bw alloc enable Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 40/52] drm/amd/display: Add VCO speed parameter for DCN31 FPU Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 41/52] drm/amd/display: Fix DC mode screen flickering on DCN321 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 42/52] drm/amd/display: Disable seamless boot on 128b/132b encoding Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 43/52] drm/amdkfd: Flush the process wq before creating a kfd_process Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 44/52] x86/mm: Remove broken vsyscall emulation code from the page fault code Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 45/52] nvme: find numa distance only if controller has valid numa id Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 46/52] nvmet-auth: return the error code to the nvmet_auth_host_hash() callers Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 47/52] nvmet-auth: replace pr_debug() with pr_err() to report an error Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 48/52] nvme: cancel pending I/O if nvme controller is in terminal state Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 49/52] nvmet-tcp: fix possible memory leak when tearing down a controller Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 50/52] nvmet: fix nvme status code when namespace is disabled Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 51/52] nvme-tcp: strict pdu pacing to avoid send stalls on TLS Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 52/52] epoll: be better about file lifetimes Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240507230800.392128-20-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.