public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Josef Bacik <josef@toxicpanda.com>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
	clm@fb.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.8 20/52] btrfs: take the cleaner_mutex earlier in qgroup disable
Date: Tue,  7 May 2024 19:06:46 -0400	[thread overview]
Message-ID: <20240507230800.392128-20-sashal@kernel.org> (raw)
In-Reply-To: <20240507230800.392128-1-sashal@kernel.org>

From: Josef Bacik <josef@toxicpanda.com>

[ Upstream commit 0f2b8098d72a93890e69aa24ec549ef4bc34f4db ]

One of my CI runs popped the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc4+ #1 Not tainted
------------------------------------------------------
btrfs/471533 is trying to acquire lock:
ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0

but task is already holding lock:
ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&fs_info->subvol_sem){++++}-{3:3}:
       down_read+0x42/0x170
       btrfs_rename+0x607/0xb00
       btrfs_rename2+0x2e/0x70
       vfs_rename+0xaf8/0xfc0
       do_renameat2+0x586/0x600
       __x64_sys_rename+0x43/0x50
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
       down_write+0x3f/0xc0
       btrfs_inode_lock+0x40/0x70
       prealloc_file_extent_cluster+0x1b0/0x370
       relocate_file_extent_cluster+0xb2/0x720
       relocate_data_extent+0x107/0x160
       relocate_block_group+0x442/0x550
       btrfs_relocate_block_group+0x2cb/0x4b0
       btrfs_relocate_chunk+0x50/0x1b0
       btrfs_balance+0x92f/0x13d0
       btrfs_ioctl+0x1abf/0x2600
       __x64_sys_ioctl+0x97/0xd0
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
       __lock_acquire+0x13e7/0x2180
       lock_acquire+0xcb/0x2e0
       __mutex_lock+0xbe/0xc00
       btrfs_quota_disable+0x54/0x4c0
       btrfs_ioctl+0x206b/0x2600
       __x64_sys_ioctl+0x97/0xd0
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
  &fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->subvol_sem);
                               lock(&sb->s_type->i_mutex_key#16);
                               lock(&fs_info->subvol_sem);
  lock(&fs_info->cleaner_mutex);

 *** DEADLOCK ***

2 locks held by btrfs/471533:
 #0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
 #1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600

stack backtrace:
CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
Call Trace:
 <TASK>
 dump_stack_lvl+0x77/0xb0
 check_noncircular+0x148/0x160
 ? lock_acquire+0xcb/0x2e0
 __lock_acquire+0x13e7/0x2180
 lock_acquire+0xcb/0x2e0
 ? btrfs_quota_disable+0x54/0x4c0
 ? lock_is_held_type+0x9a/0x110
 __mutex_lock+0xbe/0xc00
 ? btrfs_quota_disable+0x54/0x4c0
 ? srso_return_thunk+0x5/0x5f
 ? lock_acquire+0xcb/0x2e0
 ? btrfs_quota_disable+0x54/0x4c0
 ? btrfs_quota_disable+0x54/0x4c0
 btrfs_quota_disable+0x54/0x4c0
 btrfs_ioctl+0x206b/0x2600
 ? srso_return_thunk+0x5/0x5f
 ? __do_sys_statfs+0x61/0x70
 __x64_sys_ioctl+0x97/0xd0
 do_syscall_64+0x95/0x180
 ? srso_return_thunk+0x5/0x5f
 ? reacquire_held_locks+0xd1/0x1f0
 ? do_user_addr_fault+0x307/0x8a0
 ? srso_return_thunk+0x5/0x5f
 ? lock_acquire+0xcb/0x2e0
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? find_held_lock+0x2b/0x80
 ? srso_return_thunk+0x5/0x5f
 ? lock_release+0xca/0x2a0
 ? srso_return_thunk+0x5/0x5f
 ? do_user_addr_fault+0x35c/0x8a0
 ? srso_return_thunk+0x5/0x5f
 ? trace_hardirqs_off+0x4b/0xc0
 ? srso_return_thunk+0x5/0x5f
 ? lockdep_hardirqs_on_prepare+0xde/0x190
 ? srso_return_thunk+0x5/0x5f

This happens because when we call rename we already have the inode mutex
held, and then we acquire the subvol_sem if we are a subvolume.  This
makes the dependency

inode lock -> subvol sem

When we're running data relocation we will preallocate space for the
data relocation inode, and we always run the relocation under the
->cleaner_mutex.  This now creates the dependency of

cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem

Qgroup delete is doing this in the opposite order, it is acquiring the
subvol_sem and then it is acquiring the cleaner_mutex, which results in
this lockdep splat.  This deadlock can't happen in reality, because we
won't ever rename the data reloc inode, nor is the data reloc inode a
subvolume.

However this is fairly easy to fix, simply take the cleaner mutex in the
case where we are disabling qgroups before we take the subvol_sem.  This
resolves the lockdep splat.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/ioctl.c  | 33 ++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.c | 21 ++++++++-------------
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6b93fae74403d..8851ba7a1e271 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3722,15 +3722,43 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg)
 		goto drop_write;
 	}
 
-	down_write(&fs_info->subvol_sem);
-
 	switch (sa->cmd) {
 	case BTRFS_QUOTA_CTL_ENABLE:
 	case BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA:
+		down_write(&fs_info->subvol_sem);
 		ret = btrfs_quota_enable(fs_info, sa);
+		up_write(&fs_info->subvol_sem);
 		break;
 	case BTRFS_QUOTA_CTL_DISABLE:
+		/*
+		 * Lock the cleaner mutex to prevent races with concurrent
+		 * relocation, because relocation may be building backrefs for
+		 * blocks of the quota root while we are deleting the root. This
+		 * is like dropping fs roots of deleted snapshots/subvolumes, we
+		 * need the same protection.
+		 *
+		 * This also prevents races between concurrent tasks trying to
+		 * disable quotas, because we will unlock and relock
+		 * qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+		 *
+		 * We take this here because we have the dependency of
+		 *
+		 * inode_lock -> subvol_sem
+		 *
+		 * because of rename.  With relocation we can prealloc extents,
+		 * so that makes the dependency chain
+		 *
+		 * cleaner_mutex -> inode_lock -> subvol_sem
+		 *
+		 * so we must take the cleaner_mutex here before we take the
+		 * subvol_sem.  The deadlock can't actually happen, but this
+		 * quiets lockdep.
+		 */
+		mutex_lock(&fs_info->cleaner_mutex);
+		down_write(&fs_info->subvol_sem);
 		ret = btrfs_quota_disable(fs_info);
+		up_write(&fs_info->subvol_sem);
+		mutex_unlock(&fs_info->cleaner_mutex);
 		break;
 	default:
 		ret = -EINVAL;
@@ -3738,7 +3766,6 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg)
 	}
 
 	kfree(sa);
-	up_write(&fs_info->subvol_sem);
 drop_write:
 	mnt_drop_write_file(file);
 	return ret;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 132802bd80999..9b8a61aa390eb 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1342,16 +1342,10 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
 	lockdep_assert_held_write(&fs_info->subvol_sem);
 
 	/*
-	 * Lock the cleaner mutex to prevent races with concurrent relocation,
-	 * because relocation may be building backrefs for blocks of the quota
-	 * root while we are deleting the root. This is like dropping fs roots
-	 * of deleted snapshots/subvolumes, we need the same protection.
-	 *
-	 * This also prevents races between concurrent tasks trying to disable
-	 * quotas, because we will unlock and relock qgroup_ioctl_lock across
-	 * BTRFS_FS_QUOTA_ENABLED changes.
+	 * Relocation will mess with backrefs, so make sure we have the
+	 * cleaner_mutex held to protect us from relocate.
 	 */
-	mutex_lock(&fs_info->cleaner_mutex);
+	lockdep_assert_held(&fs_info->cleaner_mutex);
 
 	mutex_lock(&fs_info->qgroup_ioctl_lock);
 	if (!fs_info->quota_root)
@@ -1373,9 +1367,13 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
 	clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
 	btrfs_qgroup_wait_for_completion(fs_info, false);
 
+	/*
+	 * We have nothing held here and no trans handle, just return the error
+	 * if there is one.
+	 */
 	ret = flush_reservations(fs_info);
 	if (ret)
-		goto out_unlock_cleaner;
+		return ret;
 
 	/*
 	 * 1 For the root item
@@ -1439,9 +1437,6 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
 		btrfs_end_transaction(trans);
 	else if (trans)
 		ret = btrfs_commit_transaction(trans);
-out_unlock_cleaner:
-	mutex_unlock(&fs_info->cleaner_mutex);
-
 	return ret;
 }
 
-- 
2.43.0


  parent reply	other threads:[~2024-05-07 23:08 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 23:06 [PATCH AUTOSEL 6.8 01/52] ASoC: Intel: bytcr_rt5640: Apply Asus T100TA quirk to Asus T100TAM too Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 02/52] regulator: irq_helpers: duplicate IRQ name Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 03/52] ALSA: hda: cs35l56: Exit cache-only after cs35l56_wait_for_firmware_boot() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 04/52] ASoC: SOF: ipc4-pcm: Use consistent name for snd_sof_pcm_stream pointer Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 05/52] ASoC: SOF: ipc4-pcm: Use consistent name for sof_ipc4_timestamp_info pointer Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 06/52] ASoC: SOF: ipc4-pcm: Introduce generic sof_ipc4_pcm_stream_priv Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 07/52] ASoC: SOF: pcm: Restrict DSP D0i3 during S0ix to IPC3 Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 08/52] ASoC: acp: Support microphone from device Acer 315-24p Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 09/52] ASoC: rt5645: Fix the electric noise due to the CBJ contacts floating Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 10/52] ASoC: dt-bindings: rt5645: add cbj sleeve gpio property Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 11/52] ASoC: rt722-sdca: modify channel number to support 4 channels Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 12/52] ASoC: rt722-sdca: add headset microphone vrefo setting Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 13/52] regulator: qcom-refgen: fix module autoloading Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 14/52] regulator: vqmmc-ipq4019: " Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 15/52] ASoC: cs35l41: Update DSP1RX5/6 Sources for DSP config Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 16/52] ASoC: rt715: add vendor clear control register Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 17/52] ASoC: rt715-sdca: volume step modification Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 18/52] KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF Sasha Levin
2024-05-08  6:25   ` Oliver Upton
2024-05-08 17:55     ` Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 19/52] Input: xpad - add support for ASUS ROG RAIKIRI Sasha Levin
2024-05-07 23:06 ` Sasha Levin [this message]
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 21/52] EDAC/versal: Do not register for NOC errors Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 22/52] fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 23/52] bpf, x86: Fix PROBE_MEM runtime load check Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 24/52] ALSA: emu10k1: factor out snd_emu1010_load_dock_firmware() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 25/52] ALSA: emu10k1: make E-MU FPGA writes potentially more reliable Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 26/52] erofs: reliably distinguish block based and fscache mode Sasha Levin
2024-05-07 23:19   ` Gao Xiang
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 27/52] softirq: Fix suspicious RCU usage in __do_softirq() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 28/52] net: qede: sanitize 'rc' in qede_add_tc_flower_fltr() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 29/52] firewire: nosy: ensure user_length is taken into account when fetching packet contents Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 30/52] platform/x86: ISST: Add Grand Ridge to HPM CPU list Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 31/52] ASoC: da7219-aad: fix usage of device_get_named_child_node() Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 32/52] ASoC: cs35l56: fix usages " Sasha Levin
2024-05-07 23:06 ` [PATCH AUTOSEL 6.8 33/52] ALSA: hda: intel-dsp-config: harden I2C/I2S codec detection Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 34/52] Input: amimouse - mark driver struct with __refdata to prevent section mismatch Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 35/52] drm/amdgpu: Fix VRAM memory accounting Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 36/52] drm/amd/display: Ensure that dmcub support flag is set for DCN20 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 37/52] drm/amd/display: Add dtbclk access to dcn315 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 38/52] drm/amd/display: Atom Integrated System Info v2_2 for DCN35 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 39/52] drm/amd/display: Allocate zero bw after bw alloc enable Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 40/52] drm/amd/display: Add VCO speed parameter for DCN31 FPU Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 41/52] drm/amd/display: Fix DC mode screen flickering on DCN321 Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 42/52] drm/amd/display: Disable seamless boot on 128b/132b encoding Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 43/52] drm/amdkfd: Flush the process wq before creating a kfd_process Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 44/52] x86/mm: Remove broken vsyscall emulation code from the page fault code Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 45/52] nvme: find numa distance only if controller has valid numa id Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 46/52] nvmet-auth: return the error code to the nvmet_auth_host_hash() callers Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 47/52] nvmet-auth: replace pr_debug() with pr_err() to report an error Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 48/52] nvme: cancel pending I/O if nvme controller is in terminal state Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 49/52] nvmet-tcp: fix possible memory leak when tearing down a controller Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 50/52] nvmet: fix nvme status code when namespace is disabled Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 51/52] nvme-tcp: strict pdu pacing to avoid send stalls on TLS Sasha Levin
2024-05-07 23:07 ` [PATCH AUTOSEL 6.8 52/52] epoll: be better about file lifetimes Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507230800.392128-20-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox