From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: take the cleaner_mutex earlier in qgroup disable
Date: Tue, 23 Apr 2024 15:53:40 +0200 [thread overview]
Message-ID: <20240423135340.GF3492@twin.jikos.cz> (raw)
In-Reply-To: <a78874868a1d3d8d78e6b0fdbb97debc88923734.1713573156.git.josef@toxicpanda.com>
On Fri, Apr 19, 2024 at 08:32:48PM -0400, Josef Bacik wrote:
> One of my CI runs popped the following lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.9.0-rc4+ #1 Not tainted
> ------------------------------------------------------
> btrfs/471533 is trying to acquire lock:
> ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0
>
> but task is already holding lock:
> ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&fs_info->subvol_sem){++++}-{3:3}:
> down_read+0x42/0x170
> btrfs_rename+0x607/0xb00
> btrfs_rename2+0x2e/0x70
> vfs_rename+0xaf8/0xfc0
> do_renameat2+0x586/0x600
> __x64_sys_rename+0x43/0x50
> do_syscall_64+0x95/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
> down_write+0x3f/0xc0
> btrfs_inode_lock+0x40/0x70
> prealloc_file_extent_cluster+0x1b0/0x370
> relocate_file_extent_cluster+0xb2/0x720
> relocate_data_extent+0x107/0x160
> relocate_block_group+0x442/0x550
> btrfs_relocate_block_group+0x2cb/0x4b0
> btrfs_relocate_chunk+0x50/0x1b0
> btrfs_balance+0x92f/0x13d0
> btrfs_ioctl+0x1abf/0x2600
> __x64_sys_ioctl+0x97/0xd0
> do_syscall_64+0x95/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
> __lock_acquire+0x13e7/0x2180
> lock_acquire+0xcb/0x2e0
> __mutex_lock+0xbe/0xc00
> btrfs_quota_disable+0x54/0x4c0
> btrfs_ioctl+0x206b/0x2600
> __x64_sys_ioctl+0x97/0xd0
> do_syscall_64+0x95/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> other info that might help us debug this:
>
> Chain exists of:
> &fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->subvol_sem);
> lock(&sb->s_type->i_mutex_key#16);
> lock(&fs_info->subvol_sem);
> lock(&fs_info->cleaner_mutex);
>
> *** DEADLOCK ***
>
> 2 locks held by btrfs/471533:
> #0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
> #1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
>
> stack backtrace:
> CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
> Call Trace:
> <TASK>
> dump_stack_lvl+0x77/0xb0
> check_noncircular+0x148/0x160
> ? lock_acquire+0xcb/0x2e0
> __lock_acquire+0x13e7/0x2180
> lock_acquire+0xcb/0x2e0
> ? btrfs_quota_disable+0x54/0x4c0
> ? lock_is_held_type+0x9a/0x110
> __mutex_lock+0xbe/0xc00
> ? btrfs_quota_disable+0x54/0x4c0
> ? srso_return_thunk+0x5/0x5f
> ? lock_acquire+0xcb/0x2e0
> ? btrfs_quota_disable+0x54/0x4c0
> ? btrfs_quota_disable+0x54/0x4c0
> btrfs_quota_disable+0x54/0x4c0
> btrfs_ioctl+0x206b/0x2600
> ? srso_return_thunk+0x5/0x5f
> ? __do_sys_statfs+0x61/0x70
> __x64_sys_ioctl+0x97/0xd0
> do_syscall_64+0x95/0x180
> ? srso_return_thunk+0x5/0x5f
> ? reacquire_held_locks+0xd1/0x1f0
> ? do_user_addr_fault+0x307/0x8a0
> ? srso_return_thunk+0x5/0x5f
> ? lock_acquire+0xcb/0x2e0
> ? srso_return_thunk+0x5/0x5f
> ? srso_return_thunk+0x5/0x5f
> ? find_held_lock+0x2b/0x80
> ? srso_return_thunk+0x5/0x5f
> ? lock_release+0xca/0x2a0
> ? srso_return_thunk+0x5/0x5f
> ? do_user_addr_fault+0x35c/0x8a0
> ? srso_return_thunk+0x5/0x5f
> ? trace_hardirqs_off+0x4b/0xc0
> ? srso_return_thunk+0x5/0x5f
> ? lockdep_hardirqs_on_prepare+0xde/0x190
> ? srso_return_thunk+0x5/0x5f
>
> This happens because when we call rename we already have the inode mutex
> held, and then we acquire the subvol_sem if we are a subvolume. This
> makes the dependency
>
> inode lock -> subvol sem
>
> When we're running data relocation we will preallocate space for the
> data relocation inode, and we always run the relocation under the
> ->cleaner_mutex. This now creates the dependency of
>
> cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem
>
> Qgroup delete is doing this in the opposite order, it is acquiring the
> subvol_sem and then it is acquiring the cleaner_mutex, which results in
> this lockdep splat. This deadlock can't happen in reality, because we
> won't ever rename the data reloc inode, nor is the data reloc inode a
> subvolume.
>
> However this is fairly easy to fix, simply take the cleaner mutex in the
> case where we are disabling qgroups before we take the subvol_sem. This
> resolves the lockdep splat.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
prev parent reply other threads:[~2024-04-23 14:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-20 0:32 [PATCH] btrfs: take the cleaner_mutex earlier in qgroup disable Josef Bacik
2024-04-22 15:27 ` Filipe Manana
2024-04-23 14:03 ` David Sterba
2024-04-23 13:53 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240423135340.GF3492@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox