From: Johannes Thumshirn <jth@kernel.org>
To: linux-btrfs@vger.kernel.org
Cc: Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
David Sterba <dsterba@suse.com>,
Josef Bacik <josef@toxicpanda.com>, Boris Burkov <boris@bur.io>,
Filipe Manana <fdmanana@suse.com>,
Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex
Date: Fri, 27 Jun 2025 11:19:10 +0200 [thread overview]
Message-ID: <20250627091914.100715-6-jth@kernel.org> (raw)
In-Reply-To: <20250627091914.100715-1-jth@kernel.org>
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
When benchmarking garbage collection on zoned BTRFS filesystems on ZNS
drives, we regularly observe hung_task messages like the following:
INFO: task kworker/u132:2:297 blocked for more than 122 seconds.
Not tainted 6.16.0-rc1+ #1225
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u132:2 state:D stack:0 pid:297 tgid:297 ppid:2 task_flags:0x4208060 flags:0x00004000
Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space
Call Trace:
<TASK>
__schedule+0x2f9/0x7b0
schedule+0x27/0x80
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x4af/0x890
? srso_return_thunk+0x5/0x5f
btrfs_start_delalloc_roots+0x8a/0x290
? timerqueue_del+0x2e/0x60
shrink_delalloc+0x10c/0x2d0
? srso_return_thunk+0x5/0x5f
? psi_group_change+0x19e/0x460
? srso_return_thunk+0x5/0x5f
? btrfs_reduce_alloc_profile+0x9a/0x1d0
flush_space+0x202/0x280
? srso_return_thunk+0x5/0x5f
? need_preemptive_reclaim+0xaa/0x190
btrfs_preempt_reclaim_metadata_space+0xe7/0x340
process_one_work+0x192/0x350
worker_thread+0x25a/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xfc/0x240
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x152/0x180
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
INFO: task kworker/u132:2:297 is blocked on a mutex likely owned by task kworker/u129:0:2359.
task:kworker/u129:0 state:R running task stack:0 pid:2359 tgid:2359 ppid:2
The affected tasks are blocked on 'struct btrfs_fs_info::delalloc_root_mutex',
a global lock that serializes entry into btrfs_start_delalloc_roots().
This lock was introduced in commit 573bfb72f760 ("Btrfs: fix possible
empty list access when flushing the delalloc inodes") but without a
clear justification for its necessity.
However, the condition it was meant to protect against—a possibly empty
list access—is already safely handled by 'list_splice_init()', which
does nothing when the source list is empty.
There are no known concurrency issues in btrfs_start_delalloc_roots()
that require serialization via this mutex. All critical regions are
either covered by per-root locking or operate on safely isolated lists.
Removing the lock eliminates the observed hangs and improves metadata
GC throughput, particularly on systems with high concurrency like
ZNS-based deployments.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/disk-io.c | 1 -
fs/btrfs/fs.h | 1 -
fs/btrfs/inode.c | 2 --
3 files changed, 4 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 35cd38de7727..929f39886b0e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2795,7 +2795,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
mutex_init(&fs_info->unused_bg_unpin_mutex);
mutex_init(&fs_info->reclaim_bgs_lock);
mutex_init(&fs_info->reloc_mutex);
- mutex_init(&fs_info->delalloc_root_mutex);
mutex_init(&fs_info->zoned_meta_io_lock);
mutex_init(&fs_info->zoned_data_reloc_io_lock);
seqlock_init(&fs_info->profiles_lock);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index a388af40a251..04ebc976f841 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -606,7 +606,6 @@ struct btrfs_fs_info {
*/
struct list_head ordered_roots;
- struct mutex delalloc_root_mutex;
spinlock_t delalloc_root_lock;
/* All fs/file tree roots that have delalloc inodes. */
struct list_head delalloc_roots;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80c72c594b19..d68f4ef61c43 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8766,7 +8766,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
if (BTRFS_FS_ERROR(fs_info))
return -EROFS;
- mutex_lock(&fs_info->delalloc_root_mutex);
spin_lock(&fs_info->delalloc_root_lock);
list_splice_init(&fs_info->delalloc_roots, &splice);
while (!list_empty(&splice)) {
@@ -8800,7 +8799,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
list_splice_tail(&splice, &fs_info->delalloc_roots);
spin_unlock(&fs_info->delalloc_root_lock);
}
- mutex_unlock(&fs_info->delalloc_root_mutex);
return ret;
}
--
2.49.0
next prev parent reply other threads:[~2025-06-27 9:19 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-27 9:19 [PATCH RFC 0/9] btrfs: zoned: fixes for garbage collection under preassure Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 1/9] btrfs: zoned: do not select metadata BG as finish target Johannes Thumshirn
2025-06-27 11:34 ` Christoph Hellwig
2025-07-02 15:34 ` Naohiro Aota
2025-06-27 9:19 ` [PATCH RFC 2/9] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 3/9] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 4/9] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn
2025-06-27 9:19 ` Johannes Thumshirn [this message]
2025-06-27 12:42 ` [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex Filipe Manana
2025-06-27 9:19 ` [PATCH RFC 6/9] btrfs: remove btrfs_root's delalloc_mutex Johannes Thumshirn
2025-06-27 12:30 ` Filipe Manana
2025-06-27 9:19 ` [PATCH RFC 7/9] btrfs: lower auto-reclaim message log level Johannes Thumshirn
2025-06-27 11:35 ` Christoph Hellwig
2025-06-27 9:19 ` [PATCH RFC 8/9] btrfs: lower log level of relocation messages Johannes Thumshirn
2025-06-27 11:36 ` Christoph Hellwig
2025-06-30 17:12 ` David Sterba
2025-07-01 5:09 ` Johannes Thumshirn
2025-07-01 14:43 ` David Sterba
2025-06-27 9:19 ` [PATCH RFC 9/9] btrfs: remove unused bgs on allocation failure Johannes Thumshirn
2025-06-27 11:38 ` Christoph Hellwig
2025-06-30 11:45 ` Johannes Thumshirn
2025-06-30 12:05 ` Filipe Manana
2025-06-27 12:14 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250627091914.100715-6-jth@kernel.org \
--to=jth@kernel.org \
--cc=boris@bur.io \
--cc=dlemoal@kernel.org \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=johannes.thumshirn@wdc.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox