From: Johannes Thumshirn <jth@kernel.org>
To: linux-btrfs@vger.kernel.org
Cc: Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
David Sterba <dsterba@suse.com>,
Josef Bacik <josef@toxicpanda.com>, Boris Burkov <boris@bur.io>,
Filipe Manana <fdmanana@suse.com>,
Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex
Date: Fri, 27 Jun 2025 11:19:10 +0200 [thread overview]
Message-ID: <20250627091914.100715-6-jth@kernel.org> (raw)
In-Reply-To: <20250627091914.100715-1-jth@kernel.org>
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
When benchmarking garbage collection on zoned BTRFS filesystems on ZNS
drives, we regularly observe hung_task messages like the following:
INFO: task kworker/u132:2:297 blocked for more than 122 seconds.
Not tainted 6.16.0-rc1+ #1225
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u132:2 state:D stack:0 pid:297 tgid:297 ppid:2 task_flags:0x4208060 flags:0x00004000
Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space
Call Trace:
<TASK>
__schedule+0x2f9/0x7b0
schedule+0x27/0x80
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x4af/0x890
? srso_return_thunk+0x5/0x5f
btrfs_start_delalloc_roots+0x8a/0x290
? timerqueue_del+0x2e/0x60
shrink_delalloc+0x10c/0x2d0
? srso_return_thunk+0x5/0x5f
? psi_group_change+0x19e/0x460
? srso_return_thunk+0x5/0x5f
? btrfs_reduce_alloc_profile+0x9a/0x1d0
flush_space+0x202/0x280
? srso_return_thunk+0x5/0x5f
? need_preemptive_reclaim+0xaa/0x190
btrfs_preempt_reclaim_metadata_space+0xe7/0x340
process_one_work+0x192/0x350
worker_thread+0x25a/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xfc/0x240
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x152/0x180
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
INFO: task kworker/u132:2:297 is blocked on a mutex likely owned by task kworker/u129:0:2359.
task:kworker/u129:0 state:R running task stack:0 pid:2359 tgid:2359 ppid:2
The affected tasks are blocked on 'struct btrfs_fs_info::delalloc_root_mutex',
a global lock that serializes entry into btrfs_start_delalloc_roots().
This lock was introduced in commit 573bfb72f760 ("Btrfs: fix possible
empty list access when flushing the delalloc inodes") but without a
clear justification for its necessity.
However, the condition it was meant to protect against—a possibly empty
list access—is already safely handled by 'list_splice_init()', which
does nothing when the source list is empty.
There are no known concurrency issues in btrfs_start_delalloc_roots()
that require serialization via this mutex. All critical regions are
either covered by per-root locking or operate on safely isolated lists.
Removing the lock eliminates the observed hangs and improves metadata
GC throughput, particularly on systems with high concurrency like
ZNS-based deployments.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/disk-io.c | 1 -
fs/btrfs/fs.h | 1 -
fs/btrfs/inode.c | 2 --
3 files changed, 4 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 35cd38de7727..929f39886b0e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2795,7 +2795,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
mutex_init(&fs_info->unused_bg_unpin_mutex);
mutex_init(&fs_info->reclaim_bgs_lock);
mutex_init(&fs_info->reloc_mutex);
- mutex_init(&fs_info->delalloc_root_mutex);
mutex_init(&fs_info->zoned_meta_io_lock);
mutex_init(&fs_info->zoned_data_reloc_io_lock);
seqlock_init(&fs_info->profiles_lock);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index a388af40a251..04ebc976f841 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -606,7 +606,6 @@ struct btrfs_fs_info {
*/
struct list_head ordered_roots;
- struct mutex delalloc_root_mutex;
spinlock_t delalloc_root_lock;
/* All fs/file tree roots that have delalloc inodes. */
struct list_head delalloc_roots;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80c72c594b19..d68f4ef61c43 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8766,7 +8766,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
if (BTRFS_FS_ERROR(fs_info))
return -EROFS;
- mutex_lock(&fs_info->delalloc_root_mutex);
spin_lock(&fs_info->delalloc_root_lock);
list_splice_init(&fs_info->delalloc_roots, &splice);
while (!list_empty(&splice)) {
@@ -8800,7 +8799,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
list_splice_tail(&splice, &fs_info->delalloc_roots);
spin_unlock(&fs_info->delalloc_root_lock);
}
- mutex_unlock(&fs_info->delalloc_root_mutex);
return ret;
}
--
2.49.0
next prev parent reply other threads:[~2025-06-27 9:19 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-27 9:19 [PATCH RFC 0/9] btrfs: zoned: fixes for garbage collection under preassure Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 1/9] btrfs: zoned: do not select metadata BG as finish target Johannes Thumshirn
2025-06-27 11:34 ` Christoph Hellwig
2025-07-02 15:34 ` Naohiro Aota
2025-06-27 9:19 ` [PATCH RFC 2/9] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 3/9] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
2025-06-27 9:19 ` [PATCH RFC 4/9] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn
2025-06-27 9:19 ` Johannes Thumshirn [this message]
2025-06-27 12:42 ` [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex Filipe Manana
2025-06-27 9:19 ` [PATCH RFC 6/9] btrfs: remove btrfs_root's delalloc_mutex Johannes Thumshirn
2025-06-27 12:30 ` Filipe Manana
2025-06-27 9:19 ` [PATCH RFC 7/9] btrfs: lower auto-reclaim message log level Johannes Thumshirn
2025-06-27 11:35 ` Christoph Hellwig
2025-06-27 23:24 ` kernel test robot
2025-06-27 9:19 ` [PATCH RFC 8/9] btrfs: lower log level of relocation messages Johannes Thumshirn
2025-06-27 11:36 ` Christoph Hellwig
2025-06-27 23:44 ` kernel test robot
2025-06-30 17:12 ` David Sterba
2025-07-01 5:09 ` Johannes Thumshirn
2025-07-01 14:43 ` David Sterba
2025-06-27 9:19 ` [PATCH RFC 9/9] btrfs: remove unused bgs on allocation failure Johannes Thumshirn
2025-06-27 11:38 ` Christoph Hellwig
2025-06-30 11:45 ` Johannes Thumshirn
2025-06-30 12:05 ` Filipe Manana
2025-06-27 12:14 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250627091914.100715-6-jth@kernel.org \
--to=jth@kernel.org \
--cc=boris@bur.io \
--cc=dlemoal@kernel.org \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=johannes.thumshirn@wdc.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.