public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jth@kernel.org>
To: linux-btrfs@vger.kernel.org
Cc: Damien Le Moal <dlemoal@kernel.org>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	David Sterba <dsterba@suse.com>,
	Josef Bacik <josef@toxicpanda.com>, Boris Burkov <boris@bur.io>,
	Filipe Manana <fdmanana@suse.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex
Date: Fri, 27 Jun 2025 11:19:10 +0200	[thread overview]
Message-ID: <20250627091914.100715-6-jth@kernel.org> (raw)
In-Reply-To: <20250627091914.100715-1-jth@kernel.org>

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When benchmarking garbage collection on zoned BTRFS filesystems on ZNS
drives, we regularly observe hung_task messages like the following:

INFO: task kworker/u132:2:297 blocked for more than 122 seconds.
       Not tainted 6.16.0-rc1+ #1225
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u132:2  state:D stack:0     pid:297   tgid:297   ppid:2      task_flags:0x4208060 flags:0x00004000
 Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space
 Call Trace:
  <TASK>
  __schedule+0x2f9/0x7b0
  schedule+0x27/0x80
  schedule_preempt_disabled+0x15/0x30
  __mutex_lock.constprop.0+0x4af/0x890
  ? srso_return_thunk+0x5/0x5f
  btrfs_start_delalloc_roots+0x8a/0x290
  ? timerqueue_del+0x2e/0x60
  shrink_delalloc+0x10c/0x2d0
  ? srso_return_thunk+0x5/0x5f
  ? psi_group_change+0x19e/0x460
  ? srso_return_thunk+0x5/0x5f
  ? btrfs_reduce_alloc_profile+0x9a/0x1d0
  flush_space+0x202/0x280
  ? srso_return_thunk+0x5/0x5f
  ? need_preemptive_reclaim+0xaa/0x190
  btrfs_preempt_reclaim_metadata_space+0xe7/0x340
  process_one_work+0x192/0x350
  worker_thread+0x25a/0x3a0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xfc/0x240
  ? __pfx_kthread+0x10/0x10
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x152/0x180
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 INFO: task kworker/u132:2:297 is blocked on a mutex likely owned by task kworker/u129:0:2359.
 task:kworker/u129:0  state:R  running task     stack:0     pid:2359  tgid:2359  ppid:2

The affected tasks are blocked on 'struct btrfs_fs_info::delalloc_root_mutex',
a global lock that serializes entry into btrfs_start_delalloc_roots().
This lock was introduced in commit 573bfb72f760 ("Btrfs: fix possible
empty list access when flushing the delalloc inodes") but without a
clear justification for its necessity.

However, the condition it was meant to protect against—a possibly empty
list access—is already safely handled by 'list_splice_init()', which
does nothing when the source list is empty.

There are no known concurrency issues in btrfs_start_delalloc_roots()
that require serialization via this mutex. All critical regions are
either covered by per-root locking or operate on safely isolated lists.

Removing the lock eliminates the observed hangs and improves metadata
GC throughput, particularly on systems with high concurrency like
ZNS-based deployments.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/disk-io.c | 1 -
 fs/btrfs/fs.h      | 1 -
 fs/btrfs/inode.c   | 2 --
 3 files changed, 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 35cd38de7727..929f39886b0e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2795,7 +2795,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
 	mutex_init(&fs_info->reclaim_bgs_lock);
 	mutex_init(&fs_info->reloc_mutex);
-	mutex_init(&fs_info->delalloc_root_mutex);
 	mutex_init(&fs_info->zoned_meta_io_lock);
 	mutex_init(&fs_info->zoned_data_reloc_io_lock);
 	seqlock_init(&fs_info->profiles_lock);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index a388af40a251..04ebc976f841 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -606,7 +606,6 @@ struct btrfs_fs_info {
 	 */
 	struct list_head ordered_roots;
 
-	struct mutex delalloc_root_mutex;
 	spinlock_t delalloc_root_lock;
 	/* All fs/file tree roots that have delalloc inodes. */
 	struct list_head delalloc_roots;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80c72c594b19..d68f4ef61c43 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8766,7 +8766,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
 	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
-	mutex_lock(&fs_info->delalloc_root_mutex);
 	spin_lock(&fs_info->delalloc_root_lock);
 	list_splice_init(&fs_info->delalloc_roots, &splice);
 	while (!list_empty(&splice)) {
@@ -8800,7 +8799,6 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
 		list_splice_tail(&splice, &fs_info->delalloc_roots);
 		spin_unlock(&fs_info->delalloc_root_lock);
 	}
-	mutex_unlock(&fs_info->delalloc_root_mutex);
 	return ret;
 }
 
-- 
2.49.0


  parent reply	other threads:[~2025-06-27  9:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27  9:19 [PATCH RFC 0/9] btrfs: zoned: fixes for garbage collection under preassure Johannes Thumshirn
2025-06-27  9:19 ` [PATCH RFC 1/9] btrfs: zoned: do not select metadata BG as finish target Johannes Thumshirn
2025-06-27 11:34   ` Christoph Hellwig
2025-07-02 15:34     ` Naohiro Aota
2025-06-27  9:19 ` [PATCH RFC 2/9] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
2025-06-27  9:19 ` [PATCH RFC 3/9] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
2025-06-27  9:19 ` [PATCH RFC 4/9] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn
2025-06-27  9:19 ` Johannes Thumshirn [this message]
2025-06-27 12:42   ` [PATCH RFC 5/9] btrfs: remove delalloc_root_mutex Filipe Manana
2025-06-27  9:19 ` [PATCH RFC 6/9] btrfs: remove btrfs_root's delalloc_mutex Johannes Thumshirn
2025-06-27 12:30   ` Filipe Manana
2025-06-27  9:19 ` [PATCH RFC 7/9] btrfs: lower auto-reclaim message log level Johannes Thumshirn
2025-06-27 11:35   ` Christoph Hellwig
2025-06-27  9:19 ` [PATCH RFC 8/9] btrfs: lower log level of relocation messages Johannes Thumshirn
2025-06-27 11:36   ` Christoph Hellwig
2025-06-30 17:12   ` David Sterba
2025-07-01  5:09     ` Johannes Thumshirn
2025-07-01 14:43       ` David Sterba
2025-06-27  9:19 ` [PATCH RFC 9/9] btrfs: remove unused bgs on allocation failure Johannes Thumshirn
2025-06-27 11:38   ` Christoph Hellwig
2025-06-30 11:45     ` Johannes Thumshirn
2025-06-30 12:05       ` Filipe Manana
2025-06-27 12:14   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250627091914.100715-6-jth@kernel.org \
    --to=jth@kernel.org \
    --cc=boris@bur.io \
    --cc=dlemoal@kernel.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox