All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs: zoned: reduce lock contention in zoned extent allocator
@ 2025-06-24  9:37 Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2025-06-24  9:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

While investigating performance issues on zoned btrfs, I've analyzed
lockstat output while running fio workloads.

Three locks that have always been on top are the space_info's ->lock, as
well as fs_info's ->treelog_bg_lock and ->relocation_bg_lock.

The first one is not as trivial to get rid and most likely legit in most
cases (see patch #3). The other two are only protecting a single value in
fs_info and can be replaced by REAC_ONCE() and WRITE_ONCE().

The series has been run through fstests on tcmu-runner emulated SMR drives
as well as Damien's fio benchmarks on real ZNS hardware and no regressions
have been observed.

Changes in v2:
- Rewording of the commit message (Damien)
- No more local variables caching the values (Damien)

Johannes Thumshirn (3):
  btrfs: zoned: get rid of relocation_bg_lock
  btrfs: zoned: get rid of treelog_bg_lock
  btrfs: zoned: don't hold space_info lock on zoned allocation

 fs/btrfs/disk-io.c     |  2 -
 fs/btrfs/extent-tree.c | 86 +++++++++++-------------------------------
 fs/btrfs/fs.h          |  7 +---
 fs/btrfs/zoned.c       | 12 +++---
 fs/btrfs/zoned.h       |  6 +--
 5 files changed, 31 insertions(+), 82 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock
  2025-06-24  9:37 [PATCH v2 0/3] btrfs: zoned: reduce lock contention in zoned extent allocator Johannes Thumshirn
@ 2025-06-24  9:37 ` Johannes Thumshirn
  2025-06-24 12:32   ` Naohiro Aota
  2025-06-24  9:37 ` [PATCH v2 2/3] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 3/3] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn
  2 siblings, 1 reply; 6+ messages in thread
From: Johannes Thumshirn @ 2025-06-24  9:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Lockstat analysis of benchmark workloads shows a very high contention on
relocation_bg_lock. But the lock only protects a single field in
'struct btrfs_fs_info', namely 'u64 data_reloc_bg'.

Use READ_ONCE()/WRITE_ONCE() to access 'btrfs_fs_info::data_reloc_bg'.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/disk-io.c     |  1 -
 fs/btrfs/extent-tree.c | 34 ++++++++++++----------------------
 fs/btrfs/fs.h          |  6 +-----
 fs/btrfs/zoned.c       | 10 ++++------
 4 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ee3cdd7035cc..24896322376d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2791,7 +2791,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	spin_lock_init(&fs_info->unused_bgs_lock);
 	spin_lock_init(&fs_info->treelog_bg_lock);
 	spin_lock_init(&fs_info->zone_active_bgs_lock);
-	spin_lock_init(&fs_info->relocation_bg_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	rwlock_init(&fs_info->global_root_lock);
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 10f50c725313..80ceca89a9ef 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3842,7 +3842,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	u64 avail;
 	u64 bytenr = block_group->start;
 	u64 log_bytenr;
-	u64 data_reloc_bytenr;
 	int ret = 0;
 	bool skip = false;
 
@@ -3865,14 +3864,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	 * Do not allow non-relocation blocks in the dedicated relocation block
 	 * group, and vice versa.
 	 */
-	spin_lock(&fs_info->relocation_bg_lock);
-	data_reloc_bytenr = fs_info->data_reloc_bg;
-	if (data_reloc_bytenr &&
-	    ((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
-	     (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr)))
-		skip = true;
-	spin_unlock(&fs_info->relocation_bg_lock);
-	if (skip)
+	if (READ_ONCE(fs_info->data_reloc_bg) &&
+	    ((ffe_ctl->for_data_reloc && bytenr != READ_ONCE(fs_info->data_reloc_bg)) ||
+	     (!ffe_ctl->for_data_reloc && bytenr == READ_ONCE(fs_info->data_reloc_bg))))
 		return 1;
 
 	/* Check RO and no space case before trying to activate it */
@@ -3899,7 +3893,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
 	spin_lock(&fs_info->treelog_bg_lock);
-	spin_lock(&fs_info->relocation_bg_lock);
 
 	if (ret)
 		goto out;
@@ -3908,8 +3901,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	       block_group->start == fs_info->treelog_bg ||
 	       fs_info->treelog_bg == 0);
 	ASSERT(!ffe_ctl->for_data_reloc ||
-	       block_group->start == fs_info->data_reloc_bg ||
-	       fs_info->data_reloc_bg == 0);
+	       block_group->start == READ_ONCE(fs_info->data_reloc_bg) ||
+	       READ_ONCE(fs_info->data_reloc_bg) == 0);
 
 	if (block_group->ro ||
 	    (!ffe_ctl->for_data_reloc &&
@@ -3932,7 +3925,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	 * Do not allow currently used block group to be the data relocation
 	 * dedicated block group.
 	 */
-	if (ffe_ctl->for_data_reloc && !fs_info->data_reloc_bg &&
+	if (ffe_ctl->for_data_reloc && READ_ONCE(fs_info->data_reloc_bg) == 0 &&
 	    (block_group->used || block_group->reserved)) {
 		ret = 1;
 		goto out;
@@ -3957,8 +3950,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 		fs_info->treelog_bg = block_group->start;
 
 	if (ffe_ctl->for_data_reloc) {
-		if (!fs_info->data_reloc_bg)
-			fs_info->data_reloc_bg = block_group->start;
+		if (READ_ONCE(fs_info->data_reloc_bg) == 0)
+			WRITE_ONCE(fs_info->data_reloc_bg, block_group->start);
 		/*
 		 * Do not allow allocations from this block group, unless it is
 		 * for data relocation. Compared to increasing the ->ro, setting
@@ -3994,8 +3987,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	if (ret && ffe_ctl->for_treelog)
 		fs_info->treelog_bg = 0;
 	if (ret && ffe_ctl->for_data_reloc)
-		fs_info->data_reloc_bg = 0;
-	spin_unlock(&fs_info->relocation_bg_lock);
+		WRITE_ONCE(fs_info->data_reloc_bg, 0);
 	spin_unlock(&fs_info->treelog_bg_lock);
 	spin_unlock(&block_group->lock);
 	spin_unlock(&space_info->lock);
@@ -4303,11 +4295,9 @@ static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info,
 		if (fs_info->treelog_bg)
 			ffe_ctl->hint_byte = fs_info->treelog_bg;
 		spin_unlock(&fs_info->treelog_bg_lock);
-	} else if (ffe_ctl->for_data_reloc) {
-		spin_lock(&fs_info->relocation_bg_lock);
-		if (fs_info->data_reloc_bg)
-			ffe_ctl->hint_byte = fs_info->data_reloc_bg;
-		spin_unlock(&fs_info->relocation_bg_lock);
+	} else if (ffe_ctl->for_data_reloc &&
+		   READ_ONCE(fs_info->data_reloc_bg)) {
+			ffe_ctl->hint_byte = READ_ONCE(fs_info->data_reloc_bg);
 	} else if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) {
 		struct btrfs_block_group *block_group;
 
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b239e4b8421c..570f4b85096c 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -849,11 +849,7 @@ struct btrfs_fs_info {
 	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
 
-	/*
-	 * Start of the dedicated data relocation block group, protected by
-	 * relocation_bg_lock.
-	 */
-	spinlock_t relocation_bg_lock;
+	/* Start of the dedicated data relocation block group */
 	u64 data_reloc_bg;
 	struct mutex zoned_data_reloc_io_lock;
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index bd987c90a05c..421afdb6eb39 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2496,10 +2496,8 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
 {
 	struct btrfs_fs_info *fs_info = bg->fs_info;
 
-	spin_lock(&fs_info->relocation_bg_lock);
-	if (fs_info->data_reloc_bg == bg->start)
-		fs_info->data_reloc_bg = 0;
-	spin_unlock(&fs_info->relocation_bg_lock);
+	if (READ_ONCE(fs_info->data_reloc_bg) == bg->start)
+		WRITE_ONCE(fs_info->data_reloc_bg, 0);
 }
 
 void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
@@ -2518,7 +2516,7 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
 	if (!btrfs_is_zoned(fs_info))
 		return;
 
-	if (fs_info->data_reloc_bg)
+	if (READ_ONCE(fs_info->data_reloc_bg))
 		return;
 
 	if (sb_rdonly(fs_info->sb))
@@ -2539,7 +2537,7 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
 			continue;
 		}
 
-		fs_info->data_reloc_bg = bg->start;
+		WRITE_ONCE(fs_info->data_reloc_bg, bg->start);
 		set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
 		btrfs_zone_activate(bg);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] btrfs: zoned: get rid of treelog_bg_lock
  2025-06-24  9:37 [PATCH v2 0/3] btrfs: zoned: reduce lock contention in zoned extent allocator Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
@ 2025-06-24  9:37 ` Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 3/3] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn
  2 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2025-06-24  9:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Lockstat analysis of benchmark workloads shows a very high contention on
treelog_bg_lock. But the lock only protects a single field in
'struct btrfs_fs_info', namely 'u64 treelog_bg'.

Use READ_ONCE()/WRITE_ONCE() to access 'btrfs_fs_info::treelog_bg'.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/disk-io.c     |  1 -
 fs/btrfs/extent-tree.c | 49 ++++++++++--------------------------------
 fs/btrfs/fs.h          |  1 -
 fs/btrfs/zoned.c       |  2 +-
 fs/btrfs/zoned.h       |  6 ++----
 5 files changed, 14 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 24896322376d..a54218717cb4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2789,7 +2789,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	spin_lock_init(&fs_info->defrag_inodes_lock);
 	spin_lock_init(&fs_info->super_lock);
 	spin_lock_init(&fs_info->unused_bgs_lock);
-	spin_lock_init(&fs_info->treelog_bg_lock);
 	spin_lock_init(&fs_info->zone_active_bgs_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	rwlock_init(&fs_info->global_root_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 80ceca89a9ef..39d3984153a2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3809,22 +3809,6 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group,
 	return find_free_extent_unclustered(block_group, ffe_ctl);
 }
 
-/*
- * Tree-log block group locking
- * ============================
- *
- * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which
- * indicates the starting address of a block group, which is reserved only
- * for tree-log metadata.
- *
- * Lock nesting
- * ============
- *
- * space_info::lock
- *   block_group::lock
- *     fs_info::treelog_bg_lock
- */
-
 /*
  * Simple allocator for sequential-only block group. It only allows sequential
  * allocation. No need to play with trees. This function also reserves the
@@ -3841,9 +3825,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	u64 num_bytes = ffe_ctl->num_bytes;
 	u64 avail;
 	u64 bytenr = block_group->start;
-	u64 log_bytenr;
 	int ret = 0;
-	bool skip = false;
 
 	ASSERT(btrfs_is_zoned(block_group->fs_info));
 
@@ -3851,13 +3833,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	 * Do not allow non-tree-log blocks in the dedicated tree-log block
 	 * group, and vice versa.
 	 */
-	spin_lock(&fs_info->treelog_bg_lock);
-	log_bytenr = fs_info->treelog_bg;
-	if (log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) ||
-			   (!ffe_ctl->for_treelog && bytenr == log_bytenr)))
-		skip = true;
-	spin_unlock(&fs_info->treelog_bg_lock);
-	if (skip)
+	if (READ_ONCE(fs_info->treelog_bg) &&
+	    ((ffe_ctl->for_treelog && bytenr != READ_ONCE(fs_info->treelog_bg)) ||
+	     (!ffe_ctl->for_treelog && bytenr == READ_ONCE(fs_info->treelog_bg))))
 		return 1;
 
 	/*
@@ -3892,14 +3870,13 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 
 	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
-	spin_lock(&fs_info->treelog_bg_lock);
 
 	if (ret)
 		goto out;
 
 	ASSERT(!ffe_ctl->for_treelog ||
-	       block_group->start == fs_info->treelog_bg ||
-	       fs_info->treelog_bg == 0);
+	       block_group->start == READ_ONCE(fs_info->treelog_bg) ||
+	       READ_ONCE(fs_info->treelog_bg) == 0);
 	ASSERT(!ffe_ctl->for_data_reloc ||
 	       block_group->start == READ_ONCE(fs_info->data_reloc_bg) ||
 	       READ_ONCE(fs_info->data_reloc_bg) == 0);
@@ -3915,7 +3892,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	 * Do not allow currently using block group to be tree-log dedicated
 	 * block group.
 	 */
-	if (ffe_ctl->for_treelog && !fs_info->treelog_bg &&
+	if (ffe_ctl->for_treelog && READ_ONCE(fs_info->treelog_bg) == 0 &&
 	    (block_group->used || block_group->reserved)) {
 		ret = 1;
 		goto out;
@@ -3946,8 +3923,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 		goto out;
 	}
 
-	if (ffe_ctl->for_treelog && !fs_info->treelog_bg)
-		fs_info->treelog_bg = block_group->start;
+	if (ffe_ctl->for_treelog && READ_ONCE(fs_info->treelog_bg) == 0)
+		WRITE_ONCE(fs_info->treelog_bg, block_group->start);
 
 	if (ffe_ctl->for_data_reloc) {
 		if (READ_ONCE(fs_info->data_reloc_bg) == 0)
@@ -3985,10 +3962,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 
 out:
 	if (ret && ffe_ctl->for_treelog)
-		fs_info->treelog_bg = 0;
+		WRITE_ONCE(fs_info->treelog_bg, 0);
 	if (ret && ffe_ctl->for_data_reloc)
 		WRITE_ONCE(fs_info->data_reloc_bg, 0);
-	spin_unlock(&fs_info->treelog_bg_lock);
 	spin_unlock(&block_group->lock);
 	spin_unlock(&space_info->lock);
 	return ret;
@@ -4290,11 +4266,8 @@ static int prepare_allocation_clustered(struct btrfs_fs_info *fs_info,
 static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info,
 				    struct find_free_extent_ctl *ffe_ctl)
 {
-	if (ffe_ctl->for_treelog) {
-		spin_lock(&fs_info->treelog_bg_lock);
-		if (fs_info->treelog_bg)
-			ffe_ctl->hint_byte = fs_info->treelog_bg;
-		spin_unlock(&fs_info->treelog_bg_lock);
+	if (ffe_ctl->for_treelog && READ_ONCE(fs_info->treelog_bg)) {
+			ffe_ctl->hint_byte = READ_ONCE(fs_info->treelog_bg);
 	} else if (ffe_ctl->for_data_reloc &&
 		   READ_ONCE(fs_info->data_reloc_bg)) {
 			ffe_ctl->hint_byte = READ_ONCE(fs_info->data_reloc_bg);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 570f4b85096c..a388af40a251 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -846,7 +846,6 @@ struct btrfs_fs_info {
 	u64 max_zone_append_size;
 
 	struct mutex zoned_meta_io_lock;
-	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
 
 	/* Start of the dedicated data relocation block group */
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 421afdb6eb39..98f483def77d 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1948,7 +1948,7 @@ static bool check_bg_is_active(struct btrfs_eb_write_context *ctx,
 	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
 		return true;
 
-	if (fs_info->treelog_bg == block_group->start) {
+	if (READ_ONCE(fs_info->treelog_bg) == block_group->start) {
 		if (!btrfs_zone_activate(block_group)) {
 			int ret_fin = btrfs_zone_finish_one_bg(fs_info);
 
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 6e11533b8e14..f71026b6c100 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -387,10 +387,8 @@ static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg)
 	if (!btrfs_is_zoned(fs_info))
 		return;
 
-	spin_lock(&fs_info->treelog_bg_lock);
-	if (fs_info->treelog_bg == bg->start)
-		fs_info->treelog_bg = 0;
-	spin_unlock(&fs_info->treelog_bg_lock);
+	if (READ_ONCE(fs_info->treelog_bg) == bg->start)
+		WRITE_ONCE(fs_info->treelog_bg, 0);
 }
 
 static inline void btrfs_zoned_data_reloc_lock(struct btrfs_inode *inode)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] btrfs: zoned: don't hold space_info lock on zoned allocation
  2025-06-24  9:37 [PATCH v2 0/3] btrfs: zoned: reduce lock contention in zoned extent allocator Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
  2025-06-24  9:37 ` [PATCH v2 2/3] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
@ 2025-06-24  9:37 ` Johannes Thumshirn
  2 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2025-06-24  9:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

The zoned extent allocator holds 'struct btrfs_space_info::lock' nearly
over the entirety of the allocation process, but nothing in
do_allocation_zoned() is actually accessing fields of 'struct
btrfs_space_info'.

Furthermore taking lock_stat snapshots in performance testing, always shows
the space_info::lock as the most contented lock in the entire system.

Remove locking the space_info lock during do_allocation_zoned() to reduce
lock contention.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent-tree.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 39d3984153a2..e8d607877fb4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3819,7 +3819,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 			       struct btrfs_block_group **bg_ret)
 {
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
-	struct btrfs_space_info *space_info = block_group->space_info;
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
 	u64 start = block_group->start;
 	u64 num_bytes = ffe_ctl->num_bytes;
@@ -3868,7 +3867,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 		 */
 	}
 
-	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
 
 	if (ret)
@@ -3966,7 +3964,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	if (ret && ffe_ctl->for_data_reloc)
 		WRITE_ONCE(fs_info->data_reloc_bg, 0);
 	spin_unlock(&block_group->lock);
-	spin_unlock(&space_info->lock);
 	return ret;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock
  2025-06-24  9:37 ` [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
@ 2025-06-24 12:32   ` Naohiro Aota
  2025-06-25 10:28     ` Johannes Thumshirn
  0 siblings, 1 reply; 6+ messages in thread
From: Naohiro Aota @ 2025-06-24 12:32 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-btrfs@vger.kernel.org
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn

On Tue Jun 24, 2025 at 6:37 PM JST, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Lockstat analysis of benchmark workloads shows a very high contention on
> relocation_bg_lock. But the lock only protects a single field in
> 'struct btrfs_fs_info', namely 'u64 data_reloc_bg'.
>
> Use READ_ONCE()/WRITE_ONCE() to access 'btrfs_fs_info::data_reloc_bg'.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/disk-io.c     |  1 -
>  fs/btrfs/extent-tree.c | 34 ++++++++++++----------------------
>  fs/btrfs/fs.h          |  6 +-----
>  fs/btrfs/zoned.c       | 10 ++++------
>  4 files changed, 17 insertions(+), 34 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index ee3cdd7035cc..24896322376d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2791,7 +2791,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>  	spin_lock_init(&fs_info->unused_bgs_lock);
>  	spin_lock_init(&fs_info->treelog_bg_lock);
>  	spin_lock_init(&fs_info->zone_active_bgs_lock);
> -	spin_lock_init(&fs_info->relocation_bg_lock);
>  	rwlock_init(&fs_info->tree_mod_log_lock);
>  	rwlock_init(&fs_info->global_root_lock);
>  	mutex_init(&fs_info->unused_bg_unpin_mutex);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 10f50c725313..80ceca89a9ef 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3842,7 +3842,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	u64 avail;
>  	u64 bytenr = block_group->start;
>  	u64 log_bytenr;
> -	u64 data_reloc_bytenr;
>  	int ret = 0;
>  	bool skip = false;
>  
> @@ -3865,14 +3864,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	 * Do not allow non-relocation blocks in the dedicated relocation block
>  	 * group, and vice versa.
>  	 */
> -	spin_lock(&fs_info->relocation_bg_lock);
> -	data_reloc_bytenr = fs_info->data_reloc_bg;
> -	if (data_reloc_bytenr &&
> -	    ((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
> -	     (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr)))
> -		skip = true;
> -	spin_unlock(&fs_info->relocation_bg_lock);
> -	if (skip)
> +	if (READ_ONCE(fs_info->data_reloc_bg) &&
> +	    ((ffe_ctl->for_data_reloc && bytenr != READ_ONCE(fs_info->data_reloc_bg)) ||
> +	     (!ffe_ctl->for_data_reloc && bytenr == READ_ONCE(fs_info->data_reloc_bg))))
>  		return 1;
>  
>  	/* Check RO and no space case before trying to activate it */
> @@ -3899,7 +3893,6 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	spin_lock(&space_info->lock);
>  	spin_lock(&block_group->lock);
>  	spin_lock(&fs_info->treelog_bg_lock);
> -	spin_lock(&fs_info->relocation_bg_lock);
>  
>  	if (ret)
>  		goto out;
> @@ -3908,8 +3901,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	       block_group->start == fs_info->treelog_bg ||
>  	       fs_info->treelog_bg == 0);
>  	ASSERT(!ffe_ctl->for_data_reloc ||
> -	       block_group->start == fs_info->data_reloc_bg ||
> -	       fs_info->data_reloc_bg == 0);
> +	       block_group->start == READ_ONCE(fs_info->data_reloc_bg) ||
> +	       READ_ONCE(fs_info->data_reloc_bg) == 0);
>  
>  	if (block_group->ro ||
>  	    (!ffe_ctl->for_data_reloc &&
> @@ -3932,7 +3925,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	 * Do not allow currently used block group to be the data relocation
>  	 * dedicated block group.
>  	 */
> -	if (ffe_ctl->for_data_reloc && !fs_info->data_reloc_bg &&
> +	if (ffe_ctl->for_data_reloc && READ_ONCE(fs_info->data_reloc_bg) == 0 &&
>  	    (block_group->used || block_group->reserved)) {
>  		ret = 1;
>  		goto out;
> @@ -3957,8 +3950,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  		fs_info->treelog_bg = block_group->start;
>  
>  	if (ffe_ctl->for_data_reloc) {
> -		if (!fs_info->data_reloc_bg)
> -			fs_info->data_reloc_bg = block_group->start;
> +		if (READ_ONCE(fs_info->data_reloc_bg) == 0)
> +			WRITE_ONCE(fs_info->data_reloc_bg, block_group->start);

What happens two threads reach here at the same time and each has a
different block group locked? One of them will get an unexpected state.

Apparently, it does not happen because we have the space_info->lock
around, but patch 3 will remove it...

>  		/*
>  		 * Do not allow allocations from this block group, unless it is
>  		 * for data relocation. Compared to increasing the ->ro, setting
> @@ -3994,8 +3987,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
>  	if (ret && ffe_ctl->for_treelog)
>  		fs_info->treelog_bg = 0;
>  	if (ret && ffe_ctl->for_data_reloc)
> -		fs_info->data_reloc_bg = 0;
> -	spin_unlock(&fs_info->relocation_bg_lock);
> +		WRITE_ONCE(fs_info->data_reloc_bg, 0);
>  	spin_unlock(&fs_info->treelog_bg_lock);
>  	spin_unlock(&block_group->lock);
>  	spin_unlock(&space_info->lock);
> @@ -4303,11 +4295,9 @@ static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info,
>  		if (fs_info->treelog_bg)
>  			ffe_ctl->hint_byte = fs_info->treelog_bg;
>  		spin_unlock(&fs_info->treelog_bg_lock);
> -	} else if (ffe_ctl->for_data_reloc) {
> -		spin_lock(&fs_info->relocation_bg_lock);
> -		if (fs_info->data_reloc_bg)
> -			ffe_ctl->hint_byte = fs_info->data_reloc_bg;
> -		spin_unlock(&fs_info->relocation_bg_lock);
> +	} else if (ffe_ctl->for_data_reloc &&
> +		   READ_ONCE(fs_info->data_reloc_bg)) {
> +			ffe_ctl->hint_byte = READ_ONCE(fs_info->data_reloc_bg);
>  	} else if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) {
>  		struct btrfs_block_group *block_group;
>  
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index b239e4b8421c..570f4b85096c 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -849,11 +849,7 @@ struct btrfs_fs_info {
>  	spinlock_t treelog_bg_lock;
>  	u64 treelog_bg;
>  
> -	/*
> -	 * Start of the dedicated data relocation block group, protected by
> -	 * relocation_bg_lock.
> -	 */
> -	spinlock_t relocation_bg_lock;
> +	/* Start of the dedicated data relocation block group */
>  	u64 data_reloc_bg;
>  	struct mutex zoned_data_reloc_io_lock;
>  
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index bd987c90a05c..421afdb6eb39 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2496,10 +2496,8 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
>  {
>  	struct btrfs_fs_info *fs_info = bg->fs_info;
>  
> -	spin_lock(&fs_info->relocation_bg_lock);
> -	if (fs_info->data_reloc_bg == bg->start)
> -		fs_info->data_reloc_bg = 0;
> -	spin_unlock(&fs_info->relocation_bg_lock);
> +	if (READ_ONCE(fs_info->data_reloc_bg) == bg->start)
> +		WRITE_ONCE(fs_info->data_reloc_bg, 0);
>  }
>  
>  void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
> @@ -2518,7 +2516,7 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
>  	if (!btrfs_is_zoned(fs_info))
>  		return;
>  
> -	if (fs_info->data_reloc_bg)
> +	if (READ_ONCE(fs_info->data_reloc_bg))
>  		return;
>  
>  	if (sb_rdonly(fs_info->sb))
> @@ -2539,7 +2537,7 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
>  			continue;
>  		}
>  
> -		fs_info->data_reloc_bg = bg->start;
> +		WRITE_ONCE(fs_info->data_reloc_bg, bg->start);
>  		set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
>  		btrfs_zone_activate(bg);
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock
  2025-06-24 12:32   ` Naohiro Aota
@ 2025-06-25 10:28     ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 10:28 UTC (permalink / raw)
  To: Naohiro Aota, Johannes Thumshirn, linux-btrfs@vger.kernel.org
  Cc: Damien Le Moal

On 24.06.25 14:32, Naohiro Aota wrote:
> What happens two threads reach here at the same time and each has a
> different block group locked? One of them will get an unexpected state.
> 
> Apparently, it does not happen because we have the space_info->lock
> around, but patch 3 will remove it...

But can this really happen, now that we have dedicated space_infos for 
relocation and treelog?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-25 10:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24  9:37 [PATCH v2 0/3] btrfs: zoned: reduce lock contention in zoned extent allocator Johannes Thumshirn
2025-06-24  9:37 ` [PATCH v2 1/3] btrfs: zoned: get rid of relocation_bg_lock Johannes Thumshirn
2025-06-24 12:32   ` Naohiro Aota
2025-06-25 10:28     ` Johannes Thumshirn
2025-06-24  9:37 ` [PATCH v2 2/3] btrfs: zoned: get rid of treelog_bg_lock Johannes Thumshirn
2025-06-24  9:37 ` [PATCH v2 3/3] btrfs: zoned: don't hold space_info lock on zoned allocation Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.