public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] btrfs: zoned: split out data relocation space_info
@ 2024-12-05  7:48 Naohiro Aota
  2024-12-05  7:48 ` [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

As discussed in [1], there is a longstanding early ENOSPC issue on the
zoned mode even with simple fio script. This is also causing blktests
zbd/009 to fail [2].

[1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
[2] https://github.com/osandov/blktests/issues/150

This series is the second part to fix the ENOSPC issue. This series
introduces "space_info sub-space" and use it split a space_info for data
relocation block group.

Current code assumes we have only one space_info for each block group type
(DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
manage special block groups.

One example is handling the data relocation block group for the zoned mode.
That block group is dedicated for writing relocated data and we cannot
allocate any regular extent from that block group, which is implemented in
the zoned extent allocator. That block group still belongs to the normal
data space_info. So, when all the normal data block groups are full and
there are some free space in the dedicated block group, the space_info
looks to have some free space, while it cannot allocate normal extent
anymore. That results in a strange ENOSPC error. We need to have a
space_info for the relocation data block group to represent the situation
properly.

Naohiro Aota (11):
  btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
  btrfs: take struct btrfs_inode in
    btrfs_free_reserved_data_space_noquota
  btrfs: factor out init_space_info()
  btrfs: spin out do_async_reclaim_data_space()
  btrfs: factor out check_removing_space_info()
  btrfs: introduce space_info argument to btrfs_chunk_alloc
  btrfs: pass space_info for block group creation
  btrfs: introduce btrfs_space_info sub-group
  btrfs: tweak extent/chunk allocation for space_info sub-space
  btrfs: use proper data space_info
  btrfs: reclaim from data sub-space space_info

 fs/btrfs/block-group.c    | 89 ++++++++++++++++++++++++---------------
 fs/btrfs/block-group.h    |  5 ++-
 fs/btrfs/delalloc-space.c | 26 ++++++++----
 fs/btrfs/delalloc-space.h |  3 +-
 fs/btrfs/extent-tree.c    |  5 ++-
 fs/btrfs/inode.c          |  4 +-
 fs/btrfs/relocation.c     |  3 +-
 fs/btrfs/space-info.c     | 84 ++++++++++++++++++++++++------------
 fs/btrfs/space-info.h     | 10 ++++-
 fs/btrfs/sysfs.c          | 16 +++++--
 fs/btrfs/transaction.c    |  2 +-
 fs/btrfs/volumes.c        | 16 ++++---
 fs/btrfs/volumes.h        |  2 +-
 13 files changed, 176 insertions(+), 89 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05 23:48   ` Johannes Thumshirn
  2024-12-05  7:48 ` [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Take struct btrfs_space_info in btrfs_reserve_data_bytes() to allow reserving
the data from multiple data space_info candidates.

This is a preparation for the following commits and there is no functional
change.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/delalloc-space.c |  4 ++--
 fs/btrfs/space-info.c     | 10 +++++-----
 fs/btrfs/space-info.h     |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 88e900e5a43d..e9750f96f86c 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -123,7 +123,7 @@ int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
 	if (btrfs_is_free_space_inode(inode))
 		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
 
-	return btrfs_reserve_data_bytes(fs_info, bytes, flush);
+	return btrfs_reserve_data_bytes(fs_info->data_sinfo, bytes, flush);
 }
 
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
@@ -144,7 +144,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 	else if (btrfs_is_free_space_inode(inode))
 		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
 
-	ret = btrfs_reserve_data_bytes(fs_info, len, flush);
+	ret = btrfs_reserve_data_bytes(fs_info->data_sinfo, len, flush);
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index a341d087567a..2c07871480b6 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1836,10 +1836,10 @@ int btrfs_reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
  * This will reserve bytes from the data space info.  If there is not enough
  * space then we will attempt to flush space as specified by flush.
  */
-int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes,
 			     enum btrfs_reserve_flush_enum flush)
 {
-	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+	struct btrfs_fs_info *fs_info = space_info->fs_info;
 	int ret;
 
 	ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
@@ -1847,12 +1847,12 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 	       flush == BTRFS_RESERVE_NO_FLUSH);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
-	ret = __reserve_bytes(fs_info, data_sinfo, bytes, flush);
+	ret = __reserve_bytes(fs_info, space_info, bytes, flush);
 	if (ret == -ENOSPC) {
 		trace_btrfs_space_reservation(fs_info, "space_info:enospc",
-					      data_sinfo->flags, bytes, 1);
+					      space_info->flags, bytes, 1);
 		if (btrfs_test_opt(fs_info, ENOSPC_DEBUG))
-			btrfs_dump_space_info(fs_info, data_sinfo, bytes, 0);
+			btrfs_dump_space_info(fs_info, space_info, bytes, 0);
 	}
 	return ret;
 }
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index a96efdb5e681..7459b4eb99cd 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -288,7 +288,7 @@ static inline void btrfs_space_info_free_bytes_may_use(
 	btrfs_try_granting_tickets(space_info->fs_info, space_info);
 	spin_unlock(&space_info->lock);
 }
-int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes,
 			     enum btrfs_reserve_flush_enum flush);
 void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
  2024-12-05  7:48 ` [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05 23:49   ` Johannes Thumshirn
  2024-12-05  7:48 ` [PATCH 03/11] btrfs: factor out init_space_info() Naohiro Aota
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

As well as the last patch, take struct btrfs_inode in the function and let it
distinguish which data space it is working on in a later patch. There is no
functional change with this commit.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/delalloc-space.c | 7 ++++---
 fs/btrfs/delalloc-space.h | 3 +--
 fs/btrfs/inode.c          | 4 ++--
 fs/btrfs/relocation.c     | 3 +--
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index e9750f96f86c..918ba2ab1d5f 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -151,7 +151,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
 	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
 	if (ret < 0) {
-		btrfs_free_reserved_data_space_noquota(fs_info, len);
+		btrfs_free_reserved_data_space_noquota(inode, len);
 		extent_changeset_free(*reserved);
 		*reserved = NULL;
 	} else {
@@ -168,9 +168,10 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
  * which we can't sleep and is sure it won't affect qgroup reserved space.
  * Like clear_bit_hook().
  */
-void btrfs_free_reserved_data_space_noquota(struct btrfs_fs_info *fs_info,
+void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode,
 					    u64 len)
 {
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_space_info *data_sinfo;
 
 	ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
@@ -196,7 +197,7 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
 	      round_down(start, fs_info->sectorsize);
 	start = round_down(start, fs_info->sectorsize);
 
-	btrfs_free_reserved_data_space_noquota(fs_info, len);
+	btrfs_free_reserved_data_space_noquota(inode, len);
 	btrfs_qgroup_free_data(inode, reserved, start, len, NULL);
 }
 
diff --git a/fs/btrfs/delalloc-space.h b/fs/btrfs/delalloc-space.h
index 3f32953c0a80..d582779dac5a 100644
--- a/fs/btrfs/delalloc-space.h
+++ b/fs/btrfs/delalloc-space.h
@@ -18,8 +18,7 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
 void btrfs_delalloc_release_space(struct btrfs_inode *inode,
 				  struct extent_changeset *reserved,
 				  u64 start, u64 len, bool qgroup_free);
-void btrfs_free_reserved_data_space_noquota(struct btrfs_fs_info *fs_info,
-					    u64 len);
+void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode, u64 len);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
 				     bool qgroup_free);
 int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c4997200dbb2..e553264eaa0f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2556,7 +2556,7 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
 		    !btrfs_is_free_space_inode(inode) &&
 		    !(state->state & EXTENT_NORESERVE) &&
 		    (bits & EXTENT_CLEAR_DATA_RESV))
-			btrfs_free_reserved_data_space_noquota(fs_info, len);
+			btrfs_free_reserved_data_space_noquota(inode, len);
 
 		percpu_counter_add_batch(&fs_info->delalloc_bytes, -len,
 					 fs_info->delalloc_batch);
@@ -9644,7 +9644,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	 * bytes_may_use.
 	 */
 	if (!extent_reserved)
-		btrfs_free_reserved_data_space_noquota(fs_info, disk_num_bytes);
+		btrfs_free_reserved_data_space_noquota(inode, disk_num_bytes);
 out_unlock:
 	unlock_extent(io_tree, start, end, &cached_state);
 out_folios:
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index bf267bdfa8f8..d60e118e88a3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2828,8 +2828,7 @@ static noinline_for_stack int prealloc_file_extent_cluster(struct reloc_control
 	btrfs_inode_unlock(inode, 0);
 
 	if (cur_offset < prealloc_end)
-		btrfs_free_reserved_data_space_noquota(inode->root->fs_info,
-					       prealloc_end + 1 - cur_offset);
+		btrfs_free_reserved_data_space_noquota(inode, prealloc_end + 1 - cur_offset);
 	return ret;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/11] btrfs: factor out init_space_info()
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
  2024-12-05  7:48 ` [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
  2024-12-05  7:48 ` [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 04/11] btrfs: spin out do_async_reclaim_data_space() Naohiro Aota
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Factor out initialization of the space_info struct, which is used in a later
patch. There is no functional change.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/space-info.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 2c07871480b6..782807c926e1 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -234,19 +234,11 @@ void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info,
 	WRITE_ONCE(space_info->chunk_size, chunk_size);
 }
 
-static int create_space_info(struct btrfs_fs_info *info, u64 flags)
+static void init_space_info(struct btrfs_fs_info *info,
+			    struct btrfs_space_info *space_info, u64 flags)
 {
-
-	struct btrfs_space_info *space_info;
-	int i;
-	int ret;
-
-	space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
-	if (!space_info)
-		return -ENOMEM;
-
 	space_info->fs_info = info;
-	for (i = 0; i < BTRFS_NR_RAID_TYPES; i++)
+	for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
 		INIT_LIST_HEAD(&space_info->block_groups[i]);
 	init_rwsem(&space_info->groups_sem);
 	spin_lock_init(&space_info->lock);
@@ -260,6 +252,19 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 
 	if (btrfs_is_zoned(info))
 		space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
+}
+
+static int create_space_info(struct btrfs_fs_info *info, u64 flags)
+{
+
+	struct btrfs_space_info *space_info;
+	int ret;
+
+	space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
+	if (!space_info)
+		return -ENOMEM;
+
+	init_space_info(info, space_info, flags);
 
 	ret = btrfs_sysfs_add_space_info_type(info, space_info);
 	if (ret)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/11] btrfs: spin out do_async_reclaim_data_space()
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (2 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 03/11] btrfs: factor out init_space_info() Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 05/11] btrfs: factor out check_removing_space_info() Naohiro Aota
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Factor out the main part of btrfs_async_reclaim_data_space() to
do_async_reclaim_data_space(), so it can take data space_info parameter it is
working on. There is no functional change.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/space-info.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 782807c926e1..1fb55655f49d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1323,16 +1323,12 @@ static const enum btrfs_flush_state data_flush_states[] = {
 	ALLOC_CHUNK_FORCE,
 };
 
-static void btrfs_async_reclaim_data_space(struct work_struct *work)
+static void do_async_reclaim_data_space(struct btrfs_space_info *space_info)
 {
-	struct btrfs_fs_info *fs_info;
-	struct btrfs_space_info *space_info;
+	struct btrfs_fs_info *fs_info = space_info->fs_info;
 	u64 last_tickets_id;
 	enum btrfs_flush_state flush_state = 0;
 
-	fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
-	space_info = fs_info->data_sinfo;
-
 	spin_lock(&space_info->lock);
 	if (list_empty(&space_info->tickets)) {
 		space_info->flush = 0;
@@ -1400,6 +1396,16 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
 	spin_unlock(&space_info->lock);
 }
 
+static void btrfs_async_reclaim_data_space(struct work_struct *work)
+{
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_space_info *space_info;
+
+	fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
+	space_info = fs_info->data_sinfo;
+	do_async_reclaim_data_space(space_info);
+}
+
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
 {
 	INIT_WORK(&fs_info->async_reclaim_work, btrfs_async_reclaim_metadata_space);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/11] btrfs: factor out check_removing_space_info()
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (3 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 04/11] btrfs: spin out do_async_reclaim_data_space() Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-07 11:29   ` Johannes Thumshirn
  2024-12-05  7:48 ` [PATCH 06/11] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Factor out check_removing_space_info() from btrfs_free_block_groups(). It
sanity checks a to-be-removed space_info. There is no functional change.
---
 fs/btrfs/block-group.c | 51 ++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 5be029734cfa..4b8071a8d795 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4371,6 +4371,34 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
 	}
 }
 
+static void check_removing_space_info(struct btrfs_space_info *space_info)
+{
+	struct btrfs_fs_info *info = space_info->fs_info;
+
+	/*
+	 * Do not hide this behind enospc_debug, this is actually
+	 * important and indicates a real bug if this happens.
+	 */
+	if (WARN_ON(space_info->bytes_pinned > 0 ||
+		    space_info->bytes_may_use > 0))
+		btrfs_dump_space_info(info, space_info, 0, 0);
+
+	/*
+	 * If there was a failure to cleanup a log tree, very likely due
+	 * to an IO failure on a writeback attempt of one or more of its
+	 * extent buffers, we could not do proper (and cheap) unaccounting
+	 * of their reserved space, so don't warn on bytes_reserved > 0 in
+	 * that case.
+	 */
+	if (!(space_info->flags & BTRFS_BLOCK_GROUP_METADATA) ||
+	    !BTRFS_FS_LOG_CLEANUP_ERROR(info)) {
+		if (WARN_ON(space_info->bytes_reserved > 0))
+			btrfs_dump_space_info(info, space_info, 0, 0);
+	}
+
+	WARN_ON(space_info->reclaim_size > 0);
+}
+
 /*
  * Must be called only after stopping all workers, since we could have block
  * group caching kthreads running, and therefore they could race with us if we
@@ -4472,28 +4500,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 					struct btrfs_space_info,
 					list);
 
-		/*
-		 * Do not hide this behind enospc_debug, this is actually
-		 * important and indicates a real bug if this happens.
-		 */
-		if (WARN_ON(space_info->bytes_pinned > 0 ||
-			    space_info->bytes_may_use > 0))
-			btrfs_dump_space_info(info, space_info, 0, 0);
-
-		/*
-		 * If there was a failure to cleanup a log tree, very likely due
-		 * to an IO failure on a writeback attempt of one or more of its
-		 * extent buffers, we could not do proper (and cheap) unaccounting
-		 * of their reserved space, so don't warn on bytes_reserved > 0 in
-		 * that case.
-		 */
-		if (!(space_info->flags & BTRFS_BLOCK_GROUP_METADATA) ||
-		    !BTRFS_FS_LOG_CLEANUP_ERROR(info)) {
-			if (WARN_ON(space_info->bytes_reserved > 0))
-				btrfs_dump_space_info(info, space_info, 0, 0);
-		}
-
-		WARN_ON(space_info->reclaim_size > 0);
+		check_removing_space_info(space_info);
 		list_del(&space_info->list);
 		btrfs_sysfs_remove_space_info(space_info);
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/11] btrfs: introduce space_info argument to btrfs_chunk_alloc
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (4 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 05/11] btrfs: factor out check_removing_space_info() Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 07/11] btrfs: pass space_info for block group creation Naohiro Aota
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Take an optional btrfs_space_info argument in btrfs_chunk_alloc(). If
specified, btrfs_chunk_alloc() works on the space_info. If not, the default
space_info is used as the same as before.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 19 ++++++++++++-------
 fs/btrfs/block-group.h |  3 ++-
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/space-info.c  |  2 +-
 fs/btrfs/transaction.c |  2 +-
 5 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 4b8071a8d795..ad78c8f1d381 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2985,7 +2985,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
 		 */
 		alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
 		if (alloc_flags != cache->flags) {
-			ret = btrfs_chunk_alloc(trans, alloc_flags,
+			ret = btrfs_chunk_alloc(trans, NULL, alloc_flags,
 						CHUNK_ALLOC_FORCE);
 			/*
 			 * ENOSPC is allowed here, we may have enough space
@@ -3014,7 +3014,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
 		goto unlock_out;
 
 	alloc_flags = btrfs_get_alloc_profile(fs_info, cache->space_info->flags);
-	ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
+	ret = btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
 	if (ret < 0)
 		goto out;
 	/*
@@ -3870,7 +3870,7 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
 {
 	u64 alloc_flags = btrfs_get_alloc_profile(trans->fs_info, type);
 
-	return btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
+	return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
 }
 
 static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
@@ -4073,12 +4073,15 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
  *    - return 0 if it doesn't need to allocate a new chunk,
  *    - return 1 if it successfully allocates a chunk,
  *    - return errors including -ENOSPC otherwise.
+ *
+ * @space_info can optionally be specified to make a new chunk belong to it. If
+ * it is NULL, it is set automatically.
  */
-int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
+int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
+		      struct btrfs_space_info *space_info, u64 flags,
 		      enum btrfs_chunk_alloc_enum force)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_space_info *space_info;
 	struct btrfs_block_group *ret_bg;
 	bool wait_for_alloc = false;
 	bool should_alloc = false;
@@ -4117,8 +4120,10 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
 	if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
 		return -ENOSPC;
 
-	space_info = btrfs_find_space_info(fs_info, flags);
-	ASSERT(space_info);
+	if (!space_info) {
+		space_info = btrfs_find_space_info(fs_info, flags);
+		ASSERT(space_info);
+	}
 
 	do {
 		spin_lock(&space_info->lock);
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 36937eeab9b8..c01f3af726a1 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -342,7 +342,8 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group *cache,
 			     bool force_wrong_size_class);
 void btrfs_free_reserved_bytes(struct btrfs_block_group *cache,
 			       u64 num_bytes, int delalloc);
-int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
+int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
+		      struct btrfs_space_info *space_info, u64 flags,
 		      enum btrfs_chunk_alloc_enum force);
 int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
 void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2f9126528a01..334a1701ff33 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4118,7 +4118,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 				return ret;
 			}
 
-			ret = btrfs_chunk_alloc(trans, ffe_ctl->flags,
+			ret = btrfs_chunk_alloc(trans, NULL, ffe_ctl->flags,
 						CHUNK_ALLOC_FORCE_FOR_EXTENT);
 
 			/* Do not bail out on ENOSPC since we can do more. */
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1fb55655f49d..1d0f0c9d8956 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -817,7 +817,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 			ret = PTR_ERR(trans);
 			break;
 		}
-		ret = btrfs_chunk_alloc(trans,
+		ret = btrfs_chunk_alloc(trans, space_info,
 				btrfs_get_alloc_profile(fs_info, space_info->flags),
 				(state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE :
 					CHUNK_ALLOC_FORCE);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 15312013f2a3..e5852316f0b6 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -755,7 +755,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	if (do_chunk_alloc && num_bytes) {
 		u64 flags = h->block_rsv->space_info->flags;
 
-		btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
+		btrfs_chunk_alloc(h, NULL, btrfs_get_alloc_profile(fs_info, flags),
 				  CHUNK_ALLOC_NO_FORCE);
 	}
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/11] btrfs: pass space_info for block group creation
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (5 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 06/11] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Add btrfs_space_info parameter to btrfs_make_block_group(), its related
functions and related struct. Passed space_info will have a new block group. If
NULL is passed, it uses the default space_info.

The parameter is used in a later commit and the behavior is unchanged now.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 15 ++++++++-------
 fs/btrfs/block-group.h |  2 +-
 fs/btrfs/volumes.c     | 16 +++++++++++-----
 fs/btrfs/volumes.h     |  2 +-
 4 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index ad78c8f1d381..6787f7034b9e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2833,7 +2833,7 @@ static u64 calculate_global_root_id(const struct btrfs_fs_info *fs_info, u64 off
 }
 
 struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
-						 u64 type,
+						 struct btrfs_space_info *space_info, u64 type,
 						 u64 chunk_offset, u64 size)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -2888,7 +2888,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
 	 * assigned to our block group. We want our bg to be added to the rbtree
 	 * with its ->space_info set.
 	 */
-	cache->space_info = btrfs_find_space_info(fs_info, cache->flags);
+	cache->space_info = space_info;
 	ASSERT(cache->space_info);
 
 	ret = btrfs_add_block_group_cache(fs_info, cache);
@@ -3873,7 +3873,8 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
 	return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
 }
 
-static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
+static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans,
+						struct btrfs_space_info *space_info, u64 flags)
 {
 	struct btrfs_block_group *bg;
 	int ret;
@@ -3886,7 +3887,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
 	 */
 	check_system_chunk(trans, flags);
 
-	bg = btrfs_create_chunk(trans, flags);
+	bg = btrfs_create_chunk(trans, space_info, flags);
 	if (IS_ERR(bg)) {
 		ret = PTR_ERR(bg);
 		goto out;
@@ -3935,7 +3936,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
 		const u64 sys_flags = btrfs_system_alloc_profile(trans->fs_info);
 		struct btrfs_block_group *sys_bg;
 
-		sys_bg = btrfs_create_chunk(trans, sys_flags);
+		sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
 		if (IS_ERR(sys_bg)) {
 			ret = PTR_ERR(sys_bg);
 			btrfs_abort_transaction(trans, ret);
@@ -4185,7 +4186,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
 			force_metadata_allocation(fs_info);
 	}
 
-	ret_bg = do_chunk_alloc(trans, flags);
+	ret_bg = do_chunk_alloc(trans, space_info, flags);
 	trans->allocating_chunk = false;
 
 	if (IS_ERR(ret_bg)) {
@@ -4268,7 +4269,7 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans,
 		 * the paths we visit in the chunk tree (they were already COWed
 		 * or created in the current transaction for example).
 		 */
-		bg = btrfs_create_chunk(trans, flags);
+		bg = btrfs_create_chunk(trans, NULL, flags);
 		if (IS_ERR(bg)) {
 			ret = PTR_ERR(bg);
 		} else {
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index c01f3af726a1..cb9b0405172c 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -326,7 +326,7 @@ void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info);
 void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg);
 int btrfs_read_block_groups(struct btrfs_fs_info *info);
 struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
-						 u64 type,
+						 struct btrfs_space_info *space_info, u64 type,
 						 u64 chunk_offset, u64 size);
 void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans);
 int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1cccaf9c2b0d..d1f3068377aa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3403,7 +3403,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
 		const u64 sys_flags = btrfs_system_alloc_profile(fs_info);
 		struct btrfs_block_group *sys_bg;
 
-		sys_bg = btrfs_create_chunk(trans, sys_flags);
+		sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
 		if (IS_ERR(sys_bg)) {
 			ret = PTR_ERR(sys_bg);
 			btrfs_abort_transaction(trans, ret);
@@ -5202,6 +5202,8 @@ struct alloc_chunk_ctl {
 	u64 stripe_size;
 	u64 chunk_size;
 	int ndevs;
+	/* Space_info the block group is going to belong. */
+	struct btrfs_space_info *space_info;
 };
 
 static void init_alloc_chunk_ctl_policy_regular(
@@ -5603,7 +5605,7 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 		return ERR_PTR(ret);
 	}
 
-	block_group = btrfs_make_block_group(trans, type, start, ctl->chunk_size);
+	block_group = btrfs_make_block_group(trans, ctl->space_info, type, start, ctl->chunk_size);
 	if (IS_ERR(block_group)) {
 		btrfs_remove_chunk_map(info, map);
 		return block_group;
@@ -5629,7 +5631,7 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 }
 
 struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
-					    u64 type)
+					     struct btrfs_space_info *space_info, u64 type)
 {
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_fs_devices *fs_devices = info->fs_devices;
@@ -5657,8 +5659,12 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
 		return ERR_PTR(-EINVAL);
 	}
 
+	if (!space_info)
+		space_info = btrfs_find_space_info(info, type);
+	ASSERT(space_info);
 	ctl.start = find_next_chunk(info);
 	ctl.type = type;
+	ctl.space_info = space_info;
 	init_alloc_chunk_ctl(fs_devices, &ctl);
 
 	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
@@ -5826,12 +5832,12 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans)
 	 */
 
 	alloc_profile = btrfs_metadata_alloc_profile(fs_info);
-	meta_bg = btrfs_create_chunk(trans, alloc_profile);
+	meta_bg = btrfs_create_chunk(trans, NULL, alloc_profile);
 	if (IS_ERR(meta_bg))
 		return PTR_ERR(meta_bg);
 
 	alloc_profile = btrfs_system_alloc_profile(fs_info);
-	sys_bg = btrfs_create_chunk(trans, alloc_profile);
+	sys_bg = btrfs_create_chunk(trans, NULL, alloc_profile);
 	if (IS_ERR(sys_bg))
 		return PTR_ERR(sys_bg);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3a416b1bc24c..2faee2d2e584 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -690,7 +690,7 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
-					    u64 type);
+					     struct btrfs_space_info *space_info, u64 type);
 void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       blk_mode_t flags, void *holder);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (6 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 07/11] btrfs: pass space_info for block group creation Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-24 14:21   ` kernel test robot
  2024-12-05  7:48 ` [PATCH 09/11] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Current code assumes we have only one space_info for each block group type
(DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
manage special block groups.

One example is handling the data relocation block group for the zoned mode.
That block group is dedicated for writing relocated data and we cannot
allocate any regular extent from that block group, which is implemented in
the zoned extent allocator. That block group still belongs to the normal
data space_info. So, when all the normal data block groups are full and
there are some free space in the dedicated block group, the space_info
looks to have some free space, while it cannot allocate normal extent
anymore. That results in a strange ENOSPC error. We need to have a
space_info for the relocation data block group to represent the situation
properly.

This commit adds a basic infrastructure for having a "sub-group" of a
space_info: creation and removing. A sub-group space_info belongs to one of
the primary space_infos and has the same flags as its parent.

Currently, the sub-group is only implemented for the relocation data
space_info. In the future, we can also implement the space_info for
the tree-log block group. Or, it could be useful to implement tiered
storage for btrfs e.g, by implementing a sub-group space_info for block
groups resides on a fast storage.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c |  6 ++++++
 fs/btrfs/space-info.c  | 20 ++++++++++++++++++--
 fs/btrfs/space-info.h  |  8 ++++++++
 fs/btrfs/sysfs.c       | 16 +++++++++++++---
 4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 6787f7034b9e..aa35b62e9773 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4506,6 +4506,12 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 					struct btrfs_space_info,
 					list);
 
+		for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++) {
+			if (space_info->sub_group[i]) {
+				check_removing_space_info(space_info->sub_group[i]);
+				kfree(space_info->sub_group[i]);
+			}
+		}
 		check_removing_space_info(space_info);
 		list_del(&space_info->list);
 		btrfs_sysfs_remove_space_info(space_info);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1d0f0c9d8956..16beb25be4b0 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -249,6 +249,7 @@ static void init_space_info(struct btrfs_fs_info *info,
 	INIT_LIST_HEAD(&space_info->priority_tickets);
 	space_info->clamp = 1;
 	btrfs_update_space_info_chunk_size(space_info, calc_chunk_size(info, flags));
+	space_info->subgroup_id = SUB_GROUP_PRIMARY;
 
 	if (btrfs_is_zoned(info))
 		space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
@@ -266,6 +267,20 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 
 	init_space_info(info, space_info, flags);
 
+	if (btrfs_is_zoned(info) && (flags & BTRFS_BLOCK_GROUP_DATA)) {
+		struct btrfs_space_info *reloc = kzalloc(sizeof(*reloc), GFP_NOFS);
+
+		if (!reloc)
+			return -ENOMEM;
+		init_space_info(info, reloc, flags);
+		space_info->sub_group[SUB_GROUP_DATA_RELOC] = reloc;
+		reloc->parent = space_info;
+		reloc->subgroup_id = SUB_GROUP_DATA_RELOC;
+
+		ret = btrfs_sysfs_add_space_info_type(info, reloc);
+		ASSERT(!ret);
+	}
+
 	ret = btrfs_sysfs_add_space_info_type(info, space_info);
 	if (ret)
 		return ret;
@@ -561,8 +576,9 @@ static void __btrfs_dump_space_info(const struct btrfs_fs_info *fs_info,
 	lockdep_assert_held(&info->lock);
 
 	/* The free space could be negative in case of overcommit */
-	btrfs_info(fs_info, "space_info %s has %lld free, is %sfull",
-		   flag_str,
+	btrfs_info(fs_info,
+		   "space_info %s (sub-group id %d) has %lld free, is %sfull",
+		   flag_str, info->subgroup_id,
 		   (s64)(info->total_bytes - btrfs_space_info_used(info, true)),
 		   info->full ? "" : "not ");
 	btrfs_info(fs_info,
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 7459b4eb99cd..64641885babd 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -98,8 +98,16 @@ enum btrfs_flush_state {
 	RESET_ZONES		= 12,
 };
 
+enum btrfs_space_info_sub_group {
+	SUB_GROUP_DATA_RELOC = 0,
+	SUB_GROUP_PRIMARY = -1,
+};
+#define BTRFS_SPACE_INFO_SUB_GROUP_MAX 1
 struct btrfs_space_info {
 	struct btrfs_fs_info *fs_info;
+	struct btrfs_space_info *parent;
+	struct btrfs_space_info *sub_group[BTRFS_SPACE_INFO_SUB_GROUP_MAX];
+	int subgroup_id;
 	spinlock_t lock;
 
 	u64 total_bytes;	/* total bytes in the space,
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index fdcbf650ac31..041ffe5e9cc5 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1792,15 +1792,25 @@ void btrfs_sysfs_remove_space_info(struct btrfs_space_info *space_info)
 	kobject_put(&space_info->kobj);
 }
 
-static const char *alloc_name(u64 flags)
+static const char *alloc_name(struct btrfs_space_info *space_info)
 {
+	u64 flags = space_info->flags;
+
 	switch (flags) {
 	case BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA:
 		return "mixed";
 	case BTRFS_BLOCK_GROUP_METADATA:
 		return "metadata";
 	case BTRFS_BLOCK_GROUP_DATA:
-		return "data";
+		switch (space_info->subgroup_id) {
+		case SUB_GROUP_PRIMARY:
+			return "data";
+		case SUB_GROUP_DATA_RELOC:
+			return "data-reloc";
+		default:
+			WARN_ON_ONCE(1);
+			return "data (unknown sub-group)";
+		}
 	case BTRFS_BLOCK_GROUP_SYSTEM:
 		return "system";
 	default:
@@ -1820,7 +1830,7 @@ int btrfs_sysfs_add_space_info_type(struct btrfs_fs_info *fs_info,
 
 	ret = kobject_init_and_add(&space_info->kobj, &space_info_ktype,
 				   fs_info->space_info_kobj, "%s",
-				   alloc_name(space_info->flags));
+				   alloc_name(space_info));
 	if (ret) {
 		kobject_put(&space_info->kobj);
 		return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/11] btrfs: tweak extent/chunk allocation for space_info sub-space
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (7 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 10/11] btrfs: use proper data space_info Naohiro Aota
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Make the extent allocator and the chunk allocator aware of the sub-space.
It now uses SUB_GROUP_DATA_RELOC sub-space for data relocation block group.
And, it needs to check the space_info is the right one when a block group
candidate is given. Also, new block group should now belong to the
specified one.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 3 +++
 fs/btrfs/space-info.c  | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 334a1701ff33..2f32497d2577 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4342,6 +4342,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
 		return -ENOSPC;
 	}
+	if (btrfs_is_zoned(fs_info) && ffe_ctl->for_data_reloc)
+		space_info = space_info->sub_group[SUB_GROUP_DATA_RELOC];
 
 	ret = prepare_allocation(fs_info, ffe_ctl, space_info, ins);
 	if (ret < 0)
@@ -4361,6 +4363,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		 * picked out then we don't care that the block group is cached.
 		 */
 		if (block_group && block_group_bits(block_group, ffe_ctl->flags) &&
+		    block_group->space_info == space_info &&
 		    block_group->cached != BTRFS_CACHE_NO) {
 			down_read(&space_info->groups_sem);
 			if (list_empty(&block_group->list) ||
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 16beb25be4b0..cfc59123b00c 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -337,7 +337,9 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
 
 	factor = btrfs_bg_type_to_factor(block_group->flags);
 
-	found = btrfs_find_space_info(info, block_group->flags);
+	found = block_group->space_info;
+	if (!found)
+		found = btrfs_find_space_info(info, block_group->flags);
 	ASSERT(found);
 	spin_lock(&found->lock);
 	found->total_bytes += block_group->length;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/11] btrfs: use proper data space_info
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (8 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 09/11] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-05  7:48 ` [PATCH 11/11] btrfs: reclaim from data sub-space space_info Naohiro Aota
  2024-12-07 11:35 ` [PATCH 00/11] btrfs: zoned: split out data relocation space_info Johannes Thumshirn
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Now that, we have data sub-space for the zoned mode. This commit tweaks
some space_info functions to use proper space_info for a file.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/delalloc-space.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 918ba2ab1d5f..89e43abfb2b8 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -111,6 +111,17 @@
  *  making error handling and cleanup easier.
  */
 
+static inline struct btrfs_space_info* data_sinfo_for_inode(const struct btrfs_inode *inode)
+{
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+
+	if (!btrfs_is_zoned(fs_info))
+		return fs_info->data_sinfo;
+	if (btrfs_is_data_reloc_root(inode->root))
+		return fs_info->data_sinfo->sub_group[SUB_GROUP_DATA_RELOC];
+	return fs_info->data_sinfo;
+}
+
 int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
 {
 	struct btrfs_root *root = inode->root;
@@ -123,7 +134,7 @@ int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
 	if (btrfs_is_free_space_inode(inode))
 		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
 
-	return btrfs_reserve_data_bytes(fs_info->data_sinfo, bytes, flush);
+	return btrfs_reserve_data_bytes(data_sinfo_for_inode(inode), bytes, flush);
 }
 
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
@@ -144,7 +155,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 	else if (btrfs_is_free_space_inode(inode))
 		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
 
-	ret = btrfs_reserve_data_bytes(fs_info->data_sinfo, len, flush);
+	ret = btrfs_reserve_data_bytes(data_sinfo_for_inode(inode), len, flush);
 	if (ret < 0)
 		return ret;
 
@@ -172,12 +183,10 @@ void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode,
 					    u64 len)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_space_info *data_sinfo;
 
 	ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
 
-	data_sinfo = fs_info->data_sinfo;
-	btrfs_space_info_free_bytes_may_use(data_sinfo, len);
+	btrfs_space_info_free_bytes_may_use(data_sinfo_for_inode(inode), len);
 }
 
 /*
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/11] btrfs: reclaim from data sub-space space_info
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (9 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 10/11] btrfs: use proper data space_info Naohiro Aota
@ 2024-12-05  7:48 ` Naohiro Aota
  2024-12-07 11:35 ` [PATCH 00/11] btrfs: zoned: split out data relocation space_info Johannes Thumshirn
  11 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-05  7:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Currently, we only have sub-space space_info for data. Modify
btrfs_async_data_reclaim() to run the reclaim process on the sub-spaces as
well.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/space-info.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index cfc59123b00c..ddb042845e86 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1422,6 +1422,9 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
 	fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
 	space_info = fs_info->data_sinfo;
 	do_async_reclaim_data_space(space_info);
+	for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++)
+		if (space_info->sub_group[i])
+			do_async_reclaim_data_space(space_info->sub_group[i]);
 }
 
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
  2024-12-05  7:48 ` [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
@ 2024-12-05 23:48   ` Johannes Thumshirn
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2024-12-05 23:48 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota
  2024-12-05  7:48 ` [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
@ 2024-12-05 23:49   ` Johannes Thumshirn
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2024-12-05 23:49 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/11] btrfs: factor out check_removing_space_info()
  2024-12-05  7:48 ` [PATCH 05/11] btrfs: factor out check_removing_space_info() Naohiro Aota
@ 2024-12-07 11:29   ` Johannes Thumshirn
  2024-12-10  5:16     ` Naohiro Aota
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2024-12-07 11:29 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

On 05.12.24 08:50, Naohiro Aota wrote:
> Factor out check_removing_space_info() from btrfs_free_block_groups(). It
> sanity checks a to-be-removed space_info. There is no functional change.
> ---

This is missing your SoB

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] btrfs: zoned: split out data relocation space_info
  2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
                   ` (10 preceding siblings ...)
  2024-12-05  7:48 ` [PATCH 11/11] btrfs: reclaim from data sub-space space_info Naohiro Aota
@ 2024-12-07 11:35 ` Johannes Thumshirn
  2024-12-10  5:40   ` Naohiro Aota
  11 siblings, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2024-12-07 11:35 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

On 05.12.24 08:50, Naohiro Aota wrote:
> As discussed in [1], there is a longstanding early ENOSPC issue on the
> zoned mode even with simple fio script. This is also causing blktests
> zbd/009 to fail [2].
> 
> [1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
> [2] https://github.com/osandov/blktests/issues/150
> 
> This series is the second part to fix the ENOSPC issue. This series
> introduces "space_info sub-space" and use it split a space_info for data
> relocation block group.
> 
> Current code assumes we have only one space_info for each block group type
> (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> manage special block groups.
> 
> One example is handling the data relocation block group for the zoned mode.
> That block group is dedicated for writing relocated data and we cannot
> allocate any regular extent from that block group, which is implemented in
> the zoned extent allocator. That block group still belongs to the normal
> data space_info. So, when all the normal data block groups are full and
> there are some free space in the dedicated block group, the space_info
> looks to have some free space, while it cannot allocate normal extent
> anymore. That results in a strange ENOSPC error. We need to have a
> space_info for the relocation data block group to represent the situation
> properly.

I like the idea and the patches, but I'm a bit concerned it diverges 
zoned and non-zoned btrfs quite a bit in handling relocation. I'd be 
interested what David and Josef think of it. If no one objects to have 
these sub-space_infos zoned specific I'm all good with it as it fixes 
real problems.

Would it be useful to also do the same for regular btrfs? And while 
we're at it, the treelog block-group for zoned mode could benefit form a 
own space-info as well, couldn't it? To not run into premature ENOSPC on 
frequent syncs, or is this unlikely to happen (I'm thinking out loud here).

Byte,
	Johannes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/11] btrfs: factor out check_removing_space_info()
  2024-12-07 11:29   ` Johannes Thumshirn
@ 2024-12-10  5:16     ` Naohiro Aota
  0 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2024-12-10  5:16 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-btrfs@vger.kernel.org

On Sat, Dec 07, 2024 at 11:29:51AM +0000, Johannes Thumshirn wrote:
> On 05.12.24 08:50, Naohiro Aota wrote:
> > Factor out check_removing_space_info() from btrfs_free_block_groups(). It
> > sanity checks a to-be-removed space_info. There is no functional change.
> > ---
> 
> This is missing your SoB

Oops, I'll add it in the next version.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] btrfs: zoned: split out data relocation space_info
  2024-12-07 11:35 ` [PATCH 00/11] btrfs: zoned: split out data relocation space_info Johannes Thumshirn
@ 2024-12-10  5:40   ` Naohiro Aota
  2025-01-02 14:19     ` David Sterba
  0 siblings, 1 reply; 20+ messages in thread
From: Naohiro Aota @ 2024-12-10  5:40 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-btrfs@vger.kernel.org

On Sat, Dec 07, 2024 at 11:35:04AM +0000, Johannes Thumshirn wrote:
> On 05.12.24 08:50, Naohiro Aota wrote:
> > As discussed in [1], there is a longstanding early ENOSPC issue on the
> > zoned mode even with simple fio script. This is also causing blktests
> > zbd/009 to fail [2].
> > 
> > [1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
> > [2] https://github.com/osandov/blktests/issues/150
> > 
> > This series is the second part to fix the ENOSPC issue. This series
> > introduces "space_info sub-space" and use it split a space_info for data
> > relocation block group.
> > 
> > Current code assumes we have only one space_info for each block group type
> > (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> > manage special block groups.
> > 
> > One example is handling the data relocation block group for the zoned mode.
> > That block group is dedicated for writing relocated data and we cannot
> > allocate any regular extent from that block group, which is implemented in
> > the zoned extent allocator. That block group still belongs to the normal
> > data space_info. So, when all the normal data block groups are full and
> > there are some free space in the dedicated block group, the space_info
> > looks to have some free space, while it cannot allocate normal extent
> > anymore. That results in a strange ENOSPC error. We need to have a
> > space_info for the relocation data block group to represent the situation
> > properly.
> 
> I like the idea and the patches, but I'm a bit concerned it diverges 
> zoned and non-zoned btrfs quite a bit in handling relocation. I'd be 
> interested what David and Josef think of it. If no one objects to have 
> these sub-space_infos zoned specific I'm all good with it as it fixes 
> real problems.

Hmm, for that point, we already do the relocation a bit different on zoned
vs non-zoned. On the zoned mode, the relocated data is always allocated
from a dedicated block group. This series just move that block group into
its own space_info (aka sub-space_info) to fix a space accounting issue.

I admit the concept of sub-space_info is new, so I'd like to hear David and
Josef's opinion on it.

> 
> Would it be useful to also do the same for regular btrfs? And while 
> we're at it, the treelog block-group for zoned mode could benefit form a 
> own space-info as well, couldn't it? To not run into premature ENOSPC on 
> frequent syncs, or is this unlikely to happen (I'm thinking out loud here).

On the regular mode, it can allocate space for relocation data from any
block group. So, it is not so useful to separate space_info for
that. However, it would be interesting to have a dedicated relocation data
block group as well on the regular mode, because it could reduce the
fragmentation of relocated data. This would be interesting topic to
explore.

Yes, adding treelog sub-space_info is useful too. Apparently, I sometime
see a test failure due to treelog space_info accounting mismatch. I didn't
implement that sub-space_info for now, because I'd like to know first that the
sub-space_info concept itself is the right way to go.

> 
> Byte,
> 	Johannes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group
  2024-12-05  7:48 ` [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
@ 2024-12-24 14:21   ` kernel test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2024-12-24 14:21 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: oe-lkp, lkp, linux-btrfs, Naohiro Aota, oliver.sang


Hello,

kernel test robot noticed a 13.0% regression of aim7.jobs-per-min on:


commit: 1d2d783b0ef24d58eae07a32493d1e1e78b4351c ("[PATCH 08/11] btrfs: introduce btrfs_space_info sub-group")
url: https://github.com/intel-lab-lkp/linux/commits/Naohiro-Aota/btrfs-take-btrfs_space_info-in-btrfs_reserve_data_bytes/20241205-195311
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/ab8eb232d1acbf02af7352a2224a31b53ece01f5.1733384172.git.naohiro.aota@wdc.com/
patch subject: [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group

testcase: aim7
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	disk: 1BRD_48G
	fs: btrfs
	test: disk_cp
	load: 1500
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202412241603.d5f0c18f-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412241603.d5f0c18f-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1BRD_48G/btrfs/x86_64-rhel-9.4/1500/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/disk_cp/aim7

commit: 
  d437de1e3e ("btrfs: pass space_info for block group creation")
  1d2d783b0e ("btrfs: introduce btrfs_space_info sub-group")

d437de1e3ee21349 1d2d783b0ef24d58eae07a32493 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.914e+09 ±  2%      +9.0%  4.266e+09        cpuidle..time
     13.96            -5.6%      13.17        iostat.cpu.idle
      0.28 ±  3%      -0.0        0.24 ±  2%  mpstat.cpu.all.usr%
    283.52 ±  2%     +11.1%     315.09        uptime.boot
    145120 ± 40%     +51.6%     220066 ±  2%  meminfo.AnonHugePages
   9544499 ±  8%     +11.8%   10670421 ±  4%  meminfo.DirectMap2M
    124537            -7.6%     115017        vmstat.system.cs
    216746            -4.5%     207084        vmstat.system.in
    235158 ± 22%     +89.9%     446683 ± 20%  numa-meminfo.node0.Shmem
     92328 ± 88%    +138.0%     219719 ±  2%  numa-meminfo.node1.AnonHugePages
    906862 ±  6%     -25.2%     678748 ± 12%  numa-meminfo.node1.Shmem
     58803 ± 22%     +90.0%     111697 ± 20%  numa-vmstat.node0.nr_shmem
    163496 ±  2%     +12.8%     184425 ±  4%  numa-vmstat.node0.nr_written
     45.09 ± 88%    +138.1%     107.38 ±  2%  numa-vmstat.node1.nr_anon_transparent_hugepages
    226731 ±  6%     -25.1%     169747 ± 12%  numa-vmstat.node1.nr_shmem
    163188 ±  2%     +12.8%     184147 ±  2%  numa-vmstat.node1.nr_written
     39739 ±  2%     -13.0%      34581        aim7.jobs-per-min
    226.75 ±  2%     +14.8%     260.40        aim7.time.elapsed_time
    226.75 ±  2%     +14.8%     260.40        aim7.time.elapsed_time.max
    201270 ±  3%     +18.8%     239157        aim7.time.involuntary_context_switches
     24819 ±  2%     +15.8%      28731        aim7.time.system_time
  14211174            +5.9%   15052342        aim7.time.voluntary_context_switches
     20288            +1.0%      20501        proc-vmstat.nr_inactive_file
     30666 ±  2%      -5.6%      28940 ±  2%  proc-vmstat.nr_mapped
    326687           +13.2%     369933 ±  2%  proc-vmstat.nr_written
     20288            +1.0%      20501        proc-vmstat.nr_zone_inactive_file
   1098889            +6.7%    1172947        proc-vmstat.pgfault
   1317807           +13.2%    1492099 ±  2%  proc-vmstat.pgpgout
     69223 ±  4%      +8.5%      75099 ±  3%  proc-vmstat.pgreuse
 7.248e+09            -2.6%   7.06e+09        perf-stat.i.branch-instructions
      0.55 ±  2%      -0.0        0.51        perf-stat.i.branch-miss-rate%
  50144736 ±  8%      -7.5%   46399318        perf-stat.i.cache-misses
 2.114e+08            -7.4%  1.958e+08        perf-stat.i.cache-references
    125888            -7.9%     115986        perf-stat.i.context-switches
      8.96            +4.5%       9.36        perf-stat.i.cpi
      5667 ±  7%      +8.4%       6144        perf-stat.i.cycles-between-cache-misses
 3.119e+10            -3.1%  3.021e+10        perf-stat.i.instructions
      0.17            -7.4%       0.16        perf-stat.i.ipc
      0.71 ± 10%     -77.0%       0.16 ± 27%  perf-stat.i.metric.K/sec
      4337 ±  2%      -6.0%       4075        perf-stat.i.minor-faults
      4339 ±  2%      -6.0%       4078        perf-stat.i.page-faults
      5.59 ± 81%     +72.6%       9.64        perf-stat.overall.cpi
      3401 ± 82%     +84.6%       6279        perf-stat.overall.cycles-between-cache-misses
 4.276e+12 ± 81%     +84.2%  7.875e+12        perf-stat.total.instructions
   9128951           +37.9%   12584528        sched_debug.cfs_rq:/.avg_vruntime.avg
  19113988 ± 16%     +52.5%   29146821 ± 21%  sched_debug.cfs_rq:/.avg_vruntime.max
   7853850 ±  2%     +38.2%   10851391 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.min
   1473135 ± 16%     +60.2%    2360403 ± 24%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    577.10 ± 20%     -31.6%     394.77 ± 20%  sched_debug.cfs_rq:/.load_avg.max
    128.74 ±  9%     -26.2%      94.97 ± 14%  sched_debug.cfs_rq:/.load_avg.stddev
   9128951           +37.9%   12584527        sched_debug.cfs_rq:/.min_vruntime.avg
  19113988 ± 16%     +52.5%   29146821 ± 21%  sched_debug.cfs_rq:/.min_vruntime.max
   7853850 ±  2%     +38.2%   10851391 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
   1473135 ± 16%     +60.2%    2360403 ± 24%  sched_debug.cfs_rq:/.min_vruntime.stddev
    257.90           -19.0%     209.00 ±  3%  sched_debug.cfs_rq:/.removed.load_avg.max
    132.30 ±  3%     -19.2%     106.93 ±  3%  sched_debug.cfs_rq:/.removed.runnable_avg.max
    131.35 ±  2%     -18.6%     106.93 ±  3%  sched_debug.cfs_rq:/.removed.util_avg.max
    506.87 ±  2%     +10.4%     559.52 ±  2%  sched_debug.cfs_rq:/.util_est.avg
      1264 ±  9%     +14.3%       1445 ±  4%  sched_debug.cfs_rq:/.util_est.max
     71296 ±  6%     +16.6%      83111 ±  6%  sched_debug.cpu.avg_idle.min
    131900 ±  8%     -14.7%     112551 ±  9%  sched_debug.cpu.avg_idle.stddev
    146287           +19.1%     174200        sched_debug.cpu.clock.avg
    146298           +19.1%     174211        sched_debug.cpu.clock.max
    146275           +19.1%     174187        sched_debug.cpu.clock.min
    145437           +19.0%     173116        sched_debug.cpu.clock_task.avg
    145601           +19.0%     173298        sched_debug.cpu.clock_task.max
    136631           +20.2%     164246        sched_debug.cpu.clock_task.min
      6491 ±  8%     +19.4%       7753 ±  4%  sched_debug.cpu.curr->pid.max
     84360           +24.8%     105323        sched_debug.cpu.nr_switches.avg
    112587 ±  3%     +25.5%     141314 ±  8%  sched_debug.cpu.nr_switches.max
     79225           +26.6%     100293        sched_debug.cpu.nr_switches.min
    146275           +19.1%     174187        sched_debug.cpu_clk
    145107           +19.2%     173021        sched_debug.ktime
    147232           +18.9%     175035        sched_debug.sched_clk
      0.03 ±100%    +707.4%       0.26 ±167%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_extent_state.__clear_extent_bit.btrfs_dirty_folio
      0.01 ± 47%    +265.2%       0.05 ± 74%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      0.00 ± 51%    +150.0%       0.01 ± 39%  perf-sched.sch_delay.avg.ms.btrfs_start_ordered_extent.lock_and_cleanup_extent_if_need.btrfs_buffered_write.btrfs_do_write_iter
      0.12 ± 53%     -55.6%       0.05 ± 32%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.03 ± 14%    +111.5%       0.06 ± 67%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.01 ±136%   +1221.8%       0.08 ± 58%  perf-sched.sch_delay.avg.ms.usleep_range_state.wait_for_tpm_stat.tpm_tis_send_data.tpm_tis_send_main
      0.14 ± 39%    +300.9%       0.56 ± 95%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.btrfs_tree_lock_nested.btrfs_lock_root_node.btrfs_search_slot
      0.08 ±116%    +979.5%       0.90 ±156%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_extent_state.__clear_extent_bit.btrfs_dirty_folio
      2.45 ± 13%     +26.1%       3.09 ± 11%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1256 ± 80%     +98.2%       2490        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
      0.07 ±  9%    +749.4%       0.57 ±121%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      2.38 ± 36%     -56.0%       1.05 ± 45%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.27 ±126%    +188.4%       0.77 ± 95%  perf-sched.sch_delay.max.ms.usleep_range_state.tpm_try_transmit.tpm_transmit.tpm_transmit_cmd
      0.02 ±139%   +5117.2%       0.98 ± 77%  perf-sched.sch_delay.max.ms.usleep_range_state.wait_for_tpm_stat.tpm_tis_send_data.tpm_tis_send_main
      2092 ±  4%     +19.1%       2491        perf-sched.total_sch_delay.max.ms
    286268           -11.6%     253047 ±  2%  perf-sched.total_wait_and_delay.count.ms
      4111 ±  5%     +18.3%       4862 ±  4%  perf-sched.total_wait_and_delay.max.ms
    115.81 ±  5%     +32.9%     153.90 ±  6%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    130.80           -19.1%     105.83 ±  4%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    259644           -12.4%     227389 ±  3%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
     10660 ±  2%      -8.6%       9745 ±  2%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
      1639 ±  3%     -16.8%       1364 ±  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
      2784 ±  3%     -11.2%       2471        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
      2977 ± 30%     +38.9%       4137 ± 27%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
      2204 ±  2%     +49.8%       3301 ± 33%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
      2158 ± 28%     +40.6%       3033 ± 16%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.03 ± 52%     +60.1%       0.04 ± 12%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.extent_write_cache_pages.btrfs_writepages
    114.21 ±  5%     +31.8%     150.51 ±  5%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.22 ±122%    +180.9%       0.61 ±  6%  perf-sched.wait_time.avg.ms.usleep_range_state.wait_for_tpm_stat.tpm_tis_send_data.tpm_tis_send_main
      1717 ± 50%     +44.7%       2484        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_noprof.__filemap_get_folio
      2192 ±  2%     +13.0%       2476        perf-sched.wait_time.max.ms.__cond_resched.__filemap_get_folio.prepare_one_folio.constprop.0
      2186 ±  2%     +14.2%       2496        perf-sched.wait_time.max.ms.__cond_resched.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
      2162 ±  3%     +14.8%       2482        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_extent_map.btrfs_get_extent.btrfs_set_extent_delalloc
      2154 ±  2%     +15.7%       2493        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_extent_state.__set_extent_bit.set_extent_bit
      2207 ±  2%     +13.9%       2513        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
      2204 ±  2%     +13.9%       2509        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
      2204 ±  2%     +14.0%       2512        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat
      2206 ±  2%     +13.8%       2510        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
      0.66 ±123%    +164.6%       1.74 ± 42%  perf-sched.wait_time.max.ms.usleep_range_state.tpm_try_transmit.tpm_transmit.tpm_transmit_cmd
      0.41 ±122%    +296.7%       1.63 ± 36%  perf-sched.wait_time.max.ms.usleep_range_state.wait_for_tpm_stat.tpm_tis_send_data.tpm_tis_send_main
      1850 ± 19%     +55.3%       2873 ± 21%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     33.43            -0.2       33.19        perf-profile.calltrace.cycles-pp.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
     33.43            -0.2       33.20        perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
     33.33            -0.2       33.10        perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter
     33.25            -0.2       33.02        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write
     27.98            +0.1       28.11        perf-profile.calltrace.cycles-pp.btrfs_dirty_folio.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
     98.01            +0.1       98.15        perf-profile.calltrace.cycles-pp.write
     97.93            +0.2       98.08        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     97.87            +0.2       98.02        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     97.88            +0.2       98.04        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     97.92            +0.2       98.08        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     97.84            +0.2       98.00        perf-profile.calltrace.cycles-pp.btrfs_do_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.81            +0.2       97.97        perf-profile.calltrace.cycles-pp.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write.do_syscall_64
     34.96            +0.2       35.14        perf-profile.calltrace.cycles-pp._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write
     26.26            +0.2       26.44        perf-profile.calltrace.cycles-pp.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit
     26.20            +0.2       26.38        perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit
     26.27            +0.2       26.45        perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.btrfs_dirty_folio
     26.14            +0.2       26.32        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent
     34.86            +0.2       35.05        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata
     35.29            +0.2       35.48        perf-profile.calltrace.cycles-pp.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
     35.29            +0.2       35.48        perf-profile.calltrace.cycles-pp.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter
     35.41            +0.2       35.62        perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
     26.60            +0.3       26.90        perf-profile.calltrace.cycles-pp.__clear_extent_bit.btrfs_dirty_folio.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
     26.55            +0.3       26.85        perf-profile.calltrace.cycles-pp.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.btrfs_dirty_folio.btrfs_buffered_write
     26.55            +0.3       26.86        perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.btrfs_dirty_folio.btrfs_buffered_write.btrfs_do_write_iter
      0.34 ±  5%      -0.1        0.29 ±  2%  perf-profile.children.cycles-pp.read
      0.40 ±  3%      -0.0        0.36 ±  3%  perf-profile.children.cycles-pp.down_write
      0.23 ±  5%      -0.0        0.20 ±  4%  perf-profile.children.cycles-pp.ksys_read
      0.47 ±  2%      -0.0        0.43 ±  3%  perf-profile.children.cycles-pp.start_secondary
      0.18 ±  8%      -0.0        0.15 ±  8%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.41            -0.0        0.38 ±  2%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.47 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.common_startup_64
      0.47 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.47 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.do_idle
      0.19 ±  6%      -0.0        0.15 ±  6%  perf-profile.children.cycles-pp.filemap_read
      0.39 ±  3%      -0.0        0.36 ±  2%  perf-profile.children.cycles-pp.rwsem_down_write_slowpath
      0.21 ±  5%      -0.0        0.18 ±  5%  perf-profile.children.cycles-pp.vfs_read
      0.34 ±  3%      -0.0        0.31 ±  2%  perf-profile.children.cycles-pp.rwsem_optimistic_spin
      0.09 ±  4%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.btrfs_bin_search
      0.10 ± 11%      -0.0        0.07 ± 14%  perf-profile.children.cycles-pp.filemap_get_pages
      0.37 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.open_last_lookups
      0.30 ±  2%      -0.0        0.27 ±  2%  perf-profile.children.cycles-pp.acpi_idle_enter
      0.31            -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.37 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.__x64_sys_creat
      0.37 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.creat64
      0.37 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.do_filp_open
      0.37 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.path_openat
      0.30            -0.0        0.27 ±  2%  perf-profile.children.cycles-pp.acpi_safe_halt
      0.31            -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter
      0.16 ±  8%      -0.0        0.14 ±  5%  perf-profile.children.cycles-pp.osq_lock
      0.30            -0.0        0.27 ±  2%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      0.09            -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.read_block_for_search
      0.24 ±  3%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.18 ±  3%      -0.0        0.16        perf-profile.children.cycles-pp.__set_extent_bit
      0.13 ±  3%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.set_extent_bit
      0.34            -0.0        0.32        perf-profile.children.cycles-pp.__close
      0.34            -0.0        0.32        perf-profile.children.cycles-pp.__dentry_kill
      0.34            -0.0        0.32        perf-profile.children.cycles-pp.__fput
      0.34            -0.0        0.32        perf-profile.children.cycles-pp.__x64_sys_close
      0.34            -0.0        0.32        perf-profile.children.cycles-pp.dput
      0.08 ±  5%      -0.0        0.07        perf-profile.children.cycles-pp.btrfs_space_info_update_bytes_may_use
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.kmem_cache_free
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.up_write
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.calc_available_free_space
      0.07 ±  7%      +0.0        0.08 ±  4%  perf-profile.children.cycles-pp.btrfs_folio_clamp_clear_checked
      0.07            +0.0        0.09        perf-profile.children.cycles-pp.btrfs_drop_folio
     99.26            +0.1       99.32        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.25            +0.1       99.32        perf-profile.children.cycles-pp.do_syscall_64
      0.30 ± 19%      +0.1        0.40 ±  8%  perf-profile.children.cycles-pp.btrfs_reserve_data_bytes
      0.32 ± 17%      +0.1        0.42 ±  8%  perf-profile.children.cycles-pp.btrfs_check_data_free_space
      0.24 ± 16%      +0.1        0.36 ±  7%  perf-profile.children.cycles-pp.btrfs_free_reserved_data_space_noquota
     27.98            +0.1       28.11        perf-profile.children.cycles-pp.btrfs_dirty_folio
     98.05            +0.1       98.20        perf-profile.children.cycles-pp.write
     97.91            +0.2       98.07        perf-profile.children.cycles-pp.ksys_write
     97.89            +0.2       98.05        perf-profile.children.cycles-pp.vfs_write
     97.84            +0.2       98.00        perf-profile.children.cycles-pp.btrfs_do_write_iter
     97.81            +0.2       97.98        perf-profile.children.cycles-pp.btrfs_buffered_write
     35.43            +0.2       35.62        perf-profile.children.cycles-pp.btrfs_reserve_metadata_bytes
     35.42            +0.2       35.63        perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
     26.73            +0.3       27.02        perf-profile.children.cycles-pp.__clear_extent_bit
     26.63            +0.3       26.93        perf-profile.children.cycles-pp.clear_state_bit
     35.67            +0.3       35.97        perf-profile.children.cycles-pp.__reserve_bytes
     26.59            +0.3       26.89        perf-profile.children.cycles-pp.btrfs_clear_delalloc_extent
     95.31            +0.3       95.62        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     95.80            +0.3       96.12        perf-profile.children.cycles-pp._raw_spin_lock
      0.66            -0.0        0.63        perf-profile.self.cycles-pp._raw_spin_lock
      0.09 ±  5%      -0.0        0.06 ± 11%  perf-profile.self.cycles-pp.btrfs_bin_search
      0.16 ±  6%      -0.0        0.14 ±  5%  perf-profile.self.cycles-pp.osq_lock
      0.08 ±  5%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.btrfs_space_info_update_bytes_may_use
      0.15 ±  3%      -0.0        0.13 ±  3%  perf-profile.self.cycles-pp.acpi_safe_halt
      0.06            -0.0        0.05        perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.07            +0.0        0.08        perf-profile.self.cycles-pp.btrfs_block_rsv_release
      0.06 ±  7%      +0.0        0.08        perf-profile.self.cycles-pp.btrfs_folio_clamp_clear_checked
      0.10            +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.need_preemptive_reclaim
     94.52            +0.3       94.86        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] btrfs: zoned: split out data relocation space_info
  2024-12-10  5:40   ` Naohiro Aota
@ 2025-01-02 14:19     ` David Sterba
  0 siblings, 0 replies; 20+ messages in thread
From: David Sterba @ 2025-01-02 14:19 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: Johannes Thumshirn, linux-btrfs@vger.kernel.org

On Tue, Dec 10, 2024 at 05:40:30AM +0000, Naohiro Aota wrote:
> On Sat, Dec 07, 2024 at 11:35:04AM +0000, Johannes Thumshirn wrote:
> > On 05.12.24 08:50, Naohiro Aota wrote:
> > > As discussed in [1], there is a longstanding early ENOSPC issue on the
> > > zoned mode even with simple fio script. This is also causing blktests
> > > zbd/009 to fail [2].
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
> > > [2] https://github.com/osandov/blktests/issues/150
> > > 
> > > This series is the second part to fix the ENOSPC issue. This series
> > > introduces "space_info sub-space" and use it split a space_info for data
> > > relocation block group.
> > > 
> > > Current code assumes we have only one space_info for each block group type
> > > (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> > > manage special block groups.
> > > 
> > > One example is handling the data relocation block group for the zoned mode.
> > > That block group is dedicated for writing relocated data and we cannot
> > > allocate any regular extent from that block group, which is implemented in
> > > the zoned extent allocator. That block group still belongs to the normal
> > > data space_info. So, when all the normal data block groups are full and
> > > there are some free space in the dedicated block group, the space_info
> > > looks to have some free space, while it cannot allocate normal extent
> > > anymore. That results in a strange ENOSPC error. We need to have a
> > > space_info for the relocation data block group to represent the situation
> > > properly.
> > 
> > I like the idea and the patches, but I'm a bit concerned it diverges 
> > zoned and non-zoned btrfs quite a bit in handling relocation. I'd be 
> > interested what David and Josef think of it. If no one objects to have 
> > these sub-space_infos zoned specific I'm all good with it as it fixes 
> > real problems.
> 
> Hmm, for that point, we already do the relocation a bit different on zoned
> vs non-zoned. On the zoned mode, the relocated data is always allocated
> from a dedicated block group. This series just move that block group into
> its own space_info (aka sub-space_info) to fix a space accounting issue.
> 
> I admit the concept of sub-space_info is new, so I'd like to hear David and
> Josef's opinion on it.

One thing that sounds ok is that it fixes real problems, even if it's
just for the zoned mode, the exceptions for handling space already
exist.

Conceptually the sub groups also sound ok but this is a design thing and
with reach to user space (e.g. exporting to sysfs or requiring more
detailed 'btrfs fi df' and other utilities). Handling the reloc and
potentially tree-log block groups more transparently sounds like a good
idea. Reusing that for more fine grained space control like tiering
sounds also ok.

> > Would it be useful to also do the same for regular btrfs? And while 
> > we're at it, the treelog block-group for zoned mode could benefit form a 
> > own space-info as well, couldn't it? To not run into premature ENOSPC on 
> > frequent syncs, or is this unlikely to happen (I'm thinking out loud here).
> 
> On the regular mode, it can allocate space for relocation data from any
> block group. So, it is not so useful to separate space_info for
> that. However, it would be interesting to have a dedicated relocation data
> block group as well on the regular mode, because it could reduce the
> fragmentation of relocated data. This would be interesting topic to
> explore.
> 
> Yes, adding treelog sub-space_info is useful too. Apparently, I sometime
> see a test failure due to treelog space_info accounting mismatch. I didn't
> implement that sub-space_info for now, because I'd like to know first that the
> sub-space_info concept itself is the right way to go.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-01-02 14:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-05  7:48 [PATCH 00/11] btrfs: zoned: split out data relocation space_info Naohiro Aota
2024-12-05  7:48 ` [PATCH 01/11] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
2024-12-05 23:48   ` Johannes Thumshirn
2024-12-05  7:48 ` [PATCH 02/11] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
2024-12-05 23:49   ` Johannes Thumshirn
2024-12-05  7:48 ` [PATCH 03/11] btrfs: factor out init_space_info() Naohiro Aota
2024-12-05  7:48 ` [PATCH 04/11] btrfs: spin out do_async_reclaim_data_space() Naohiro Aota
2024-12-05  7:48 ` [PATCH 05/11] btrfs: factor out check_removing_space_info() Naohiro Aota
2024-12-07 11:29   ` Johannes Thumshirn
2024-12-10  5:16     ` Naohiro Aota
2024-12-05  7:48 ` [PATCH 06/11] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
2024-12-05  7:48 ` [PATCH 07/11] btrfs: pass space_info for block group creation Naohiro Aota
2024-12-05  7:48 ` [PATCH 08/11] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
2024-12-24 14:21   ` kernel test robot
2024-12-05  7:48 ` [PATCH 09/11] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
2024-12-05  7:48 ` [PATCH 10/11] btrfs: use proper data space_info Naohiro Aota
2024-12-05  7:48 ` [PATCH 11/11] btrfs: reclaim from data sub-space space_info Naohiro Aota
2024-12-07 11:35 ` [PATCH 00/11] btrfs: zoned: split out data relocation space_info Johannes Thumshirn
2024-12-10  5:40   ` Naohiro Aota
2025-01-02 14:19     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox