* [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups
@ 2025-04-16 14:28 Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 01/13] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
` (12 more replies)
0 siblings, 13 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
As discussed in [1], there is a longstanding early ENOSPC issue on the
zoned mode even with simple fio script. This is also causing blktests
zbd/009 to fail [2].
[1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
[2] https://github.com/osandov/blktests/issues/150
This series is the second part to fix the ENOSPC issue. This series
introduces "space_info sub-space" and use it split a space_info for data
relocation block group and metadata tree-log block group.
Current code assumes we have only one space_info for each block group type
(DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
manage special block groups.
One example is handling the data relocation block group for the zoned mode.
That block group is dedicated for writing relocated data and we cannot
allocate any regular extent from that block group, which is implemented in
the zoned extent allocator. That block group still belongs to the normal
data space_info. So, when all the normal data block groups are full and
there are some free space in the dedicated block group, the space_info
looks to have some free space, while it cannot allocate normal extent
anymore. That results in a strange ENOSPC error. We need to have a
space_info for the relocation data block group to represent the situation
properly.
Changes:
- v3:
- Add proper error handling at ASSERT in btrfs_create_chunk
- Move the loop on sub_group into check_removing_space_info()
- Introduce create_space_info_sub_group() to create sub_group
space_info.
- Format fix.
- v2: https://patch.msgid.link/cover.1742364593.git.naohiro.aota@wdc.com
- Add tree-log sub-space_info implementation.
- Some spell and style fix.
- v1: https://patch.msgid.link/cover.1733384171.git.naohiro.aota@wdc.com
Naohiro Aota (13):
btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
btrfs: take struct btrfs_inode in
btrfs_free_reserved_data_space_noquota
btrfs: factor out init_space_info()
btrfs: spin out do_async_reclaim_{data,metadata}_space()
btrfs: factor out check_removing_space_info()
btrfs: introduce space_info argument to btrfs_chunk_alloc
btrfs: pass space_info for block group creation
btrfs: introduce btrfs_space_info sub-group
btrfs: introduce tree-log sub-space_info
btrfs: tweak extent/chunk allocation for space_info sub-space
btrfs: use proper data space_info
btrfs: add block_rsv for treelog
btrfs: reclaim from sub-space space_info
fs/btrfs/block-group.c | 99 +++++++++++++++++-----------
fs/btrfs/block-group.h | 7 +-
fs/btrfs/block-rsv.c | 12 ++++
fs/btrfs/block-rsv.h | 1 +
fs/btrfs/delalloc-space.c | 24 ++++---
fs/btrfs/delalloc-space.h | 3 +-
fs/btrfs/disk-io.c | 1 +
fs/btrfs/extent-tree.c | 20 ++++--
fs/btrfs/fs.h | 2 +
fs/btrfs/inode.c | 4 +-
fs/btrfs/relocation.c | 3 +-
fs/btrfs/space-info.c | 134 ++++++++++++++++++++++++++++----------
fs/btrfs/space-info.h | 11 +++-
fs/btrfs/sysfs.c | 26 ++++++--
fs/btrfs/transaction.c | 2 +-
fs/btrfs/volumes.c | 22 +++++--
fs/btrfs/volumes.h | 3 +-
17 files changed, 267 insertions(+), 107 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3 01/13] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 02/13] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
` (11 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Take struct btrfs_space_info in btrfs_reserve_data_bytes() to allow
reserving the data from multiple data space_info candidates.
This is a preparation for the following commits and there is no functional
change.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/delalloc-space.c | 4 ++--
fs/btrfs/space-info.c | 10 +++++-----
fs/btrfs/space-info.h | 2 +-
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 1479be2427cb..c7181779b013 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -123,7 +123,7 @@ int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
if (btrfs_is_free_space_inode(inode))
flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
- return btrfs_reserve_data_bytes(fs_info, bytes, flush);
+ return btrfs_reserve_data_bytes(fs_info->data_sinfo, bytes, flush);
}
int btrfs_check_data_free_space(struct btrfs_inode *inode,
@@ -144,7 +144,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
else if (btrfs_is_free_space_inode(inode))
flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
- ret = btrfs_reserve_data_bytes(fs_info, len, flush);
+ ret = btrfs_reserve_data_bytes(fs_info->data_sinfo, len, flush);
if (ret < 0)
return ret;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 77cc5d4a5a47..3bb7246f40fa 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1836,10 +1836,10 @@ int btrfs_reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
* This will reserve bytes from the data space info. If there is not enough
* space then we will attempt to flush space as specified by flush.
*/
-int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes,
enum btrfs_reserve_flush_enum flush)
{
- struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+ struct btrfs_fs_info *fs_info = space_info->fs_info;
int ret;
ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
@@ -1847,12 +1847,12 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
flush == BTRFS_RESERVE_NO_FLUSH);
ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
- ret = __reserve_bytes(fs_info, data_sinfo, bytes, flush);
+ ret = __reserve_bytes(fs_info, space_info, bytes, flush);
if (ret == -ENOSPC) {
trace_btrfs_space_reservation(fs_info, "space_info:enospc",
- data_sinfo->flags, bytes, 1);
+ space_info->flags, bytes, 1);
if (btrfs_test_opt(fs_info, ENOSPC_DEBUG))
- btrfs_dump_space_info(fs_info, data_sinfo, bytes, 0);
+ btrfs_dump_space_info(fs_info, space_info, bytes, 0);
}
return ret;
}
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index a96efdb5e681..7459b4eb99cd 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -288,7 +288,7 @@ static inline void btrfs_space_info_free_bytes_may_use(
btrfs_try_granting_tickets(space_info->fs_info, space_info);
spin_unlock(&space_info->lock);
}
-int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes,
enum btrfs_reserve_flush_enum flush);
void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 02/13] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 01/13] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 03/13] btrfs: factor out init_space_info() Naohiro Aota
` (10 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
As well as the last patch, take struct btrfs_inode in the function and let
it distinguish which data space it is working on in a later patch. There is
no functional change with this commit.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/delalloc-space.c | 7 ++++---
fs/btrfs/delalloc-space.h | 3 +--
fs/btrfs/inode.c | 4 ++--
fs/btrfs/relocation.c | 3 +--
4 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index c7181779b013..a18895255af9 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -151,7 +151,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
if (ret < 0) {
- btrfs_free_reserved_data_space_noquota(fs_info, len);
+ btrfs_free_reserved_data_space_noquota(inode, len);
extent_changeset_free(*reserved);
*reserved = NULL;
} else {
@@ -168,9 +168,10 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
* which we can't sleep and is sure it won't affect qgroup reserved space.
* Like clear_bit_hook().
*/
-void btrfs_free_reserved_data_space_noquota(struct btrfs_fs_info *fs_info,
+void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode,
u64 len)
{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct btrfs_space_info *data_sinfo;
ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
@@ -196,7 +197,7 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
round_down(start, fs_info->sectorsize);
start = round_down(start, fs_info->sectorsize);
- btrfs_free_reserved_data_space_noquota(fs_info, len);
+ btrfs_free_reserved_data_space_noquota(inode, len);
btrfs_qgroup_free_data(inode, reserved, start, len, NULL);
}
diff --git a/fs/btrfs/delalloc-space.h b/fs/btrfs/delalloc-space.h
index 069005959479..6119c0d3f883 100644
--- a/fs/btrfs/delalloc-space.h
+++ b/fs/btrfs/delalloc-space.h
@@ -18,8 +18,7 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
void btrfs_delalloc_release_space(struct btrfs_inode *inode,
struct extent_changeset *reserved,
u64 start, u64 len, bool qgroup_free);
-void btrfs_free_reserved_data_space_noquota(struct btrfs_fs_info *fs_info,
- u64 len);
+void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode, u64 len);
void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
bool qgroup_free);
int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 868ec20ef805..652dd9c5ec82 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2591,7 +2591,7 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
!btrfs_is_free_space_inode(inode) &&
!(state->state & EXTENT_NORESERVE) &&
(bits & EXTENT_CLEAR_DATA_RESV))
- btrfs_free_reserved_data_space_noquota(fs_info, len);
+ btrfs_free_reserved_data_space_noquota(inode, len);
percpu_counter_add_batch(&fs_info->delalloc_bytes, -len,
fs_info->delalloc_batch);
@@ -9718,7 +9718,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
* bytes_may_use.
*/
if (!extent_reserved)
- btrfs_free_reserved_data_space_noquota(fs_info, disk_num_bytes);
+ btrfs_free_reserved_data_space_noquota(inode, disk_num_bytes);
out_unlock:
btrfs_unlock_extent(io_tree, start, end, &cached_state);
out_folios:
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 6ba9fcb53c33..6f4d9ffa404e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2749,8 +2749,7 @@ static noinline_for_stack int prealloc_file_extent_cluster(struct reloc_control
btrfs_inode_unlock(inode, 0);
if (cur_offset < prealloc_end)
- btrfs_free_reserved_data_space_noquota(inode->root->fs_info,
- prealloc_end + 1 - cur_offset);
+ btrfs_free_reserved_data_space_noquota(inode, prealloc_end + 1 - cur_offset);
return ret;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 03/13] btrfs: factor out init_space_info()
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 01/13] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 02/13] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 04/13] btrfs: spin out do_async_reclaim_{data,metadata}_space() Naohiro Aota
` (9 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Factor out initialization of the space_info struct, which is used in a
later patch. There is no functional change.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/space-info.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 3bb7246f40fa..7334ffa67a86 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -234,19 +234,11 @@ void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info,
WRITE_ONCE(space_info->chunk_size, chunk_size);
}
-static int create_space_info(struct btrfs_fs_info *info, u64 flags)
+static void init_space_info(struct btrfs_fs_info *info,
+ struct btrfs_space_info *space_info, u64 flags)
{
-
- struct btrfs_space_info *space_info;
- int i;
- int ret;
-
- space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
- if (!space_info)
- return -ENOMEM;
-
space_info->fs_info = info;
- for (i = 0; i < BTRFS_NR_RAID_TYPES; i++)
+ for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
INIT_LIST_HEAD(&space_info->block_groups[i]);
init_rwsem(&space_info->groups_sem);
spin_lock_init(&space_info->lock);
@@ -260,6 +252,19 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
if (btrfs_is_zoned(info))
space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
+}
+
+static int create_space_info(struct btrfs_fs_info *info, u64 flags)
+{
+
+ struct btrfs_space_info *space_info;
+ int ret;
+
+ space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
+ if (!space_info)
+ return -ENOMEM;
+
+ init_space_info(info, space_info, flags);
ret = btrfs_sysfs_add_space_info_type(info, space_info);
if (ret)
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 04/13] btrfs: spin out do_async_reclaim_{data,metadata}_space()
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (2 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 03/13] btrfs: factor out init_space_info() Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 05/13] btrfs: factor out check_removing_space_info() Naohiro Aota
` (8 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Factor out the main part of btrfs_async_reclaim_data_space() to
do_async_reclaim_data_space(), so it can take data space_info parameter it
is working on. Do the same for metadata. There is no functional change.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/space-info.c | 45 ++++++++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 7334ffa67a86..d6d33ab754ba 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1088,23 +1088,15 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
return (tickets_id != space_info->tickets_id);
}
-/*
- * This is for normal flushers, we can wait all goddamned day if we want to. We
- * will loop and continuously try to flush as long as we are making progress.
- * We count progress as clearing off tickets each time we have to loop.
- */
-static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
+static void do_async_reclaim_metadata_space(struct btrfs_space_info *space_info)
{
- struct btrfs_fs_info *fs_info;
- struct btrfs_space_info *space_info;
+ struct btrfs_fs_info *fs_info = space_info->fs_info;
u64 to_reclaim;
enum btrfs_flush_state flush_state;
int commit_cycles = 0;
u64 last_tickets_id;
enum btrfs_flush_state final_state;
- fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
- space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
if (btrfs_is_zoned(fs_info))
final_state = RESET_ZONES;
else
@@ -1178,6 +1170,21 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
} while (flush_state <= final_state);
}
+/*
+ * This is for normal flushers, we can wait all goddamned day if we want to. We
+ * will loop and continuously try to flush as long as we are making progress.
+ * We count progress as clearing off tickets each time we have to loop.
+ */
+static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
+{
+ struct btrfs_fs_info *fs_info;
+ struct btrfs_space_info *space_info;
+
+ fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
+ space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+ do_async_reclaim_metadata_space(space_info);
+}
+
/*
* This handles pre-flushing of metadata space before we get to the point that
* we need to start blocking threads on tickets. The logic here is different
@@ -1323,16 +1330,12 @@ static const enum btrfs_flush_state data_flush_states[] = {
ALLOC_CHUNK_FORCE,
};
-static void btrfs_async_reclaim_data_space(struct work_struct *work)
+static void do_async_reclaim_data_space(struct btrfs_space_info *space_info)
{
- struct btrfs_fs_info *fs_info;
- struct btrfs_space_info *space_info;
+ struct btrfs_fs_info *fs_info = space_info->fs_info;
u64 last_tickets_id;
enum btrfs_flush_state flush_state = 0;
- fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
- space_info = fs_info->data_sinfo;
-
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
space_info->flush = 0;
@@ -1400,6 +1403,16 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
spin_unlock(&space_info->lock);
}
+static void btrfs_async_reclaim_data_space(struct work_struct *work)
+{
+ struct btrfs_fs_info *fs_info;
+ struct btrfs_space_info *space_info;
+
+ fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
+ space_info = fs_info->data_sinfo;
+ do_async_reclaim_data_space(space_info);
+}
+
void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
{
INIT_WORK(&fs_info->async_reclaim_work, btrfs_async_reclaim_metadata_space);
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 05/13] btrfs: factor out check_removing_space_info()
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (3 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 04/13] btrfs: spin out do_async_reclaim_{data,metadata}_space() Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
` (7 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Factor out check_removing_space_info() from btrfs_free_block_groups(). It
sanity checks a to-be-removed space_info. There is no functional change.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/block-group.c | 51 ++++++++++++++++++++++++------------------
1 file changed, 29 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 91807d294366..b700d80089d3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4400,6 +4400,34 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
}
}
+static void check_removing_space_info(struct btrfs_space_info *space_info)
+{
+ struct btrfs_fs_info *info = space_info->fs_info;
+
+ /*
+ * Do not hide this behind enospc_debug, this is actually
+ * important and indicates a real bug if this happens.
+ */
+ if (WARN_ON(space_info->bytes_pinned > 0 ||
+ space_info->bytes_may_use > 0))
+ btrfs_dump_space_info(info, space_info, 0, 0);
+
+ /*
+ * If there was a failure to cleanup a log tree, very likely due
+ * to an IO failure on a writeback attempt of one or more of its
+ * extent buffers, we could not do proper (and cheap) unaccounting
+ * of their reserved space, so don't warn on bytes_reserved > 0 in
+ * that case.
+ */
+ if (!(space_info->flags & BTRFS_BLOCK_GROUP_METADATA) ||
+ !BTRFS_FS_LOG_CLEANUP_ERROR(info)) {
+ if (WARN_ON(space_info->bytes_reserved > 0))
+ btrfs_dump_space_info(info, space_info, 0, 0);
+ }
+
+ WARN_ON(space_info->reclaim_size > 0);
+}
+
/*
* Must be called only after stopping all workers, since we could have block
* group caching kthreads running, and therefore they could race with us if we
@@ -4501,28 +4529,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_space_info,
list);
- /*
- * Do not hide this behind enospc_debug, this is actually
- * important and indicates a real bug if this happens.
- */
- if (WARN_ON(space_info->bytes_pinned > 0 ||
- space_info->bytes_may_use > 0))
- btrfs_dump_space_info(info, space_info, 0, 0);
-
- /*
- * If there was a failure to cleanup a log tree, very likely due
- * to an IO failure on a writeback attempt of one or more of its
- * extent buffers, we could not do proper (and cheap) unaccounting
- * of their reserved space, so don't warn on bytes_reserved > 0 in
- * that case.
- */
- if (!(space_info->flags & BTRFS_BLOCK_GROUP_METADATA) ||
- !BTRFS_FS_LOG_CLEANUP_ERROR(info)) {
- if (WARN_ON(space_info->bytes_reserved > 0))
- btrfs_dump_space_info(info, space_info, 0, 0);
- }
-
- WARN_ON(space_info->reclaim_size > 0);
+ check_removing_space_info(space_info);
list_del(&space_info->list);
btrfs_sysfs_remove_space_info(space_info);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (4 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 05/13] btrfs: factor out check_removing_space_info() Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-17 12:38 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 07/13] btrfs: pass space_info for block group creation Naohiro Aota
` (6 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Take an optional btrfs_space_info argument in btrfs_chunk_alloc(). If
specified, btrfs_chunk_alloc() works on the space_info. If not, the default
space_info is used as the same as before.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/block-group.c | 19 ++++++++++++-------
fs/btrfs/block-group.h | 3 ++-
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/space-info.c | 2 +-
fs/btrfs/transaction.c | 2 +-
5 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index b700d80089d3..12cc9069d4bb 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3018,7 +3018,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
*/
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
if (alloc_flags != cache->flags) {
- ret = btrfs_chunk_alloc(trans, alloc_flags,
+ ret = btrfs_chunk_alloc(trans, NULL, alloc_flags,
CHUNK_ALLOC_FORCE);
/*
* ENOSPC is allowed here, we may have enough space
@@ -3047,7 +3047,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
goto unlock_out;
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->space_info->flags);
- ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
+ ret = btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
/*
@@ -3899,7 +3899,7 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
{
u64 alloc_flags = btrfs_get_alloc_profile(trans->fs_info, type);
- return btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
+ return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
}
static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
@@ -4102,12 +4102,15 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
* - return 0 if it doesn't need to allocate a new chunk,
* - return 1 if it successfully allocates a chunk,
* - return errors including -ENOSPC otherwise.
+ *
+ * @space_info can optionally be specified to make a new chunk belong to it. If
+ * it is NULL, it is set automatically.
*/
-int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
+int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
+ struct btrfs_space_info *space_info, u64 flags,
enum btrfs_chunk_alloc_enum force)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
- struct btrfs_space_info *space_info;
struct btrfs_block_group *ret_bg;
bool wait_for_alloc = false;
bool should_alloc = false;
@@ -4146,8 +4149,10 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
return -ENOSPC;
- space_info = btrfs_find_space_info(fs_info, flags);
- ASSERT(space_info);
+ if (!space_info) {
+ space_info = btrfs_find_space_info(fs_info, flags);
+ ASSERT(space_info);
+ }
do {
spin_lock(&space_info->lock);
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 36937eeab9b8..c01f3af726a1 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -342,7 +342,8 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group *cache,
bool force_wrong_size_class);
void btrfs_free_reserved_bytes(struct btrfs_block_group *cache,
u64 num_bytes, int delalloc);
-int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
+int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
+ struct btrfs_space_info *space_info, u64 flags,
enum btrfs_chunk_alloc_enum force);
int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a68a8a07caff..1dad1a42c9c1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4159,7 +4159,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
return ret;
}
- ret = btrfs_chunk_alloc(trans, ffe_ctl->flags,
+ ret = btrfs_chunk_alloc(trans, NULL, ffe_ctl->flags,
CHUNK_ALLOC_FORCE_FOR_EXTENT);
/* Do not bail out on ENOSPC since we can do more. */
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index d6d33ab754ba..2489c2a16123 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -817,7 +817,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
ret = PTR_ERR(trans);
break;
}
- ret = btrfs_chunk_alloc(trans,
+ ret = btrfs_chunk_alloc(trans, space_info,
btrfs_get_alloc_profile(fs_info, space_info->flags),
(state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE :
CHUNK_ALLOC_FORCE);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 39e48bf610a1..670e0527996c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -763,7 +763,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
if (do_chunk_alloc && num_bytes) {
u64 flags = h->block_rsv->space_info->flags;
- btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
+ btrfs_chunk_alloc(h, NULL, btrfs_get_alloc_profile(fs_info, flags),
CHUNK_ALLOC_NO_FORCE);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 07/13] btrfs: pass space_info for block group creation
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (5 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-17 12:40 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
` (5 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Add btrfs_space_info parameter to btrfs_make_block_group(), its related
functions and related struct. Passed space_info will have a new block
group. If NULL is passed, it uses the default space_info.
The parameter is used in a later commit and the behavior is unchanged now.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/block-group.c | 18 ++++++++++--------
fs/btrfs/block-group.h | 4 ++--
fs/btrfs/volumes.c | 22 +++++++++++++++++-----
fs/btrfs/volumes.h | 3 ++-
4 files changed, 31 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 12cc9069d4bb..846c9737ff5a 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2866,8 +2866,8 @@ static u64 calculate_global_root_id(const struct btrfs_fs_info *fs_info, u64 off
}
struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
- u64 type,
- u64 chunk_offset, u64 size)
+ struct btrfs_space_info *space_info,
+ u64 type, u64 chunk_offset, u64 size)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_block_group *cache;
@@ -2921,7 +2921,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
* assigned to our block group. We want our bg to be added to the rbtree
* with its ->space_info set.
*/
- cache->space_info = btrfs_find_space_info(fs_info, cache->flags);
+ cache->space_info = space_info;
ASSERT(cache->space_info);
ret = btrfs_add_block_group_cache(cache);
@@ -3902,7 +3902,9 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
}
-static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
+static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans,
+ struct btrfs_space_info *space_info,
+ u64 flags)
{
struct btrfs_block_group *bg;
int ret;
@@ -3915,7 +3917,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
*/
check_system_chunk(trans, flags);
- bg = btrfs_create_chunk(trans, flags);
+ bg = btrfs_create_chunk(trans, space_info, flags);
if (IS_ERR(bg)) {
ret = PTR_ERR(bg);
goto out;
@@ -3964,7 +3966,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
const u64 sys_flags = btrfs_system_alloc_profile(trans->fs_info);
struct btrfs_block_group *sys_bg;
- sys_bg = btrfs_create_chunk(trans, sys_flags);
+ sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
if (IS_ERR(sys_bg)) {
ret = PTR_ERR(sys_bg);
btrfs_abort_transaction(trans, ret);
@@ -4214,7 +4216,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
force_metadata_allocation(fs_info);
}
- ret_bg = do_chunk_alloc(trans, flags);
+ ret_bg = do_chunk_alloc(trans, space_info, flags);
trans->allocating_chunk = false;
if (IS_ERR(ret_bg)) {
@@ -4297,7 +4299,7 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans,
* the paths we visit in the chunk tree (they were already COWed
* or created in the current transaction for example).
*/
- bg = btrfs_create_chunk(trans, flags);
+ bg = btrfs_create_chunk(trans, NULL, flags);
if (IS_ERR(bg)) {
ret = PTR_ERR(bg);
} else {
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index c01f3af726a1..35309b690d6f 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -326,8 +326,8 @@ void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info);
void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg);
int btrfs_read_block_groups(struct btrfs_fs_info *info);
struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
- u64 type,
- u64 chunk_offset, u64 size);
+ struct btrfs_space_info *space_info,
+ u64 type, u64 chunk_offset, u64 size);
void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans);
int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
bool do_chunk_alloc);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7509cbe3272c..5462c832ea19 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3420,7 +3420,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
const u64 sys_flags = btrfs_system_alloc_profile(fs_info);
struct btrfs_block_group *sys_bg;
- sys_bg = btrfs_create_chunk(trans, sys_flags);
+ sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
if (IS_ERR(sys_bg)) {
ret = PTR_ERR(sys_bg);
btrfs_abort_transaction(trans, ret);
@@ -5216,6 +5216,8 @@ struct alloc_chunk_ctl {
u64 stripe_size;
u64 chunk_size;
int ndevs;
+ /* Space_info the block group is going to belong. */
+ struct btrfs_space_info *space_info;
};
static void init_alloc_chunk_ctl_policy_regular(
@@ -5617,7 +5619,8 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
return ERR_PTR(ret);
}
- block_group = btrfs_make_block_group(trans, type, start, ctl->chunk_size);
+ block_group = btrfs_make_block_group(trans, ctl->space_info, type, start,
+ ctl->chunk_size);
if (IS_ERR(block_group)) {
btrfs_remove_chunk_map(info, map);
return block_group;
@@ -5643,7 +5646,8 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
}
struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
- u64 type)
+ struct btrfs_space_info *space_info,
+ u64 type)
{
struct btrfs_fs_info *info = trans->fs_info;
struct btrfs_fs_devices *fs_devices = info->fs_devices;
@@ -5671,8 +5675,16 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
return ERR_PTR(-EINVAL);
}
+ if (!space_info) {
+ space_info = btrfs_find_space_info(info, type);
+ if (!space_info) {
+ ASSERT(0);
+ return ERR_PTR(-EINVAL);
+ }
+ }
ctl.start = find_next_chunk(info);
ctl.type = type;
+ ctl.space_info = space_info;
init_alloc_chunk_ctl(fs_devices, &ctl);
devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
@@ -5840,12 +5852,12 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans)
*/
alloc_profile = btrfs_metadata_alloc_profile(fs_info);
- meta_bg = btrfs_create_chunk(trans, alloc_profile);
+ meta_bg = btrfs_create_chunk(trans, NULL, alloc_profile);
if (IS_ERR(meta_bg))
return PTR_ERR(meta_bg);
alloc_profile = btrfs_system_alloc_profile(fs_info);
- sys_bg = btrfs_create_chunk(trans, alloc_profile);
+ sys_bg = btrfs_create_chunk(trans, NULL, alloc_profile);
if (IS_ERR(sys_bg))
return PTR_ERR(sys_bg);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index e247d551da67..7f314a4487c4 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -715,7 +715,8 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
- u64 type);
+ struct btrfs_space_info *space_info,
+ u64 type);
void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info);
int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
blk_mode_t flags, void *holder);
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (6 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 07/13] btrfs: pass space_info for block group creation Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:56 ` Johannes Thumshirn
2025-04-17 12:43 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info Naohiro Aota
` (4 subsequent siblings)
12 siblings, 2 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Current code assumes we have only one space_info for each block group type
(DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
manage special block groups.
One example is handling the data relocation block group for the zoned mode.
That block group is dedicated for writing relocated data and we cannot
allocate any regular extent from that block group, which is implemented in
the zoned extent allocator. That block group still belongs to the normal
data space_info. So, when all the normal data block groups are full and
there are some free space in the dedicated block group, the space_info
looks to have some free space, while it cannot allocate normal extent
anymore. That results in a strange ENOSPC error. We need to have a
space_info for the relocation data block group to represent the situation
properly.
This commit adds a basic infrastructure for having a "sub-group" of a
space_info: creation and removing. A sub-group space_info belongs to one of
the primary space_infos and has the same flags as its parent.
This commit first introduces the relocation data sub-space_info, and the
next commit will introduce tree-log sub-space_info. In the future, it could
be useful to implement tiered storage for btrfs e.g, by implementing a
sub-group space_info for block groups resides on a fast storage.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
fs/btrfs/block-group.c | 11 +++++++++++
fs/btrfs/space-info.c | 38 +++++++++++++++++++++++++++++++++++---
fs/btrfs/space-info.h | 8 ++++++++
fs/btrfs/sysfs.c | 16 +++++++++++++---
4 files changed, 67 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 846c9737ff5a..475353b0b32c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4411,6 +4411,17 @@ static void check_removing_space_info(struct btrfs_space_info *space_info)
{
struct btrfs_fs_info *info = space_info->fs_info;
+ if (space_info->subgroup_id == SUB_GROUP_PRIMARY) {
+ /* This is a top space_info, proceeds its children first. */
+ for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++) {
+ if (space_info->sub_group[i]) {
+ check_removing_space_info(space_info->sub_group[i]);
+ kfree(space_info->sub_group[i]);
+ space_info->sub_group[i] = NULL;
+ }
+ }
+ }
+
/*
* Do not hide this behind enospc_debug, this is actually
* important and indicates a real bug if this happens.
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 2489c2a16123..37e55298c082 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -249,16 +249,38 @@ static void init_space_info(struct btrfs_fs_info *info,
INIT_LIST_HEAD(&space_info->priority_tickets);
space_info->clamp = 1;
btrfs_update_space_info_chunk_size(space_info, calc_chunk_size(info, flags));
+ space_info->subgroup_id = SUB_GROUP_PRIMARY;
if (btrfs_is_zoned(info))
space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
}
+static int create_space_info_sub_group(struct btrfs_space_info *parent, u64 flags,
+ enum btrfs_space_info_sub_group id)
+{
+ struct btrfs_fs_info *fs_info = parent->fs_info;
+ struct btrfs_space_info *sub_space;
+
+ ASSERT(parent->subgroup_id == SUB_GROUP_PRIMARY);
+ ASSERT(id != SUB_GROUP_PRIMARY);
+
+ sub_space = kzalloc(sizeof(*sub_space), GFP_NOFS);
+ if (!sub_space)
+ return -ENOMEM;
+
+ init_space_info(fs_info, sub_space, flags);
+ parent->sub_group[id] = sub_space;
+ sub_space->parent = parent;
+ sub_space->subgroup_id = id;
+
+ return btrfs_sysfs_add_space_info_type(fs_info, sub_space);
+}
+
static int create_space_info(struct btrfs_fs_info *info, u64 flags)
{
struct btrfs_space_info *space_info;
- int ret;
+ int ret = 0;
space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
if (!space_info)
@@ -266,6 +288,15 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
init_space_info(info, space_info, flags);
+ if (btrfs_is_zoned(info)) {
+ if (flags & BTRFS_BLOCK_GROUP_DATA)
+ ret = create_space_info_sub_group(space_info, flags,
+ SUB_GROUP_DATA_RELOC);
+ if (ret == -ENOMEM)
+ return ret;
+ ASSERT(!ret);
+ }
+
ret = btrfs_sysfs_add_space_info_type(info, space_info);
if (ret)
return ret;
@@ -561,8 +592,9 @@ static void __btrfs_dump_space_info(const struct btrfs_fs_info *fs_info,
lockdep_assert_held(&info->lock);
/* The free space could be negative in case of overcommit */
- btrfs_info(fs_info, "space_info %s has %lld free, is %sfull",
- flag_str,
+ btrfs_info(fs_info,
+ "space_info %s (sub-group id %d) has %lld free, is %sfull",
+ flag_str, info->subgroup_id,
(s64)(info->total_bytes - btrfs_space_info_used(info, true)),
info->full ? "" : "not ");
btrfs_info(fs_info,
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 7459b4eb99cd..64641885babd 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -98,8 +98,16 @@ enum btrfs_flush_state {
RESET_ZONES = 12,
};
+enum btrfs_space_info_sub_group {
+ SUB_GROUP_DATA_RELOC = 0,
+ SUB_GROUP_PRIMARY = -1,
+};
+#define BTRFS_SPACE_INFO_SUB_GROUP_MAX 1
struct btrfs_space_info {
struct btrfs_fs_info *fs_info;
+ struct btrfs_space_info *parent;
+ struct btrfs_space_info *sub_group[BTRFS_SPACE_INFO_SUB_GROUP_MAX];
+ int subgroup_id;
spinlock_t lock;
u64 total_bytes; /* total bytes in the space,
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index b9af74498b0c..92caa5d09e2f 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1930,15 +1930,25 @@ void btrfs_sysfs_remove_space_info(struct btrfs_space_info *space_info)
kobject_put(&space_info->kobj);
}
-static const char *alloc_name(u64 flags)
+static const char *alloc_name(struct btrfs_space_info *space_info)
{
+ u64 flags = space_info->flags;
+
switch (flags) {
case BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA:
return "mixed";
case BTRFS_BLOCK_GROUP_METADATA:
return "metadata";
case BTRFS_BLOCK_GROUP_DATA:
- return "data";
+ switch (space_info->subgroup_id) {
+ case SUB_GROUP_PRIMARY:
+ return "data";
+ case SUB_GROUP_DATA_RELOC:
+ return "data-reloc";
+ default:
+ WARN_ON_ONCE(1);
+ return "data (unknown sub-group)";
+ }
case BTRFS_BLOCK_GROUP_SYSTEM:
return "system";
default:
@@ -1958,7 +1968,7 @@ int btrfs_sysfs_add_space_info_type(struct btrfs_fs_info *fs_info,
ret = kobject_init_and_add(&space_info->kobj, &space_info_ktype,
fs_info->space_info_kobj, "%s",
- alloc_name(space_info->flags));
+ alloc_name(space_info));
if (ret) {
kobject_put(&space_info->kobj);
return ret;
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (7 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:57 ` Johannes Thumshirn
2025-04-17 12:44 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
` (3 subsequent siblings)
12 siblings, 2 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
This commit introduces the tree-log sub-space_info, which is sub-space of
metadata space_info and dedicated for tree-log node allocation.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
fs/btrfs/space-info.c | 4 ++++
fs/btrfs/space-info.h | 1 +
fs/btrfs/sysfs.c | 10 +++++++++-
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 37e55298c082..4b2343a3a009 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -292,6 +292,10 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
if (flags & BTRFS_BLOCK_GROUP_DATA)
ret = create_space_info_sub_group(space_info, flags,
SUB_GROUP_DATA_RELOC);
+ else if (flags & BTRFS_BLOCK_GROUP_METADATA)
+ ret = create_space_info_sub_group(space_info, flags,
+ SUB_GROUP_METADATA_TREELOG);
+
if (ret == -ENOMEM)
return ret;
ASSERT(!ret);
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 64641885babd..1aadf88e5789 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -100,6 +100,7 @@ enum btrfs_flush_state {
enum btrfs_space_info_sub_group {
SUB_GROUP_DATA_RELOC = 0,
+ SUB_GROUP_METADATA_TREELOG = 0,
SUB_GROUP_PRIMARY = -1,
};
#define BTRFS_SPACE_INFO_SUB_GROUP_MAX 1
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 92caa5d09e2f..fba31e2354e5 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1938,7 +1938,15 @@ static const char *alloc_name(struct btrfs_space_info *space_info)
case BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA:
return "mixed";
case BTRFS_BLOCK_GROUP_METADATA:
- return "metadata";
+ switch (space_info->subgroup_id) {
+ case SUB_GROUP_PRIMARY:
+ return "metadata";
+ case SUB_GROUP_METADATA_TREELOG:
+ return "metadata-treelog";
+ default:
+ WARN_ON_ONCE(1);
+ return "metadata (unknown sub-group)";
+ }
case BTRFS_BLOCK_GROUP_DATA:
switch (space_info->subgroup_id) {
case SUB_GROUP_PRIMARY:
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (8 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-17 5:48 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 11/13] btrfs: use proper data space_info Naohiro Aota
` (2 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Make the extent allocator and the chunk allocator aware of the sub-space.
It now uses SUB_GROUP_DATA_RELOC sub-space for data relocation block group,
and uses SUB_GROUP_METADATA_TREELOG for metadata tree-log block group.
And, it needs to check the space_info is the right one when a block group
candidate is given. Also, new block group should now belong to the
specified one.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/extent-tree.c | 18 ++++++++++++++----
fs/btrfs/space-info.c | 4 +++-
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1dad1a42c9c1..0744134a0000 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4347,7 +4347,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
int ret = 0;
int cache_block_group_error = 0;
struct btrfs_block_group *block_group = NULL;
- struct btrfs_space_info *space_info;
+ struct btrfs_space_info *space_info = NULL;
bool full_search = false;
WARN_ON(ffe_ctl->num_bytes < fs_info->sectorsize);
@@ -4378,10 +4378,19 @@ static noinline int find_free_extent(struct btrfs_root *root,
trace_btrfs_find_free_extent(root, ffe_ctl);
- space_info = btrfs_find_space_info(fs_info, ffe_ctl->flags);
+ if (btrfs_is_zoned(fs_info)) {
+ /* Use dedicated sub-space_info for dedicated block group users. */
+ if (ffe_ctl->for_data_reloc)
+ space_info = space_info->sub_group[SUB_GROUP_DATA_RELOC];
+ else if (ffe_ctl->for_treelog)
+ space_info = space_info->sub_group[SUB_GROUP_METADATA_TREELOG];
+ }
if (!space_info) {
- btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
- return -ENOSPC;
+ space_info = btrfs_find_space_info(fs_info, ffe_ctl->flags);
+ if (!space_info) {
+ btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
+ return -ENOSPC;
+ }
}
ret = prepare_allocation(fs_info, ffe_ctl, space_info, ins);
@@ -4402,6 +4411,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
* picked out then we don't care that the block group is cached.
*/
if (block_group && block_group_bits(block_group, ffe_ctl->flags) &&
+ block_group->space_info == space_info &&
block_group->cached != BTRFS_CACHE_NO) {
down_read(&space_info->groups_sem);
if (list_empty(&block_group->list) ||
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 4b2343a3a009..62dc69322b80 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -357,7 +357,9 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
factor = btrfs_bg_type_to_factor(block_group->flags);
- found = btrfs_find_space_info(info, block_group->flags);
+ found = block_group->space_info;
+ if (!found)
+ found = btrfs_find_space_info(info, block_group->flags);
ASSERT(found);
spin_lock(&found->lock);
found->total_bytes += block_group->length;
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 11/13] btrfs: use proper data space_info
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (9 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-16 14:59 ` Johannes Thumshirn
2025-04-16 14:28 ` [PATCH v3 12/13] btrfs: add block_rsv for treelog Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 13/13] btrfs: reclaim from sub-space space_info Naohiro Aota
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Now that, we have data sub-space for the zoned mode. This commit tweaks
some space_info functions to use proper space_info for a file.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
fs/btrfs/delalloc-space.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index a18895255af9..f7927657e036 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -111,6 +111,15 @@
* making error handling and cleanup easier.
*/
+static inline struct btrfs_space_info *data_sinfo_for_inode(const struct btrfs_inode *inode)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+
+ if (btrfs_is_zoned(fs_info) && btrfs_is_data_reloc_root(inode->root))
+ return fs_info->data_sinfo->sub_group[SUB_GROUP_DATA_RELOC];
+ return fs_info->data_sinfo;
+}
+
int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
{
struct btrfs_root *root = inode->root;
@@ -123,7 +132,7 @@ int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
if (btrfs_is_free_space_inode(inode))
flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
- return btrfs_reserve_data_bytes(fs_info->data_sinfo, bytes, flush);
+ return btrfs_reserve_data_bytes(data_sinfo_for_inode(inode), bytes, flush);
}
int btrfs_check_data_free_space(struct btrfs_inode *inode,
@@ -144,7 +153,7 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
else if (btrfs_is_free_space_inode(inode))
flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
- ret = btrfs_reserve_data_bytes(fs_info->data_sinfo, len, flush);
+ ret = btrfs_reserve_data_bytes(data_sinfo_for_inode(inode), len, flush);
if (ret < 0)
return ret;
@@ -172,12 +181,10 @@ void btrfs_free_reserved_data_space_noquota(struct btrfs_inode *inode,
u64 len)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct btrfs_space_info *data_sinfo;
ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
- data_sinfo = fs_info->data_sinfo;
- btrfs_space_info_free_bytes_may_use(data_sinfo, len);
+ btrfs_space_info_free_bytes_may_use(data_sinfo_for_inode(inode), len);
}
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 12/13] btrfs: add block_rsv for treelog
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (10 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 11/13] btrfs: use proper data space_info Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-17 12:48 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 13/13] btrfs: reclaim from sub-space space_info Naohiro Aota
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
We need to add a dedicated block_rsv for tree-log, because the block_rsv
serves for a tree node allocation in btrfs_alloc_tree_block(). Currently,
tree-log tree uses fs_info->empty_block_rsv, which is shared across trees
and points to the normal metadata space_info. Instead, we add a dedicated
block_rsv and that block_rsv can use the dedicated sub-space_info.
Currently, we use the dedicated block_rsv only for the zoned mode, but it
might be somewhat useful for the regular btrfs too.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/block-rsv.c | 12 ++++++++++++
fs/btrfs/block-rsv.h | 1 +
fs/btrfs/disk-io.c | 1 +
fs/btrfs/fs.h | 2 ++
4 files changed, 16 insertions(+)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index 3f3608299c0b..680b395b32ad 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -418,6 +418,12 @@ void btrfs_init_root_block_rsv(struct btrfs_root *root)
case BTRFS_CHUNK_TREE_OBJECTID:
root->block_rsv = &fs_info->chunk_block_rsv;
break;
+ case BTRFS_TREE_LOG_OBJECTID:
+ if (btrfs_is_zoned(fs_info))
+ root->block_rsv = &fs_info->treelog_rsv;
+ else
+ root->block_rsv = NULL;
+ break;
default:
root->block_rsv = NULL;
break;
@@ -438,6 +444,12 @@ void btrfs_init_global_block_rsv(struct btrfs_fs_info *fs_info)
fs_info->delayed_block_rsv.space_info = space_info;
fs_info->delayed_refs_rsv.space_info = space_info;
+ /* The treelog_rsv uses a dedicated space_info on the zoned mode. */
+ if (!btrfs_is_zoned(fs_info))
+ fs_info->treelog_rsv.space_info = space_info;
+ else
+ fs_info->treelog_rsv.space_info = space_info->sub_group[SUB_GROUP_METADATA_TREELOG];
+
btrfs_update_global_block_rsv(fs_info);
}
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index d12b1fac5c74..79ae9d05cd91 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -24,6 +24,7 @@ enum btrfs_rsv_type {
BTRFS_BLOCK_RSV_CHUNK,
BTRFS_BLOCK_RSV_DELOPS,
BTRFS_BLOCK_RSV_DELREFS,
+ BTRFS_BLOCK_RSV_TREELOG,
BTRFS_BLOCK_RSV_EMPTY,
BTRFS_BLOCK_RSV_TEMP,
};
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 59da809b7d57..88dbda24ad46 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2822,6 +2822,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
BTRFS_BLOCK_RSV_GLOBAL);
btrfs_init_block_rsv(&fs_info->trans_block_rsv, BTRFS_BLOCK_RSV_TRANS);
btrfs_init_block_rsv(&fs_info->chunk_block_rsv, BTRFS_BLOCK_RSV_CHUNK);
+ btrfs_init_block_rsv(&fs_info->treelog_rsv, BTRFS_BLOCK_RSV_TREELOG);
btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
BTRFS_BLOCK_RSV_DELOPS);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index bcca43046064..0d5af9732a3c 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -471,6 +471,8 @@ struct btrfs_fs_info {
struct btrfs_block_rsv delayed_block_rsv;
/* Block reservation for delayed refs */
struct btrfs_block_rsv delayed_refs_rsv;
+ /* Block reservation for treelog tree */
+ struct btrfs_block_rsv treelog_rsv;
struct btrfs_block_rsv empty_block_rsv;
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 13/13] btrfs: reclaim from sub-space space_info
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
` (11 preceding siblings ...)
2025-04-16 14:28 ` [PATCH v3 12/13] btrfs: add block_rsv for treelog Naohiro Aota
@ 2025-04-16 14:28 ` Naohiro Aota
2025-04-17 12:49 ` Josef Bacik
12 siblings, 1 reply; 27+ messages in thread
From: Naohiro Aota @ 2025-04-16 14:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota, Johannes Thumshirn
Modify btrfs_async_{data,metadata}_reclaim() to run the reclaim process on
the sub-spaces as well.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/space-info.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 62dc69322b80..0f543e3cb2fe 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1221,6 +1221,9 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
do_async_reclaim_metadata_space(space_info);
+ for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++)
+ if (space_info->sub_group[i])
+ do_async_reclaim_metadata_space(space_info->sub_group[i]);
}
/*
@@ -1449,6 +1452,9 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
fs_info = container_of(work, struct btrfs_fs_info, async_data_reclaim_work);
space_info = fs_info->data_sinfo;
do_async_reclaim_data_space(space_info);
+ for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++)
+ if (space_info->sub_group[i])
+ do_async_reclaim_data_space(space_info->sub_group[i]);
}
void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group
2025-04-16 14:28 ` [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
@ 2025-04-16 14:56 ` Johannes Thumshirn
2025-04-17 12:43 ` Josef Bacik
1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2025-04-16 14:56 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info
2025-04-16 14:28 ` [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info Naohiro Aota
@ 2025-04-16 14:57 ` Johannes Thumshirn
2025-04-17 12:44 ` Josef Bacik
1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2025-04-16 14:57 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 11/13] btrfs: use proper data space_info
2025-04-16 14:28 ` [PATCH v3 11/13] btrfs: use proper data space_info Naohiro Aota
@ 2025-04-16 14:59 ` Johannes Thumshirn
2025-04-17 4:35 ` Naohiro Aota
0 siblings, 1 reply; 27+ messages in thread
From: Johannes Thumshirn @ 2025-04-16 14:59 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org
On 16.04.25 16:30, Naohiro Aota wrote:
> Now that, we have data sub-space for the zoned mode. This commit tweaks
Can you do a
s/This commit tweaks/Tweak/
when applying?
Other than that:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 11/13] btrfs: use proper data space_info
2025-04-16 14:59 ` Johannes Thumshirn
@ 2025-04-17 4:35 ` Naohiro Aota
0 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-17 4:35 UTC (permalink / raw)
To: Johannes Thumshirn, linux-btrfs@vger.kernel.org
On Wed Apr 16, 2025 at 11:59 PM JST, Johannes Thumshirn wrote:
> On 16.04.25 16:30, Naohiro Aota wrote:
>> Now that, we have data sub-space for the zoned mode. This commit tweaks
>
> Can you do a
>
> s/This commit tweaks/Tweak/
>
> when applying?
>
> Other than that:
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Sure. I'll fix that.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space
2025-04-16 14:28 ` [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
@ 2025-04-17 5:48 ` Naohiro Aota
0 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-17 5:48 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org; +Cc: Johannes Thumshirn
On Wed Apr 16, 2025 at 11:28 PM JST, Naohiro Aota wrote:
> Make the extent allocator and the chunk allocator aware of the sub-space.
> It now uses SUB_GROUP_DATA_RELOC sub-space for data relocation block group,
> and uses SUB_GROUP_METADATA_TREELOG for metadata tree-log block group.
>
> And, it needs to check the space_info is the right one when a block group
> candidate is given. Also, new block group should now belong to the
> specified one.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> fs/btrfs/extent-tree.c | 18 ++++++++++++++----
> fs/btrfs/space-info.c | 4 +++-
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1dad1a42c9c1..0744134a0000 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4347,7 +4347,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
> int ret = 0;
> int cache_block_group_error = 0;
> struct btrfs_block_group *block_group = NULL;
> - struct btrfs_space_info *space_info;
> + struct btrfs_space_info *space_info = NULL;
> bool full_search = false;
>
> WARN_ON(ffe_ctl->num_bytes < fs_info->sectorsize);
> @@ -4378,10 +4378,19 @@ static noinline int find_free_extent(struct btrfs_root *root,
>
> trace_btrfs_find_free_extent(root, ffe_ctl);
>
> - space_info = btrfs_find_space_info(fs_info, ffe_ctl->flags);
> + if (btrfs_is_zoned(fs_info)) {
> + /* Use dedicated sub-space_info for dedicated block group users. */
> + if (ffe_ctl->for_data_reloc)
> + space_info = space_info->sub_group[SUB_GROUP_DATA_RELOC];
> + else if (ffe_ctl->for_treelog)
> + space_info = space_info->sub_group[SUB_GROUP_METADATA_TREELOG];
> + }
I noticed this part always fails with NULL poitner dereference because
space_info is NULL here. I'll update this one.
> if (!space_info) {
> - btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
> - return -ENOSPC;
> + space_info = btrfs_find_space_info(fs_info, ffe_ctl->flags);
> + if (!space_info) {
> + btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
> + return -ENOSPC;
> + }
> }
>
> ret = prepare_allocation(fs_info, ffe_ctl, space_info, ins);
> @@ -4402,6 +4411,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc
2025-04-16 14:28 ` [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
@ 2025-04-17 12:38 ` Josef Bacik
2025-04-18 0:59 ` Naohiro Aota
0 siblings, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:38 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs, Johannes Thumshirn
On Wed, Apr 16, 2025 at 11:28:11PM +0900, Naohiro Aota wrote:
> Take an optional btrfs_space_info argument in btrfs_chunk_alloc(). If
> specified, btrfs_chunk_alloc() works on the space_info. If not, the default
> space_info is used as the same as before.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
For consistency sake I'd prefer if you'd just update the callers to lookup the
space_info and pass that in as appropriate. In fact a lot of these callers
already have the block group or space_info available, so we could avoid the
extra overhead of doing a lookup
> ---
> fs/btrfs/block-group.c | 19 ++++++++++++-------
> fs/btrfs/block-group.h | 3 ++-
> fs/btrfs/extent-tree.c | 2 +-
> fs/btrfs/space-info.c | 2 +-
> fs/btrfs/transaction.c | 2 +-
> 5 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index b700d80089d3..12cc9069d4bb 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3018,7 +3018,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
> */
> alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
> if (alloc_flags != cache->flags) {
> - ret = btrfs_chunk_alloc(trans, alloc_flags,
> + ret = btrfs_chunk_alloc(trans, NULL, alloc_flags,
> CHUNK_ALLOC_FORCE);
Here we just do cache->space_info;
> /*
> * ENOSPC is allowed here, we may have enough space
> @@ -3047,7 +3047,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
> goto unlock_out;
>
> alloc_flags = btrfs_get_alloc_profile(fs_info, cache->space_info->flags);
> - ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
> + ret = btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
Same here.
> if (ret < 0)
> goto out;
> /*
> @@ -3899,7 +3899,7 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
> {
> u64 alloc_flags = btrfs_get_alloc_profile(trans->fs_info, type);
>
> - return btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
> + return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
Here we'd have to lookup.
> }
>
> static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
> @@ -4102,12 +4102,15 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
> * - return 0 if it doesn't need to allocate a new chunk,
> * - return 1 if it successfully allocates a chunk,
> * - return errors including -ENOSPC otherwise.
> + *
> + * @space_info can optionally be specified to make a new chunk belong to it. If
> + * it is NULL, it is set automatically.
> */
> -int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
> +int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
> + struct btrfs_space_info *space_info, u64 flags,
> enum btrfs_chunk_alloc_enum force)
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
> - struct btrfs_space_info *space_info;
> struct btrfs_block_group *ret_bg;
> bool wait_for_alloc = false;
> bool should_alloc = false;
> @@ -4146,8 +4149,10 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
> if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
> return -ENOSPC;
>
> - space_info = btrfs_find_space_info(fs_info, flags);
> - ASSERT(space_info);
> + if (!space_info) {
> + space_info = btrfs_find_space_info(fs_info, flags);
> + ASSERT(space_info);
> + }
>
> do {
> spin_lock(&space_info->lock);
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 36937eeab9b8..c01f3af726a1 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -342,7 +342,8 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group *cache,
> bool force_wrong_size_class);
> void btrfs_free_reserved_bytes(struct btrfs_block_group *cache,
> u64 num_bytes, int delalloc);
> -int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
> +int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
> + struct btrfs_space_info *space_info, u64 flags,
> enum btrfs_chunk_alloc_enum force);
> int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
> void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a68a8a07caff..1dad1a42c9c1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4159,7 +4159,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
> return ret;
> }
>
> - ret = btrfs_chunk_alloc(trans, ffe_ctl->flags,
> + ret = btrfs_chunk_alloc(trans, NULL, ffe_ctl->flags,
> CHUNK_ALLOC_FORCE_FOR_EXTENT);
We'd have to look up here.
>
> /* Do not bail out on ENOSPC since we can do more. */
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index d6d33ab754ba..2489c2a16123 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -817,7 +817,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
> ret = PTR_ERR(trans);
> break;
> }
> - ret = btrfs_chunk_alloc(trans,
> + ret = btrfs_chunk_alloc(trans, space_info,
> btrfs_get_alloc_profile(fs_info, space_info->flags),
> (state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE :
> CHUNK_ALLOC_FORCE);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 39e48bf610a1..670e0527996c 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -763,7 +763,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
> if (do_chunk_alloc && num_bytes) {
> u64 flags = h->block_rsv->space_info->flags;
>
> - btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
> + btrfs_chunk_alloc(h, NULL, btrfs_get_alloc_profile(fs_info, flags),
> CHUNK_ALLOC_NO_FORCE);
Here we just do h->block_rsv->space_info. Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 07/13] btrfs: pass space_info for block group creation
2025-04-16 14:28 ` [PATCH v3 07/13] btrfs: pass space_info for block group creation Naohiro Aota
@ 2025-04-17 12:40 ` Josef Bacik
0 siblings, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:40 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs, Johannes Thumshirn
On Wed, Apr 16, 2025 at 11:28:12PM +0900, Naohiro Aota wrote:
> Add btrfs_space_info parameter to btrfs_make_block_group(), its related
> functions and related struct. Passed space_info will have a new block
> group. If NULL is passed, it uses the default space_info.
>
> The parameter is used in a later commit and the behavior is unchanged now.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> fs/btrfs/block-group.c | 18 ++++++++++--------
> fs/btrfs/block-group.h | 4 ++--
> fs/btrfs/volumes.c | 22 +++++++++++++++++-----
> fs/btrfs/volumes.h | 3 ++-
> 4 files changed, 31 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 12cc9069d4bb..846c9737ff5a 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2866,8 +2866,8 @@ static u64 calculate_global_root_id(const struct btrfs_fs_info *fs_info, u64 off
> }
>
> struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
> - u64 type,
> - u64 chunk_offset, u64 size)
> + struct btrfs_space_info *space_info,
> + u64 type, u64 chunk_offset, u64 size)
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
> struct btrfs_block_group *cache;
> @@ -2921,7 +2921,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
> * assigned to our block group. We want our bg to be added to the rbtree
> * with its ->space_info set.
> */
> - cache->space_info = btrfs_find_space_info(fs_info, cache->flags);
> + cache->space_info = space_info;
> ASSERT(cache->space_info);
>
> ret = btrfs_add_block_group_cache(cache);
> @@ -3902,7 +3902,9 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
> return btrfs_chunk_alloc(trans, NULL, alloc_flags, CHUNK_ALLOC_FORCE);
> }
>
> -static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
> +static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans,
> + struct btrfs_space_info *space_info,
> + u64 flags)
> {
> struct btrfs_block_group *bg;
> int ret;
> @@ -3915,7 +3917,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
> */
> check_system_chunk(trans, flags);
>
> - bg = btrfs_create_chunk(trans, flags);
> + bg = btrfs_create_chunk(trans, space_info, flags);
> if (IS_ERR(bg)) {
> ret = PTR_ERR(bg);
> goto out;
> @@ -3964,7 +3966,7 @@ static struct btrfs_block_group *do_chunk_alloc(struct btrfs_trans_handle *trans
> const u64 sys_flags = btrfs_system_alloc_profile(trans->fs_info);
> struct btrfs_block_group *sys_bg;
>
> - sys_bg = btrfs_create_chunk(trans, sys_flags);
> + sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
> if (IS_ERR(sys_bg)) {
> ret = PTR_ERR(sys_bg);
> btrfs_abort_transaction(trans, ret);
> @@ -4214,7 +4216,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans,
> force_metadata_allocation(fs_info);
> }
>
> - ret_bg = do_chunk_alloc(trans, flags);
> + ret_bg = do_chunk_alloc(trans, space_info, flags);
> trans->allocating_chunk = false;
>
> if (IS_ERR(ret_bg)) {
> @@ -4297,7 +4299,7 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans,
> * the paths we visit in the chunk tree (they were already COWed
> * or created in the current transaction for example).
> */
> - bg = btrfs_create_chunk(trans, flags);
> + bg = btrfs_create_chunk(trans, NULL, flags);
> if (IS_ERR(bg)) {
> ret = PTR_ERR(bg);
> } else {
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index c01f3af726a1..35309b690d6f 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -326,8 +326,8 @@ void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info);
> void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg);
> int btrfs_read_block_groups(struct btrfs_fs_info *info);
> struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *trans,
> - u64 type,
> - u64 chunk_offset, u64 size);
> + struct btrfs_space_info *space_info,
> + u64 type, u64 chunk_offset, u64 size);
> void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans);
> int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
> bool do_chunk_alloc);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7509cbe3272c..5462c832ea19 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3420,7 +3420,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> const u64 sys_flags = btrfs_system_alloc_profile(fs_info);
> struct btrfs_block_group *sys_bg;
>
> - sys_bg = btrfs_create_chunk(trans, sys_flags);
> + sys_bg = btrfs_create_chunk(trans, NULL, sys_flags);
> if (IS_ERR(sys_bg)) {
> ret = PTR_ERR(sys_bg);
> btrfs_abort_transaction(trans, ret);
> @@ -5216,6 +5216,8 @@ struct alloc_chunk_ctl {
> u64 stripe_size;
> u64 chunk_size;
> int ndevs;
> + /* Space_info the block group is going to belong. */
> + struct btrfs_space_info *space_info;
> };
>
> static void init_alloc_chunk_ctl_policy_regular(
> @@ -5617,7 +5619,8 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
> return ERR_PTR(ret);
> }
>
> - block_group = btrfs_make_block_group(trans, type, start, ctl->chunk_size);
> + block_group = btrfs_make_block_group(trans, ctl->space_info, type, start,
> + ctl->chunk_size);
> if (IS_ERR(block_group)) {
> btrfs_remove_chunk_map(info, map);
> return block_group;
> @@ -5643,7 +5646,8 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
> }
>
> struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
> - u64 type)
> + struct btrfs_space_info *space_info,
> + u64 type)
> {
> struct btrfs_fs_info *info = trans->fs_info;
> struct btrfs_fs_devices *fs_devices = info->fs_devices;
> @@ -5671,8 +5675,16 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
> return ERR_PTR(-EINVAL);
> }
>
> + if (!space_info) {
> + space_info = btrfs_find_space_info(info, type);
> + if (!space_info) {
> + ASSERT(0);
> + return ERR_PTR(-EINVAL);
> + }
> + }
Same comment here, make everybody send down the space_info instead of it being
optional. Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group
2025-04-16 14:28 ` [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
2025-04-16 14:56 ` Johannes Thumshirn
@ 2025-04-17 12:43 ` Josef Bacik
1 sibling, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:43 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs
On Wed, Apr 16, 2025 at 11:28:13PM +0900, Naohiro Aota wrote:
> Current code assumes we have only one space_info for each block group type
> (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> manage special block groups.
>
> One example is handling the data relocation block group for the zoned mode.
> That block group is dedicated for writing relocated data and we cannot
> allocate any regular extent from that block group, which is implemented in
> the zoned extent allocator. That block group still belongs to the normal
> data space_info. So, when all the normal data block groups are full and
> there are some free space in the dedicated block group, the space_info
> looks to have some free space, while it cannot allocate normal extent
> anymore. That results in a strange ENOSPC error. We need to have a
> space_info for the relocation data block group to represent the situation
> properly.
>
> This commit adds a basic infrastructure for having a "sub-group" of a
> space_info: creation and removing. A sub-group space_info belongs to one of
> the primary space_infos and has the same flags as its parent.
>
> This commit first introduces the relocation data sub-space_info, and the
> next commit will introduce tree-log sub-space_info. In the future, it could
> be useful to implement tiered storage for btrfs e.g, by implementing a
> sub-group space_info for block groups resides on a fast storage.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> fs/btrfs/block-group.c | 11 +++++++++++
> fs/btrfs/space-info.c | 38 +++++++++++++++++++++++++++++++++++---
> fs/btrfs/space-info.h | 8 ++++++++
> fs/btrfs/sysfs.c | 16 +++++++++++++---
> 4 files changed, 67 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 846c9737ff5a..475353b0b32c 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -4411,6 +4411,17 @@ static void check_removing_space_info(struct btrfs_space_info *space_info)
> {
> struct btrfs_fs_info *info = space_info->fs_info;
>
> + if (space_info->subgroup_id == SUB_GROUP_PRIMARY) {
> + /* This is a top space_info, proceeds its children first. */
> + for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++) {
> + if (space_info->sub_group[i]) {
> + check_removing_space_info(space_info->sub_group[i]);
> + kfree(space_info->sub_group[i]);
> + space_info->sub_group[i] = NULL;
> + }
> + }
> + }
> +
> /*
> * Do not hide this behind enospc_debug, this is actually
> * important and indicates a real bug if this happens.
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 2489c2a16123..37e55298c082 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -249,16 +249,38 @@ static void init_space_info(struct btrfs_fs_info *info,
> INIT_LIST_HEAD(&space_info->priority_tickets);
> space_info->clamp = 1;
> btrfs_update_space_info_chunk_size(space_info, calc_chunk_size(info, flags));
> + space_info->subgroup_id = SUB_GROUP_PRIMARY;
>
> if (btrfs_is_zoned(info))
> space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
> }
>
> +static int create_space_info_sub_group(struct btrfs_space_info *parent, u64 flags,
> + enum btrfs_space_info_sub_group id)
> +{
> + struct btrfs_fs_info *fs_info = parent->fs_info;
> + struct btrfs_space_info *sub_space;
> +
> + ASSERT(parent->subgroup_id == SUB_GROUP_PRIMARY);
> + ASSERT(id != SUB_GROUP_PRIMARY);
> +
> + sub_space = kzalloc(sizeof(*sub_space), GFP_NOFS);
> + if (!sub_space)
> + return -ENOMEM;
> +
> + init_space_info(fs_info, sub_space, flags);
> + parent->sub_group[id] = sub_space;
> + sub_space->parent = parent;
> + sub_space->subgroup_id = id;
> +
> + return btrfs_sysfs_add_space_info_type(fs_info, sub_space);
> +}
> +
> static int create_space_info(struct btrfs_fs_info *info, u64 flags)
> {
>
> struct btrfs_space_info *space_info;
> - int ret;
> + int ret = 0;
>
> space_info = kzalloc(sizeof(*space_info), GFP_NOFS);
> if (!space_info)
> @@ -266,6 +288,15 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
>
> init_space_info(info, space_info, flags);
>
> + if (btrfs_is_zoned(info)) {
> + if (flags & BTRFS_BLOCK_GROUP_DATA)
> + ret = create_space_info_sub_group(space_info, flags,
> + SUB_GROUP_DATA_RELOC);
> + if (ret == -ENOMEM)
> + return ret;
> + ASSERT(!ret);
> + }
> +
> ret = btrfs_sysfs_add_space_info_type(info, space_info);
> if (ret)
> return ret;
> @@ -561,8 +592,9 @@ static void __btrfs_dump_space_info(const struct btrfs_fs_info *fs_info,
> lockdep_assert_held(&info->lock);
>
> /* The free space could be negative in case of overcommit */
> - btrfs_info(fs_info, "space_info %s has %lld free, is %sfull",
> - flag_str,
> + btrfs_info(fs_info,
> + "space_info %s (sub-group id %d) has %lld free, is %sfull",
> + flag_str, info->subgroup_id,
> (s64)(info->total_bytes - btrfs_space_info_used(info, true)),
> info->full ? "" : "not ");
> btrfs_info(fs_info,
> diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
> index 7459b4eb99cd..64641885babd 100644
> --- a/fs/btrfs/space-info.h
> +++ b/fs/btrfs/space-info.h
> @@ -98,8 +98,16 @@ enum btrfs_flush_state {
> RESET_ZONES = 12,
> };
>
> +enum btrfs_space_info_sub_group {
> + SUB_GROUP_DATA_RELOC = 0,
> + SUB_GROUP_PRIMARY = -1,
> +};
> +#define BTRFS_SPACE_INFO_SUB_GROUP_MAX 1
We want to avoid namespace pollution, so rename these to have a btrfs prefix,
and do something like
enum btrfs_space_info_sub_group {
BTRFS_SUB_GROUP_DATA_RELOC = 0,
BTRFS_SUB_GROUP_MAX,
BTRFS_SUB_GROUP_PRIMARY = -1,
};
And then you can remove the define. Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info
2025-04-16 14:28 ` [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info Naohiro Aota
2025-04-16 14:57 ` Johannes Thumshirn
@ 2025-04-17 12:44 ` Josef Bacik
2025-04-18 1:08 ` Naohiro Aota
1 sibling, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:44 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs
On Wed, Apr 16, 2025 at 11:28:14PM +0900, Naohiro Aota wrote:
> This commit introduces the tree-log sub-space_info, which is sub-space of
> metadata space_info and dedicated for tree-log node allocation.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> fs/btrfs/space-info.c | 4 ++++
> fs/btrfs/space-info.h | 1 +
> fs/btrfs/sysfs.c | 10 +++++++++-
> 3 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 37e55298c082..4b2343a3a009 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -292,6 +292,10 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
> if (flags & BTRFS_BLOCK_GROUP_DATA)
> ret = create_space_info_sub_group(space_info, flags,
> SUB_GROUP_DATA_RELOC);
> + else if (flags & BTRFS_BLOCK_GROUP_METADATA)
> + ret = create_space_info_sub_group(space_info, flags,
> + SUB_GROUP_METADATA_TREELOG);
> +
> if (ret == -ENOMEM)
> return ret;
> ASSERT(!ret);
> diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
> index 64641885babd..1aadf88e5789 100644
> --- a/fs/btrfs/space-info.h
> +++ b/fs/btrfs/space-info.h
> @@ -100,6 +100,7 @@ enum btrfs_flush_state {
>
> enum btrfs_space_info_sub_group {
> SUB_GROUP_DATA_RELOC = 0,
> + SUB_GROUP_METADATA_TREELOG = 0,
This will mess up since they have the same value now. Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 12/13] btrfs: add block_rsv for treelog
2025-04-16 14:28 ` [PATCH v3 12/13] btrfs: add block_rsv for treelog Naohiro Aota
@ 2025-04-17 12:48 ` Josef Bacik
0 siblings, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:48 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs, Johannes Thumshirn
On Wed, Apr 16, 2025 at 11:28:17PM +0900, Naohiro Aota wrote:
> We need to add a dedicated block_rsv for tree-log, because the block_rsv
> serves for a tree node allocation in btrfs_alloc_tree_block(). Currently,
> tree-log tree uses fs_info->empty_block_rsv, which is shared across trees
> and points to the normal metadata space_info. Instead, we add a dedicated
> block_rsv and that block_rsv can use the dedicated sub-space_info.
>
> Currently, we use the dedicated block_rsv only for the zoned mode, but it
> might be somewhat useful for the regular btrfs too.
We can just use the treelog_rsv for this too, it'll give the same behavior as
the empty_rsv, so just do that and it'll make this cleaner. Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 13/13] btrfs: reclaim from sub-space space_info
2025-04-16 14:28 ` [PATCH v3 13/13] btrfs: reclaim from sub-space space_info Naohiro Aota
@ 2025-04-17 12:49 ` Josef Bacik
0 siblings, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2025-04-17 12:49 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs, Johannes Thumshirn
On Wed, Apr 16, 2025 at 11:28:18PM +0900, Naohiro Aota wrote:
> Modify btrfs_async_{data,metadata}_reclaim() to run the reclaim process on
> the sub-spaces as well.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> fs/btrfs/space-info.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 62dc69322b80..0f543e3cb2fe 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1221,6 +1221,9 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
> fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
> space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
> do_async_reclaim_metadata_space(space_info);
> + for (int i = 0; i < BTRFS_SPACE_INFO_SUB_GROUP_MAX; i++)
> + if (space_info->sub_group[i])
> + do_async_reclaim_metadata_space(space_info->sub_group[i]);
Just a formating thing, for multi-line for loops, even with a single if, it's
cleaner to have the braces. I find it easier to read and less error prone.
Thanks,
Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc
2025-04-17 12:38 ` Josef Bacik
@ 2025-04-18 0:59 ` Naohiro Aota
0 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-18 0:59 UTC (permalink / raw)
To: Josef Bacik, Naohiro Aota; +Cc: linux-btrfs@vger.kernel.org, Johannes Thumshirn
On Thu Apr 17, 2025 at 9:38 PM JST, Josef Bacik wrote:
> On Wed, Apr 16, 2025 at 11:28:11PM +0900, Naohiro Aota wrote:
>> Take an optional btrfs_space_info argument in btrfs_chunk_alloc(). If
>> specified, btrfs_chunk_alloc() works on the space_info. If not, the default
>> space_info is used as the same as before.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> For consistency sake I'd prefer if you'd just update the callers to lookup the
> space_info and pass that in as appropriate. In fact a lot of these callers
> already have the block group or space_info available, so we could avoid the
> extra overhead of doing a lookup
Sure. I'll revise this patch and next one in that way.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info
2025-04-17 12:44 ` Josef Bacik
@ 2025-04-18 1:08 ` Naohiro Aota
0 siblings, 0 replies; 27+ messages in thread
From: Naohiro Aota @ 2025-04-18 1:08 UTC (permalink / raw)
To: Josef Bacik, Naohiro Aota; +Cc: linux-btrfs@vger.kernel.org
On Thu Apr 17, 2025 at 9:44 PM JST, Josef Bacik wrote:
> On Wed, Apr 16, 2025 at 11:28:14PM +0900, Naohiro Aota wrote:
>> This commit introduces the tree-log sub-space_info, which is sub-space of
>> metadata space_info and dedicated for tree-log node allocation.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>> fs/btrfs/space-info.c | 4 ++++
>> fs/btrfs/space-info.h | 1 +
>> fs/btrfs/sysfs.c | 10 +++++++++-
>> 3 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>> index 37e55298c082..4b2343a3a009 100644
>> --- a/fs/btrfs/space-info.c
>> +++ b/fs/btrfs/space-info.c
>> @@ -292,6 +292,10 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
>> if (flags & BTRFS_BLOCK_GROUP_DATA)
>> ret = create_space_info_sub_group(space_info, flags,
>> SUB_GROUP_DATA_RELOC);
>> + else if (flags & BTRFS_BLOCK_GROUP_METADATA)
>> + ret = create_space_info_sub_group(space_info, flags,
>> + SUB_GROUP_METADATA_TREELOG);
>> +
>> if (ret == -ENOMEM)
>> return ret;
>> ASSERT(!ret);
>> diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
>> index 64641885babd..1aadf88e5789 100644
>> --- a/fs/btrfs/space-info.h
>> +++ b/fs/btrfs/space-info.h
>> @@ -100,6 +100,7 @@ enum btrfs_flush_state {
>>
>> enum btrfs_space_info_sub_group {
>> SUB_GROUP_DATA_RELOC = 0,
>> + SUB_GROUP_METADATA_TREELOG = 0,
>
> This will mess up since they have the same value now. Thanks,
They intensionally have the same value. Since SUB_GROUP_DATA_RELOC and
SUB_GROUP_METADATA_TREELOG are used to index space_info->sub_group array
and they are in the different root space_info (DATA vs METADATA), they
can have the same value.
But, yeah, making them as an ID would be more clean. So, I'm going to
make the values distinct and introduce a small inline function like
this.
static inline struct btrfs_space_info *btrfs_space_info_sub_group(struct btrfs_space_info *space_info,
enum btrfs_space_info_sub_group subgroup_id)
{
if (subgroup_id == BTRFS_SUB_GROUP_PRIMARY)
return space_info;
if (space_info->flags & BTRFS_BLOCK_GROUP_DATA) {
ASSERT(subgroup_id == BTRFS_SUB_GROUP_DATA_RELOC);
return space_info->sub_group[0];
} else if (space_info->flags & BTRFS_BLOCK_GROUP_METADATA) {
ASSERT(subgroup_id == BTRFS_SUB_GROUP_METADATA_TREELOG);
return space_info->sub_group[0];
}
/* Invalid combination */
ASSERT(0);
return NULL;
}
>
> Josef
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-04-18 1:08 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 14:28 [PATCH v3 00/13] btrfs: zoned: split out space_info for dedicated block groups Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 01/13] btrfs: take btrfs_space_info in btrfs_reserve_data_bytes Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 02/13] btrfs: take struct btrfs_inode in btrfs_free_reserved_data_space_noquota Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 03/13] btrfs: factor out init_space_info() Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 04/13] btrfs: spin out do_async_reclaim_{data,metadata}_space() Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 05/13] btrfs: factor out check_removing_space_info() Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 06/13] btrfs: introduce space_info argument to btrfs_chunk_alloc Naohiro Aota
2025-04-17 12:38 ` Josef Bacik
2025-04-18 0:59 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 07/13] btrfs: pass space_info for block group creation Naohiro Aota
2025-04-17 12:40 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 08/13] btrfs: introduce btrfs_space_info sub-group Naohiro Aota
2025-04-16 14:56 ` Johannes Thumshirn
2025-04-17 12:43 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 09/13] btrfs: introduce tree-log sub-space_info Naohiro Aota
2025-04-16 14:57 ` Johannes Thumshirn
2025-04-17 12:44 ` Josef Bacik
2025-04-18 1:08 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 10/13] btrfs: tweak extent/chunk allocation for space_info sub-space Naohiro Aota
2025-04-17 5:48 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 11/13] btrfs: use proper data space_info Naohiro Aota
2025-04-16 14:59 ` Johannes Thumshirn
2025-04-17 4:35 ` Naohiro Aota
2025-04-16 14:28 ` [PATCH v3 12/13] btrfs: add block_rsv for treelog Naohiro Aota
2025-04-17 12:48 ` Josef Bacik
2025-04-16 14:28 ` [PATCH v3 13/13] btrfs: reclaim from sub-space space_info Naohiro Aota
2025-04-17 12:49 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox