* [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and
@ 2025-02-19 7:57 Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
` (13 more replies)
0 siblings, 14 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Running mkfs.btrfs on a null_blk device with the following setup fails
as below.
- zone size: 64MB
- zone capacity: 64MB
- number of conventional zones: 6
- storage size: 2048MB
+ /home/naota/src/btrfs-progs/mkfs.btrfs -d single -m dup -f /dev/nullb0
btrfs-progs v6.10
See https://btrfs.readthedocs.io for more information.
zoned: /dev/nullb0: host-managed device detected, setting zoned feature
Resetting device zones /dev/nullb0 (32 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
bad tree block 268435456, bytenr mismatch, want=268435456, have=0
kernel-shared/disk-io.c:485: write_tree_block: BUG_ON `1` triggered, value 1
/home/naota/src/btrfs-progs/mkfs.btrfs(+0x290ca) [0x55603cf7e0ca]
/home/naota/src/btrfs-progs/mkfs.btrfs(write_tree_block+0xa7) [0x55603cf80417]
/home/naota/src/btrfs-progs/mkfs.btrfs(__commit_transaction+0xe8) [0x55603cf9b7d8]
/home/naota/src/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x176) [0x55603cf9ba66]
/home/naota/src/btrfs-progs/mkfs.btrfs(main+0x2831) [0x55603cf67291]
/usr/lib64/libc.so.6(+0x271ee) [0x7f5ab706f1ee]
/usr/lib64/libc.so.6(__libc_start_main+0x89) [0x7f5ab706f2a9]
/home/naota/src/btrfs-progs/mkfs.btrfs(_start+0x25) [0x55603cf6a135]
/home/naota/tmp/test-mkfs.sh: line 13: 821886 Aborted (core dumped)
The crash happens because btrfs-progs failed to set proper allocation
pointer when a DUP block group is created over a conventional zone and a
sequential write required zone. In that case, the write pointer is
recovered from the last allocated extent in the block group. That
functionality is not well implemented in btrfs-progs side.
Implementing that functionality is relatively trivial because we can
copy the code from the kernel side. However, the code is quite out of
sync between the kernel side and user space side. So, this series first
refactors btrfs_load_block_group_zone_info() to make it easy to
integrate the code from the kernel side.
The main part is the last patch, which fixes allocation pointer
calculation for all the profiles.
While at it, this series also adds support for zone capacity and zone
activeness. But, zone activeness support is currently limited. It does
not attempt to check the zone active limit on the extent allocation,
because mkfs.btrfs should work without hitting the limit.
- v2
- Temporarily fails some profiles while adding supports in the patch
series.
- v1: https://lore.kernel.org/linux-btrfs/cover.1739756953.git.naohiro.aota@wdc.com/
Naohiro Aota (12):
btrfs-progs: introduce min_not_zero()
btrfs-progs: zoned: introduce a zone_info struct in
btrfs_load_block_group_zone_info
btrfs-progs: zoned: support zone capacity
btrfs-progs: zoned: load zone activeness
btrfs-progs: zoned: activate block group on loading
btrfs-progs: factor out btrfs_load_zone_info()
btrfs-progs: zoned: factor out SINGLE zone info loading
btrfs-progs: zoned: implement DUP zone info loading
btrfs-progs: zoned: implement RAID1 zone info loading
btrfs-progs: zoned: implement RAID0 zone info loading
btrfs-progs: implement RAID10 zone info loading
btrfs-progs: zoned: fix alloc_offset calculation for partly
conventional block groups
include/kerncompat.h | 10 +
kernel-shared/ctree.h | 3 +
kernel-shared/extent-tree.c | 2 +-
kernel-shared/zoned.c | 458 +++++++++++++++++++++++++++++++-----
kernel-shared/zoned.h | 3 +
5 files changed, 414 insertions(+), 62 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 01/12] btrfs-progs: introduce min_not_zero()
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
` (12 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Introduce min_not_zero() macro from the kernel.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
include/kerncompat.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/kerncompat.h b/include/kerncompat.h
index 42c84460c1e5..e95bb4a53342 100644
--- a/include/kerncompat.h
+++ b/include/kerncompat.h
@@ -127,6 +127,16 @@
}
#endif
+/**
+ * min_not_zero - return the minimum that is _not_ zero, unless both are zero
+ * @x: value1
+ * @y: value2
+ */
+#define min_not_zero(x, y) ({ \
+ typeof(x) __x = (x); \
+ typeof(y) __y = (y); \
+ __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
+
static inline void print_trace(void)
{
#ifndef BTRFS_DISABLE_BACKTRACE
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
` (11 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
This is an userland side update to follow kernel-side commit 15c12fcc50a1
("btrfs: zoned: introduce a zone_info struct in
btrfs_load_block_group_zone_info"). This will make the code unification easier.
This commit introduces zone_info structure to hold per-zone information in
btrfs_load_block_group_zone_info.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 46 ++++++++++++++++++++++---------------------
1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index fd8a776dc471..b06774482cfd 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -828,6 +828,11 @@ bool zoned_profile_supported(u64 map_type, bool rst)
return false;
}
+struct zone_info {
+ u64 physical;
+ u64 alloc_offset;
+};
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
@@ -837,10 +842,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct map_lookup *map;
u64 logical = cache->start;
u64 length = cache->length;
- u64 physical = 0;
+ struct zone_info *zone_info = NULL;
int ret = 0;
int i;
- u64 *alloc_offsets = NULL;
u64 last_alloc = 0;
u32 num_conventional = 0;
@@ -867,30 +871,29 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
}
map = container_of(ce, struct map_lookup, ce);
- alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets));
- if (!alloc_offsets) {
- error_msg(ERROR_MSG_MEMORY, "zone offsets");
+ zone_info = calloc(map->num_stripes, sizeof(*zone_info));
+ if (!zone_info) {
+ error_msg(ERROR_MSG_MEMORY, "zone info");
return -ENOMEM;
}
for (i = 0; i < map->num_stripes; i++) {
+ struct zone_info *info = &zone_info[i];
bool is_sequential;
struct blk_zone zone;
device = map->stripes[i].dev;
- physical = map->stripes[i].physical;
+ info->physical = map->stripes[i].physical;
if (device->fd == -1) {
- alloc_offsets[i] = WP_MISSING_DEV;
+ info->alloc_offset = WP_MISSING_DEV;
continue;
}
- is_sequential = btrfs_dev_is_sequential(device, physical);
- if (!is_sequential)
- num_conventional++;
-
+ is_sequential = btrfs_dev_is_sequential(device, info->physical);
if (!is_sequential) {
- alloc_offsets[i] = WP_CONVENTIONAL;
+ num_conventional++;
+ info->alloc_offset = WP_CONVENTIONAL;
continue;
}
@@ -898,28 +901,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
* The group is mapped to a sequential zone. Get the zone write
* pointer to determine the allocation offset within the zone.
*/
- WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size));
- zone = device->zone_info->zones[physical / fs_info->zone_size];
+ WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+ zone = device->zone_info->zones[info->physical / fs_info->zone_size];
switch (zone.cond) {
case BLK_ZONE_COND_OFFLINE:
case BLK_ZONE_COND_READONLY:
error(
"zoned: offline/readonly zone %llu on device %s (devid %llu)",
- physical / fs_info->zone_size, device->name,
+ info->physical / fs_info->zone_size, device->name,
device->devid);
- alloc_offsets[i] = WP_MISSING_DEV;
+ info->alloc_offset = WP_MISSING_DEV;
break;
case BLK_ZONE_COND_EMPTY:
- alloc_offsets[i] = 0;
+ info->alloc_offset = 0;
break;
case BLK_ZONE_COND_FULL:
- alloc_offsets[i] = fs_info->zone_size;
+ info->alloc_offset = fs_info->zone_size;
break;
default:
/* Partially used zone */
- alloc_offsets[i] =
- ((zone.wp - zone.start) << SECTOR_SHIFT);
+ info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
break;
}
}
@@ -943,7 +945,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
goto out;
}
- cache->alloc_offset = alloc_offsets[0];
+ cache->alloc_offset = zone_info[0].alloc_offset;
out:
/* An extent is allocated after the write pointer */
@@ -957,7 +959,7 @@ out:
if (!ret)
cache->write_offset = cache->alloc_offset;
- kfree(alloc_offsets);
+ kfree(zone_info);
return ret;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 03/12] btrfs-progs: zoned: support zone capacity
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
` (10 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
The userland tools did not load and use the zone capacity. Support it properly.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/ctree.h | 1 +
kernel-shared/extent-tree.c | 2 +-
kernel-shared/zoned.c | 9 ++++++++-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 8c923be96705..a6aa10a690bb 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -285,6 +285,7 @@ struct btrfs_block_group {
*/
u64 alloc_offset;
u64 write_offset;
+ u64 zone_capacity;
u64 global_root_id;
};
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 20eef4f3df7b..2b7a962f294b 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -300,7 +300,7 @@ again:
goto new_group;
if (btrfs_is_zoned(root->fs_info)) {
- if (cache->length - cache->alloc_offset < num)
+ if (cache->zone_capacity - cache->alloc_offset < num)
goto new_group;
*start_ret = cache->start + cache->alloc_offset;
cache->alloc_offset += num;
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index b06774482cfd..319ee88d5b06 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -776,7 +776,7 @@ static int calculate_alloc_pointer(struct btrfs_fs_info *fs_info,
length = fs_info->nodesize;
if (!(found_key.objectid >= cache->start &&
- found_key.objectid + length <= cache->start + cache->length)) {
+ found_key.objectid + length <= cache->start + cache->zone_capacity)) {
ret = -EUCLEAN;
goto out;
}
@@ -830,6 +830,7 @@ bool zoned_profile_supported(u64 map_type, bool rst)
struct zone_info {
u64 physical;
+ u64 capacity;
u64 alloc_offset;
};
@@ -894,6 +895,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
if (!is_sequential) {
num_conventional++;
info->alloc_offset = WP_CONVENTIONAL;
+ info->capacity = device->zone_info->zone_size;
continue;
}
@@ -904,6 +906,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
zone = device->zone_info->zones[info->physical / fs_info->zone_size];
+ info->capacity = (zone.capacity << SECTOR_SHIFT);
+
switch (zone.cond) {
case BLK_ZONE_COND_OFFLINE:
case BLK_ZONE_COND_READONLY:
@@ -927,6 +931,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
}
if (num_conventional > 0) {
+ /* Zone capacity is always zone size in emulation */
+ cache->zone_capacity = cache->length;
ret = calculate_alloc_pointer(fs_info, cache, &last_alloc);
if (ret || map->num_stripes == num_conventional) {
if (!ret)
@@ -946,6 +952,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
goto out;
}
cache->alloc_offset = zone_info[0].alloc_offset;
+ cache->zone_capacity = zone_info[0].capacity;
out:
/* An extent is allocated after the write pointer */
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 04/12] btrfs-progs: zoned: load zone activeness
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (2 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
` (9 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Properly load the zone activeness on the userland tool. Also, check if a device
has enough active zone limit to run btrfs.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/ctree.h | 1 +
kernel-shared/zoned.c | 77 +++++++++++++++++++++++++++++++++++++++----
kernel-shared/zoned.h | 3 ++
3 files changed, 75 insertions(+), 6 deletions(-)
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index a6aa10a690bb..f10142df80eb 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -368,6 +368,7 @@ struct btrfs_fs_info {
unsigned int allow_transid_mismatch:1;
unsigned int skip_leaf_item_checks:1;
unsigned int rebuilding_extent_tree:1;
+ unsigned int active_zone_tracking:1;
int transaction_aborted;
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 319ee88d5b06..a97466635ecb 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -23,6 +23,7 @@
#include <stdlib.h>
#include <string.h>
#include "kernel-lib/list.h"
+#include "kernel-lib/bitmap.h"
#include "kernel-shared/volumes.h"
#include "kernel-shared/zoned.h"
#include "kernel-shared/accessors.h"
@@ -57,6 +58,16 @@ static u64 emulated_zone_size = DEFAULT_EMULATED_ZONE_SIZE;
#define BTRFS_MAX_ZONE_SIZE (8ULL * SZ_1G)
#define BTRFS_MIN_ZONE_SIZE (SZ_4M)
+/*
+ * Minimum of active zones we need:
+ *
+ * - BTRFS_SUPER_MIRROR_MAX zones for superblock mirrors
+ * - 3 zones to ensure at least one zone per SYSTEM, META and DATA block group
+ * - 1 zone for tree-log dedicated block group
+ * - 1 zone for relocation
+ */
+#define BTRFS_MIN_ACTIVE_ZONES (BTRFS_SUPER_MIRROR_MAX + 5)
+
static int btrfs_get_dev_zone_info(struct btrfs_device *device);
enum btrfs_zoned_model zoned_model(const char *file)
@@ -132,6 +143,18 @@ static u64 max_zone_append_size(const char *file)
return strtoull((const char *)chunk, NULL, 10);
}
+static unsigned int max_active_zone_count(const char *file)
+{
+ char buf[32];
+ int ret;
+
+ ret = device_get_queue_param(file, "max_active_zones", buf, sizeof(buf));
+ if (ret <= 0)
+ return 0;
+
+ return strtoul((const char *)buf, NULL, 10);
+}
+
#ifdef BTRFS_ZONED
/*
* Emulate blkdev_report_zones() for a non-zoned device. It slices up the block
@@ -273,7 +296,8 @@ static int report_zones(int fd, const char *file,
struct stat st;
struct blk_zone_report *rep;
struct blk_zone *zone;
- unsigned int i, n = 0;
+ unsigned int i, nreported = 0, nactive = 0;
+ unsigned int max_active_zones;
int ret;
/*
@@ -336,6 +360,20 @@ static int report_zones(int fd, const char *file,
exit(1);
}
+ zinfo->active_zones = bitmap_zalloc(zinfo->nr_zones);
+ if (!zinfo->active_zones) {
+ error_msg(ERROR_MSG_MEMORY, "active zone bitmap");
+ exit(1);
+ }
+
+ max_active_zones = max_active_zone_count(file);
+ if (max_active_zones && max_active_zones < BTRFS_MIN_ACTIVE_ZONES) {
+ error("zoned: %s: max active zones %u is too small, need at least %u active zones",
+ file, max_active_zones, BTRFS_MIN_ACTIVE_ZONES);
+ exit(1);
+ }
+ zinfo->max_active_zones = max_active_zones;
+
/* Allocate a zone report */
rep_size = sizeof(struct blk_zone_report) +
sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES;
@@ -347,7 +385,7 @@ static int report_zones(int fd, const char *file,
/* Get zone information */
zone = (struct blk_zone *)(rep + 1);
- while (n < zinfo->nr_zones) {
+ while (nreported < zinfo->nr_zones) {
memset(rep, 0, rep_size);
rep->sector = sector;
rep->nr_zones = BTRFS_REPORT_NR_ZONES;
@@ -374,17 +412,36 @@ static int report_zones(int fd, const char *file,
break;
for (i = 0; i < rep->nr_zones; i++) {
- if (n >= zinfo->nr_zones)
+ if (nreported >= zinfo->nr_zones)
break;
- memcpy(&zinfo->zones[n], &zone[i],
+ memcpy(&zinfo->zones[nreported], &zone[i],
sizeof(struct blk_zone));
- n++;
+ switch (zone[i].cond) {
+ case BLK_ZONE_COND_EMPTY:
+ break;
+ case BLK_ZONE_COND_IMP_OPEN:
+ case BLK_ZONE_COND_EXP_OPEN:
+ case BLK_ZONE_COND_CLOSED:
+ set_bit(nreported, zinfo->active_zones);
+ nactive++;
+ break;
+ }
+ nreported++;
}
sector = zone[rep->nr_zones - 1].start +
zone[rep->nr_zones - 1].len;
}
+ if (max_active_zones) {
+ if (nactive > max_active_zones) {
+ error("zoned: %u active zones on %s exceeds max_active_zones %u",
+ nactive, file, max_active_zones);
+ exit(1);
+ }
+ zinfo->active_zones_left = max_active_zones - nactive;
+ }
+
kfree(rep);
return 0;
@@ -1080,6 +1137,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
static int btrfs_get_dev_zone_info(struct btrfs_device *device)
{
struct btrfs_fs_info *fs_info = device->fs_info;
+ int ret;
/*
* Cannot use btrfs_is_zoned here, since fs_info::zone_size might not
@@ -1091,7 +1149,14 @@ static int btrfs_get_dev_zone_info(struct btrfs_device *device)
if (device->zone_info)
return 0;
- return btrfs_get_zone_info(device->fd, device->name, &device->zone_info);
+ ret = btrfs_get_zone_info(device->fd, device->name, &device->zone_info);
+ if (ret)
+ return ret;
+
+ if (device->zone_info->max_active_zones)
+ fs_info->active_zone_tracking = 1;
+
+ return 0;
}
int btrfs_get_zone_info(int fd, const char *file,
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index c593571c4b69..d004ff16f198 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -72,7 +72,10 @@ struct btrfs_zoned_device_info {
enum btrfs_zoned_model model;
u64 zone_size;
u32 nr_zones;
+ unsigned int max_active_zones;
struct blk_zone *zones;
+ atomic_t active_zones_left;
+ unsigned long *active_zones;
bool emulated;
};
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 05/12] btrfs-progs: zoned: activate block group on loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (3 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
` (8 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Introduce "zone_is_active" member to struct btrfs_block_group and activate it
on loading a block group.
Note that activeness check for the extent allocation is currently not
implemented. The activeness checking requires to activate a non-active block
group on the extent allocation, which also require finishing a zone in the case
of hitting the active zone limit. Since mkfs should not hit the limit,
implementing the zone finishing code would not be necessary at the moment.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/ctree.h | 1 +
kernel-shared/zoned.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index f10142df80eb..da0635d567dc 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -286,6 +286,7 @@ struct btrfs_block_group {
u64 alloc_offset;
u64 write_offset;
u64 zone_capacity;
+ bool zone_is_active;
u64 global_root_id;
};
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index a97466635ecb..ee6c4ee61e4a 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -901,6 +901,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
u64 logical = cache->start;
u64 length = cache->length;
struct zone_info *zone_info = NULL;
+ unsigned long *active = NULL;
int ret = 0;
int i;
u64 last_alloc = 0;
@@ -935,6 +936,13 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
return -ENOMEM;
}
+ active = bitmap_zalloc(map->num_stripes);
+ if (!active) {
+ free(zone_info);
+ error_msg(ERROR_MSG_MEMORY, "active bitmap");
+ return -ENOMEM;
+ }
+
for (i = 0; i < map->num_stripes; i++) {
struct zone_info *info = &zone_info[i];
bool is_sequential;
@@ -948,6 +956,10 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
continue;
}
+ /* Consider a zone as active if we can allow any number of active zones. */
+ if (!device->zone_info->max_active_zones)
+ set_bit(i, active);
+
is_sequential = btrfs_dev_is_sequential(device, info->physical);
if (!is_sequential) {
num_conventional++;
@@ -983,6 +995,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
default:
/* Partially used zone */
info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
+ set_bit(i, active);
break;
}
}
@@ -1008,8 +1021,10 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
goto out;
}
+ /* SINGLE profile case. */
cache->alloc_offset = zone_info[0].alloc_offset;
cache->zone_capacity = zone_info[0].capacity;
+ cache->zone_is_active = test_bit(0, active);
out:
/* An extent is allocated after the write pointer */
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 06/12] btrfs-progs: factor out btrfs_load_zone_info()
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (4 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
` (7 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Now that, we have zone capacity and (basic) zone activeness support. It's time
to factor out btrfs_load_zone_info() as same as the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 124 ++++++++++++++++++++++++------------------
1 file changed, 71 insertions(+), 53 deletions(-)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index ee6c4ee61e4a..4045cf0d2b98 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -891,10 +891,76 @@ struct zone_info {
u64 alloc_offset;
};
+static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
+ struct zone_info *info, unsigned long *active,
+ struct map_lookup *map)
+{
+ struct btrfs_device *device;
+ struct blk_zone zone;
+
+ info->physical = map->stripes[zone_idx].physical;
+
+ device = map->stripes[zone_idx].dev;
+
+ if (device->fd == -1) {
+ info->alloc_offset = WP_MISSING_DEV;
+ return 0;
+ }
+
+ /* Consider a zone as active if we can allow any number of active zones. */
+ if (!device->zone_info->max_active_zones)
+ set_bit(zone_idx, active);
+
+ if (!btrfs_dev_is_sequential(device, info->physical)) {
+ info->alloc_offset = WP_CONVENTIONAL;
+ info->capacity = device->zone_info->zone_size;
+ return 0;
+ }
+
+ /*
+ * The group is mapped to a sequential zone. Get the zone write
+ * pointer to determine the allocation offset within the zone.
+ */
+ WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+ zone = device->zone_info->zones[info->physical / fs_info->zone_size];
+
+ if (zone.type == BLK_ZONE_TYPE_CONVENTIONAL) {
+ error("zoned: unexpected conventional zone %llu on device %s (devid %llu)",
+ zone.start << SECTOR_SHIFT, device->name,
+ device->devid);
+ return -EIO;
+ }
+
+ info->capacity = (zone.capacity << SECTOR_SHIFT);
+
+ switch (zone.cond) {
+ case BLK_ZONE_COND_OFFLINE:
+ case BLK_ZONE_COND_READONLY:
+ error(
+ "zoned: offline/readonly zone %llu on device %s (devid %llu)",
+ info->physical / fs_info->zone_size, device->name,
+ device->devid);
+ info->alloc_offset = WP_MISSING_DEV;
+ break;
+ case BLK_ZONE_COND_EMPTY:
+ info->alloc_offset = 0;
+ break;
+ case BLK_ZONE_COND_FULL:
+ info->alloc_offset = fs_info->zone_size;
+ break;
+ default:
+ /* Partially used zone */
+ info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
+ set_bit(zone_idx, active);
+ break;
+ }
+
+ return 0;
+}
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
- struct btrfs_device *device;
struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
struct cache_extent *ce;
struct map_lookup *map;
@@ -944,60 +1010,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
}
for (i = 0; i < map->num_stripes; i++) {
- struct zone_info *info = &zone_info[i];
- bool is_sequential;
- struct blk_zone zone;
-
- device = map->stripes[i].dev;
- info->physical = map->stripes[i].physical;
-
- if (device->fd == -1) {
- info->alloc_offset = WP_MISSING_DEV;
- continue;
- }
-
- /* Consider a zone as active if we can allow any number of active zones. */
- if (!device->zone_info->max_active_zones)
- set_bit(i, active);
+ ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map);
+ if (ret)
+ goto out;
- is_sequential = btrfs_dev_is_sequential(device, info->physical);
- if (!is_sequential) {
+ if (zone_info[i].alloc_offset == WP_CONVENTIONAL)
num_conventional++;
- info->alloc_offset = WP_CONVENTIONAL;
- info->capacity = device->zone_info->zone_size;
- continue;
- }
-
- /*
- * The group is mapped to a sequential zone. Get the zone write
- * pointer to determine the allocation offset within the zone.
- */
- WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
- zone = device->zone_info->zones[info->physical / fs_info->zone_size];
-
- info->capacity = (zone.capacity << SECTOR_SHIFT);
-
- switch (zone.cond) {
- case BLK_ZONE_COND_OFFLINE:
- case BLK_ZONE_COND_READONLY:
- error(
- "zoned: offline/readonly zone %llu on device %s (devid %llu)",
- info->physical / fs_info->zone_size, device->name,
- device->devid);
- info->alloc_offset = WP_MISSING_DEV;
- break;
- case BLK_ZONE_COND_EMPTY:
- info->alloc_offset = 0;
- break;
- case BLK_ZONE_COND_FULL:
- info->alloc_offset = fs_info->zone_size;
- break;
- default:
- /* Partially used zone */
- info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
- set_bit(i, active);
- break;
- }
}
if (num_conventional > 0) {
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (5 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
` (6 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Currently, the userland tool only considers the SINGLE profile, which make it
fail when a DUP block group is created over one conventional zone and one
sequential required zone.
Before adding the other profiles support, let's factor out per-profile code
(actually, SINGLE only) into functions just like as the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 47 +++++++++++++++++++++++++++++++++++++++----
1 file changed, 43 insertions(+), 4 deletions(-)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 4045cf0d2b98..e3240714b415 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -958,6 +958,26 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
return 0;
}
+static int btrfs_load_block_group_single(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct zone_info *info,
+ unsigned long *active)
+{
+ if (info->alloc_offset == WP_MISSING_DEV) {
+ btrfs_err(fs_info,
+ "zoned: cannot recover write pointer for zone %llu",
+ info->physical);
+ return -EIO;
+ }
+
+ bg->alloc_offset = info->alloc_offset;
+ bg->zone_capacity = info->capacity;
+ if (test_bit(0, active))
+ bg->zone_is_active = 1;
+ return 0;
+}
+
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
@@ -972,6 +992,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
int i;
u64 last_alloc = 0;
u32 num_conventional = 0;
+ u64 profile;
if (!btrfs_is_zoned(fs_info))
return 0;
@@ -1039,10 +1060,28 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
goto out;
}
- /* SINGLE profile case. */
- cache->alloc_offset = zone_info[0].alloc_offset;
- cache->zone_capacity = zone_info[0].capacity;
- cache->zone_is_active = test_bit(0, active);
+
+ profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+ switch (profile) {
+ case 0: /* single */
+ ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
+ break;
+ case BTRFS_BLOCK_GROUP_DUP:
+ case BTRFS_BLOCK_GROUP_RAID1:
+ case BTRFS_BLOCK_GROUP_RAID1C3:
+ case BTRFS_BLOCK_GROUP_RAID1C4:
+ case BTRFS_BLOCK_GROUP_RAID0:
+ case BTRFS_BLOCK_GROUP_RAID10:
+ /* Temporarily fails these case, until following commits. */
+ fallthrough;
+ case BTRFS_BLOCK_GROUP_RAID5:
+ case BTRFS_BLOCK_GROUP_RAID6:
+ default:
+ error("zoned: profile %s not yet supported",
+ btrfs_bg_type_to_raid_name(map->type));
+ ret = -EINVAL;
+ goto out;
+ }
out:
/* An extent is allocated after the write pointer */
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 08/12] btrfs-progs: zoned: implement DUP zone info loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (6 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
` (5 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
DUP support is added like the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index e3240714b415..f0a44587679b 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -977,6 +977,46 @@ static int btrfs_load_block_group_single(struct btrfs_fs_info *fs_info,
return 0;
}
+static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct map_lookup *map,
+ struct zone_info *zone_info,
+ unsigned long *active)
+{
+ if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+ btrfs_err(fs_info, "zoned: data DUP profile needs raid-stripe-tree");
+ return -EINVAL;
+ }
+
+ bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
+
+ if (zone_info[0].alloc_offset == WP_MISSING_DEV) {
+ btrfs_err(fs_info,
+ "zoned: cannot recover write pointer for zone %llu",
+ zone_info[0].physical);
+ return -EIO;
+ }
+ if (zone_info[1].alloc_offset == WP_MISSING_DEV) {
+ btrfs_err(fs_info,
+ "zoned: cannot recover write pointer for zone %llu",
+ zone_info[1].physical);
+ return -EIO;
+ }
+ if (zone_info[0].alloc_offset != zone_info[1].alloc_offset) {
+ btrfs_err(fs_info,
+ "zoned: write pointer offset mismatch of zones in DUP profile");
+ return -EIO;
+ }
+
+ if (test_bit(0, active) != test_bit(1, active)) {
+ return -EIO;
+ } else if (test_bit(0, active)) {
+ bg->zone_is_active = 1;
+ }
+
+ bg->alloc_offset = zone_info[0].alloc_offset;
+ return 0;
+}
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
@@ -1067,6 +1107,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
break;
case BTRFS_BLOCK_GROUP_DUP:
+ ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active);
+ break;
case BTRFS_BLOCK_GROUP_RAID1:
case BTRFS_BLOCK_GROUP_RAID1C3:
case BTRFS_BLOCK_GROUP_RAID1C4:
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 09/12] btrfs-progs: zoned: implement RAID1 zone info loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (7 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
` (4 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Implement it just like the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 46 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index f0a44587679b..e3ee1dc941dc 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1018,6 +1018,50 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
return 0;
}
+static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct map_lookup *map,
+ struct zone_info *zone_info,
+ unsigned long *active)
+{
+ int i;
+
+ if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+ btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+ btrfs_bg_type_to_raid_name(map->type));
+ return -EINVAL;
+ }
+
+ /* In case a device is missing we have a cap of 0, so don't use it. */
+ bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
+
+ for (i = 0; i < map->num_stripes; i++) {
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+ zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ continue;
+
+ if (zone_info[0].alloc_offset != zone_info[i].alloc_offset) {
+ btrfs_err(fs_info,
+ "zoned: write pointer offset mismatch of zones in %s profile",
+ btrfs_bg_type_to_raid_name(map->type));
+ return -EIO;
+ }
+ if (test_bit(0, active) != test_bit(i, active)) {
+ return -EIO;
+ } else {
+ if (test_bit(0, active))
+ bg->zone_is_active = 1;
+ }
+ }
+
+ if (zone_info[0].alloc_offset != WP_MISSING_DEV)
+ bg->alloc_offset = zone_info[0].alloc_offset;
+ else
+ bg->alloc_offset = zone_info[i - 1].alloc_offset;
+
+ return 0;
+}
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
@@ -1112,6 +1156,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
case BTRFS_BLOCK_GROUP_RAID1:
case BTRFS_BLOCK_GROUP_RAID1C3:
case BTRFS_BLOCK_GROUP_RAID1C4:
+ ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
+ break;
case BTRFS_BLOCK_GROUP_RAID0:
case BTRFS_BLOCK_GROUP_RAID10:
/* Temporarily fails these case, until following commits. */
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 10/12] btrfs-progs: zoned: implement RAID0 zone info loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (8 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
` (3 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Implement it just like the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index e3ee1dc941dc..10e59b837efd 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1062,6 +1062,36 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
return 0;
}
+static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct map_lookup *map,
+ struct zone_info *zone_info,
+ unsigned long *active)
+{
+ if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+ btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+ btrfs_bg_type_to_raid_name(map->type));
+ return -EINVAL;
+ }
+
+ for (int i = 0; i < map->num_stripes; i++) {
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+ zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ continue;
+
+ if (test_bit(0, active) != test_bit(i, active)) {
+ return -EIO;
+ } else {
+ if (test_bit(0, active))
+ bg->zone_is_active = 1;
+ }
+ bg->zone_capacity += zone_info[i].capacity;
+ bg->alloc_offset += zone_info[i].alloc_offset;
+ }
+
+ return 0;
+}
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
@@ -1159,6 +1189,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
break;
case BTRFS_BLOCK_GROUP_RAID0:
+ ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
+ break;
case BTRFS_BLOCK_GROUP_RAID10:
/* Temporarily fails these case, until following commits. */
fallthrough;
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 11/12] btrfs-progs: implement RAID10 zone info loading
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (9 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
` (2 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Just same as the kernel side.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 37 +++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 10e59b837efd..484bade1d2ed 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1092,6 +1092,39 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
return 0;
}
+static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct map_lookup *map,
+ struct zone_info *zone_info,
+ unsigned long *active)
+{
+ if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+ btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+ btrfs_bg_type_to_raid_name(map->type));
+ return -EINVAL;
+ }
+
+ for (int i = 0; i < map->num_stripes; i++) {
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+ zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ continue;
+
+ if (test_bit(0, active) != test_bit(i, active)) {
+ return -EIO;
+ } else {
+ if (test_bit(0, active))
+ bg->zone_is_active = 1;
+ }
+
+ if ((i % map->sub_stripes) == 0) {
+ bg->zone_capacity += zone_info[i].capacity;
+ bg->alloc_offset += zone_info[i].alloc_offset;
+ }
+ }
+
+ return 0;
+}
+
int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *cache)
{
@@ -1192,8 +1225,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
break;
case BTRFS_BLOCK_GROUP_RAID10:
- /* Temporarily fails these case, until following commits. */
- fallthrough;
+ ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active);
+ break;
case BTRFS_BLOCK_GROUP_RAID5:
case BTRFS_BLOCK_GROUP_RAID6:
default:
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (10 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
@ 2025-02-19 7:57 ` Naohiro Aota
2025-02-19 16:58 ` Johannes Thumshirn
2025-02-19 16:58 ` [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Johannes Thumshirn
2025-03-05 13:20 ` David Sterba
13 siblings, 1 reply; 17+ messages in thread
From: Naohiro Aota @ 2025-02-19 7:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
When one of two zones composing a DUP block group is a conventional zone, we
have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of course,
not match the write pointer of the other zone, and fails that block group.
This commit solves that issue by properly recovering the emulated write pointer
from the last allocated extent. The offset for the SINGLE, DUP, and RAID1 are
straight-forward: it is same as the end of last allocated extent. The RAID0 and
RAID10 are a bit tricky that we need to do the math of striping.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
kernel-shared/zoned.c | 65 +++++++++++++++++++++++++++++++++----------
1 file changed, 51 insertions(+), 14 deletions(-)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 484bade1d2ed..d96311af70b2 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -981,7 +981,7 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *bg,
struct map_lookup *map,
struct zone_info *zone_info,
- unsigned long *active)
+ unsigned long *active, u64 last_alloc)
{
if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
btrfs_err(fs_info, "zoned: data DUP profile needs raid-stripe-tree");
@@ -1002,6 +1002,12 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
zone_info[1].physical);
return -EIO;
}
+
+ if (zone_info[0].alloc_offset == WP_CONVENTIONAL)
+ zone_info[0].alloc_offset = last_alloc;
+ if (zone_info[1].alloc_offset == WP_CONVENTIONAL)
+ zone_info[1].alloc_offset = last_alloc;
+
if (zone_info[0].alloc_offset != zone_info[1].alloc_offset) {
btrfs_err(fs_info,
"zoned: write pointer offset mismatch of zones in DUP profile");
@@ -1022,7 +1028,7 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *bg,
struct map_lookup *map,
struct zone_info *zone_info,
- unsigned long *active)
+ unsigned long *active, u64 last_alloc)
{
int i;
@@ -1036,9 +1042,10 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
for (i = 0; i < map->num_stripes; i++) {
- if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
- zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV)
continue;
+ if (zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ zone_info[i].alloc_offset = last_alloc;
if (zone_info[0].alloc_offset != zone_info[i].alloc_offset) {
btrfs_err(fs_info,
@@ -1066,7 +1073,7 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *bg,
struct map_lookup *map,
struct zone_info *zone_info,
- unsigned long *active)
+ unsigned long *active, u64 last_alloc)
{
if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
@@ -1075,9 +1082,24 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
}
for (int i = 0; i < map->num_stripes; i++) {
- if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
- zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV)
continue;
+ if (zone_info[i].alloc_offset == WP_CONVENTIONAL) {
+ u64 stripe_nr, full_stripe_nr;
+ u64 stripe_offset;
+ int stripe_index;
+
+ stripe_nr = last_alloc / map->stripe_len;
+ stripe_offset = stripe_nr * map->stripe_len;
+ full_stripe_nr = stripe_nr / map->num_stripes;
+ stripe_index = stripe_nr % map->num_stripes;
+
+ zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len;
+ if (stripe_index > i)
+ zone_info[i].alloc_offset += map->stripe_len;
+ else if (stripe_index == i)
+ zone_info[i].alloc_offset += (last_alloc - stripe_offset);
+ }
if (test_bit(0, active) != test_bit(i, active)) {
return -EIO;
@@ -1096,7 +1118,7 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *bg,
struct map_lookup *map,
struct zone_info *zone_info,
- unsigned long *active)
+ unsigned long *active, u64 last_alloc)
{
if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
@@ -1105,9 +1127,24 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
}
for (int i = 0; i < map->num_stripes; i++) {
- if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
- zone_info[i].alloc_offset == WP_CONVENTIONAL)
+ if (zone_info[i].alloc_offset == WP_MISSING_DEV)
continue;
+ if (zone_info[i].alloc_offset == WP_CONVENTIONAL) {
+ u64 stripe_nr, full_stripe_nr;
+ u64 stripe_offset;
+ int stripe_index;
+
+ stripe_nr = last_alloc / map->stripe_len;
+ stripe_offset = stripe_nr * map->stripe_len;
+ full_stripe_nr = stripe_nr / (map->num_stripes / map->sub_stripes);
+ stripe_index = stripe_nr % (map->num_stripes / map->sub_stripes);
+
+ zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len;
+ if (stripe_index > (i / map->sub_stripes))
+ zone_info[i].alloc_offset += map->stripe_len;
+ else if (stripe_index == (i / map->sub_stripes))
+ zone_info[i].alloc_offset += (last_alloc - stripe_offset);
+ }
if (test_bit(0, active) != test_bit(i, active)) {
return -EIO;
@@ -1214,18 +1251,18 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
break;
case BTRFS_BLOCK_GROUP_DUP:
- ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active);
+ ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active, last_alloc);
break;
case BTRFS_BLOCK_GROUP_RAID1:
case BTRFS_BLOCK_GROUP_RAID1C3:
case BTRFS_BLOCK_GROUP_RAID1C4:
- ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
+ ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active, last_alloc);
break;
case BTRFS_BLOCK_GROUP_RAID0:
- ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
+ ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active, last_alloc);
break;
case BTRFS_BLOCK_GROUP_RAID10:
- ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active);
+ ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active, last_alloc);
break;
case BTRFS_BLOCK_GROUP_RAID5:
case BTRFS_BLOCK_GROUP_RAID6:
--
2.48.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups
2025-02-19 7:57 ` [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
@ 2025-02-19 16:58 ` Johannes Thumshirn
2025-02-20 5:06 ` Naohiro Aota
0 siblings, 1 reply; 17+ messages in thread
From: Johannes Thumshirn @ 2025-02-19 16:58 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org
On 19.02.25 09:00, Naohiro Aota wrote:
> When one of two zones composing a DUP block group is a conventional zone, we
> have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of course,
> not match the write pointer of the other zone, and fails that block group.
>
> This commit solves that issue by properly recovering the emulated write pointer
> from the last allocated extent. The offset for the SINGLE, DUP, and RAID1 are
> straight-forward: it is same as the end of last allocated extent. The RAID0 and
> RAID10 are a bit tricky that we need to do the math of striping.
I wonder if we need this patch on the kernel side as well. Can happen
there too.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (11 preceding siblings ...)
2025-02-19 7:57 ` [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
@ 2025-02-19 16:58 ` Johannes Thumshirn
2025-03-05 13:20 ` David Sterba
13 siblings, 0 replies; 17+ messages in thread
From: Johannes Thumshirn @ 2025-02-19 16:58 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs@vger.kernel.org
Looks good to me,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups
2025-02-19 16:58 ` Johannes Thumshirn
@ 2025-02-20 5:06 ` Naohiro Aota
0 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2025-02-20 5:06 UTC (permalink / raw)
To: Johannes Thumshirn, Naohiro Aota, linux-btrfs@vger.kernel.org
On Thu Feb 20, 2025 at 1:58 AM JST, Johannes Thumshirn wrote:
> On 19.02.25 09:00, Naohiro Aota wrote:
>> When one of two zones composing a DUP block group is a conventional zone, we
>> have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of course,
>> not match the write pointer of the other zone, and fails that block group.
>>
>> This commit solves that issue by properly recovering the emulated write pointer
>> from the last allocated extent. The offset for the SINGLE, DUP, and RAID1 are
>> straight-forward: it is same as the end of last allocated extent. The RAID0 and
>> RAID10 are a bit tricky that we need to do the math of striping.
>
> I wonder if we need this patch on the kernel side as well. Can happen
> there too.
Yes. I'm going to apply this one to kernel side too.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
` (12 preceding siblings ...)
2025-02-19 16:58 ` [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Johannes Thumshirn
@ 2025-03-05 13:20 ` David Sterba
13 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2025-03-05 13:20 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs
On Wed, Feb 19, 2025 at 04:57:44PM +0900, Naohiro Aota wrote:
> Running mkfs.btrfs on a null_blk device with the following setup fails
> as below.
>
> - zone size: 64MB
> - zone capacity: 64MB
> - number of conventional zones: 6
> - storage size: 2048MB
>
> + /home/naota/src/btrfs-progs/mkfs.btrfs -d single -m dup -f /dev/nullb0
> btrfs-progs v6.10
> See https://btrfs.readthedocs.io for more information.
>
> zoned: /dev/nullb0: host-managed device detected, setting zoned feature
> Resetting device zones /dev/nullb0 (32 zones) ...
> NOTE: several default settings have changed in version 5.15, please make sure
> this does not affect your deployments:
> - DUP for metadata (-m dup)
> - enabled no-holes (-O no-holes)
> - enabled free-space-tree (-R free-space-tree)
>
> bad tree block 268435456, bytenr mismatch, want=268435456, have=0
> kernel-shared/disk-io.c:485: write_tree_block: BUG_ON `1` triggered, value 1
> /home/naota/src/btrfs-progs/mkfs.btrfs(+0x290ca) [0x55603cf7e0ca]
> /home/naota/src/btrfs-progs/mkfs.btrfs(write_tree_block+0xa7) [0x55603cf80417]
> /home/naota/src/btrfs-progs/mkfs.btrfs(__commit_transaction+0xe8) [0x55603cf9b7d8]
> /home/naota/src/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x176) [0x55603cf9ba66]
> /home/naota/src/btrfs-progs/mkfs.btrfs(main+0x2831) [0x55603cf67291]
> /usr/lib64/libc.so.6(+0x271ee) [0x7f5ab706f1ee]
> /usr/lib64/libc.so.6(__libc_start_main+0x89) [0x7f5ab706f2a9]
> /home/naota/src/btrfs-progs/mkfs.btrfs(_start+0x25) [0x55603cf6a135]
> /home/naota/tmp/test-mkfs.sh: line 13: 821886 Aborted (core dumped)
>
> The crash happens because btrfs-progs failed to set proper allocation
> pointer when a DUP block group is created over a conventional zone and a
> sequential write required zone. In that case, the write pointer is
> recovered from the last allocated extent in the block group. That
> functionality is not well implemented in btrfs-progs side.
>
> Implementing that functionality is relatively trivial because we can
> copy the code from the kernel side. However, the code is quite out of
> sync between the kernel side and user space side. So, this series first
> refactors btrfs_load_block_group_zone_info() to make it easy to
> integrate the code from the kernel side.
>
> The main part is the last patch, which fixes allocation pointer
> calculation for all the profiles.
>
> While at it, this series also adds support for zone capacity and zone
> activeness. But, zone activeness support is currently limited. It does
> not attempt to check the zone active limit on the extent allocation,
> because mkfs.btrfs should work without hitting the limit.
>
> - v2
> - Temporarily fails some profiles while adding supports in the patch
> series.
> - v1: https://lore.kernel.org/linux-btrfs/cover.1739756953.git.naohiro.aota@wdc.com/
>
> Naohiro Aota (12):
> btrfs-progs: introduce min_not_zero()
> btrfs-progs: zoned: introduce a zone_info struct in
> btrfs_load_block_group_zone_info
> btrfs-progs: zoned: support zone capacity
> btrfs-progs: zoned: load zone activeness
> btrfs-progs: zoned: activate block group on loading
> btrfs-progs: factor out btrfs_load_zone_info()
> btrfs-progs: zoned: factor out SINGLE zone info loading
> btrfs-progs: zoned: implement DUP zone info loading
> btrfs-progs: zoned: implement RAID1 zone info loading
> btrfs-progs: zoned: implement RAID0 zone info loading
> btrfs-progs: implement RAID10 zone info loading
> btrfs-progs: zoned: fix alloc_offset calculation for partly
> conventional block groups
Added to devel, thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-03-05 13:20 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 7:57 [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
2025-02-19 7:57 ` [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
2025-02-19 16:58 ` Johannes Thumshirn
2025-02-20 5:06 ` Naohiro Aota
2025-02-19 16:58 ` [PATCH v2 00/12] btrfs-progs: zoned: support zone capacity and Johannes Thumshirn
2025-03-05 13:20 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox