public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] btrfs-progs: zoned: support zone capacity and
@ 2025-02-17  2:37 Naohiro Aota
  2025-02-17  2:37 ` [PATCH 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Running mkfs.btrfs on a null_blk device with the following setup fails
as below.

- zone size: 64MB
- zone capacity: 64MB
- number of conventional zones: 6
- storage size: 2048MB

    + /home/naota/src/btrfs-progs/mkfs.btrfs -d single -m dup -f /dev/nullb0
    btrfs-progs v6.10
    See https://btrfs.readthedocs.io for more information.

    zoned: /dev/nullb0: host-managed device detected, setting zoned feature
    Resetting device zones /dev/nullb0 (32 zones) ...
    NOTE: several default settings have changed in version 5.15, please make sure
          this does not affect your deployments:
          - DUP for metadata (-m dup)
          - enabled no-holes (-O no-holes)
          - enabled free-space-tree (-R free-space-tree)

    bad tree block 268435456, bytenr mismatch, want=268435456, have=0
    kernel-shared/disk-io.c:485: write_tree_block: BUG_ON `1` triggered, value 1
    /home/naota/src/btrfs-progs/mkfs.btrfs(+0x290ca) [0x55603cf7e0ca]
    /home/naota/src/btrfs-progs/mkfs.btrfs(write_tree_block+0xa7) [0x55603cf80417]
    /home/naota/src/btrfs-progs/mkfs.btrfs(__commit_transaction+0xe8) [0x55603cf9b7d8]
    /home/naota/src/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x176) [0x55603cf9ba66]
    /home/naota/src/btrfs-progs/mkfs.btrfs(main+0x2831) [0x55603cf67291]
    /usr/lib64/libc.so.6(+0x271ee) [0x7f5ab706f1ee]
    /usr/lib64/libc.so.6(__libc_start_main+0x89) [0x7f5ab706f2a9]
    /home/naota/src/btrfs-progs/mkfs.btrfs(_start+0x25) [0x55603cf6a135]
    /home/naota/tmp/test-mkfs.sh: line 13: 821886 Aborted                 (core dumped)

The crash happens because btrfs-progs failed to set proper allocation
pointer when a DUP block group is created over a conventional zone and a
sequential write required zone. In that case, the write pointer is
recovered from the last allocated extent in the block group. That
functionality is not well implemented in btrfs-progs side.

Implementing that functionality is relatively trivial because we can
copy the code from the kernel side. However, the code is quite out of
sync between the kernel side and user space side. So, this series first
refactors btrfs_load_block_group_zone_info() to make it easy to
integrate the code from the kernel side.

The main part is the last patch, which fixes allocation pointer
calculation for all the profiles.

While at it, this series also adds support for zone capacity and zone
activeness. But, zone activeness support is currently limited. It does
not attempt to check the zone active limit on the extent allocation,
because mkfs.btrfs should work without hitting the limit.

Naohiro Aota (12):
  btrfs-progs: introduce min_not_zero()
  btrfs-progs: zoned: introduce a zone_info struct in
    btrfs_load_block_group_zone_info
  btrfs-progs: zoned: support zone capacity
  btrfs-progs: zoned: load zone activeness
  btrfs-progs: zoned: activate block group on loading
  btrfs-progs: factor out btrfs_load_zone_info()
  btrfs-progs: zoned: factor out SINGLE zone info loading
  btrfs-progs: zoned: implement DUP zone info loading
  btrfs-progs: zoned: implement RAID1 zone info loading
  btrfs-progs: zoned: implement RAID0 zone info loading
  btrfs-progs: implement RAID10 zone info loading
  btrfs-progs: zoned: fix alloc_offset calculation for partly
    conventional block groups

 include/kerncompat.h        |  10 +
 kernel-shared/ctree.h       |   3 +
 kernel-shared/extent-tree.c |   2 +-
 kernel-shared/zoned.c       | 458 +++++++++++++++++++++++++++++++-----
 kernel-shared/zoned.h       |   3 +
 5 files changed, 414 insertions(+), 62 deletions(-)

--
2.48.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/12] btrfs-progs: introduce min_not_zero()
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Introduce min_not_zero() macro from the kernel.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 include/kerncompat.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/kerncompat.h b/include/kerncompat.h
index 42c84460c1e5..e95bb4a53342 100644
--- a/include/kerncompat.h
+++ b/include/kerncompat.h
@@ -127,6 +127,16 @@
 	}
 #endif
 
+/**
+ * min_not_zero - return the minimum that is _not_ zero, unless both are zero
+ * @x: value1
+ * @y: value2
+ */
+#define min_not_zero(x, y) ({			\
+	typeof(x) __x = (x);			\
+	typeof(y) __y = (y);			\
+	__x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
+
 static inline void print_trace(void)
 {
 #ifndef BTRFS_DISABLE_BACKTRACE
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
  2025-02-17  2:37 ` [PATCH 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

This is an userland side update to follow kernel-side commit 15c12fcc50a1
("btrfs: zoned: introduce a zone_info struct in
btrfs_load_block_group_zone_info"). This will make the code unification easier.

This commit introduces zone_info structure to hold per-zone information in
btrfs_load_block_group_zone_info.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 46 ++++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index fd8a776dc471..b06774482cfd 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -828,6 +828,11 @@ bool zoned_profile_supported(u64 map_type, bool rst)
 	return false;
 }
 
+struct zone_info {
+	u64 physical;
+	u64 alloc_offset;
+};
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -837,10 +842,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	struct map_lookup *map;
 	u64 logical = cache->start;
 	u64 length = cache->length;
-	u64 physical = 0;
+	struct zone_info *zone_info = NULL;
 	int ret = 0;
 	int i;
-	u64 *alloc_offsets = NULL;
 	u64 last_alloc = 0;
 	u32 num_conventional = 0;
 
@@ -867,30 +871,29 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	}
 	map = container_of(ce, struct map_lookup, ce);
 
-	alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets));
-	if (!alloc_offsets) {
-		error_msg(ERROR_MSG_MEMORY, "zone offsets");
+	zone_info = calloc(map->num_stripes, sizeof(*zone_info));
+	if (!zone_info) {
+		error_msg(ERROR_MSG_MEMORY, "zone info");
 		return -ENOMEM;
 	}
 
 	for (i = 0; i < map->num_stripes; i++) {
+		struct zone_info *info = &zone_info[i];
 		bool is_sequential;
 		struct blk_zone zone;
 
 		device = map->stripes[i].dev;
-		physical = map->stripes[i].physical;
+		info->physical = map->stripes[i].physical;
 
 		if (device->fd == -1) {
-			alloc_offsets[i] = WP_MISSING_DEV;
+			info->alloc_offset = WP_MISSING_DEV;
 			continue;
 		}
 
-		is_sequential = btrfs_dev_is_sequential(device, physical);
-		if (!is_sequential)
-			num_conventional++;
-
+		is_sequential = btrfs_dev_is_sequential(device, info->physical);
 		if (!is_sequential) {
-			alloc_offsets[i] = WP_CONVENTIONAL;
+			num_conventional++;
+			info->alloc_offset = WP_CONVENTIONAL;
 			continue;
 		}
 
@@ -898,28 +901,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		 * The group is mapped to a sequential zone. Get the zone write
 		 * pointer to determine the allocation offset within the zone.
 		 */
-		WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size));
-		zone = device->zone_info->zones[physical / fs_info->zone_size];
+		WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+		zone = device->zone_info->zones[info->physical / fs_info->zone_size];
 
 		switch (zone.cond) {
 		case BLK_ZONE_COND_OFFLINE:
 		case BLK_ZONE_COND_READONLY:
 			error(
 		"zoned: offline/readonly zone %llu on device %s (devid %llu)",
-			      physical / fs_info->zone_size, device->name,
+			      info->physical / fs_info->zone_size, device->name,
 			      device->devid);
-			alloc_offsets[i] = WP_MISSING_DEV;
+			info->alloc_offset = WP_MISSING_DEV;
 			break;
 		case BLK_ZONE_COND_EMPTY:
-			alloc_offsets[i] = 0;
+			info->alloc_offset = 0;
 			break;
 		case BLK_ZONE_COND_FULL:
-			alloc_offsets[i] = fs_info->zone_size;
+			info->alloc_offset = fs_info->zone_size;
 			break;
 		default:
 			/* Partially used zone */
-			alloc_offsets[i] =
-					((zone.wp - zone.start) << SECTOR_SHIFT);
+			info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
 			break;
 		}
 	}
@@ -943,7 +945,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 		goto out;
 	}
-	cache->alloc_offset = alloc_offsets[0];
+	cache->alloc_offset = zone_info[0].alloc_offset;
 
 out:
 	/* An extent is allocated after the write pointer */
@@ -957,7 +959,7 @@ out:
 	if (!ret)
 		cache->write_offset = cache->alloc_offset;
 
-	kfree(alloc_offsets);
+	kfree(zone_info);
 	return ret;
 }
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/12] btrfs-progs: zoned: support zone capacity
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
  2025-02-17  2:37 ` [PATCH 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
  2025-02-17  2:37 ` [PATCH 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

The userland tools did not load and use the zone capacity. Support it properly.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h       | 1 +
 kernel-shared/extent-tree.c | 2 +-
 kernel-shared/zoned.c       | 9 ++++++++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 8c923be96705..a6aa10a690bb 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -285,6 +285,7 @@ struct btrfs_block_group {
 	 */
 	u64 alloc_offset;
 	u64 write_offset;
+	u64 zone_capacity;
 
 	u64 global_root_id;
 };
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 20eef4f3df7b..2b7a962f294b 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -300,7 +300,7 @@ again:
 		goto new_group;
 
 	if (btrfs_is_zoned(root->fs_info)) {
-		if (cache->length - cache->alloc_offset < num)
+		if (cache->zone_capacity - cache->alloc_offset < num)
 			goto new_group;
 		*start_ret = cache->start + cache->alloc_offset;
 		cache->alloc_offset += num;
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index b06774482cfd..319ee88d5b06 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -776,7 +776,7 @@ static int calculate_alloc_pointer(struct btrfs_fs_info *fs_info,
 		length = fs_info->nodesize;
 
 	if (!(found_key.objectid >= cache->start &&
-	       found_key.objectid + length <= cache->start + cache->length)) {
+	       found_key.objectid + length <= cache->start + cache->zone_capacity)) {
 		ret = -EUCLEAN;
 		goto out;
 	}
@@ -830,6 +830,7 @@ bool zoned_profile_supported(u64 map_type, bool rst)
 
 struct zone_info {
 	u64 physical;
+	u64 capacity;
 	u64 alloc_offset;
 };
 
@@ -894,6 +895,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		if (!is_sequential) {
 			num_conventional++;
 			info->alloc_offset = WP_CONVENTIONAL;
+			info->capacity = device->zone_info->zone_size;
 			continue;
 		}
 
@@ -904,6 +906,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
 		zone = device->zone_info->zones[info->physical / fs_info->zone_size];
 
+		info->capacity = (zone.capacity << SECTOR_SHIFT);
+
 		switch (zone.cond) {
 		case BLK_ZONE_COND_OFFLINE:
 		case BLK_ZONE_COND_READONLY:
@@ -927,6 +931,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	}
 
 	if (num_conventional > 0) {
+		/* Zone capacity is always zone size in emulation */
+		cache->zone_capacity = cache->length;
 		ret = calculate_alloc_pointer(fs_info, cache, &last_alloc);
 		if (ret || map->num_stripes == num_conventional) {
 			if (!ret)
@@ -946,6 +952,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 	cache->alloc_offset = zone_info[0].alloc_offset;
+	cache->zone_capacity = zone_info[0].capacity;
 
 out:
 	/* An extent is allocated after the write pointer */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/12] btrfs-progs: zoned: load zone activeness
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (2 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Properly load the zone activeness on the userland tool. Also, check if a device
has enough active zone limit to run btrfs.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h |  1 +
 kernel-shared/zoned.c | 77 +++++++++++++++++++++++++++++++++++++++----
 kernel-shared/zoned.h |  3 ++
 3 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index a6aa10a690bb..f10142df80eb 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -368,6 +368,7 @@ struct btrfs_fs_info {
 	unsigned int allow_transid_mismatch:1;
 	unsigned int skip_leaf_item_checks:1;
 	unsigned int rebuilding_extent_tree:1;
+	unsigned int active_zone_tracking:1;
 
 	int transaction_aborted;
 
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 319ee88d5b06..a97466635ecb 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -23,6 +23,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include "kernel-lib/list.h"
+#include "kernel-lib/bitmap.h"
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/zoned.h"
 #include "kernel-shared/accessors.h"
@@ -57,6 +58,16 @@ static u64 emulated_zone_size = DEFAULT_EMULATED_ZONE_SIZE;
 #define BTRFS_MAX_ZONE_SIZE		(8ULL * SZ_1G)
 #define BTRFS_MIN_ZONE_SIZE		(SZ_4M)
 
+/*
+ * Minimum of active zones we need:
+ *
+ * - BTRFS_SUPER_MIRROR_MAX zones for superblock mirrors
+ * - 3 zones to ensure at least one zone per SYSTEM, META and DATA block group
+ * - 1 zone for tree-log dedicated block group
+ * - 1 zone for relocation
+ */
+#define BTRFS_MIN_ACTIVE_ZONES		(BTRFS_SUPER_MIRROR_MAX + 5)
+
 static int btrfs_get_dev_zone_info(struct btrfs_device *device);
 
 enum btrfs_zoned_model zoned_model(const char *file)
@@ -132,6 +143,18 @@ static u64 max_zone_append_size(const char *file)
 	return strtoull((const char *)chunk, NULL, 10);
 }
 
+static unsigned int max_active_zone_count(const char *file)
+{
+	char buf[32];
+	int ret;
+
+	ret = device_get_queue_param(file, "max_active_zones", buf, sizeof(buf));
+	if (ret <= 0)
+		return 0;
+
+	return strtoul((const char *)buf, NULL, 10);
+}
+
 #ifdef BTRFS_ZONED
 /*
  * Emulate blkdev_report_zones() for a non-zoned device. It slices up the block
@@ -273,7 +296,8 @@ static int report_zones(int fd, const char *file,
 	struct stat st;
 	struct blk_zone_report *rep;
 	struct blk_zone *zone;
-	unsigned int i, n = 0;
+	unsigned int i, nreported = 0, nactive = 0;
+	unsigned int max_active_zones;
 	int ret;
 
 	/*
@@ -336,6 +360,20 @@ static int report_zones(int fd, const char *file,
 		exit(1);
 	}
 
+	zinfo->active_zones = bitmap_zalloc(zinfo->nr_zones);
+	if (!zinfo->active_zones) {
+		error_msg(ERROR_MSG_MEMORY, "active zone bitmap");
+		exit(1);
+	}
+
+	max_active_zones = max_active_zone_count(file);
+	if (max_active_zones && max_active_zones < BTRFS_MIN_ACTIVE_ZONES) {
+		error("zoned: %s: max active zones %u is too small, need at least %u active zones",
+		      file, max_active_zones, BTRFS_MIN_ACTIVE_ZONES);
+		exit(1);
+	}
+	zinfo->max_active_zones = max_active_zones;
+
 	/* Allocate a zone report */
 	rep_size = sizeof(struct blk_zone_report) +
 		   sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES;
@@ -347,7 +385,7 @@ static int report_zones(int fd, const char *file,
 
 	/* Get zone information */
 	zone = (struct blk_zone *)(rep + 1);
-	while (n < zinfo->nr_zones) {
+	while (nreported < zinfo->nr_zones) {
 		memset(rep, 0, rep_size);
 		rep->sector = sector;
 		rep->nr_zones = BTRFS_REPORT_NR_ZONES;
@@ -374,17 +412,36 @@ static int report_zones(int fd, const char *file,
 			break;
 
 		for (i = 0; i < rep->nr_zones; i++) {
-			if (n >= zinfo->nr_zones)
+			if (nreported >= zinfo->nr_zones)
 				break;
-			memcpy(&zinfo->zones[n], &zone[i],
+			memcpy(&zinfo->zones[nreported], &zone[i],
 			       sizeof(struct blk_zone));
-			n++;
+			switch (zone[i].cond) {
+			case BLK_ZONE_COND_EMPTY:
+				break;
+			case BLK_ZONE_COND_IMP_OPEN:
+			case BLK_ZONE_COND_EXP_OPEN:
+			case BLK_ZONE_COND_CLOSED:
+				set_bit(nreported, zinfo->active_zones);
+				nactive++;
+				break;
+			}
+			nreported++;
 		}
 
 		sector = zone[rep->nr_zones - 1].start +
 			 zone[rep->nr_zones - 1].len;
 	}
 
+	if (max_active_zones) {
+		if (nactive > max_active_zones) {
+			error("zoned: %u active zones on %s exceeds max_active_zones %u",
+			      nactive, file, max_active_zones);
+			exit(1);
+		}
+		zinfo->active_zones_left = max_active_zones - nactive;
+	}
+
 	kfree(rep);
 
 	return 0;
@@ -1080,6 +1137,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
 static int btrfs_get_dev_zone_info(struct btrfs_device *device)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
+	int ret;
 
 	/*
 	 * Cannot use btrfs_is_zoned here, since fs_info::zone_size might not
@@ -1091,7 +1149,14 @@ static int btrfs_get_dev_zone_info(struct btrfs_device *device)
 	if (device->zone_info)
 		return 0;
 
-	return btrfs_get_zone_info(device->fd, device->name, &device->zone_info);
+	ret = btrfs_get_zone_info(device->fd, device->name, &device->zone_info);
+	if (ret)
+		return ret;
+
+	if (device->zone_info->max_active_zones)
+		fs_info->active_zone_tracking = 1;
+
+	return 0;
 }
 
 int btrfs_get_zone_info(int fd, const char *file,
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index c593571c4b69..d004ff16f198 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -72,7 +72,10 @@ struct btrfs_zoned_device_info {
 	enum btrfs_zoned_model	model;
 	u64			zone_size;
 	u32			nr_zones;
+	unsigned int            max_active_zones;
 	struct blk_zone		*zones;
+	atomic_t                active_zones_left;
+	unsigned long           *active_zones;
 	bool			emulated;
 };
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/12] btrfs-progs: zoned: activate block group on loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (3 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Introduce "zone_is_active" member to struct btrfs_block_group and activate it
on loading a block group.

Note that activeness check for the extent allocation is currently not
implemented. The activeness checking requires to activate a non-active block
group on the extent allocation, which also require finishing a zone in the case
of hitting the active zone limit. Since mkfs should not hit the limit,
implementing the zone finishing code would not be necessary at the moment.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h |  1 +
 kernel-shared/zoned.c | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index f10142df80eb..da0635d567dc 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -286,6 +286,7 @@ struct btrfs_block_group {
 	u64 alloc_offset;
 	u64 write_offset;
 	u64 zone_capacity;
+	bool zone_is_active;
 
 	u64 global_root_id;
 };
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index a97466635ecb..ee6c4ee61e4a 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -901,6 +901,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	u64 logical = cache->start;
 	u64 length = cache->length;
 	struct zone_info *zone_info = NULL;
+	unsigned long *active = NULL;
 	int ret = 0;
 	int i;
 	u64 last_alloc = 0;
@@ -935,6 +936,13 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		return -ENOMEM;
 	}
 
+	active = bitmap_zalloc(map->num_stripes);
+	if (!active) {
+		free(zone_info);
+		error_msg(ERROR_MSG_MEMORY, "active bitmap");
+		return -ENOMEM;
+	}
+
 	for (i = 0; i < map->num_stripes; i++) {
 		struct zone_info *info = &zone_info[i];
 		bool is_sequential;
@@ -948,6 +956,10 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 			continue;
 		}
 
+		/* Consider a zone as active if we can allow any number of active zones. */
+		if (!device->zone_info->max_active_zones)
+			set_bit(i, active);
+
 		is_sequential = btrfs_dev_is_sequential(device, info->physical);
 		if (!is_sequential) {
 			num_conventional++;
@@ -983,6 +995,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		default:
 			/* Partially used zone */
 			info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
+			set_bit(i, active);
 			break;
 		}
 	}
@@ -1008,8 +1021,10 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 		goto out;
 	}
+	/* SINGLE profile case. */
 	cache->alloc_offset = zone_info[0].alloc_offset;
 	cache->zone_capacity = zone_info[0].capacity;
+	cache->zone_is_active = test_bit(0, active);
 
 out:
 	/* An extent is allocated after the write pointer */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/12] btrfs-progs: factor out btrfs_load_zone_info()
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (4 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Now that, we have zone capacity and (basic) zone activeness support. It's time
to factor out btrfs_load_zone_info() as same as the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 124 ++++++++++++++++++++++++------------------
 1 file changed, 71 insertions(+), 53 deletions(-)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index ee6c4ee61e4a..4045cf0d2b98 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -891,10 +891,76 @@ struct zone_info {
 	u64 alloc_offset;
 };
 
+static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
+				struct zone_info *info, unsigned long *active,
+				struct map_lookup *map)
+{
+	struct btrfs_device *device;
+	struct blk_zone zone;
+
+	info->physical = map->stripes[zone_idx].physical;
+
+	device = map->stripes[zone_idx].dev;
+
+	if (device->fd == -1) {
+		info->alloc_offset = WP_MISSING_DEV;
+		return 0;
+	}
+
+	/* Consider a zone as active if we can allow any number of active zones. */
+	if (!device->zone_info->max_active_zones)
+		set_bit(zone_idx, active);
+
+	if (!btrfs_dev_is_sequential(device, info->physical)) {
+		info->alloc_offset = WP_CONVENTIONAL;
+		info->capacity = device->zone_info->zone_size;
+		return 0;
+	}
+
+	/*
+	 * The group is mapped to a sequential zone. Get the zone write
+	 * pointer to determine the allocation offset within the zone.
+	 */
+	WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+	zone = device->zone_info->zones[info->physical / fs_info->zone_size];
+
+	if (zone.type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		error("zoned: unexpected conventional zone %llu on device %s (devid %llu)",
+		      zone.start << SECTOR_SHIFT, device->name,
+		      device->devid);
+		return -EIO;
+	}
+
+	info->capacity = (zone.capacity << SECTOR_SHIFT);
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_OFFLINE:
+	case BLK_ZONE_COND_READONLY:
+		error(
+	"zoned: offline/readonly zone %llu on device %s (devid %llu)",
+		      info->physical / fs_info->zone_size, device->name,
+		      device->devid);
+		info->alloc_offset = WP_MISSING_DEV;
+		break;
+	case BLK_ZONE_COND_EMPTY:
+		info->alloc_offset = 0;
+		break;
+	case BLK_ZONE_COND_FULL:
+		info->alloc_offset = fs_info->zone_size;
+		break;
+	default:
+		/* Partially used zone */
+		info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
+		set_bit(zone_idx, active);
+		break;
+	}
+
+	return 0;
+}
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
-	struct btrfs_device *device;
 	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
 	struct cache_extent *ce;
 	struct map_lookup *map;
@@ -944,60 +1010,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	}
 
 	for (i = 0; i < map->num_stripes; i++) {
-		struct zone_info *info = &zone_info[i];
-		bool is_sequential;
-		struct blk_zone zone;
-
-		device = map->stripes[i].dev;
-		info->physical = map->stripes[i].physical;
-
-		if (device->fd == -1) {
-			info->alloc_offset = WP_MISSING_DEV;
-			continue;
-		}
-
-		/* Consider a zone as active if we can allow any number of active zones. */
-		if (!device->zone_info->max_active_zones)
-			set_bit(i, active);
+		ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map);
+		if (ret)
+			goto out;
 
-		is_sequential = btrfs_dev_is_sequential(device, info->physical);
-		if (!is_sequential) {
+		if (zone_info[i].alloc_offset == WP_CONVENTIONAL)
 			num_conventional++;
-			info->alloc_offset = WP_CONVENTIONAL;
-			info->capacity = device->zone_info->zone_size;
-			continue;
-		}
-
-		/*
-		 * The group is mapped to a sequential zone. Get the zone write
-		 * pointer to determine the allocation offset within the zone.
-		 */
-		WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
-		zone = device->zone_info->zones[info->physical / fs_info->zone_size];
-
-		info->capacity = (zone.capacity << SECTOR_SHIFT);
-
-		switch (zone.cond) {
-		case BLK_ZONE_COND_OFFLINE:
-		case BLK_ZONE_COND_READONLY:
-			error(
-		"zoned: offline/readonly zone %llu on device %s (devid %llu)",
-			      info->physical / fs_info->zone_size, device->name,
-			      device->devid);
-			info->alloc_offset = WP_MISSING_DEV;
-			break;
-		case BLK_ZONE_COND_EMPTY:
-			info->alloc_offset = 0;
-			break;
-		case BLK_ZONE_COND_FULL:
-			info->alloc_offset = fs_info->zone_size;
-			break;
-		default:
-			/* Partially used zone */
-			info->alloc_offset = ((zone.wp - zone.start) << SECTOR_SHIFT);
-			set_bit(i, active);
-			break;
-		}
 	}
 
 	if (num_conventional > 0) {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (5 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17 17:12   ` Johannes Thumshirn
  2025-02-17  2:37 ` [PATCH 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Currently, the userland tool only considers the SINGLE profile, which make it
fail when a DUP block group is created over one conventional zone and one
sequential required zone.

Before adding the other profiles support, let's factor out per-profile code
(actually, SINGLE only) into functions just like as the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 4045cf0d2b98..3bc7d6ba1924 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -958,6 +958,26 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 	return 0;
 }
 
+static int btrfs_load_block_group_single(struct btrfs_fs_info *fs_info,
+					 struct btrfs_block_group *bg,
+					 struct zone_info *info,
+					 unsigned long *active)
+{
+	if (info->alloc_offset == WP_MISSING_DEV) {
+		btrfs_err(fs_info,
+			"zoned: cannot recover write pointer for zone %llu",
+			info->physical);
+		return -EIO;
+	}
+
+	bg->alloc_offset = info->alloc_offset;
+	bg->zone_capacity = info->capacity;
+	if (test_bit(0, active))
+		bg->zone_is_active = 1;
+	return 0;
+}
+
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -972,6 +992,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	int i;
 	u64 last_alloc = 0;
 	u32 num_conventional = 0;
+	u64 profile;
 
 	if (!btrfs_is_zoned(fs_info))
 		return 0;
@@ -1039,10 +1060,20 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 		goto out;
 	}
-	/* SINGLE profile case. */
-	cache->alloc_offset = zone_info[0].alloc_offset;
-	cache->zone_capacity = zone_info[0].capacity;
-	cache->zone_is_active = test_bit(0, active);
+
+	profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+	switch (profile) {
+	case 0: /* single */
+		ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
+		break;
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID6:
+	default:
+		error("zoned: profile %s not yet supported",
+		      btrfs_bg_type_to_raid_name(map->type));
+		ret = -EINVAL;
+		goto out;
+	}
 
 out:
 	/* An extent is allocated after the write pointer */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/12] btrfs-progs: zoned: implement DUP zone info loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (6 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

DUP support is added like the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 3bc7d6ba1924..dd1ddd01cfba 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -977,6 +977,46 @@ static int btrfs_load_block_group_single(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
+				      struct btrfs_block_group *bg,
+				      struct map_lookup *map,
+				      struct zone_info *zone_info,
+				      unsigned long *active)
+{
+	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+		btrfs_err(fs_info, "zoned: data DUP profile needs raid-stripe-tree");
+		return -EINVAL;
+	}
+
+	bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
+
+	if (zone_info[0].alloc_offset == WP_MISSING_DEV) {
+		btrfs_err(fs_info,
+			  "zoned: cannot recover write pointer for zone %llu",
+			  zone_info[0].physical);
+		return -EIO;
+	}
+	if (zone_info[1].alloc_offset == WP_MISSING_DEV) {
+		btrfs_err(fs_info,
+			  "zoned: cannot recover write pointer for zone %llu",
+			  zone_info[1].physical);
+		return -EIO;
+	}
+	if (zone_info[0].alloc_offset != zone_info[1].alloc_offset) {
+		btrfs_err(fs_info,
+			  "zoned: write pointer offset mismatch of zones in DUP profile");
+		return -EIO;
+	}
+
+	if (test_bit(0, active) != test_bit(1, active)) {
+		return -EIO;
+	} else if (test_bit(0, active)) {
+		bg->zone_is_active = 1;
+	}
+
+	bg->alloc_offset = zone_info[0].alloc_offset;
+	return 0;
+}
 
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
@@ -1066,6 +1106,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	case 0: /* single */
 		ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
 		break;
+	case BTRFS_BLOCK_GROUP_DUP:
+		ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active);
+		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
 	case BTRFS_BLOCK_GROUP_RAID6:
 	default:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/12] btrfs-progs: zoned: implement RAID1 zone info loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (7 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Implement it just like the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 49 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index dd1ddd01cfba..e1cb57d938c5 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1018,6 +1018,50 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
+					struct btrfs_block_group *bg,
+					struct map_lookup *map,
+					struct zone_info *zone_info,
+					unsigned long *active)
+{
+	int i;
+
+	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+		btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+			  btrfs_bg_type_to_raid_name(map->type));
+		return -EINVAL;
+	}
+
+	/* In case a device is missing we have a cap of 0, so don't use it. */
+	bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
+
+	for (i = 0; i < map->num_stripes; i++) {
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+			continue;
+
+		if (zone_info[0].alloc_offset != zone_info[i].alloc_offset) {
+			btrfs_err(fs_info,
+			"zoned: write pointer offset mismatch of zones in %s profile",
+				  btrfs_bg_type_to_raid_name(map->type));
+			return -EIO;
+		}
+		if (test_bit(0, active) != test_bit(i, active)) {
+			return -EIO;
+		} else {
+			if (test_bit(0, active))
+				bg->zone_is_active = 1;
+		}
+	}
+
+	if (zone_info[0].alloc_offset != WP_MISSING_DEV)
+		bg->alloc_offset = zone_info[0].alloc_offset;
+	else
+		bg->alloc_offset = zone_info[i - 1].alloc_offset;
+
+	return 0;
+}
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -1109,6 +1153,11 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	case BTRFS_BLOCK_GROUP_DUP:
 		ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active);
 		break;
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+	case BTRFS_BLOCK_GROUP_RAID1C4:
+		ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
+		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
 	case BTRFS_BLOCK_GROUP_RAID6:
 	default:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/12] btrfs-progs: zoned: implement RAID0 zone info loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (8 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
  2025-02-17  2:37 ` [PATCH 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Implement it just like the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index e1cb57d938c5..66d76427d216 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1062,6 +1062,36 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
+					struct btrfs_block_group *bg,
+					struct map_lookup *map,
+					struct zone_info *zone_info,
+					unsigned long *active)
+{
+	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+		btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+			  btrfs_bg_type_to_raid_name(map->type));
+		return -EINVAL;
+	}
+
+	for (int i = 0; i < map->num_stripes; i++) {
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+			continue;
+
+		if (test_bit(0, active) != test_bit(i, active)) {
+			return -EIO;
+		} else {
+			if (test_bit(0, active))
+				bg->zone_is_active = 1;
+		}
+		bg->zone_capacity += zone_info[i].capacity;
+		bg->alloc_offset += zone_info[i].alloc_offset;
+	}
+
+	return 0;
+}
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -1158,6 +1188,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	case BTRFS_BLOCK_GROUP_RAID1C4:
 		ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
 		break;
+	case BTRFS_BLOCK_GROUP_RAID0:
+		ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
+		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
 	case BTRFS_BLOCK_GROUP_RAID6:
 	default:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/12] btrfs-progs: implement RAID10 zone info loading
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (9 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  2025-02-17  2:37 ` [PATCH 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Just same as the kernel side.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 66d76427d216..484bade1d2ed 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -1092,6 +1092,39 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
+					 struct btrfs_block_group *bg,
+					 struct map_lookup *map,
+					 struct zone_info *zone_info,
+					 unsigned long *active)
+{
+	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
+		btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+			  btrfs_bg_type_to_raid_name(map->type));
+		return -EINVAL;
+	}
+
+	for (int i = 0; i < map->num_stripes; i++) {
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
+		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+			continue;
+
+		if (test_bit(0, active) != test_bit(i, active)) {
+			return -EIO;
+		} else {
+			if (test_bit(0, active))
+				bg->zone_is_active = 1;
+		}
+
+		if ((i % map->sub_stripes) == 0) {
+			bg->zone_capacity += zone_info[i].capacity;
+			bg->alloc_offset += zone_info[i].alloc_offset;
+		}
+	}
+
+	return 0;
+}
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -1191,6 +1224,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	case BTRFS_BLOCK_GROUP_RAID0:
 		ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
 		break;
+	case BTRFS_BLOCK_GROUP_RAID10:
+		ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active);
+		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
 	case BTRFS_BLOCK_GROUP_RAID6:
 	default:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups
  2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
                   ` (10 preceding siblings ...)
  2025-02-17  2:37 ` [PATCH 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
@ 2025-02-17  2:37 ` Naohiro Aota
  11 siblings, 0 replies; 15+ messages in thread
From: Naohiro Aota @ 2025-02-17  2:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

When one of two zones composing a DUP block group is a conventional zone, we
have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of course,
not match the write pointer of the other zone, and fails that block group.

This commit solves that issue by properly recovering the emulated write pointer
from the last allocated extent. The offset for the SINGLE, DUP, and RAID1 are
straight-forward: it is same as the end of last allocated extent. The RAID0 and
RAID10 are a bit tricky that we need to do the math of striping.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 65 +++++++++++++++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 484bade1d2ed..d96311af70b2 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -981,7 +981,7 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
 				      struct btrfs_block_group *bg,
 				      struct map_lookup *map,
 				      struct zone_info *zone_info,
-				      unsigned long *active)
+				      unsigned long *active, u64 last_alloc)
 {
 	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
 		btrfs_err(fs_info, "zoned: data DUP profile needs raid-stripe-tree");
@@ -1002,6 +1002,12 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info,
 			  zone_info[1].physical);
 		return -EIO;
 	}
+
+	if (zone_info[0].alloc_offset == WP_CONVENTIONAL)
+		zone_info[0].alloc_offset = last_alloc;
+	if (zone_info[1].alloc_offset == WP_CONVENTIONAL)
+		zone_info[1].alloc_offset = last_alloc;
+
 	if (zone_info[0].alloc_offset != zone_info[1].alloc_offset) {
 		btrfs_err(fs_info,
 			  "zoned: write pointer offset mismatch of zones in DUP profile");
@@ -1022,7 +1028,7 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
 					struct btrfs_block_group *bg,
 					struct map_lookup *map,
 					struct zone_info *zone_info,
-					unsigned long *active)
+					unsigned long *active, u64 last_alloc)
 {
 	int i;
 
@@ -1036,9 +1042,10 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info,
 	bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);
 
 	for (i = 0; i < map->num_stripes; i++) {
-		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
-		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV)
 			continue;
+		if (zone_info[i].alloc_offset == WP_CONVENTIONAL)
+			zone_info[i].alloc_offset = last_alloc;
 
 		if (zone_info[0].alloc_offset != zone_info[i].alloc_offset) {
 			btrfs_err(fs_info,
@@ -1066,7 +1073,7 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
 					struct btrfs_block_group *bg,
 					struct map_lookup *map,
 					struct zone_info *zone_info,
-					unsigned long *active)
+					unsigned long *active, u64 last_alloc)
 {
 	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
 		btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
@@ -1075,9 +1082,24 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info,
 	}
 
 	for (int i = 0; i < map->num_stripes; i++) {
-		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
-		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV)
 			continue;
+		if (zone_info[i].alloc_offset == WP_CONVENTIONAL) {
+			u64 stripe_nr, full_stripe_nr;
+			u64 stripe_offset;
+			int stripe_index;
+
+			stripe_nr = last_alloc / map->stripe_len;
+			stripe_offset = stripe_nr * map->stripe_len;
+			full_stripe_nr = stripe_nr / map->num_stripes;
+			stripe_index = stripe_nr % map->num_stripes;
+
+			zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len;
+			if (stripe_index > i)
+				zone_info[i].alloc_offset += map->stripe_len;
+			else if (stripe_index == i)
+				zone_info[i].alloc_offset += (last_alloc - stripe_offset);
+		}
 
 		if (test_bit(0, active) != test_bit(i, active)) {
 			return -EIO;
@@ -1096,7 +1118,7 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
 					 struct btrfs_block_group *bg,
 					 struct map_lookup *map,
 					 struct zone_info *zone_info,
-					 unsigned long *active)
+					 unsigned long *active, u64 last_alloc)
 {
 	if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) {
 		btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
@@ -1105,9 +1127,24 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info,
 	}
 
 	for (int i = 0; i < map->num_stripes; i++) {
-		if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
-		    zone_info[i].alloc_offset == WP_CONVENTIONAL)
+		if (zone_info[i].alloc_offset == WP_MISSING_DEV)
 			continue;
+		if (zone_info[i].alloc_offset == WP_CONVENTIONAL) {
+			u64 stripe_nr, full_stripe_nr;
+			u64 stripe_offset;
+			int stripe_index;
+
+			stripe_nr = last_alloc / map->stripe_len;
+			stripe_offset = stripe_nr * map->stripe_len;
+			full_stripe_nr = stripe_nr / (map->num_stripes / map->sub_stripes);
+			stripe_index = stripe_nr % (map->num_stripes / map->sub_stripes);
+
+			zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len;
+			if (stripe_index > (i / map->sub_stripes))
+				zone_info[i].alloc_offset += map->stripe_len;
+			else if (stripe_index == (i / map->sub_stripes))
+				zone_info[i].alloc_offset += (last_alloc - stripe_offset);
+		}
 
 		if (test_bit(0, active) != test_bit(i, active)) {
 			return -EIO;
@@ -1214,18 +1251,18 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 		ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
 		break;
 	case BTRFS_BLOCK_GROUP_DUP:
-		ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active);
+		ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active, last_alloc);
 		break;
 	case BTRFS_BLOCK_GROUP_RAID1:
 	case BTRFS_BLOCK_GROUP_RAID1C3:
 	case BTRFS_BLOCK_GROUP_RAID1C4:
-		ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active);
+		ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active, last_alloc);
 		break;
 	case BTRFS_BLOCK_GROUP_RAID0:
-		ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active);
+		ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active, last_alloc);
 		break;
 	case BTRFS_BLOCK_GROUP_RAID10:
-		ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active);
+		ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active, last_alloc);
 		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
 	case BTRFS_BLOCK_GROUP_RAID6:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading
  2025-02-17  2:37 ` [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
@ 2025-02-17 17:12   ` Johannes Thumshirn
  2025-02-17 17:16     ` Johannes Thumshirn
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Thumshirn @ 2025-02-17 17:12 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

On 17.02.25 03:39, Naohiro Aota wrote:
> -	/* SINGLE profile case. */
> -	cache->alloc_offset = zone_info[0].alloc_offset;
> -	cache->zone_capacity = zone_info[0].capacity;
> -	cache->zone_is_active = test_bit(0, active);
> +
> +	profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
> +	switch (profile) {
> +	case 0: /* single */
> +		ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
> +		break;
> +	case BTRFS_BLOCK_GROUP_RAID5:
> +	case BTRFS_BLOCK_GROUP_RAID6:
> +	default:
> +		error("zoned: profile %s not yet supported",
> +		      btrfs_bg_type_to_raid_name(map->type));
> +		ret = -EINVAL;
> +		goto out;
> +	}

The above is missing RAID0/1/10. Which on a non-experimental build 
should also error out. I see patch 9 is adding RAID1 but I think this 
patch needs to add the cases as well and error out (for now).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading
  2025-02-17 17:12   ` Johannes Thumshirn
@ 2025-02-17 17:16     ` Johannes Thumshirn
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2025-02-17 17:16 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org

On 17.02.25 18:13, Johannes Thumshirn wrote:
> On 17.02.25 03:39, Naohiro Aota wrote:
>> -	/* SINGLE profile case. */
>> -	cache->alloc_offset = zone_info[0].alloc_offset;
>> -	cache->zone_capacity = zone_info[0].capacity;
>> -	cache->zone_is_active = test_bit(0, active);
>> +
>> +	profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
>> +	switch (profile) {
>> +	case 0: /* single */
>> +		ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active);
>> +		break;
>> +	case BTRFS_BLOCK_GROUP_RAID5:
>> +	case BTRFS_BLOCK_GROUP_RAID6:
>> +	default:
>> +		error("zoned: profile %s not yet supported",
>> +		      btrfs_bg_type_to_raid_name(map->type));
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
> 
> The above is missing RAID0/1/10. Which on a non-experimental build
> should also error out. I see patch 9 is adding RAID1 but I think this
> patch needs to add the cases as well and error out (for now).
> 

And DUP obviously as well sorry.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-17 17:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17  2:37 [PATCH 00/12] btrfs-progs: zoned: support zone capacity and Naohiro Aota
2025-02-17  2:37 ` [PATCH 01/12] btrfs-progs: introduce min_not_zero() Naohiro Aota
2025-02-17  2:37 ` [PATCH 02/12] btrfs-progs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info Naohiro Aota
2025-02-17  2:37 ` [PATCH 03/12] btrfs-progs: zoned: support zone capacity Naohiro Aota
2025-02-17  2:37 ` [PATCH 04/12] btrfs-progs: zoned: load zone activeness Naohiro Aota
2025-02-17  2:37 ` [PATCH 05/12] btrfs-progs: zoned: activate block group on loading Naohiro Aota
2025-02-17  2:37 ` [PATCH 06/12] btrfs-progs: factor out btrfs_load_zone_info() Naohiro Aota
2025-02-17  2:37 ` [PATCH 07/12] btrfs-progs: zoned: factor out SINGLE zone info loading Naohiro Aota
2025-02-17 17:12   ` Johannes Thumshirn
2025-02-17 17:16     ` Johannes Thumshirn
2025-02-17  2:37 ` [PATCH 08/12] btrfs-progs: zoned: implement DUP " Naohiro Aota
2025-02-17  2:37 ` [PATCH 09/12] btrfs-progs: zoned: implement RAID1 " Naohiro Aota
2025-02-17  2:37 ` [PATCH 10/12] btrfs-progs: zoned: implement RAID0 " Naohiro Aota
2025-02-17  2:37 ` [PATCH 11/12] btrfs-progs: implement RAID10 " Naohiro Aota
2025-02-17  2:37 ` [PATCH 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Naohiro Aota

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox