* [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support
@ 2024-05-14 18:22 Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 1/8] btrfs-progs: rename block_count to byte_count Naohiro Aota
` (7 more replies)
0 siblings, 8 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
mkfs.btrfs -b <byte_count> on a zoned device has several issues listed
below.
- The FS size needs to be larger than minimal size that can host a btrfs,
but its calculation does not consider non-SINGLE profile
- The calculation also does not ensure tree-log BG and data relocation BG
- It allows creating a FS not aligned to the zone boundary
- It resets all device zones beyond the specified length
This series fixes the issues with some cleanups.
This one passed CI workflow here:
https://github.com/naota/btrfs-progs/actions/runs/9083915553
Patches 1 to 3 are clean up patches, so they should not change the behavior.
Patches 4 to 6 address the issues.
The last two patches handle the test cases. Patch 7 adds a new test for
zone resetting. And, patch 8 tweaks an existing test to use smaller zone
size to have enough zones than the new requirement.
Changes:
- v2
- fix function declaration on older distro (non-ZONED setup)
- fix mkfs test failure
Naohiro Aota (8):
btrfs-progs: rename block_count to byte_count
btrfs-progs: mkfs: remove duplicated device size check
btrfs-progs: mkfs: unify zoned mode minimum size calc into
btrfs_min_dev_size()
btrfs-progs: mkfs: fix minimum size calculation for zoned mode
btrfs-progs: mkfs: check if byte_count is zone size aligned
btrfs-progs: support byte length for zone resetting
btrfs-progs: add test for zone resetting
btrfs-progs: test: use smaller emulated zone size
.github/workflows/coverage.yml | 2 +-
.github/workflows/devel.yml | 2 +-
.github/workflows/pull-request.yml | 2 +-
common/device-utils.c | 45 +++++++------
kernel-shared/zoned.c | 23 ++++++-
kernel-shared/zoned.h | 7 +-
mkfs/common.c | 48 +++++++++++++-
mkfs/common.h | 2 +-
mkfs/main.c | 82 ++++++++++--------------
tests/mkfs-tests/030-zoned-rst/test.sh | 7 +-
tests/mkfs-tests/032-zoned-reset/test.sh | 62 ++++++++++++++++++
11 files changed, 202 insertions(+), 80 deletions(-)
create mode 100755 tests/mkfs-tests/032-zoned-reset/test.sh
--
2.45.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/8] btrfs-progs: rename block_count to byte_count
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 2/8] btrfs-progs: mkfs: remove duplicated device size check Naohiro Aota
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
block_count and dev_block_count are counting the size in bytes. And,
comparing them with e.g, "min_dev_size" is confusing. Rename them to
represent the unit better.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
common/device-utils.c | 28 +++++++++++-----------
mkfs/main.c | 56 +++++++++++++++++++++----------------------
2 files changed, 42 insertions(+), 42 deletions(-)
diff --git a/common/device-utils.c b/common/device-utils.c
index d086e9ea2564..86942e0c7041 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -222,11 +222,11 @@ out:
* - reset zones
* - delete end of the device
*/
-int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
- u64 max_block_count, unsigned opflags)
+int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
+ u64 max_byte_count, unsigned opflags)
{
struct btrfs_zoned_device_info *zinfo = NULL;
- u64 block_count;
+ u64 byte_count;
struct stat st;
int i, ret;
@@ -236,13 +236,13 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
return 1;
}
- block_count = device_get_partition_size_fd_stat(fd, &st);
- if (block_count == 0) {
+ byte_count = device_get_partition_size_fd_stat(fd, &st);
+ if (byte_count == 0) {
error("unable to determine size of %s", file);
return 1;
}
- if (max_block_count)
- block_count = min(block_count, max_block_count);
+ if (max_byte_count)
+ byte_count = min(byte_count, max_byte_count);
if (opflags & PREP_DEVICE_ZONED) {
ret = btrfs_get_zone_info(fd, file, &zinfo);
@@ -276,18 +276,18 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
if (discard_supported(file)) {
if (opflags & PREP_DEVICE_VERBOSE)
printf("Performing full device TRIM %s (%s) ...\n",
- file, pretty_size(block_count));
- device_discard_blocks(fd, 0, block_count);
+ file, pretty_size(byte_count));
+ device_discard_blocks(fd, 0, byte_count);
}
}
- ret = zero_dev_clamped(fd, zinfo, 0, ZERO_DEV_BYTES, block_count);
+ ret = zero_dev_clamped(fd, zinfo, 0, ZERO_DEV_BYTES, byte_count);
for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++)
ret = zero_dev_clamped(fd, zinfo, btrfs_sb_offset(i),
- BTRFS_SUPER_INFO_SIZE, block_count);
+ BTRFS_SUPER_INFO_SIZE, byte_count);
if (!ret && (opflags & PREP_DEVICE_ZERO_END))
- ret = zero_dev_clamped(fd, zinfo, block_count - ZERO_DEV_BYTES,
- ZERO_DEV_BYTES, block_count);
+ ret = zero_dev_clamped(fd, zinfo, byte_count - ZERO_DEV_BYTES,
+ ZERO_DEV_BYTES, byte_count);
if (ret < 0) {
errno = -ret;
@@ -302,7 +302,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
}
free(zinfo);
- *block_count_ret = block_count;
+ *byte_count_ret = byte_count;
return 0;
err:
diff --git a/mkfs/main.c b/mkfs/main.c
index a467795d4428..950f76101058 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -80,8 +80,8 @@ static int opt_oflags = O_RDWR;
struct prepare_device_progress {
int fd;
char *file;
- u64 dev_block_count;
- u64 block_count;
+ u64 dev_byte_count;
+ u64 byte_count;
int ret;
};
@@ -1159,8 +1159,8 @@ static void *prepare_one_device(void *ctx)
}
prepare_ctx->ret = btrfs_prepare_device(prepare_ctx->fd,
prepare_ctx->file,
- &prepare_ctx->dev_block_count,
- prepare_ctx->block_count,
+ &prepare_ctx->dev_byte_count,
+ prepare_ctx->byte_count,
(bconf.verbose ? PREP_DEVICE_VERBOSE : 0) |
(opt_zero_end ? PREP_DEVICE_ZERO_END : 0) |
(opt_discard ? PREP_DEVICE_DISCARD : 0) |
@@ -1204,8 +1204,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
bool metadata_profile_set = false;
u64 data_profile = 0;
bool data_profile_set = false;
- u64 block_count = 0;
- u64 dev_block_count = 0;
+ u64 byte_count = 0;
+ u64 dev_byte_count = 0;
bool mixed = false;
char *label = NULL;
int nr_global_roots = sysconf(_SC_NPROCESSORS_ONLN);
@@ -1347,7 +1347,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
sectorsize = arg_strtou64_with_suffix(optarg);
break;
case 'b':
- block_count = arg_strtou64_with_suffix(optarg);
+ byte_count = arg_strtou64_with_suffix(optarg);
opt_zero_end = false;
break;
case 'v':
@@ -1623,34 +1623,34 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
* Block_count not specified, use file/device size first.
* Or we will always use source_dir_size calculated for mkfs.
*/
- if (!block_count)
- block_count = device_get_partition_size_fd_stat(fd, &statbuf);
+ if (!byte_count)
+ byte_count = device_get_partition_size_fd_stat(fd, &statbuf);
source_dir_size = btrfs_mkfs_size_dir(source_dir, sectorsize,
min_dev_size, metadata_profile, data_profile);
- if (block_count < source_dir_size) {
+ if (byte_count < source_dir_size) {
if (S_ISREG(statbuf.st_mode)) {
- block_count = source_dir_size;
+ byte_count = source_dir_size;
} else {
warning(
"the target device %llu (%s) is smaller than the calculated source directory size %llu (%s), mkfs may fail",
- block_count, pretty_size(block_count),
+ byte_count, pretty_size(byte_count),
source_dir_size, pretty_size(source_dir_size));
}
}
- ret = zero_output_file(fd, block_count);
+ ret = zero_output_file(fd, byte_count);
if (ret) {
error("unable to zero the output file");
close(fd);
goto error;
}
/* our "device" is the new image file */
- dev_block_count = block_count;
+ dev_byte_count = byte_count;
close(fd);
}
- /* Check device/block_count after the nodesize is determined */
- if (block_count && block_count < min_dev_size) {
+ /* Check device/byte_count after the nodesize is determined */
+ if (byte_count && byte_count < min_dev_size) {
error("size %llu is too small to make a usable filesystem",
- block_count);
+ byte_count);
error("minimum size for btrfs filesystem is %llu",
min_dev_size);
goto error;
@@ -1661,9 +1661,9 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
* 1 zone for a metadata block group
* 1 zone for a data block group
*/
- if (opt_zoned && block_count && block_count < 5 * zone_size(file)) {
+ if (opt_zoned && byte_count && byte_count < 5 * zone_size(file)) {
error("size %llu is too small to make a usable filesystem",
- block_count);
+ byte_count);
error("minimum size for a zoned btrfs filesystem is %llu",
min_dev_size);
goto error;
@@ -1741,8 +1741,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
/* Start threads */
for (i = 0; i < device_count; i++) {
prepare_ctx[i].file = argv[optind + i - 1];
- prepare_ctx[i].block_count = block_count;
- prepare_ctx[i].dev_block_count = block_count;
+ prepare_ctx[i].byte_count = byte_count;
+ prepare_ctx[i].dev_byte_count = byte_count;
ret = pthread_create(&t_prepare[i], NULL, prepare_one_device,
&prepare_ctx[i]);
if (ret) {
@@ -1763,16 +1763,16 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
goto error;
}
- dev_block_count = prepare_ctx[0].dev_block_count;
- if (block_count && block_count > dev_block_count) {
+ dev_byte_count = prepare_ctx[0].dev_byte_count;
+ if (byte_count && byte_count > dev_byte_count) {
error("%s is smaller than requested size, expected %llu, found %llu",
- file, block_count, dev_block_count);
+ file, byte_count, dev_byte_count);
goto error;
}
/* To create the first block group and chunk 0 in make_btrfs */
system_group_size = (opt_zoned ? zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE);
- if (dev_block_count < system_group_size) {
+ if (dev_byte_count < system_group_size) {
error("device is too small to make filesystem, must be at least %llu",
system_group_size);
goto error;
@@ -1794,7 +1794,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
mkfs_cfg.label = label;
memcpy(mkfs_cfg.fs_uuid, fs_uuid, sizeof(mkfs_cfg.fs_uuid));
memcpy(mkfs_cfg.dev_uuid, dev_uuid, sizeof(mkfs_cfg.dev_uuid));
- mkfs_cfg.num_bytes = dev_block_count;
+ mkfs_cfg.num_bytes = dev_byte_count;
mkfs_cfg.nodesize = nodesize;
mkfs_cfg.sectorsize = sectorsize;
mkfs_cfg.stripesize = stripesize;
@@ -1889,7 +1889,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
file);
continue;
}
- dev_block_count = prepare_ctx[i].dev_block_count;
+ dev_byte_count = prepare_ctx[i].dev_byte_count;
if (prepare_ctx[i].ret) {
errno = -prepare_ctx[i].ret;
@@ -1898,7 +1898,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
}
ret = btrfs_add_to_fsid(trans, root, prepare_ctx[i].fd,
- prepare_ctx[i].file, dev_block_count,
+ prepare_ctx[i].file, dev_byte_count,
sectorsize, sectorsize, sectorsize);
if (ret) {
error("unable to add %s to filesystem: %d",
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/8] btrfs-progs: mkfs: remove duplicated device size check
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 1/8] btrfs-progs: rename block_count to byte_count Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 3/8] btrfs-progs: mkfs: unify zoned mode minimum size calc into btrfs_min_dev_size() Naohiro Aota
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
test_minimum_size() already checks if each device can host the initial
block groups. There is no need to check if the first device can host the
initial system chunk again.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
mkfs/main.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/mkfs/main.c b/mkfs/main.c
index 950f76101058..f6f67abf3b0e 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1189,7 +1189,6 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
struct prepare_device_progress *prepare_ctx = NULL;
struct mkfs_allocation allocation = { 0 };
struct btrfs_mkfs_config mkfs_cfg;
- u64 system_group_size;
/* Options */
bool force_overwrite = false;
struct btrfs_mkfs_features features = btrfs_mkfs_default_features;
@@ -1770,14 +1769,6 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
goto error;
}
- /* To create the first block group and chunk 0 in make_btrfs */
- system_group_size = (opt_zoned ? zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE);
- if (dev_byte_count < system_group_size) {
- error("device is too small to make filesystem, must be at least %llu",
- system_group_size);
- goto error;
- }
-
if (btrfs_bg_type_to_tolerated_failures(metadata_profile) <
btrfs_bg_type_to_tolerated_failures(data_profile))
warning("metadata has lower redundancy than data!\n");
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/8] btrfs-progs: mkfs: unify zoned mode minimum size calc into btrfs_min_dev_size()
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 1/8] btrfs-progs: rename block_count to byte_count Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 2/8] btrfs-progs: mkfs: remove duplicated device size check Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode Naohiro Aota
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
We are going to implement a better minimum size calculation for the zoned
mode. Move the current logic to btrfs_min_dev_size() and unify the size
checking path.
Also, convert "int mixed" to "bool mixed" while at it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
mkfs/common.c | 11 ++++++++++-
mkfs/common.h | 2 +-
mkfs/main.c | 22 +++++-----------------
3 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/mkfs/common.c b/mkfs/common.c
index 3c48a6c120e7..af54089654a0 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -811,13 +811,22 @@ static u64 btrfs_min_global_blk_rsv_size(u32 nodesize)
return (u64)nodesize << 10;
}
-u64 btrfs_min_dev_size(u32 nodesize, int mixed, u64 meta_profile,
+u64 btrfs_min_dev_size(u32 nodesize, bool mixed, u64 zone_size, u64 meta_profile,
u64 data_profile)
{
u64 reserved = 0;
u64 meta_size;
u64 data_size;
+ /*
+ * 2 zones for the primary superblock
+ * 1 zone for the system block group
+ * 1 zone for a metadata block group
+ * 1 zone for a data block group
+ */
+ if (zone_size)
+ return 5 * zone_size;
+
if (mixed)
return 2 * (BTRFS_MKFS_SYSTEM_GROUP_SIZE +
btrfs_min_global_blk_rsv_size(nodesize));
diff --git a/mkfs/common.h b/mkfs/common.h
index d9183c997bb2..de0ff57beee8 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -105,7 +105,7 @@ struct btrfs_mkfs_config {
int make_btrfs(int fd, struct btrfs_mkfs_config *cfg);
int btrfs_make_root_dir(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 objectid);
-u64 btrfs_min_dev_size(u32 nodesize, int mixed, u64 meta_profile,
+u64 btrfs_min_dev_size(u32 nodesize, bool mixed, u64 zone_size, u64 meta_profile,
u64 data_profile);
int test_minimum_size(const char *file, u64 min_dev_size);
int is_vol_small(const char *file);
diff --git a/mkfs/main.c b/mkfs/main.c
index f6f67abf3b0e..a437ecc40c7f 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1588,8 +1588,9 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
goto error;
}
- min_dev_size = btrfs_min_dev_size(nodesize, mixed, metadata_profile,
- data_profile);
+ min_dev_size = btrfs_min_dev_size(nodesize, mixed,
+ opt_zoned ? zone_size(file) : 0,
+ metadata_profile, data_profile);
/*
* Enlarge the destination file or create a new one, using the size
* calculated from source dir.
@@ -1650,21 +1651,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
if (byte_count && byte_count < min_dev_size) {
error("size %llu is too small to make a usable filesystem",
byte_count);
- error("minimum size for btrfs filesystem is %llu",
- min_dev_size);
- goto error;
- }
- /*
- * 2 zones for the primary superblock
- * 1 zone for the system block group
- * 1 zone for a metadata block group
- * 1 zone for a data block group
- */
- if (opt_zoned && byte_count && byte_count < 5 * zone_size(file)) {
- error("size %llu is too small to make a usable filesystem",
- byte_count);
- error("minimum size for a zoned btrfs filesystem is %llu",
- min_dev_size);
+ error("minimum size for a %sbtrfs filesystem is %llu",
+ opt_zoned ? "zoned mode " : "", min_dev_size);
goto error;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
` (2 preceding siblings ...)
2024-05-14 18:22 ` [PATCH v2 3/8] btrfs-progs: mkfs: unify zoned mode minimum size calc into btrfs_min_dev_size() Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 22:54 ` Qu Wenruo
2024-05-14 18:22 ` [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned Naohiro Aota
` (3 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Currently, we check if a device is larger than 5 zones to determine we can
create btrfs on the device or not. Actually, we need more zones to create
DUP block groups, so it fails with "ERROR: not enough free space to
allocate chunk". Implement proper support for non-SINGLE profile.
Also, current code does not ensure we can create tree-log BG and data
relocation BG, which are essential for the real usage. Count them as
requirement too.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
mkfs/common.c | 53 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 45 insertions(+), 8 deletions(-)
diff --git a/mkfs/common.c b/mkfs/common.c
index af54089654a0..a5100b296f65 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -818,14 +818,51 @@ u64 btrfs_min_dev_size(u32 nodesize, bool mixed, u64 zone_size, u64 meta_profile
u64 meta_size;
u64 data_size;
- /*
- * 2 zones for the primary superblock
- * 1 zone for the system block group
- * 1 zone for a metadata block group
- * 1 zone for a data block group
- */
- if (zone_size)
- return 5 * zone_size;
+ if (zone_size) {
+ /* 2 zones for the primary superblock. */
+ reserved += 2 * zone_size;
+
+ /*
+ * 1 zone each for the initial system, metadata, and data block
+ * group
+ */
+ reserved += 3 * zone_size;
+
+ /*
+ * non-SINGLE profile needs:
+ * 1 zone for system block group
+ * 1 zone for normal metadata block group
+ * 1 zone for tree-log block group
+ *
+ * SINGLE profile only need to add tree-log block group
+ */
+ if (meta_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
+ meta_size = 3 * zone_size;
+ else
+ meta_size = zone_size;
+ /* DUP profile needs two zones for each block group. */
+ if (meta_profile & BTRFS_BLOCK_GROUP_DUP)
+ meta_size *= 2;
+ reserved += meta_size;
+
+ /*
+ * non-SINGLE profile needs:
+ * 1 zone for data block group
+ * 1 zone for data relocation block group
+ *
+ * SINGLE profile only need to add data relocationblock group
+ */
+ if (data_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
+ data_size = 2 * zone_size;
+ else
+ data_size = zone_size;
+ /* DUP profile needs two zones for each block group. */
+ if (data_profile & BTRFS_BLOCK_GROUP_DUP)
+ data_size *= 2;
+ reserved += data_size;
+
+ return reserved;
+ }
if (mixed)
return 2 * (BTRFS_MKFS_SYSTEM_GROUP_SIZE +
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
` (3 preceding siblings ...)
2024-05-14 18:22 ` [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 22:56 ` Qu Wenruo
2024-05-14 18:22 ` [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting Naohiro Aota
` (2 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Creating a btrfs whose size is not aligned to the zone boundary is
meaningless and allowing it can confuse users. Disallow creating it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
mkfs/main.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mkfs/main.c b/mkfs/main.c
index a437ecc40c7f..faf397848cc4 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1655,6 +1655,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
opt_zoned ? "zoned mode " : "", min_dev_size);
goto error;
}
+ if (byte_count && opt_zoned && !IS_ALIGNED(byte_count, zone_size(file))) {
+ error("size %llu is not aligned to zone size %llu", byte_count,
+ zone_size(file));
+ goto error;
+ }
for (i = saved_optind; i < saved_optind + device_count; i++) {
char *path;
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
` (4 preceding siblings ...)
2024-05-14 18:22 ` [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 22:59 ` Qu Wenruo
2024-05-14 18:22 ` [PATCH v2 7/8] btrfs-progs: add test " Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 8/8] btrfs-progs: test: use smaller emulated zone size Naohiro Aota
7 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
Limit the reset target within the specified length.
Also, we need to check that there is no active zone outside of the FS
range. If there is one, btrfs fails to meet the active zone limit properly.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
common/device-utils.c | 17 ++++++++++++-----
kernel-shared/zoned.c | 23 ++++++++++++++++++++++-
kernel-shared/zoned.h | 7 ++++---
3 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/common/device-utils.c b/common/device-utils.c
index 86942e0c7041..7df7d9ce39d8 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -254,16 +254,23 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
if (!zinfo->emulated) {
if (opflags & PREP_DEVICE_VERBOSE)
- printf("Resetting device zones %s (%u zones) ...\n",
- file, zinfo->nr_zones);
+ printf("Resetting device zones %s (%llu zones) ...\n",
+ file, byte_count / zinfo->zone_size);
/*
* We cannot ignore zone reset errors for a zoned block
* device as this could result in the inability to write
* to non-empty sequential zones of the device.
*/
- if (btrfs_reset_all_zones(fd, zinfo)) {
- error("zoned: failed to reset device '%s' zones: %m",
- file);
+ ret = btrfs_reset_zones(fd, zinfo, byte_count);
+ if (ret) {
+ if (ret == EBUSY) {
+ error("zoned: device '%s' contains an active zone outside of the FS range",
+ file);
+ error("zoned: btrfs needs full control of active zones");
+ } else {
+ error("zoned: failed to reset device '%s' zones: %m",
+ file);
+ }
goto err;
}
}
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index fb1e1388804e..b4244966ca36 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -395,16 +395,24 @@ static int report_zones(int fd, const char *file,
* Discard blocks in the zones of a zoned block device. Process this with zone
* size granularity so that blocks in conventional zones are discarded using
* discard_range and blocks in sequential zones are reset though a zone reset.
+ *
+ * We need to ensure that zones outside of the FS is not active, so that
+ * the FS can use all the active zones. Return EBUSY if there is an active
+ * zone.
*/
-int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
+int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count)
{
unsigned int i;
int ret = 0;
ASSERT(zinfo);
+ ASSERT(IS_ALIGNED(byte_count, zinfo->zone_size));
/* Zone size granularity */
for (i = 0; i < zinfo->nr_zones; i++) {
+ if (byte_count == 0)
+ break;
+
if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
ret = device_discard_blocks(fd,
zinfo->zones[i].start << SECTOR_SHIFT,
@@ -419,7 +427,20 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
if (ret)
return ret;
+
+ byte_count -= zinfo->zone_size;
}
+ for (; i < zinfo->nr_zones; i++) {
+ const enum blk_zone_cond cond = zinfo->zones[i].cond;
+
+ if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL)
+ continue;
+ if (cond == BLK_ZONE_COND_IMP_OPEN ||
+ cond == BLK_ZONE_COND_EXP_OPEN ||
+ cond == BLK_ZONE_COND_CLOSED)
+ return EBUSY;
+ }
+
return fsync(fd);
}
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 6eba86d266bf..2bf24cbba62a 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -149,7 +149,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
u64 start, u64 end);
int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
u64 offset, u64 length);
-int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
+int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count);
int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
size_t len);
int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices);
@@ -203,8 +203,9 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
return 0;
}
-static inline int btrfs_reset_all_zones(int fd,
- struct btrfs_zoned_device_info *zinfo)
+static inline int btrfs_reset_zones(int fd,
+ struct btrfs_zoned_device_info *zinfo,
+ u64 byte_count)
{
return -EOPNOTSUPP;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 7/8] btrfs-progs: add test for zone resetting
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
` (5 preceding siblings ...)
2024-05-14 18:22 ` [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
2024-05-14 23:04 ` Qu Wenruo
2024-05-14 18:22 ` [PATCH v2 8/8] btrfs-progs: test: use smaller emulated zone size Naohiro Aota
7 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
Add test for mkfs.btrfs's zone reset behavior to check if
- it resets all the zones without "-b" option
- it detects an active zone outside of the FS range
- it do not reset a zone outside of the range
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
tests/mkfs-tests/032-zoned-reset/test.sh | 62 ++++++++++++++++++++++++
1 file changed, 62 insertions(+)
create mode 100755 tests/mkfs-tests/032-zoned-reset/test.sh
diff --git a/tests/mkfs-tests/032-zoned-reset/test.sh b/tests/mkfs-tests/032-zoned-reset/test.sh
new file mode 100755
index 000000000000..6a599dd2874f
--- /dev/null
+++ b/tests/mkfs-tests/032-zoned-reset/test.sh
@@ -0,0 +1,62 @@
+#!/bin/bash
+# Verify mkfs for zoned devices support block-group-tree feature
+
+source "$TEST_TOP/common" || exit
+
+setup_root_helper
+prepare_test_dev
+
+nullb="$TEST_TOP/nullb"
+# Create one 128M device with 4M zones, 32 of them
+size=128
+zone=4
+
+run_mayfail $SUDO_HELPER "$nullb" setup
+if [ $? != 0 ]; then
+ _not_run "cannot setup nullb environment for zoned devices"
+fi
+
+# Record any other pre-existing devices in case creation fails
+run_check $SUDO_HELPER "$nullb" ls
+
+# Last line has the name of the device node path
+out=$(run_check_stdout $SUDO_HELPER "$nullb" create -s "$size" -z "$zone")
+if [ $? != 0 ]; then
+ _fail "cannot create nullb zoned device $i"
+fi
+dev=$(echo "$out" | tail -n 1)
+name=$(basename "${dev}")
+
+run_check $SUDO_HELPER "$nullb" ls
+
+TEST_DEV="${dev}"
+last_zone_sector=$(( 4 * 31 * 1024 * 1024 / 512 ))
+# Write some data to the last zone
+run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=4 seek=$(( 4 * 31 ))
+# Use single as it's supported on more kernels
+run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -m single -d single "${dev}"
+# Check if the lat zone is empty
+$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
+if [ $? != 0 ]; then
+ _fail "last zone is not empty"
+fi
+
+# Write some data to the last zone
+run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=1 seek=$(( 4 * 31 ))
+# Create a FS excluding the last zone
+run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
+if [ $? == 0 ]; then
+ _fail "mkfs.btrfs should detect active zone outside of FS range"
+fi
+
+# Fill the last zone to finish it
+run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=3 seek=$(( 4 * 31 + 1 ))
+# Create a FS excluding the last zone
+run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
+# Check if the lat zone is not empty
+$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
+if [ $? == 0 ]; then
+ _fail "last zone is empty"
+fi
+
+run_check $SUDO_HELPER "$nullb" rm "${name}"
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 8/8] btrfs-progs: test: use smaller emulated zone size
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
` (6 preceding siblings ...)
2024-05-14 18:22 ` [PATCH v2 7/8] btrfs-progs: add test " Naohiro Aota
@ 2024-05-14 18:22 ` Naohiro Aota
7 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-14 18:22 UTC (permalink / raw)
To: linux-btrfs; +Cc: Naohiro Aota
With the change of minimal number of zones, mkfs-tests/030-zoned-rst now
fails because the loopback device is 2GB and can contain 8x 256MB zones.
Use "--param zone-size=4M" to use 4MB zone size as same other nullb case.
We also need to enable "--enable-experimental" configure option in the CI
scripts to use that mkfs.btrfs option. Currently, it is limited to the place
mkfs test is running, but it would be nice to have it in general, as we
need to test development code anyway.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
.github/workflows/coverage.yml | 2 +-
.github/workflows/devel.yml | 2 +-
.github/workflows/pull-request.yml | 2 +-
tests/mkfs-tests/030-zoned-rst/test.sh | 7 ++++---
4 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml
index 3aea8cd5f56b..b7f209b3fc51 100644
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -16,7 +16,7 @@ jobs:
- run: sudo modprobe btrfs
- run: sudo apt-get install -y pkg-config gcc liblzo2-dev libzstd-dev libblkid-dev uuid-dev zlib1g-dev libext2fs-dev e2fsprogs libudev-dev python3-sphinx libaio-dev liburing-dev attr jq lcov
- name: Configure
- run: ./autogen.sh && ./configure --disable-documentation
+ run: ./autogen.sh && ./configure --disable-documentation --enable-experimental
- name: Make
run: make V=1 D=gcov
- name: Tests cli
diff --git a/.github/workflows/devel.yml b/.github/workflows/devel.yml
index aca6ed975563..3ac85b0b32e4 100644
--- a/.github/workflows/devel.yml
+++ b/.github/workflows/devel.yml
@@ -71,7 +71,7 @@ jobs:
- run: sudo modprobe btrfs
- run: sudo apt-get install -y pkg-config gcc liblzo2-dev libzstd-dev libblkid-dev uuid-dev zlib1g-dev libext2fs-dev e2fsprogs libudev-dev libaio-dev liburing-dev attr jq
- name: Configure
- run: ./autogen.sh && ./configure --disable-documentation
+ run: ./autogen.sh && ./configure --disable-documentation --enable-experimental
- name: Make
run: make V=1
- name: Tests mkfs
diff --git a/.github/workflows/pull-request.yml b/.github/workflows/pull-request.yml
index 954e1ee5ffb0..9765ea24a2e4 100644
--- a/.github/workflows/pull-request.yml
+++ b/.github/workflows/pull-request.yml
@@ -20,7 +20,7 @@ jobs:
- run: sudo modprobe btrfs
- run: sudo apt-get install -y pkg-config gcc liblzo2-dev libzstd-dev libblkid-dev uuid-dev zlib1g-dev libext2fs-dev e2fsprogs libudev-dev python3-sphinx libaio-dev liburing-dev attr jq
- name: Configure
- run: ./autogen.sh && ./configure --disable-documentation
+ run: ./autogen.sh && ./configure --disable-documentation --enable-experimental
- name: Make
run: make V=1
# - name: Musl build
diff --git a/tests/mkfs-tests/030-zoned-rst/test.sh b/tests/mkfs-tests/030-zoned-rst/test.sh
index 2e048cf79f20..9fa9c8c0d30b 100755
--- a/tests/mkfs-tests/030-zoned-rst/test.sh
+++ b/tests/mkfs-tests/030-zoned-rst/test.sh
@@ -9,17 +9,18 @@ prepare_loopdevs
TEST_DEV=${loopdevs[1]}
profiles="single dup raid1 raid1c3 raid1c4 raid10"
+zoned_param="-O zoned --param zone-size=4M"
for dprofile in $profiles; do
for mprofile in $profiles; do
# It's sufficient to specify only 'zoned', the rst will be enabled
- run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -O zoned -d "$dprofile" -m "$mprofile" "${loopdevs[@]}"
+ run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f ${zoned_param} -d "$dprofile" -m "$mprofile" "${loopdevs[@]}"
done
done
run_mustfail "unsupported profile raid56 created" \
- $SUDO_HELPER "$TOP/mkfs.btrfs" -f -O zoned -d raid5 -m raid5 "${loopdevs[@]}"
+ $SUDO_HELPER "$TOP/mkfs.btrfs" -f ${zoned_param} -d raid5 -m raid5 "${loopdevs[@]}"
run_mustfail "unsupported profile raid56 created" \
- $SUDO_HELPER "$TOP/mkfs.btrfs" -f -O zoned -d raid6 -m raid6 "${loopdevs[@]}"
+ $SUDO_HELPER "$TOP/mkfs.btrfs" -f ${zoned_param} -d raid6 -m raid6 "${loopdevs[@]}"
cleanup_loopdevs
--
2.45.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode
2024-05-14 18:22 ` [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode Naohiro Aota
@ 2024-05-14 22:54 ` Qu Wenruo
2024-05-15 16:25 ` Naohiro Aota
0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2024-05-14 22:54 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs
在 2024/5/15 03:52, Naohiro Aota 写道:
> Currently, we check if a device is larger than 5 zones to determine we can
> create btrfs on the device or not. Actually, we need more zones to create
> DUP block groups, so it fails with "ERROR: not enough free space to
> allocate chunk". Implement proper support for non-SINGLE profile.
>
> Also, current code does not ensure we can create tree-log BG and data
> relocation BG, which are essential for the real usage. Count them as
> requirement too.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> mkfs/common.c | 53 +++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/mkfs/common.c b/mkfs/common.c
> index af54089654a0..a5100b296f65 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -818,14 +818,51 @@ u64 btrfs_min_dev_size(u32 nodesize, bool mixed, u64 zone_size, u64 meta_profile
> u64 meta_size;
> u64 data_size;
>
> - /*
> - * 2 zones for the primary superblock
> - * 1 zone for the system block group
> - * 1 zone for a metadata block group
> - * 1 zone for a data block group
> - */
> - if (zone_size)
> - return 5 * zone_size;
> + if (zone_size) {
> + /* 2 zones for the primary superblock. */
> + reserved += 2 * zone_size;
> +
> + /*
> + * 1 zone each for the initial system, metadata, and data block
> + * group
> + */
> + reserved += 3 * zone_size;
> +
> + /*
> + * non-SINGLE profile needs:
> + * 1 zone for system block group
> + * 1 zone for normal metadata block group
> + * 1 zone for tree-log block group
> + *
> + * SINGLE profile only need to add tree-log block group
This comments looks a little confusing to me.
As (for now) the non-SINGLE profiles for metadata is only DUP, thus they
needs at least 2 zones for each bg.
It's only explained later in the "meta_size *= 2;" line.
Would the following ones be a little better?
/*
* non-SINGLE profile needs:
* 1 extra system block group
* 1 extra normal metadata block group
* 1 extra tree-log block group
*
* SINGLE profiles needs:
* 1 extra tree-log block group
*/
if (meta_profiles & BTRFS_BLOCK_GROUP_DUP)
factor = 2;
if (meta_profiles & BTRFS_BLOCK_GROUP_PROFILE_MASK)
meta_size = 3 * zone_size * factor;
else
meta_size = 1 * zone_size * factor;
Otherwise looks reasonable to me.
Thanks,
Qu
> + */
> + if (meta_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
> + meta_size = 3 * zone_size;
> + else
> + meta_size = zone_size;
> + /* DUP profile needs two zones for each block group. */
> + if (meta_profile & BTRFS_BLOCK_GROUP_DUP)
> + meta_size *= 2;
> + reserved += meta_size;
> +
> + /*
> + * non-SINGLE profile needs:
> + * 1 zone for data block group
> + * 1 zone for data relocation block group
> + *
> + * SINGLE profile only need to add data relocationblock group
> + */
> + if (data_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
> + data_size = 2 * zone_size;
> + else
> + data_size = zone_size;
> + /* DUP profile needs two zones for each block group. */
> + if (data_profile & BTRFS_BLOCK_GROUP_DUP)
> + data_size *= 2;
> + reserved += data_size;
> +
> + return reserved;
> + }
>
> if (mixed)
> return 2 * (BTRFS_MKFS_SYSTEM_GROUP_SIZE +
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned
2024-05-14 18:22 ` [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned Naohiro Aota
@ 2024-05-14 22:56 ` Qu Wenruo
2024-05-15 15:43 ` Naohiro Aota
0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2024-05-14 22:56 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs
在 2024/5/15 03:52, Naohiro Aota 写道:
> Creating a btrfs whose size is not aligned to the zone boundary is
> meaningless and allowing it can confuse users. Disallow creating it.
Can we just round it down and gives a warning?
I'm pretty sure some users are used to just passing some numbers like
1000000 to "-b" option.
And it may also be a good idea to do the same rounddown for non-zoned fs.
Thanks,
Qu
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> mkfs/main.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mkfs/main.c b/mkfs/main.c
> index a437ecc40c7f..faf397848cc4 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1655,6 +1655,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
> opt_zoned ? "zoned mode " : "", min_dev_size);
> goto error;
> }
> + if (byte_count && opt_zoned && !IS_ALIGNED(byte_count, zone_size(file))) {
> + error("size %llu is not aligned to zone size %llu", byte_count,
> + zone_size(file));
> + goto error;
> + }
>
> for (i = saved_optind; i < saved_optind + device_count; i++) {
> char *path;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting
2024-05-14 18:22 ` [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting Naohiro Aota
@ 2024-05-14 22:59 ` Qu Wenruo
2024-05-15 16:11 ` Naohiro Aota
0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2024-05-14 22:59 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs
在 2024/5/15 03:52, Naohiro Aota 写道:
> Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
> Limit the reset target within the specified length.
>
> Also, we need to check that there is no active zone outside of the FS
> range. If there is one, btrfs fails to meet the active zone limit properly.
Mind to explain more on why an active zone *outside* of the fs range is
a problem?
It's pretty instinctive to consider such active zones out of the fs
range as non-exist, thus should not cause much problem (until we want to
expand the fs etc).
This should just acts like the data beyond fs range in traditional
devices, and we never really bothered them.
Thanks,
Qu
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> common/device-utils.c | 17 ++++++++++++-----
> kernel-shared/zoned.c | 23 ++++++++++++++++++++++-
> kernel-shared/zoned.h | 7 ++++---
> 3 files changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/common/device-utils.c b/common/device-utils.c
> index 86942e0c7041..7df7d9ce39d8 100644
> --- a/common/device-utils.c
> +++ b/common/device-utils.c
> @@ -254,16 +254,23 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
>
> if (!zinfo->emulated) {
> if (opflags & PREP_DEVICE_VERBOSE)
> - printf("Resetting device zones %s (%u zones) ...\n",
> - file, zinfo->nr_zones);
> + printf("Resetting device zones %s (%llu zones) ...\n",
> + file, byte_count / zinfo->zone_size);
> /*
> * We cannot ignore zone reset errors for a zoned block
> * device as this could result in the inability to write
> * to non-empty sequential zones of the device.
> */
> - if (btrfs_reset_all_zones(fd, zinfo)) {
> - error("zoned: failed to reset device '%s' zones: %m",
> - file);
> + ret = btrfs_reset_zones(fd, zinfo, byte_count);
> + if (ret) {
> + if (ret == EBUSY) {
> + error("zoned: device '%s' contains an active zone outside of the FS range",
> + file);
> + error("zoned: btrfs needs full control of active zones");
> + } else {
> + error("zoned: failed to reset device '%s' zones: %m",
> + file);
> + }
> goto err;
> }
> }
> diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
> index fb1e1388804e..b4244966ca36 100644
> --- a/kernel-shared/zoned.c
> +++ b/kernel-shared/zoned.c
> @@ -395,16 +395,24 @@ static int report_zones(int fd, const char *file,
> * Discard blocks in the zones of a zoned block device. Process this with zone
> * size granularity so that blocks in conventional zones are discarded using
> * discard_range and blocks in sequential zones are reset though a zone reset.
> + *
> + * We need to ensure that zones outside of the FS is not active, so that
> + * the FS can use all the active zones. Return EBUSY if there is an active
> + * zone.
> */
> -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
> +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count)
> {
> unsigned int i;
> int ret = 0;
>
> ASSERT(zinfo);
> + ASSERT(IS_ALIGNED(byte_count, zinfo->zone_size));
>
> /* Zone size granularity */
> for (i = 0; i < zinfo->nr_zones; i++) {
> + if (byte_count == 0)
> + break;
> +
> if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> ret = device_discard_blocks(fd,
> zinfo->zones[i].start << SECTOR_SHIFT,
> @@ -419,7 +427,20 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
>
> if (ret)
> return ret;
> +
> + byte_count -= zinfo->zone_size;
> }
> + for (; i < zinfo->nr_zones; i++) {
> + const enum blk_zone_cond cond = zinfo->zones[i].cond;
> +
> + if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL)
> + continue;
> + if (cond == BLK_ZONE_COND_IMP_OPEN ||
> + cond == BLK_ZONE_COND_EXP_OPEN ||
> + cond == BLK_ZONE_COND_CLOSED)
> + return EBUSY;
> + }
> +
> return fsync(fd);
> }
>
> diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
> index 6eba86d266bf..2bf24cbba62a 100644
> --- a/kernel-shared/zoned.h
> +++ b/kernel-shared/zoned.h
> @@ -149,7 +149,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
> u64 start, u64 end);
> int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
> u64 offset, u64 length);
> -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
> +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count);
> int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
> size_t len);
> int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices);
> @@ -203,8 +203,9 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
> return 0;
> }
>
> -static inline int btrfs_reset_all_zones(int fd,
> - struct btrfs_zoned_device_info *zinfo)
> +static inline int btrfs_reset_zones(int fd,
> + struct btrfs_zoned_device_info *zinfo,
> + u64 byte_count)
> {
> return -EOPNOTSUPP;
> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 7/8] btrfs-progs: add test for zone resetting
2024-05-14 18:22 ` [PATCH v2 7/8] btrfs-progs: add test " Naohiro Aota
@ 2024-05-14 23:04 ` Qu Wenruo
2024-05-15 16:14 ` Naohiro Aota
0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2024-05-14 23:04 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs
在 2024/5/15 03:52, Naohiro Aota 写道:
> Add test for mkfs.btrfs's zone reset behavior to check if
>
> - it resets all the zones without "-b" option
> - it detects an active zone outside of the FS range
> - it do not reset a zone outside of the range
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> tests/mkfs-tests/032-zoned-reset/test.sh | 62 ++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
> create mode 100755 tests/mkfs-tests/032-zoned-reset/test.sh
>
> diff --git a/tests/mkfs-tests/032-zoned-reset/test.sh b/tests/mkfs-tests/032-zoned-reset/test.sh
> new file mode 100755
> index 000000000000..6a599dd2874f
> --- /dev/null
> +++ b/tests/mkfs-tests/032-zoned-reset/test.sh
> @@ -0,0 +1,62 @@
> +#!/bin/bash
> +# Verify mkfs for zoned devices support block-group-tree feature
> +
> +source "$TEST_TOP/common" || exit
> +
> +setup_root_helper
> +prepare_test_dev
> +
> +nullb="$TEST_TOP/nullb"
> +# Create one 128M device with 4M zones, 32 of them
> +size=128
> +zone=4
> +
> +run_mayfail $SUDO_HELPER "$nullb" setup
> +if [ $? != 0 ]; then
> + _not_run "cannot setup nullb environment for zoned devices"
> +fi
> +
> +# Record any other pre-existing devices in case creation fails
> +run_check $SUDO_HELPER "$nullb" ls
> +
> +# Last line has the name of the device node path
> +out=$(run_check_stdout $SUDO_HELPER "$nullb" create -s "$size" -z "$zone")
> +if [ $? != 0 ]; then
> + _fail "cannot create nullb zoned device $i"
> +fi
> +dev=$(echo "$out" | tail -n 1)
> +name=$(basename "${dev}")
Can we wrap all the zoned devices setup in a common function?
I believe zoned tests would only increase in the future.
> +
> +run_check $SUDO_HELPER "$nullb" ls
> +
> +TEST_DEV="${dev}"
> +last_zone_sector=$(( 4 * 31 * 1024 * 1024 / 512 ))
> +# Write some data to the last zone
> +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=4 seek=$(( 4 * 31 ))
> +# Use single as it's supported on more kernels
> +run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -m single -d single "${dev}"
> +# Check if the lat zone is empty
> +$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
You may want to use `run_check_stdout`, as that would dump the command
and its output into the log for easier debug.
And since the test is relying on external program `blkzone` you may want
to put all those requirement into a zoned specific helper like
`check_zoned_preqreq()`.
Thanks,
Qu
> +if [ $? != 0 ]; then
> + _fail "last zone is not empty"
> +fi
> +
> +# Write some data to the last zone
> +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=1 seek=$(( 4 * 31 ))
> +# Create a FS excluding the last zone
> +run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
> +if [ $? == 0 ]; then
> + _fail "mkfs.btrfs should detect active zone outside of FS range"
> +fi
> +
> +# Fill the last zone to finish it
> +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=3 seek=$(( 4 * 31 + 1 ))
> +# Create a FS excluding the last zone
> +run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
> +# Check if the lat zone is not empty
> +$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
> +if [ $? == 0 ]; then
> + _fail "last zone is empty"
> +fi
> +
> +run_check $SUDO_HELPER "$nullb" rm "${name}"
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned
2024-05-14 22:56 ` Qu Wenruo
@ 2024-05-15 15:43 ` Naohiro Aota
0 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-15 15:43 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org
On Wed, May 15, 2024 at 08:26:56AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/5/15 03:52, Naohiro Aota 写道:
> > Creating a btrfs whose size is not aligned to the zone boundary is
> > meaningless and allowing it can confuse users. Disallow creating it.
>
> Can we just round it down and gives a warning?
>
> I'm pretty sure some users are used to just passing some numbers like
> 1000000 to "-b" option.
Sure, that would be nice.
>
> And it may also be a good idea to do the same rounddown for non-zoned fs.
So, round it towards the sector size (4KB)? In fact, we already do it when
we add a device to the FS.
https://github.com/kdave/btrfs-progs/blob/master/mkfs/common.c#L431
https://github.com/kdave/btrfs-progs/blob/master/common/device-scan.c#L144
Still, it would be nice to round it first and do the size checks. I'll
implement it.
>
> Thanks,
> Qu
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> > mkfs/main.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/mkfs/main.c b/mkfs/main.c
> > index a437ecc40c7f..faf397848cc4 100644
> > --- a/mkfs/main.c
> > +++ b/mkfs/main.c
> > @@ -1655,6 +1655,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
> > opt_zoned ? "zoned mode " : "", min_dev_size);
> > goto error;
> > }
> > + if (byte_count && opt_zoned && !IS_ALIGNED(byte_count, zone_size(file))) {
> > + error("size %llu is not aligned to zone size %llu", byte_count,
> > + zone_size(file));
> > + goto error;
> > + }
> >
> > for (i = saved_optind; i < saved_optind + device_count; i++) {
> > char *path;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting
2024-05-14 22:59 ` Qu Wenruo
@ 2024-05-15 16:11 ` Naohiro Aota
2024-05-15 21:47 ` Qu Wenruo
0 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2024-05-15 16:11 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org
On Wed, May 15, 2024 at 08:29:55AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/5/15 03:52, Naohiro Aota 写道:
> > Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
> > Limit the reset target within the specified length.
> >
> > Also, we need to check that there is no active zone outside of the FS
> > range. If there is one, btrfs fails to meet the active zone limit properly.
>
> Mind to explain more on why an active zone *outside* of the fs range is
> a problem?
>
> It's pretty instinctive to consider such active zones out of the fs
> range as non-exist, thus should not cause much problem (until we want to
> expand the fs etc).
>
> This should just acts like the data beyond fs range in traditional
> devices, and we never really bothered them.
A zoned device may have an upper limit on the number of active zones, so
you cannot write into zones beyond that limit at the same time.
https://zonedstorage.io/docs/introduction/zns#zone-resources-limits
So, if we have an active zone outside the FS, btrfs cannot utilize all the
active zones for it. In the worst case, if you have an active zone limit =
8 and 5 zones are already used outside the FS, we cannot maintain the
minimum necessary 4 active zones: superblock, data, metadata, and system
block group.
Technically, we can scan all the device zones to count active zones and try
to live with the rest. But, I don't see a clear use case for that.
However ... I just noticed we do it so because the current mount code never
checks the btrfs_device->total_bytes. The minumum active zone requirement
check is broken for the "-b" case, though.
I believe mandating no active zones outside the FS both at mkfs and mount
time is a clean way to go unless there is a request with a good reason.
> Thanks,
> Qu
>
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> > common/device-utils.c | 17 ++++++++++++-----
> > kernel-shared/zoned.c | 23 ++++++++++++++++++++++-
> > kernel-shared/zoned.h | 7 ++++---
> > 3 files changed, 38 insertions(+), 9 deletions(-)
> >
> > diff --git a/common/device-utils.c b/common/device-utils.c
> > index 86942e0c7041..7df7d9ce39d8 100644
> > --- a/common/device-utils.c
> > +++ b/common/device-utils.c
> > @@ -254,16 +254,23 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
> >
> > if (!zinfo->emulated) {
> > if (opflags & PREP_DEVICE_VERBOSE)
> > - printf("Resetting device zones %s (%u zones) ...\n",
> > - file, zinfo->nr_zones);
> > + printf("Resetting device zones %s (%llu zones) ...\n",
> > + file, byte_count / zinfo->zone_size);
> > /*
> > * We cannot ignore zone reset errors for a zoned block
> > * device as this could result in the inability to write
> > * to non-empty sequential zones of the device.
> > */
> > - if (btrfs_reset_all_zones(fd, zinfo)) {
> > - error("zoned: failed to reset device '%s' zones: %m",
> > - file);
> > + ret = btrfs_reset_zones(fd, zinfo, byte_count);
> > + if (ret) {
> > + if (ret == EBUSY) {
> > + error("zoned: device '%s' contains an active zone outside of the FS range",
> > + file);
> > + error("zoned: btrfs needs full control of active zones");
> > + } else {
> > + error("zoned: failed to reset device '%s' zones: %m",
> > + file);
> > + }
> > goto err;
> > }
> > }
> > diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
> > index fb1e1388804e..b4244966ca36 100644
> > --- a/kernel-shared/zoned.c
> > +++ b/kernel-shared/zoned.c
> > @@ -395,16 +395,24 @@ static int report_zones(int fd, const char *file,
> > * Discard blocks in the zones of a zoned block device. Process this with zone
> > * size granularity so that blocks in conventional zones are discarded using
> > * discard_range and blocks in sequential zones are reset though a zone reset.
> > + *
> > + * We need to ensure that zones outside of the FS is not active, so that
> > + * the FS can use all the active zones. Return EBUSY if there is an active
> > + * zone.
> > */
> > -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
> > +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count)
> > {
> > unsigned int i;
> > int ret = 0;
> >
> > ASSERT(zinfo);
> > + ASSERT(IS_ALIGNED(byte_count, zinfo->zone_size));
> >
> > /* Zone size granularity */
> > for (i = 0; i < zinfo->nr_zones; i++) {
> > + if (byte_count == 0)
> > + break;
> > +
> > if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > ret = device_discard_blocks(fd,
> > zinfo->zones[i].start << SECTOR_SHIFT,
> > @@ -419,7 +427,20 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
> >
> > if (ret)
> > return ret;
> > +
> > + byte_count -= zinfo->zone_size;
> > }
> > + for (; i < zinfo->nr_zones; i++) {
> > + const enum blk_zone_cond cond = zinfo->zones[i].cond;
> > +
> > + if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL)
> > + continue;
> > + if (cond == BLK_ZONE_COND_IMP_OPEN ||
> > + cond == BLK_ZONE_COND_EXP_OPEN ||
> > + cond == BLK_ZONE_COND_CLOSED)
> > + return EBUSY;
> > + }
> > +
> > return fsync(fd);
> > }
> >
> > diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
> > index 6eba86d266bf..2bf24cbba62a 100644
> > --- a/kernel-shared/zoned.h
> > +++ b/kernel-shared/zoned.h
> > @@ -149,7 +149,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
> > u64 start, u64 end);
> > int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
> > u64 offset, u64 length);
> > -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
> > +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count);
> > int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
> > size_t len);
> > int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices);
> > @@ -203,8 +203,9 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
> > return 0;
> > }
> >
> > -static inline int btrfs_reset_all_zones(int fd,
> > - struct btrfs_zoned_device_info *zinfo)
> > +static inline int btrfs_reset_zones(int fd,
> > + struct btrfs_zoned_device_info *zinfo,
> > + u64 byte_count)
> > {
> > return -EOPNOTSUPP;
> > }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 7/8] btrfs-progs: add test for zone resetting
2024-05-14 23:04 ` Qu Wenruo
@ 2024-05-15 16:14 ` Naohiro Aota
0 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-15 16:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org
On Wed, May 15, 2024 at 08:34:57AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/5/15 03:52, Naohiro Aota 写道:
> > Add test for mkfs.btrfs's zone reset behavior to check if
> >
> > - it resets all the zones without "-b" option
> > - it detects an active zone outside of the FS range
> > - it do not reset a zone outside of the range
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> > tests/mkfs-tests/032-zoned-reset/test.sh | 62 ++++++++++++++++++++++++
> > 1 file changed, 62 insertions(+)
> > create mode 100755 tests/mkfs-tests/032-zoned-reset/test.sh
> >
> > diff --git a/tests/mkfs-tests/032-zoned-reset/test.sh b/tests/mkfs-tests/032-zoned-reset/test.sh
> > new file mode 100755
> > index 000000000000..6a599dd2874f
> > --- /dev/null
> > +++ b/tests/mkfs-tests/032-zoned-reset/test.sh
> > @@ -0,0 +1,62 @@
> > +#!/bin/bash
> > +# Verify mkfs for zoned devices support block-group-tree feature
> > +
> > +source "$TEST_TOP/common" || exit
> > +
> > +setup_root_helper
> > +prepare_test_dev
> > +
> > +nullb="$TEST_TOP/nullb"
> > +# Create one 128M device with 4M zones, 32 of them
> > +size=128
> > +zone=4
> > +
> > +run_mayfail $SUDO_HELPER "$nullb" setup
> > +if [ $? != 0 ]; then
> > + _not_run "cannot setup nullb environment for zoned devices"
> > +fi
> > +
> > +# Record any other pre-existing devices in case creation fails
> > +run_check $SUDO_HELPER "$nullb" ls
> > +
> > +# Last line has the name of the device node path
> > +out=$(run_check_stdout $SUDO_HELPER "$nullb" create -s "$size" -z "$zone")
> > +if [ $? != 0 ]; then
> > + _fail "cannot create nullb zoned device $i"
> > +fi
> > +dev=$(echo "$out" | tail -n 1)
> > +name=$(basename "${dev}")
>
> Can we wrap all the zoned devices setup in a common function?
>
> I believe zoned tests would only increase in the future.
Sounds good. Then, we can migrate 030-zoned-rst to use it too as there is
no reason using a loop device there.
>
> > +
> > +run_check $SUDO_HELPER "$nullb" ls
> > +
> > +TEST_DEV="${dev}"
> > +last_zone_sector=$(( 4 * 31 * 1024 * 1024 / 512 ))
> > +# Write some data to the last zone
> > +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=4 seek=$(( 4 * 31 ))
> > +# Use single as it's supported on more kernels
> > +run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -m single -d single "${dev}"
> > +# Check if the lat zone is empty
> > +$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
>
> You may want to use `run_check_stdout`, as that would dump the command
> and its output into the log for easier debug.
>
> And since the test is relying on external program `blkzone` you may want
> to put all those requirement into a zoned specific helper like
> `check_zoned_preqreq()`.
Will do. Thank you.
> Thanks,
> Qu
>
> > +if [ $? != 0 ]; then
> > + _fail "last zone is not empty"
> > +fi
> > +
> > +# Write some data to the last zone
> > +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=1 seek=$(( 4 * 31 ))
> > +# Create a FS excluding the last zone
> > +run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
> > +if [ $? == 0 ]; then
> > + _fail "mkfs.btrfs should detect active zone outside of FS range"
> > +fi
> > +
> > +# Fill the last zone to finish it
> > +run_check $SUDO_HELPER dd if=/dev/urandom of="${dev}" bs=1M count=3 seek=$(( 4 * 31 + 1 ))
> > +# Create a FS excluding the last zone
> > +run_mayfail $SUDO_HELPER "$TOP/mkfs.btrfs" -f -b $(( 4 * 31 ))M -m single -d single "${dev}"
> > +# Check if the lat zone is not empty
> > +$SUDO_HELPER blkzone report -o ${last_zone_sector} -c 1 "${dev}" | grep -Fq '(em)'
> > +if [ $? == 0 ]; then
> > + _fail "last zone is empty"
> > +fi
> > +
> > +run_check $SUDO_HELPER "$nullb" rm "${name}"
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode
2024-05-14 22:54 ` Qu Wenruo
@ 2024-05-15 16:25 ` Naohiro Aota
0 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2024-05-15 16:25 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org
On Wed, May 15, 2024 at 08:24:46AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/5/15 03:52, Naohiro Aota 写道:
> > Currently, we check if a device is larger than 5 zones to determine we can
> > create btrfs on the device or not. Actually, we need more zones to create
> > DUP block groups, so it fails with "ERROR: not enough free space to
> > allocate chunk". Implement proper support for non-SINGLE profile.
> >
> > Also, current code does not ensure we can create tree-log BG and data
> > relocation BG, which are essential for the real usage. Count them as
> > requirement too.
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> > mkfs/common.c | 53 +++++++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 45 insertions(+), 8 deletions(-)
> >
> > diff --git a/mkfs/common.c b/mkfs/common.c
> > index af54089654a0..a5100b296f65 100644
> > --- a/mkfs/common.c
> > +++ b/mkfs/common.c
> > @@ -818,14 +818,51 @@ u64 btrfs_min_dev_size(u32 nodesize, bool mixed, u64 zone_size, u64 meta_profile
> > u64 meta_size;
> > u64 data_size;
> >
> > - /*
> > - * 2 zones for the primary superblock
> > - * 1 zone for the system block group
> > - * 1 zone for a metadata block group
> > - * 1 zone for a data block group
> > - */
> > - if (zone_size)
> > - return 5 * zone_size;
> > + if (zone_size) {
> > + /* 2 zones for the primary superblock. */
> > + reserved += 2 * zone_size;
> > +
> > + /*
> > + * 1 zone each for the initial system, metadata, and data block
> > + * group
> > + */
> > + reserved += 3 * zone_size;
> > +
> > + /*
> > + * non-SINGLE profile needs:
> > + * 1 zone for system block group
> > + * 1 zone for normal metadata block group
> > + * 1 zone for tree-log block group
> > + *
> > + * SINGLE profile only need to add tree-log block group
>
> This comments looks a little confusing to me.
>
> As (for now) the non-SINGLE profiles for metadata is only DUP, thus they
> needs at least 2 zones for each bg.
RAID is also supported in an experimetanl build ;-)
> It's only explained later in the "meta_size *= 2;" line.
>
> Would the following ones be a little better?
>
> /*
> * non-SINGLE profile needs:
> * 1 extra system block group
> * 1 extra normal metadata block group
> * 1 extra tree-log block group
> *
> * SINGLE profiles needs:
> * 1 extra tree-log block group
> */
> if (meta_profiles & BTRFS_BLOCK_GROUP_DUP)
> factor = 2;
> if (meta_profiles & BTRFS_BLOCK_GROUP_PROFILE_MASK)
> meta_size = 3 * zone_size * factor;
> else
> meta_size = 1 * zone_size * factor;
>
> Otherwise looks reasonable to me.
I followed the regular case code, but this looks cleaner to me. I'll follow
your suggestion and tweak the comment as well.
> Thanks,
> Qu
> > + */
> > + if (meta_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
> > + meta_size = 3 * zone_size;
> > + else
> > + meta_size = zone_size;
> > + /* DUP profile needs two zones for each block group. */
> > + if (meta_profile & BTRFS_BLOCK_GROUP_DUP)
> > + meta_size *= 2;
> > + reserved += meta_size;
> > +
> > + /*
> > + * non-SINGLE profile needs:
> > + * 1 zone for data block group
> > + * 1 zone for data relocation block group
> > + *
> > + * SINGLE profile only need to add data relocationblock group
> > + */
> > + if (data_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK)
> > + data_size = 2 * zone_size;
> > + else
> > + data_size = zone_size;
> > + /* DUP profile needs two zones for each block group. */
> > + if (data_profile & BTRFS_BLOCK_GROUP_DUP)
> > + data_size *= 2;
> > + reserved += data_size;
> > +
> > + return reserved;
> > + }
> >
> > if (mixed)
> > return 2 * (BTRFS_MKFS_SYSTEM_GROUP_SIZE +
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting
2024-05-15 16:11 ` Naohiro Aota
@ 2024-05-15 21:47 ` Qu Wenruo
0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2024-05-15 21:47 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs@vger.kernel.org
在 2024/5/16 01:41, Naohiro Aota 写道:
> On Wed, May 15, 2024 at 08:29:55AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/5/15 03:52, Naohiro Aota 写道:
>>> Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
>>> Limit the reset target within the specified length.
>>>
>>> Also, we need to check that there is no active zone outside of the FS
>>> range. If there is one, btrfs fails to meet the active zone limit properly.
>>
>> Mind to explain more on why an active zone *outside* of the fs range is
>> a problem?
>>
>> It's pretty instinctive to consider such active zones out of the fs
>> range as non-exist, thus should not cause much problem (until we want to
>> expand the fs etc).
>>
>> This should just acts like the data beyond fs range in traditional
>> devices, and we never really bothered them.
>
> A zoned device may have an upper limit on the number of active zones, so
> you cannot write into zones beyond that limit at the same time.
>
> https://zonedstorage.io/docs/introduction/zns#zone-resources-limits
Oh, I forgot the active zones limits.
>
> So, if we have an active zone outside the FS, btrfs cannot utilize all the
> active zones for it. In the worst case, if you have an active zone limit =
> 8 and 5 zones are already used outside the FS, we cannot maintain the
> minimum necessary 4 active zones: superblock, data, metadata, and system
> block group.
>
> Technically, we can scan all the device zones to count active zones and try
> to live with the rest. But, I don't see a clear use case for that.
>
> However ... I just noticed we do it so because the current mount code never
> checks the btrfs_device->total_bytes. The minumum active zone requirement
> check is broken for the "-b" case, though.
A new series for kernel would be great.
>
> I believe mandating no active zones outside the FS both at mkfs and mount
> time is a clean way to go unless there is a request with a good reason.
Yeah, this sounds very reasonable now to require no active zones.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
>
>> Thanks,
>> Qu
>>
>>>
>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>> ---
>>> common/device-utils.c | 17 ++++++++++++-----
>>> kernel-shared/zoned.c | 23 ++++++++++++++++++++++-
>>> kernel-shared/zoned.h | 7 ++++---
>>> 3 files changed, 38 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/common/device-utils.c b/common/device-utils.c
>>> index 86942e0c7041..7df7d9ce39d8 100644
>>> --- a/common/device-utils.c
>>> +++ b/common/device-utils.c
>>> @@ -254,16 +254,23 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
>>>
>>> if (!zinfo->emulated) {
>>> if (opflags & PREP_DEVICE_VERBOSE)
>>> - printf("Resetting device zones %s (%u zones) ...\n",
>>> - file, zinfo->nr_zones);
>>> + printf("Resetting device zones %s (%llu zones) ...\n",
>>> + file, byte_count / zinfo->zone_size);
>>> /*
>>> * We cannot ignore zone reset errors for a zoned block
>>> * device as this could result in the inability to write
>>> * to non-empty sequential zones of the device.
>>> */
>>> - if (btrfs_reset_all_zones(fd, zinfo)) {
>>> - error("zoned: failed to reset device '%s' zones: %m",
>>> - file);
>>> + ret = btrfs_reset_zones(fd, zinfo, byte_count);
>>> + if (ret) {
>>> + if (ret == EBUSY) {
>>> + error("zoned: device '%s' contains an active zone outside of the FS range",
>>> + file);
>>> + error("zoned: btrfs needs full control of active zones");
>>> + } else {
>>> + error("zoned: failed to reset device '%s' zones: %m",
>>> + file);
>>> + }
>>> goto err;
>>> }
>>> }
>>> diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
>>> index fb1e1388804e..b4244966ca36 100644
>>> --- a/kernel-shared/zoned.c
>>> +++ b/kernel-shared/zoned.c
>>> @@ -395,16 +395,24 @@ static int report_zones(int fd, const char *file,
>>> * Discard blocks in the zones of a zoned block device. Process this with zone
>>> * size granularity so that blocks in conventional zones are discarded using
>>> * discard_range and blocks in sequential zones are reset though a zone reset.
>>> + *
>>> + * We need to ensure that zones outside of the FS is not active, so that
>>> + * the FS can use all the active zones. Return EBUSY if there is an active
>>> + * zone.
>>> */
>>> -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
>>> +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count)
>>> {
>>> unsigned int i;
>>> int ret = 0;
>>>
>>> ASSERT(zinfo);
>>> + ASSERT(IS_ALIGNED(byte_count, zinfo->zone_size));
>>>
>>> /* Zone size granularity */
>>> for (i = 0; i < zinfo->nr_zones; i++) {
>>> + if (byte_count == 0)
>>> + break;
>>> +
>>> if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
>>> ret = device_discard_blocks(fd,
>>> zinfo->zones[i].start << SECTOR_SHIFT,
>>> @@ -419,7 +427,20 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
>>>
>>> if (ret)
>>> return ret;
>>> +
>>> + byte_count -= zinfo->zone_size;
>>> }
>>> + for (; i < zinfo->nr_zones; i++) {
>>> + const enum blk_zone_cond cond = zinfo->zones[i].cond;
>>> +
>>> + if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL)
>>> + continue;
>>> + if (cond == BLK_ZONE_COND_IMP_OPEN ||
>>> + cond == BLK_ZONE_COND_EXP_OPEN ||
>>> + cond == BLK_ZONE_COND_CLOSED)
>>> + return EBUSY;
>>> + }
>>> +
>>> return fsync(fd);
>>> }
>>>
>>> diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
>>> index 6eba86d266bf..2bf24cbba62a 100644
>>> --- a/kernel-shared/zoned.h
>>> +++ b/kernel-shared/zoned.h
>>> @@ -149,7 +149,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
>>> u64 start, u64 end);
>>> int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
>>> u64 offset, u64 length);
>>> -int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
>>> +int btrfs_reset_zones(int fd, struct btrfs_zoned_device_info *zinfo, u64 byte_count);
>>> int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
>>> size_t len);
>>> int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices);
>>> @@ -203,8 +203,9 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
>>> return 0;
>>> }
>>>
>>> -static inline int btrfs_reset_all_zones(int fd,
>>> - struct btrfs_zoned_device_info *zinfo)
>>> +static inline int btrfs_reset_zones(int fd,
>>> + struct btrfs_zoned_device_info *zinfo,
>>> + u64 byte_count)
>>> {
>>> return -EOPNOTSUPP;
>>> }
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-15 21:47 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-14 18:22 [PATCH v2 0/8] btrfs-progs: zoned: proper "mkfs.btrfs -b" support Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 1/8] btrfs-progs: rename block_count to byte_count Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 2/8] btrfs-progs: mkfs: remove duplicated device size check Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 3/8] btrfs-progs: mkfs: unify zoned mode minimum size calc into btrfs_min_dev_size() Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 4/8] btrfs-progs: mkfs: fix minimum size calculation for zoned mode Naohiro Aota
2024-05-14 22:54 ` Qu Wenruo
2024-05-15 16:25 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 5/8] btrfs-progs: mkfs: check if byte_count is zone size aligned Naohiro Aota
2024-05-14 22:56 ` Qu Wenruo
2024-05-15 15:43 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 6/8] btrfs-progs: support byte length for zone resetting Naohiro Aota
2024-05-14 22:59 ` Qu Wenruo
2024-05-15 16:11 ` Naohiro Aota
2024-05-15 21:47 ` Qu Wenruo
2024-05-14 18:22 ` [PATCH v2 7/8] btrfs-progs: add test " Naohiro Aota
2024-05-14 23:04 ` Qu Wenruo
2024-05-15 16:14 ` Naohiro Aota
2024-05-14 18:22 ` [PATCH v2 8/8] btrfs-progs: test: use smaller emulated zone size Naohiro Aota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox