public inbox for fio@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size
@ 2026-03-02  2:26 Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 1/8] zbd: fix zone selection of random writes Shin'ichiro Kawasaki
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

When random write workload runs with zonemode=zbd and block size
unaligned to the zone size or the initial write pointer position, three
problems are observed. The first one is write target zone selection.
When one zone is filled by the write workload, the same zone is selected
as the next write target. This results in writes concentrating on
certain zones despite the workload specifies random write.

The second problem is wrong write target zone accounting. When a job
selects a zone for next write, another job might have removed the zone
from the write target zone under certain conditions.

The third problem is write performance. The writes with unaligned block
size leaves small remainder areas at the end of write target zones. To
free up the zone resource, current fio does zone finish operations to
the zones with the small remainder. Fio also calls io_u_quiesce() to
prepare for the zone finish operation and the write target zone
switching. These zone finish operation and io_u_quiesce() calls
significantly degrade the random write performance.

This series address these problems. The first two patches address the
first two problems respectively. The third patch introduces a new option
to address the performance problem. The last five patches adjust the
documentation and the test set for the new option introduced by the
third patch.

Changes from v2:
- 2nd patch: improved the commit message to explain which workloads
             remove zones from write target zones array
- 4th patch: reflected review comments
- Added Reviewed-by tags
- Link to v2: https://lore.kernel.org/fio/20260216075936.3318729-1-shinichiro.kawasaki@wdc.com/

Changes from v1:
- Per discussion with Vincent, keep the current zone finish operation as
  default and introduced the new option "write_zone_remainder".
- Dropped patches to remove codes for zone finish operation
- Moved "fix write zone accounting" patch from 4th to 2nd in the series
- Rebased to the latest master branch tip
- Link to v1: https://lore.kernel.org/fio/20260109023603.2848421-1-shinichiro.kawasaki@wdc.com/


Shin'ichiro Kawasaki (8):
  zbd: fix zone selection of random writes
  zbd: fix write zone accounting
  zbd: introduce write_zone_remainder option
  doc: explain the option write_zone_remainder
  t/zbd: add -m option to enable write_zone_remainder option
  t/zbd: avoid test case 14 failure with write_zone_remainder option
  t/zbd: avoid test case 33 failure with write_zone_remainder option
  t/zbd: avoid test case 71 failure with write_zone_remainder option

 HOWTO.rst                     | 26 ++++++++++-
 cconv.c                       |  2 +
 fio.1                         | 25 ++++++++--
 init.c                        | 13 ++++++
 options.c                     | 10 ++++
 server.h                      |  2 +-
 t/zbd/run-tests-against-nullb |  6 +++
 t/zbd/test-zbd-support        | 45 ++++++++++++++++--
 thread_options.h              |  2 +
 zbd.c                         | 87 ++++++++++++++++++++++++++---------
 10 files changed, 185 insertions(+), 33 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/8] zbd: fix zone selection of random writes
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 2/8] zbd: fix write zone accounting Shin'ichiro Kawasaki
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

In zonemode=zbd, random write workloads targeting zoned block devices
with max write zones limits such as max_open_zones can not do write
operations to randomly chosen offset because of the zoned block device
constraint of writing at write pointers. To adjust the offsets to valid
positions, fio calls the function zbd_convert_to_write_zone(). This
function checks the current write target zones as the next offset
candidates but may fail depending on the conditions of those zones.
In such cases, the function waits for zone condition changes before
retrying.

However, the retry logic begins with the zone where the previous attempt
ended, and selects the zones that were previously write target.
Consequently, the same zones are repeatedly chosen for writing,
resulting in writes concentrating on certain zones despite the workload
specifying random write.

To ensure proper zone selection for random writes, modify
zbd_convert_to_write_zone() to retry the zone selection based on the
original offset provided to the function. The local variable 'zb' keeps
the reference to the zone corresponding to the original offset. Use 'zb'
at the retry attempt start.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/zbd.c b/zbd.c
index 7a66b665..b71f842c 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1643,6 +1643,19 @@ choose_other_zone:
 	}
 
 retry:
+	/*
+	 * For random writes, retry from the zone chosen at the beginning using
+	 * the initial io_u random offset.
+	 */
+	if (td_random(td)) {
+		zone_unlock(z);
+		zone_lock(td, f, zb);
+		if (zbd_write_zone_get(td, f, zb))
+			return zb;
+		z = zb;
+		zone_idx = zbd_zone_idx(f, z);
+	}
+
 	/* Zone 'z' is full, so try to choose a new zone. */
 	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/8] zbd: fix write zone accounting
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 1/8] zbd: fix zone selection of random writes Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  3:41   ` Damien Le Moal
  2026-03-02  2:26 ` [PATCH v3 3/8] zbd: introduce write_zone_remainder option Shin'ichiro Kawasaki
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

Currently, zbd_convert_to_write_zones() calls io_u_quiesce() when the
number of write target zones hits one of the limits of write zones. This
wait by io_u_quiesce() significantly degrade the performance. While I
tried to remove the io_u_quiesce(), I observed that the test case 58 of
t/zbd/test-zbd-support failed with null_blk devices that have a
max_active_zones limit set.

The failure cause is an incorrect write target zone accounting in
zbd_convert_to_write_zones(). This function checks the current write
target zones, and selects one of them as the next write target zone.
After the zone selection, it locks the zone. However, when the zone is
locked, another job such as a trim workload or a write workload with the
zone_reset_threshold option might have already reset the zone and
removed it from the write target zones array. This unexpected zone
removal from the array caused an incorrect zone accounting and the test
case failure.

To avoid the incorrect zone accounting, call zbd_write_zone_get() after
the selected zone gets locked. If the zone is removed from the write
target zones array, the function adds the zone back to the array.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/zbd.c b/zbd.c
index b71f842c..c511b709 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1693,8 +1693,17 @@ retry:
 
 		zone_lock(td, f, z);
 		if (zbd_zone_remainder(z) >= min_bs) {
-			need_zone_finish = false;
-			goto out;
+			/*
+			 * The zone might be already removed from
+			 * zbdi->write_zones[] by other jobs at this moment.
+			 * Even if the zone has remainder, call
+			 * zbd_write_zone_get() to ensure that it is in the
+			 * array.
+			 */
+			if (zbd_write_zone_get(td, f, z)) {
+				need_zone_finish = false;
+				goto out;
+			}
 		}
 		pthread_mutex_lock(&zbdi->mutex);
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/8] zbd: introduce write_zone_remainder option
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 1/8] zbd: fix zone selection of random writes Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 2/8] zbd: fix write zone accounting Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  4:36   ` Damien Le Moal
  2026-03-02  2:26 ` [PATCH v3 4/8] doc: explain the option write_zone_remainder Shin'ichiro Kawasaki
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

When the specified block size is not aligned with the zone size or the
write pointer positions at workload start, write workloads create
unwritten remainder areas at the ends of zones. These remainder areas
leave zones in the open condition. This disrupts the intended write
target zone selection.

Previous commits e1a1b59b0b9b ("zbd: finish zones with remainder smaller
than minimum write block size") and e2e29bf6f830 ("zbd: finish zone when
all random write target zones have small remainder") attempted to solve
this problem by issuing zone finish operation for zones with small
remainders. However, this approach caused performance degradation due to
two reasons. First, the zone finish operation requires substantial
execution time. Second, zone finish operation requires to wait for in-
flight writes from other jobs to complete, which is done by calling
io_u_quiesce() before the zone finish operation.

To avoid the performance degradation, introduce the new option named
"write_zone_remainder". When the option is specified, issue writes to
the remainder areas instead of issuing zone finish operation. The write
operation makes the zones in the full condition in the same manner as
the zone finish operation, freeing up the zone resource of the device
and enabling writing to other zones. Also when the option is set, skip
the io_u_quiesce() which was required before the zone finish operation.
The performance benefit by eliminating the waits on in-flight writes are
particularly significant in asynchronous I/O workloads, where the write
operations to the remainder areas are managed as part of queued I/Os.

The drawback of this approach is that writing these remainders requires
write sizes smaller than the minimum block size. As a result, when using
the write_zone_remainder option, the random map feature must be disabled
using the norandommap=1 option, which is automatically done when the
option is specified.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 cconv.c          |  2 ++
 init.c           | 13 +++++++++++
 options.c        | 10 ++++++++
 server.h         |  2 +-
 thread_options.h |  2 ++
 zbd.c            | 61 +++++++++++++++++++++++++++++++-----------------
 6 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/cconv.c b/cconv.c
index 9f82c724..56cf6dbe 100644
--- a/cconv.c
+++ b/cconv.c
@@ -275,6 +275,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->max_open_zones = __le32_to_cpu(top->max_open_zones);
 	o->ignore_zone_limits = le32_to_cpu(top->ignore_zone_limits);
 	o->recover_zbd_write_error = le32_to_cpu(top->recover_zbd_write_error);
+	o->write_zone_remainder = le32_to_cpu(top->write_zone_remainder);
 	o->lockmem = le64_to_cpu(top->lockmem);
 	o->offset_increment_percent = le32_to_cpu(top->offset_increment_percent);
 	o->offset_increment = le64_to_cpu(top->offset_increment);
@@ -656,6 +657,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->max_open_zones = __cpu_to_le32(o->max_open_zones);
 	top->ignore_zone_limits = cpu_to_le32(o->ignore_zone_limits);
 	top->recover_zbd_write_error = cpu_to_le32(o->recover_zbd_write_error);
+	top->write_zone_remainder = cpu_to_le32(o->write_zone_remainder);
 	top->lockmem = __cpu_to_le64(o->lockmem);
 	top->ddir_seq_add = __cpu_to_le64(o->ddir_seq_add);
 	top->file_size_low = __cpu_to_le64(o->file_size_low);
diff --git a/init.c b/init.c
index 130158cb..5cfbdd75 100644
--- a/init.c
+++ b/init.c
@@ -665,6 +665,19 @@ static int fixup_options(struct thread_data *td)
 		ret |= 1;
 	}
 
+	if (o->zone_mode == ZONE_MODE_ZBD && o->write_zone_remainder) {
+		if (fio_option_is_set(o, norandommap)) {
+			if (o->norandommap == 0) {
+				log_err("fio: write_zone_remainder=1 requires norandommap=1\n");
+				ret |= 1;
+			}
+			/* if == 1, OK */
+		} else {
+			dprint(FD_ZBD, "fio: override norandommap=1 for write_zone_remainder=1\n");
+			o->norandommap = 1;
+		}
+	}
+
 	if (o->zone_mode == ZONE_MODE_STRIDED && !o->zone_size) {
 		log_err("fio: --zonesize must be specified when using --zonemode=strided.\n");
 		ret |= 1;
diff --git a/options.c b/options.c
index f592bc24..61d405f0 100644
--- a/options.c
+++ b/options.c
@@ -3939,6 +3939,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_ZONE,
 	},
+	{
+		.name	= "write_zone_remainder",
+		.lname	= "Fill remainders of zones by write instead of zone finish operion",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, write_zone_remainder),
+		.def	= 0,
+		.help	= "When block size is unaligned, zones have small remainder write areas at ends. Fill them by write instead of zone finish operations for better performance.",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_ZONE,
+	},
 	{
 		.name   = "fdp",
 		.lname  = "Flexible data placement",
diff --git a/server.h b/server.h
index e0a921b8..589e8bea 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 118,
+	FIO_SERVER_VER			= 119,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 3e66d477..506f1233 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -399,6 +399,7 @@ struct thread_options {
 	unsigned int job_max_open_zones;
 	unsigned int ignore_zone_limits;
 	unsigned int recover_zbd_write_error;
+	unsigned int write_zone_remainder;
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
 
@@ -728,6 +729,7 @@ struct thread_options_pack {
 	int32_t max_open_zones;
 	uint32_t ignore_zone_limits;
 	uint32_t recover_zbd_write_error;
+	uint32_t write_zone_remainder;
 
 	uint32_t log_entries;
 	uint32_t log_prio;
diff --git a/zbd.c b/zbd.c
index c511b709..3d51478b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -86,18 +86,21 @@ static inline uint64_t zbd_zone_remainder(struct fio_zone_info *z)
 
 /**
  * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
- * @f: file pointer.
+ * @td: FIO thread data
  * @z: zone info pointer.
  * @required: minimum number of bytes that must remain in a zone.
  *
  * The caller must hold z->mutex.
  */
-static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
+static bool zbd_zone_full(const struct thread_data *td, struct fio_zone_info *z,
 			  uint64_t required)
 {
+	if (!z->has_wp)
+		return false;
+	if (td->o.write_zone_remainder)
+		return zbd_zone_remainder(z) == 0;
 	assert((required & 511) == 0);
-
-	return z->has_wp && required > zbd_zone_remainder(z);
+	return required > zbd_zone_remainder(z);
 }
 
 static void zone_lock(struct thread_data *td, const struct fio_file *f,
@@ -629,7 +632,7 @@ static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
 	 * Skip full zones with data verification enabled because resetting a
 	 * zone causes data loss and hence causes verification to fail.
 	 */
-	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+	if (td->o.verify != VERIFY_NONE && zbd_zone_full(td, z, min_bs))
 		return false;
 
 	return __zbd_write_zone_get(td, f, z);
@@ -1513,14 +1516,16 @@ static struct fio_zone_info *zbd_convert_to_write_zone(struct thread_data *td,
 	struct fio_zone_info *z;
 	uint32_t zone_idx, new_zone_idx;
 	int i;
-	bool wait_zone_write;
+	bool wait_zone_write = false;
 	bool in_flight;
 	bool should_retry = true;
 	bool need_zone_finish;
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (zbd_zone_remainder(zb) > 0 && zbd_zone_remainder(zb) < min_bs) {
+	/* If the first selected zone has remainder, finish it */
+	if (!td->o.write_zone_remainder && zbd_zone_remainder(zb) > 0 &&
+	    zbd_zone_remainder(zb) < min_bs) {
 		pthread_mutex_lock(&f->zbd_info->mutex);
 		zbd_write_zone_put(td, f, zb);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
@@ -1619,13 +1624,18 @@ examine_zone:
 	}
 
 choose_other_zone:
-	/* Check if number of write target zones reaches one of limits. */
-	wait_zone_write =
-		zbdi->num_write_zones == f->max_zone - f->min_zone ||
-		(zbdi->max_write_zones &&
-		 zbdi->num_write_zones == zbdi->max_write_zones) ||
-		(td->o.job_max_open_zones &&
-		 td->num_write_zones == td->o.job_max_open_zones);
+	/*
+	 * When zones have small remainder at zone ends, zone finish operations
+	 * may take some time. In this case, check if number of write target
+	 * zones reaches one of limits to wait for the zone finish operations.
+	 */
+	if (!td->o.write_zone_remainder)
+		wait_zone_write =
+			zbdi->num_write_zones == f->max_zone - f->min_zone ||
+			(zbdi->max_write_zones &&
+			 zbdi->num_write_zones == zbdi->max_write_zones) ||
+			(td->o.job_max_open_zones &&
+			 td->num_write_zones == td->o.job_max_open_zones);
 
 	pthread_mutex_unlock(&zbdi->mutex);
 
@@ -1681,8 +1691,10 @@ retry:
 
 	/* Check whether the write fits in any of the write target zones. */
 	pthread_mutex_lock(&zbdi->mutex);
-	need_zone_finish = true;
+	need_zone_finish = !td->o.write_zone_remainder;
 	for (i = 0; i < zbdi->num_write_zones; i++) {
+		uint64_t remainder;
+
 		zone_idx = zbdi->write_zones[i];
 		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
 			continue;
@@ -1692,7 +1704,10 @@ retry:
 		z = zbd_get_zone(f, zone_idx);
 
 		zone_lock(td, f, z);
-		if (zbd_zone_remainder(z) >= min_bs) {
+
+		remainder = zbd_zone_remainder(z);
+		if ((td->o.write_zone_remainder && remainder > 0) ||
+		    (!td->o.write_zone_remainder && remainder >= min_bs)) {
 			/*
 			 * The zone might be already removed from
 			 * zbdi->write_zones[] by other jobs at this moment.
@@ -2230,7 +2245,8 @@ retry:
 			goto eof;
 		}
 
-		if (zbd_zone_remainder(zb) > 0 &&
+		if (!td->o.write_zone_remainder &&
+		    zbd_zone_remainder(zb) > 0 &&
 		    zbd_zone_remainder(zb) < min_bs)
 			goto retry;
 
@@ -2243,7 +2259,7 @@ retry:
 		}
 
 		/* Reset the zone pointer if necessary */
-		if (zb->reset_zone || zbd_zone_full(f, zb, min_bs)) {
+		if (zb->reset_zone || zbd_zone_full(td, zb, min_bs)) {
 			if (td->o.verify != VERIFY_NONE) {
 				/*
 				 * Unset io-u->file to tell get_next_verify()
@@ -2278,7 +2294,7 @@ retry:
 		}
 
 		/* Make writes occur at the write pointer */
-		assert(!zbd_zone_full(f, zb, min_bs));
+		assert(!zbd_zone_full(td, zb, min_bs));
 		io_u->offset = zb->wp;
 		if (!is_valid_offset(f, io_u->offset)) {
 			td_verror(td, EINVAL, "invalid WP value");
@@ -2294,10 +2310,13 @@ retry:
 		 */
 		new_len = min((unsigned long long)io_u->buflen,
 			      zbd_zone_capacity_end(zb) - io_u->offset);
-		new_len = new_len / min_bs * min_bs;
+		if ((td->o.write_zone_remainder && new_len > min_bs) ||
+			!td->o.write_zone_remainder)
+				new_len = new_len / min_bs * min_bs;
+
 		if (new_len == io_u->buflen)
 			goto accept;
-		if (new_len >= min_bs) {
+		if (td->o.write_zone_remainder || new_len >= min_bs) {
 			io_u->buflen = new_len;
 			dprint(FD_IO, "Changed length from %u into %llu\n",
 			       orig_len, io_u->buflen);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/8] doc: explain the option write_zone_remainder
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (2 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 3/8] zbd: introduce write_zone_remainder option Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 5/8] t/zbd: add -m option to enable write_zone_remainder option Shin'ichiro Kawasaki
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

The recent commit introduced the option write_zone_remainder. Explain
how it changes handling of zone end remainders. Also, amend the zbd
zone mode description to explain the default handling of zone end
remainders.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 HOWTO.rst | 26 +++++++++++++++++++++++++-
 fio.1     | 25 ++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/HOWTO.rst b/HOWTO.rst
index d31851e9..d9cac796 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1034,7 +1034,14 @@ Target file/device
 				all zones instead of being restricted to a
 				single zone. The :option:`zoneskip` parameter
 				is ignored. :option:`zonerange` and
-				:option:`zonesize` must be identical.
+				:option:`zonesize` must be identical. If the
+				size of the unwritten space in a zone is not
+				a multiple of the specified block size at
+				workload start, write workloads create unwritten
+				remainder areas at the ends of zones and keep
+				the zones in open conditions. To free up the
+				open zone resources, fio issues zone finish
+				operations to the zones with the remainders.
 				Trim is handled using a zone reset operation.
 				Trim only considers non-empty sequential write
 				required and sequential write preferred zones.
@@ -1167,6 +1174,23 @@ Target file/device
 	asynchronous IO engine and :option:`verify` workload are specified,
 	errors out. Default: false.
 
+.. option:: write_zone_remainders=bool
+
+	If the size of the unwritten space in a zone is not a multiple of the
+	specified block size at workload start, write workloads create unwritten
+	remainder areas at the ends of zones. By default, fio issues zone finish
+	operations on such zones, transitioning them to the full condition and
+	freeing up open zone resources. However, zone finish operations
+	introduces waits for in-flight writes, reducing overall write
+	throughput. If this option is specified, fio writes data to the
+	remainder areas instead of performing zone finish operations. This
+	improves write throughput by avoiding waits for in-flight writes,
+	particularly in asynchronous write workloads. The drawback of this
+	option is that it requires fio to perform writes smaller than the
+	minimum block size. Consequently, the option :option:`norandommap` must
+	be set. If :option:`norandommap` is not set, it is automatically set.
+	Default: false.
+
 I/O type
 ~~~~~~~~
 
diff --git a/fio.1 b/fio.1
index bc3efa5f..f963d4d4 100644
--- a/fio.1
+++ b/fio.1
@@ -809,9 +809,14 @@ starts. The \fBzonecapacity\fR parameter is ignored.
 .B zbd
 Zoned block device mode. I/O happens sequentially in each zone, even if random
 I/O has been selected. Random I/O happens across all zones instead of being
-restricted to a single zone.
-Trim is handled using a zone reset operation. Trim only considers non-empty
-sequential write required and sequential write preferred zones.
+restricted to a single zone. The \fBzoneskip\fR parameter is ignored.
+\fBzonerange\fR and \fBzonesize\fR must be identical. If the size of the
+unwritten space in a zone is not a multiple of the specified block size at
+workload start, write workloads create unwritten remainder areas at the ends of
+zones and keep the zones in open conditions. To free up the open zone resources,
+fio issues zone finish operations to the zones with the remainders. Trim is
+handled using a zone reset operation. Trim only considers non-empty sequential
+write required and sequential write preferred zones.
 .RE
 .RE
 .TP
@@ -931,6 +936,20 @@ fail due to partial writes and unexpected write pointer positions. If
 asynchronous, the write pointer move fills blocks with zero then breaks verify
 data. If an asynchronous IO engine and \fBverify\fR workload are specified,
 errors out. Default: false.
+.TP
+.BI write_zone_remainders \fR=\fPbool
+If the size of the unwritten space in a zone is not a multiple of the specified
+block size at workload start, write workloads create unwritten remainder areas
+at the ends of zones. By default, fio issues zone finish operations on such
+zones, transitioning them to the full condition and freeing up open zone
+resources. However, zone finish operations introduces waits for in-flight
+writes, reducing overall write throughput. If this option is specified, fio
+writes data to the remainder areas instead of performing zone finish operations.
+This improves write throughput by avoiding waits for in-flight writes,
+particularly in asynchronous write workloads. The drawback of this option is
+that it requires fio to perform writes smaller than the minimum block size.
+Consequently, the option \fBnorandommap\fR must be set. If \fBnorandommap\fR is
+not set, it is automatically set. Default: false.
 
 .SS "I/O type"
 .TP
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/8] t/zbd: add -m option to enable write_zone_remainder option
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (3 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 4/8] doc: explain the option write_zone_remainder Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 6/8] t/zbd: avoid test case 14 failure with " Shin'ichiro Kawasaki
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

The previous commit introduced the option write_zone_remainder. To
confirm the option works as expected, introduce the new option -m
to the test scripts test-zbd-support and run-tests-against-nullb.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/run-tests-against-nullb |  6 ++++++
 t/zbd/test-zbd-support        | 30 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/t/zbd/run-tests-against-nullb b/t/zbd/run-tests-against-nullb
index f1cba355..628e41a3 100755
--- a/t/zbd/run-tests-against-nullb
+++ b/t/zbd/run-tests-against-nullb
@@ -23,6 +23,7 @@ usage()
 	echo -e "\t-o <max_open_zones> Specify MaxOpen value, (${set_max_open} by default)."
 	echo -e "\t-n <#number of runs> Set the number of times to run the entire suite "
 	echo -e "\t   or an individual section/test."
+	echo -e "\t-m Pass -m option to test-zbd-support to enable write_zone_remainder."
 	echo -e "\t-q Quit t/zbd/test-zbd-support run after any failed test."
 	echo -e "\t-r Remove the /dev/nullb0 device that may still exist after"
 	echo -e "\t   running this script."
@@ -426,6 +427,7 @@ dev_size=1024
 dev_blocksize=4096
 set_max_open=8
 set_extra_max_active=2
+write_zone_remainder=0
 zbd_test_opts=()
 num_of_runs=1
 test_case=0
@@ -438,6 +440,7 @@ while (($#)); do
 		-L) list_only=1; shift;;
 		-r) cleanup_nullb; exit 0;;
 		-n) num_of_runs="${2}"; shift; shift;;
+		-m) write_zone_remainder=1; shift;;
 		-t) test_case="${2}"; shift; shift;;
 		-q) quit_on_err=1; shift;;
 		-h) usage; break;;
@@ -491,6 +494,9 @@ while ((run_nr <= $num_of_runs)); do
 		if ((quit_on_err)); then
 			zbd_test_opts+=("-q")
 		fi
+		if ((write_zone_remainder)); then
+			zbd_test_opts+=("-m")
+		fi
 		section$section_number
 		configure_nullb
 		rc=$?
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 40f1de90..01fb4dd3 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -13,6 +13,7 @@ usage() {
 	echo -e "\t-l Test with libzbc ioengine"
 	echo -e "\t-r Reset all zones before test start"
 	echo -e "\t-w Reset all zones before executing each write test case"
+	echo -e "\t-m Write zone remainder instead of zone finish operation"
 	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
 	echo -e "\t-t <test #> Run only a single test case with specified number"
 	echo -e "\t-s <test #> Start testing from the case with the specified number"
@@ -202,6 +203,30 @@ has_max_open_zones() {
 	return 1
 }
 
+has_zone_writes() {
+	local has_zonemode_zbd=0
+	local has_write=0
+	local word
+
+	while (($# > 1)); do
+		if [[ ${1} =~ "--zonemode=zbd" ]]; then
+			has_zonemode_zbd=1
+		fi
+		if [[ ${1} =~ "--rw=" ]]; then
+			word=${1}
+			word=${word/--rw=/}
+			if [[ $word =~ w ]]; then
+				has_write=1
+			fi
+		fi
+		shift
+	done
+	if ((has_zonemode_zbd && has_write)); then
+		return 0
+	fi
+	return 1
+}
+
 run_fio() {
     local fio opts
 
@@ -215,6 +240,9 @@ run_fio() {
     if [[ -n ${max_open_zones_opt} ]] && ! has_max_open_zones "${opts[@]}"; then
 	    opts+=("--max_open_zones=${max_open_zones_opt}")
     fi
+    if [[ -n ${write_zone_remainder} ]] && has_zone_writes "${opts[@]}"; then
+	    opts+=("--write_zone_remainder=1")
+    fi
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
     "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
@@ -1877,6 +1905,7 @@ reset_all_zones=
 reset_before_write=
 use_libzbc=
 zbd_debug=
+write_zone_remainder=
 max_open_zones_opt=
 quit_on_err=
 force_io_uring=
@@ -1892,6 +1921,7 @@ while [ "${1#-}" != "$1" ]; do
     -l) use_libzbc=1; shift;;
     -r) reset_all_zones=1; shift;;
     -w) reset_before_write=1; shift;;
+    -m) write_zone_remainder=1; shift;;
     -t) tests+=("$2"); shift; shift;;
     -o) max_open_zones_opt="${2}"; shift; shift;;
     -s) start_test=$2; shift; shift;;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/8] t/zbd: avoid test case 14 failure with write_zone_remainder option
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (4 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 5/8] t/zbd: add -m option to enable write_zone_remainder option Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 7/8] t/zbd: avoid test case 33 " Shin'ichiro Kawasaki
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

When -m option is provided for t/zbd/test-zbd-support, the option
write_zone_remainder is specified to fio. In this case, the test case
14 fails because the random map feature is disabled and then random
writes for conventional zones may have overlap. To avoid the failure,
modify the test case to count the number of overlaps.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 01fb4dd3..e5a5f75b 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -665,7 +665,7 @@ test13() {
 
 # Random write to conventional zones.
 test14() {
-    local off size
+    local off size nr_overlaps
 
     if ! result=($(first_online_zone "$dev")); then
 	echo "Failed to determine first online zone"
@@ -678,10 +678,11 @@ test14() {
 
     run_one_fio_job "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --do_verify=1 \
-		    --verify=md5 --offset=$off --size=$size\
+		    --verify=md5 --offset=$off --size=$size --debug=io \
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
+    nr_overlaps=$(grep "iolog: overlap" --count "${logfile}.${test_number}")
     check_written $((size)) || return $?
-    check_read $((size)) || return $?
+    check_read $((size - nr_overlaps * 16 * 1024)) || return $?
 }
 
 # Sequential read on a mix of empty and full zones.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/8] t/zbd: avoid test case 33 failure with write_zone_remainder option
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (5 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 6/8] t/zbd: avoid test case 14 failure with " Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  2:26 ` [PATCH v3 8/8] t/zbd: avoid test case 71 " Shin'ichiro Kawasaki
  2026-03-02  5:06 ` [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size fiotestbot
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

When -m option is provided for t/zbd/test-zbd-support, the option
write_zone_remainder is specified to fio. In this case, the test case
33 fails because fio does writes to small remainder areas at zone ends
and it changed the number of writes. To avoid the failure, modify the
test condition of the test case.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index e5a5f75b..10fd6690 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1002,10 +1002,13 @@ test33() {
     size=$((2 * zone_size))
     io_size=$((5 * capacity))
     bs=$((3 * capacity / 4))
+    if ((!write_zone_remainder)); then
+	    io_size=$((io_size / bs * bs))
+    fi
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --size=$size --io_size=$io_size --bs=$bs	\
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_written $((io_size / bs * bs)) || return $?
+    check_written $((io_size)) || return $?
 }
 
 # Test repeated async write job with verify using two unaligned block sizes.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 8/8] t/zbd: avoid test case 71 failure with write_zone_remainder option
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (6 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 7/8] t/zbd: avoid test case 33 " Shin'ichiro Kawasaki
@ 2026-03-02  2:26 ` Shin'ichiro Kawasaki
  2026-03-02  5:06 ` [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size fiotestbot
  8 siblings, 0 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-03-02  2:26 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu; +Cc: Damien Le Moal, Shin'ichiro Kawasaki

When -m option is provided for t/zbd/test-zbd-support, the option
write_zone_remainder is specified to fio. In this case, the test case
71 fails because fio does writes to small remainder areas at zone ends
and it changed the number of writes. To avoid the failure, modify the
test condition of the test case.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 10fd6690..466a8d4d 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1768,7 +1768,8 @@ test71() {
 			--max_open_zones=1 --debug=zbd \
 		       >> "${logfile}.${test_number}" 2>&1 || return $?
 
-	check_written $((zone_size * 8)) || return $?
+	check_written $((zone_size * 8)) ||
+		check_written $((zone_size *8 + 4096)) || return $?
 }
 
 set_nullb_badblocks() {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/8] zbd: fix write zone accounting
  2026-03-02  2:26 ` [PATCH v3 2/8] zbd: fix write zone accounting Shin'ichiro Kawasaki
@ 2026-03-02  3:41   ` Damien Le Moal
  2026-03-02  6:49     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2026-03-02  3:41 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, fio, Jens Axboe, Vincent Fu

On 3/2/26 11:26 AM, Shin'ichiro Kawasaki wrote:
> Currently, zbd_convert_to_write_zones() calls io_u_quiesce() when the
> number of write target zones hits one of the limits of write zones. This
> wait by io_u_quiesce() significantly degrade the performance. While I
> tried to remove the io_u_quiesce(), I observed that the test case 58 of
> t/zbd/test-zbd-support failed with null_blk devices that have a
> max_active_zones limit set.
> 
> The failure cause is an incorrect write target zone accounting in
> zbd_convert_to_write_zones(). This function checks the current write
> target zones, and selects one of them as the next write target zone.
> After the zone selection, it locks the zone. However, when the zone is
> locked, another job such as a trim workload or a write workload with the
> zone_reset_threshold option might have already reset the zone and
> removed it from the write target zones array. This unexpected zone
> removal from the array caused an incorrect zone accounting and the test
> case failure.
> 
> To avoid the incorrect zone accounting, call zbd_write_zone_get() after
> the selected zone gets locked. If the zone is removed from the write
> target zones array, the function adds the zone back to the array.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  zbd.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/zbd.c b/zbd.c
> index b71f842c..c511b709 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1693,8 +1693,17 @@ retry:
>  
>  		zone_lock(td, f, z);
>  		if (zbd_zone_remainder(z) >= min_bs) {
> -			need_zone_finish = false;
> -			goto out;
> +			/*
> +			 * The zone might be already removed from
> +			 * zbdi->write_zones[] by other jobs at this moment.
> +			 * Even if the zone has remainder, call
> +			 * zbd_write_zone_get() to ensure that it is in the
> +			 * array.
> +			 */
> +			if (zbd_write_zone_get(td, f, z)) {
> +				need_zone_finish = false;
> +				goto out;
> +			}

Please change this to:

		if (zbd_zone_remainder(z) >= min_bs &&
		    zbd_write_zone_get(td, f, z)) {
			need_zone_finish = false;
			goto out;
		}

And move the comment block above the if. You could also improve the comment to
explain why we look at "zbd_zone_remainder(z) >= min_bs"

With that (and the much better commit message), feel free to add:

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

>  		}
>  		pthread_mutex_lock(&zbdi->mutex);
>  	}


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/8] zbd: introduce write_zone_remainder option
  2026-03-02  2:26 ` [PATCH v3 3/8] zbd: introduce write_zone_remainder option Shin'ichiro Kawasaki
@ 2026-03-02  4:36   ` Damien Le Moal
  2026-03-02  6:51     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2026-03-02  4:36 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, fio, Jens Axboe, Vincent Fu

On 3/2/26 11:26 AM, Shin'ichiro Kawasaki wrote:
> When the specified block size is not aligned with the zone size or the
> write pointer positions at workload start, write workloads create
> unwritten remainder areas at the ends of zones. These remainder areas
> leave zones in the open condition. This disrupts the intended write
> target zone selection.
> 
> Previous commits e1a1b59b0b9b ("zbd: finish zones with remainder smaller
> than minimum write block size") and e2e29bf6f830 ("zbd: finish zone when
> all random write target zones have small remainder") attempted to solve
> this problem by issuing zone finish operation for zones with small
> remainders. However, this approach caused performance degradation due to
> two reasons. First, the zone finish operation requires substantial
> execution time. Second, zone finish operation requires to wait for in-
> flight writes from other jobs to complete, which is done by calling
> io_u_quiesce() before the zone finish operation.
> 
> To avoid the performance degradation, introduce the new option named
> "write_zone_remainder". When the option is specified, issue writes to
> the remainder areas instead of issuing zone finish operation. The write
> operation makes the zones in the full condition in the same manner as
> the zone finish operation, freeing up the zone resource of the device
> and enabling writing to other zones. Also when the option is set, skip
> the io_u_quiesce() which was required before the zone finish operation.
> The performance benefit by eliminating the waits on in-flight writes are
> particularly significant in asynchronous I/O workloads, where the write
> operations to the remainder areas are managed as part of queued I/Os.
> 
> The drawback of this approach is that writing these remainders requires
> write sizes smaller than the minimum block size. As a result, when using
> the write_zone_remainder option, the random map feature must be disabled
> using the norandommap=1 option, which is automatically done when the
> option is specified.
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

[...]

>  enum {
> -	FIO_SERVER_VER			= 118,
> +	FIO_SERVER_VER			= 119,

Note: this change will conflict with my patches proposing the addition of the
end_syncfs option.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size
  2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
                   ` (7 preceding siblings ...)
  2026-03-02  2:26 ` [PATCH v3 8/8] t/zbd: avoid test case 71 " Shin'ichiro Kawasaki
@ 2026-03-02  5:06 ` fiotestbot
  8 siblings, 0 replies; 14+ messages in thread
From: fiotestbot @ 2026-03-02  5:06 UTC (permalink / raw)
  To: fio

[-- Attachment #1: Type: text/plain, Size: 144 bytes --]


The result of fio's continuous integration tests was: success

For more details see https://github.com/fiotestbot/fio/actions/runs/22560369343

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/8] zbd: fix write zone accounting
  2026-03-02  3:41   ` Damien Le Moal
@ 2026-03-02  6:49     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2026-03-02  6:49 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: fio@vger.kernel.org, Jens Axboe, Vincent Fu

On Mar 02, 2026 / 12:41, Damien Le Moal wrote:
[...]
> > diff --git a/zbd.c b/zbd.c
> > index b71f842c..c511b709 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -1693,8 +1693,17 @@ retry:
> >  
> >  		zone_lock(td, f, z);
> >  		if (zbd_zone_remainder(z) >= min_bs) {
> > -			need_zone_finish = false;
> > -			goto out;
> > +			/*
> > +			 * The zone might be already removed from
> > +			 * zbdi->write_zones[] by other jobs at this moment.
> > +			 * Even if the zone has remainder, call
> > +			 * zbd_write_zone_get() to ensure that it is in the
> > +			 * array.
> > +			 */
> > +			if (zbd_write_zone_get(td, f, z)) {
> > +				need_zone_finish = false;
> > +				goto out;
> > +			}
> 
> Please change this to:
> 
> 		if (zbd_zone_remainder(z) >= min_bs &&
> 		    zbd_write_zone_get(td, f, z)) {
> 			need_zone_finish = false;
> 			goto out;
> 		}
> 
> And move the comment block above the if. You could also improve the comment to
> explain why we look at "zbd_zone_remainder(z) >= min_bs"
> 
> With that (and the much better commit message), feel free to add:
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

Thanks, will reflect the comments in v4.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/8] zbd: introduce write_zone_remainder option
  2026-03-02  4:36   ` Damien Le Moal
@ 2026-03-02  6:51     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2026-03-02  6:51 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: fio@vger.kernel.org, Jens Axboe, Vincent Fu

On Mar 02, 2026 / 13:36, Damien Le Moal wrote:

[...]

> >  enum {
> > -	FIO_SERVER_VER			= 118,
> > +	FIO_SERVER_VER			= 119,
> 
> Note: this change will conflict with my patches proposing the addition of the
> end_syncfs option.

I see. Since your end_syncfs patches have just applied, I will increment
FIO_SERVER_VER to 120 in my v4 series.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-02  6:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  2:26 [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 1/8] zbd: fix zone selection of random writes Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 2/8] zbd: fix write zone accounting Shin'ichiro Kawasaki
2026-03-02  3:41   ` Damien Le Moal
2026-03-02  6:49     ` Shinichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 3/8] zbd: introduce write_zone_remainder option Shin'ichiro Kawasaki
2026-03-02  4:36   ` Damien Le Moal
2026-03-02  6:51     ` Shinichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 4/8] doc: explain the option write_zone_remainder Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 5/8] t/zbd: add -m option to enable write_zone_remainder option Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 6/8] t/zbd: avoid test case 14 failure with " Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 7/8] t/zbd: avoid test case 33 " Shin'ichiro Kawasaki
2026-03-02  2:26 ` [PATCH v3 8/8] t/zbd: avoid test case 71 " Shin'ichiro Kawasaki
2026-03-02  5:06 ` [PATCH v3 0/8] zbd: fix problems of random write with unaligned block size fiotestbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox