* [PATCH 0/6] zloop fixes and improvements
@ 2025-11-15 12:15 Damien Le Moal
2025-11-15 12:15 ` [PATCH 1/6] zloop: make the write pointer of full zones invalid Damien Le Moal
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
Jens,
The first 2 patches are simple fixes for the zloop driver. The third
patch is a simple refactoring. Finally, patch 4 and 5 introduce new
configuration parameters that are very useful for testing the block
layer zone append emulation done as part of zone write plugging (patch
4) and to test file systems that use zone append (XFS and btrfs) by
changing the processing behavior of zone append operations in zloop.
The last patch updates zloop documentation.
Damien Le Moal (6):
zloop: make the write pointer of full zones invalid
zloop: fail zone append operations that are targeting full zones
zloop: simplify checks for writes to sequential zones
zloop: introduce the zone_append configuration parameter
zloop: introduce the ordered_zone_append configuration parameter
Documentation: admin-guide: blockdev: update zloop parameters
.../admin-guide/blockdev/zoned_loop.rst | 61 ++++---
drivers/block/zloop.c | 151 ++++++++++++++++--
2 files changed, 171 insertions(+), 41 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/6] zloop: make the write pointer of full zones invalid
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-15 12:15 ` [PATCH 2/6] zloop: fail zone append operations that are targeting full zones Damien Le Moal
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
The write pointer of zones that are in the full condition is always
invalid. Reflect that fact by setting the write pointer of full zones
to ULLONG_MAX.
Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/block/zloop.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c
index 92be9f0af00a..a975b1d07f1c 100644
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@ -177,7 +177,7 @@ static int zloop_update_seq_zone(struct zloop_device *zlo, unsigned int zone_no)
zone->wp = zone->start;
} else if (file_sectors == zlo->zone_capacity) {
zone->cond = BLK_ZONE_COND_FULL;
- zone->wp = zone->start + zlo->zone_size;
+ zone->wp = ULLONG_MAX;
} else {
zone->cond = BLK_ZONE_COND_CLOSED;
zone->wp = zone->start + file_sectors;
@@ -326,7 +326,7 @@ static int zloop_finish_zone(struct zloop_device *zlo, unsigned int zone_no)
}
zone->cond = BLK_ZONE_COND_FULL;
- zone->wp = zone->start + zlo->zone_size;
+ zone->wp = ULLONG_MAX;
clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags);
unlock:
@@ -433,8 +433,10 @@ static void zloop_rw(struct zloop_cmd *cmd)
* copmpletes.
*/
zone->wp += nr_sectors;
- if (zone->wp == zone_end)
+ if (zone->wp == zone_end) {
zone->cond = BLK_ZONE_COND_FULL;
+ zone->wp = ULLONG_MAX;
+ }
}
rq_for_each_bvec(tmp, rq, rq_iter)
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/6] zloop: fail zone append operations that are targeting full zones
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
2025-11-15 12:15 ` [PATCH 1/6] zloop: make the write pointer of full zones invalid Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-15 12:15 ` [PATCH 3/6] zloop: simplify checks for writes to sequential zones Damien Le Moal
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
zloop_rw() will fail any regular write operation that targets a full
sequential zone. The check for this is indirect and achieved by checking
the write pointer alignment of the write operation. But this check is
ineffective for zone append operations since these are alwasy
automatically directed at a zone write pointer.
Prevent zone append operations from being executed in a full zone with
an explicit check of the zone condition.
Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/block/zloop.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c
index a975b1d07f1c..266d233776ad 100644
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@ -407,6 +407,10 @@ static void zloop_rw(struct zloop_cmd *cmd)
mutex_lock(&zone->lock);
if (is_append) {
+ if (zone->cond == BLK_ZONE_COND_FULL) {
+ ret = -EIO;
+ goto unlock;
+ }
sector = zone->wp;
cmd->sector = sector;
}
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/6] zloop: simplify checks for writes to sequential zones
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
2025-11-15 12:15 ` [PATCH 1/6] zloop: make the write pointer of full zones invalid Damien Le Moal
2025-11-15 12:15 ` [PATCH 2/6] zloop: fail zone append operations that are targeting full zones Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-15 12:15 ` [PATCH 4/6] zloop: introduce the zone_append configuration parameter Damien Le Moal
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
The function zloop_rw() already checks early that a request is fully
contained within the target zone. So this check does not need to be done
again for regular writes to sequential zones. Furthermore, since zone
append operations are always directed to the zone write pointer
location, we do not need to check for their alignment to that value
after setting it. So turn the "if" checking the write pointer alignment
into an "else if".
While at it, improve the comment describing the write pointer
modification and how this value is corrected in case of error.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/block/zloop.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c
index 266d233776ad..0526277f6cd1 100644
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@ -406,6 +406,11 @@ static void zloop_rw(struct zloop_cmd *cmd)
if (!test_bit(ZLOOP_ZONE_CONV, &zone->flags) && is_write) {
mutex_lock(&zone->lock);
+ /*
+ * Zone append operations always go at the current write
+ * pointer, but regular write operations must already be
+ * aligned to the write pointer when submitted.
+ */
if (is_append) {
if (zone->cond == BLK_ZONE_COND_FULL) {
ret = -EIO;
@@ -413,13 +418,7 @@ static void zloop_rw(struct zloop_cmd *cmd)
}
sector = zone->wp;
cmd->sector = sector;
- }
-
- /*
- * Write operations must be aligned to the write pointer and
- * fully contained within the zone capacity.
- */
- if (sector != zone->wp || zone->wp + nr_sectors > zone_end) {
+ } else if (sector != zone->wp) {
pr_err("Zone %u: unaligned write: sect %llu, wp %llu\n",
zone_no, sector, zone->wp);
ret = -EIO;
@@ -432,9 +431,9 @@ static void zloop_rw(struct zloop_cmd *cmd)
zone->cond = BLK_ZONE_COND_IMP_OPEN;
/*
- * Advance the write pointer of sequential zones. If the write
- * fails, the wp position will be corrected when the next I/O
- * copmpletes.
+ * Advance the write pointer. If the write fails, the write
+ * pointer position will be corrected when the next I/O starts
+ * execution.
*/
zone->wp += nr_sectors;
if (zone->wp == zone_end) {
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/6] zloop: introduce the zone_append configuration parameter
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
` (2 preceding siblings ...)
2025-11-15 12:15 ` [PATCH 3/6] zloop: simplify checks for writes to sequential zones Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-15 12:15 ` [PATCH 5/6] zloop: introduce the ordered_zone_append " Damien Le Moal
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
A zloop zoned block device declares to the block layer that it supports
zone append operations. That is, a zloop device ressembles an NVMe ZNS
devices supporting zone append.
This native support is fine but it does not allow exercising the block
layer zone write plugging emulation of zone append, as is done with SCSI
or ATA SMR HDDs.
Introduce the zone_append configuration parameter to allow creating a
zloop device without native support for zone append, thus relying on the
block layer zone append emulation. If not specified, zone append support
is enabled by default. Otherwise, a value of 0 disables native zone
append and a value of 1 enables it.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/block/zloop.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c
index 0526277f6cd1..cf9be42ca3e1 100644
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@ -32,6 +32,7 @@ enum {
ZLOOP_OPT_NR_QUEUES = (1 << 6),
ZLOOP_OPT_QUEUE_DEPTH = (1 << 7),
ZLOOP_OPT_BUFFERED_IO = (1 << 8),
+ ZLOOP_OPT_ZONE_APPEND = (1 << 9),
};
static const match_table_t zloop_opt_tokens = {
@@ -44,6 +45,7 @@ static const match_table_t zloop_opt_tokens = {
{ ZLOOP_OPT_NR_QUEUES, "nr_queues=%u" },
{ ZLOOP_OPT_QUEUE_DEPTH, "queue_depth=%u" },
{ ZLOOP_OPT_BUFFERED_IO, "buffered_io" },
+ { ZLOOP_OPT_ZONE_APPEND, "zone_append=%u" },
{ ZLOOP_OPT_ERR, NULL }
};
@@ -56,6 +58,7 @@ static const match_table_t zloop_opt_tokens = {
#define ZLOOP_DEF_NR_QUEUES 1
#define ZLOOP_DEF_QUEUE_DEPTH 128
#define ZLOOP_DEF_BUFFERED_IO false
+#define ZLOOP_DEF_ZONE_APPEND true
/* Arbitrary limit on the zone size (16GB). */
#define ZLOOP_MAX_ZONE_SIZE_MB 16384
@@ -71,6 +74,7 @@ struct zloop_options {
unsigned int nr_queues;
unsigned int queue_depth;
bool buffered_io;
+ bool zone_append;
};
/*
@@ -108,6 +112,7 @@ struct zloop_device {
struct workqueue_struct *workqueue;
bool buffered_io;
+ bool zone_append;
const char *base_dir;
struct file *data_dir;
@@ -378,6 +383,11 @@ static void zloop_rw(struct zloop_cmd *cmd)
cmd->nr_sectors = nr_sectors;
cmd->ret = 0;
+ if (WARN_ON_ONCE(is_append && !zlo->zone_append)) {
+ ret = -EIO;
+ goto out;
+ }
+
/* We should never get an I/O beyond the device capacity. */
if (WARN_ON_ONCE(zone_no >= zlo->nr_zones)) {
ret = -EIO;
@@ -889,7 +899,6 @@ static int zloop_ctl_add(struct zloop_options *opts)
{
struct queue_limits lim = {
.max_hw_sectors = SZ_1M >> SECTOR_SHIFT,
- .max_hw_zone_append_sectors = SZ_1M >> SECTOR_SHIFT,
.chunk_sectors = opts->zone_size,
.features = BLK_FEAT_ZONED,
};
@@ -941,6 +950,7 @@ static int zloop_ctl_add(struct zloop_options *opts)
zlo->nr_zones = nr_zones;
zlo->nr_conv_zones = opts->nr_conv_zones;
zlo->buffered_io = opts->buffered_io;
+ zlo->zone_append = opts->zone_append;
zlo->workqueue = alloc_workqueue("zloop%d", WQ_UNBOUND | WQ_FREEZABLE,
opts->nr_queues * opts->queue_depth, zlo->id);
@@ -981,6 +991,8 @@ static int zloop_ctl_add(struct zloop_options *opts)
lim.physical_block_size = zlo->block_size;
lim.logical_block_size = zlo->block_size;
+ if (zlo->zone_append)
+ lim.max_hw_zone_append_sectors = lim.max_hw_sectors;
zlo->tag_set.ops = &zloop_mq_ops;
zlo->tag_set.nr_hw_queues = opts->nr_queues;
@@ -1021,10 +1033,13 @@ static int zloop_ctl_add(struct zloop_options *opts)
zlo->state = Zlo_live;
mutex_unlock(&zloop_ctl_mutex);
- pr_info("Added device %d: %u zones of %llu MB, %u B block size\n",
+ pr_info("zloop: device %d, %u zones of %llu MiB, %u B block size\n",
zlo->id, zlo->nr_zones,
((sector_t)zlo->zone_size << SECTOR_SHIFT) >> 20,
zlo->block_size);
+ pr_info("zloop%d: using %s zone append\n",
+ zlo->id,
+ zlo->zone_append ? "native" : "emulated");
return 0;
@@ -1111,6 +1126,7 @@ static int zloop_parse_options(struct zloop_options *opts, const char *buf)
opts->nr_queues = ZLOOP_DEF_NR_QUEUES;
opts->queue_depth = ZLOOP_DEF_QUEUE_DEPTH;
opts->buffered_io = ZLOOP_DEF_BUFFERED_IO;
+ opts->zone_append = ZLOOP_DEF_ZONE_APPEND;
if (!buf)
return 0;
@@ -1220,6 +1236,18 @@ static int zloop_parse_options(struct zloop_options *opts, const char *buf)
case ZLOOP_OPT_BUFFERED_IO:
opts->buffered_io = true;
break;
+ case ZLOOP_OPT_ZONE_APPEND:
+ if (match_uint(args, &token)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (token != 0 && token != 1) {
+ pr_err("Invalid zone_append value\n");
+ ret = -EINVAL;
+ goto out;
+ }
+ opts->zone_append = token;
+ break;
case ZLOOP_OPT_ERR:
default:
pr_warn("unknown parameter or missing value '%s'\n", p);
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/6] zloop: introduce the ordered_zone_append configuration parameter
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
` (3 preceding siblings ...)
2025-11-15 12:15 ` [PATCH 4/6] zloop: introduce the zone_append configuration parameter Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-15 12:15 ` [PATCH 6/6] Documentation: admin-guide: blockdev: update zloop parameters Damien Le Moal
2025-11-17 16:42 ` [PATCH 0/6] zloop fixes and improvements Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
The zone append operation processing for zloop devices is similar to any
other command, that is, the operation is processed as a command work
item, without any special serialization between the work items (beside
the zone mutex for mutually exclusive code sections).
This processing is fine and gives excellent performance. However, it has
a side effect: zone append operation are very often reordered and
processed in a sequence that is very different from their issuing order
by the user. This effect is very visible using an XFS file system on top
of a zloop device. A simple file write leads to many file extents as the
data writes using zone append are reordered and so result in the
physical order being different than the file logical order.
E.g. executing:
$ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync
$ xfs_bmap /mnt/test
/mnt/test:
0: [0..4095]: 2162688..2166783
1: [4096..6143]: 2168832..2170879
2: [6144..8191]: 2166784..2168831
3: [8192..10239]: 2170880..2172927
4: [10240..12287]: 2174976..2177023
5: [12288..14335]: 2172928..2174975
6: [14336..20479]: 2177024..2183167
For 10 IOs, 6 extents are created.
This is fine and actually allows to exercise XFS zone garbage collection
very well. However, this also makes debugging/working on XFS data
placement harder as the underlying device will most of the time reorder
IOs, resulting in many file extents.
Allow a user to mitigate this with the new ordered_zone_append
configuration parameter. For a zloop device created with this parameter
specified, the sector of a zone append command is set early, when the
command is submitted by the block layer with the zloop_queue_rq()
function, instead of in the zloop_rw() function which is exectued later
in the command work item context. This change ensures that more often
than not, zone append operations data end up being written in the same
order as the command submission by the user.
In the case of XFS, this leads to far less file data extents. E.g., for
the previous example, we get a single file data extent for the written
file.
$ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync
$ xfs_bmap /mnt/test
/mnt/test:
0: [0..20479]: 2162688..2183167
Since we cannot use a mutex in the context of the zloop_queue_rq()
function to atomically set a zone append operation sector to the target
zone write pointer location and increment that the write pointer, a new
per-zone spinlock is introduced to protect a zone write pointer access
and modifications. To check a zone write pointer location and set a zone
append operation target sector to that value, the function
zloop_set_zone_append_sector() is introduced and called from
zloop_queue_rq().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/block/zloop.c | 108 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 96 insertions(+), 12 deletions(-)
diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c
index cf9be42ca3e1..c4da3116f7a9 100644
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@ -33,6 +33,7 @@ enum {
ZLOOP_OPT_QUEUE_DEPTH = (1 << 7),
ZLOOP_OPT_BUFFERED_IO = (1 << 8),
ZLOOP_OPT_ZONE_APPEND = (1 << 9),
+ ZLOOP_OPT_ORDERED_ZONE_APPEND = (1 << 10),
};
static const match_table_t zloop_opt_tokens = {
@@ -46,6 +47,7 @@ static const match_table_t zloop_opt_tokens = {
{ ZLOOP_OPT_QUEUE_DEPTH, "queue_depth=%u" },
{ ZLOOP_OPT_BUFFERED_IO, "buffered_io" },
{ ZLOOP_OPT_ZONE_APPEND, "zone_append=%u" },
+ { ZLOOP_OPT_ORDERED_ZONE_APPEND, "ordered_zone_append" },
{ ZLOOP_OPT_ERR, NULL }
};
@@ -59,6 +61,7 @@ static const match_table_t zloop_opt_tokens = {
#define ZLOOP_DEF_QUEUE_DEPTH 128
#define ZLOOP_DEF_BUFFERED_IO false
#define ZLOOP_DEF_ZONE_APPEND true
+#define ZLOOP_DEF_ORDERED_ZONE_APPEND false
/* Arbitrary limit on the zone size (16GB). */
#define ZLOOP_MAX_ZONE_SIZE_MB 16384
@@ -75,6 +78,7 @@ struct zloop_options {
unsigned int queue_depth;
bool buffered_io;
bool zone_append;
+ bool ordered_zone_append;
};
/*
@@ -96,6 +100,7 @@ struct zloop_zone {
unsigned long flags;
struct mutex lock;
+ spinlock_t wp_lock;
enum blk_zone_cond cond;
sector_t start;
sector_t wp;
@@ -113,6 +118,7 @@ struct zloop_device {
struct workqueue_struct *workqueue;
bool buffered_io;
bool zone_append;
+ bool ordered_zone_append;
const char *base_dir;
struct file *data_dir;
@@ -152,6 +158,7 @@ static int zloop_update_seq_zone(struct zloop_device *zlo, unsigned int zone_no)
struct zloop_zone *zone = &zlo->zones[zone_no];
struct kstat stat;
sector_t file_sectors;
+ unsigned long flags;
int ret;
lockdep_assert_held(&zone->lock);
@@ -177,6 +184,7 @@ static int zloop_update_seq_zone(struct zloop_device *zlo, unsigned int zone_no)
return -EINVAL;
}
+ spin_lock_irqsave(&zone->wp_lock, flags);
if (!file_sectors) {
zone->cond = BLK_ZONE_COND_EMPTY;
zone->wp = zone->start;
@@ -187,6 +195,7 @@ static int zloop_update_seq_zone(struct zloop_device *zlo, unsigned int zone_no)
zone->cond = BLK_ZONE_COND_CLOSED;
zone->wp = zone->start + file_sectors;
}
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
return 0;
}
@@ -230,6 +239,7 @@ static int zloop_open_zone(struct zloop_device *zlo, unsigned int zone_no)
static int zloop_close_zone(struct zloop_device *zlo, unsigned int zone_no)
{
struct zloop_zone *zone = &zlo->zones[zone_no];
+ unsigned long flags;
int ret = 0;
if (test_bit(ZLOOP_ZONE_CONV, &zone->flags))
@@ -248,10 +258,12 @@ static int zloop_close_zone(struct zloop_device *zlo, unsigned int zone_no)
break;
case BLK_ZONE_COND_IMP_OPEN:
case BLK_ZONE_COND_EXP_OPEN:
+ spin_lock_irqsave(&zone->wp_lock, flags);
if (zone->wp == zone->start)
zone->cond = BLK_ZONE_COND_EMPTY;
else
zone->cond = BLK_ZONE_COND_CLOSED;
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
break;
case BLK_ZONE_COND_EMPTY:
case BLK_ZONE_COND_FULL:
@@ -269,6 +281,7 @@ static int zloop_close_zone(struct zloop_device *zlo, unsigned int zone_no)
static int zloop_reset_zone(struct zloop_device *zlo, unsigned int zone_no)
{
struct zloop_zone *zone = &zlo->zones[zone_no];
+ unsigned long flags;
int ret = 0;
if (test_bit(ZLOOP_ZONE_CONV, &zone->flags))
@@ -286,9 +299,11 @@ static int zloop_reset_zone(struct zloop_device *zlo, unsigned int zone_no)
goto unlock;
}
+ spin_lock_irqsave(&zone->wp_lock, flags);
zone->cond = BLK_ZONE_COND_EMPTY;
zone->wp = zone->start;
clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags);
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
unlock:
mutex_unlock(&zone->lock);
@@ -313,6 +328,7 @@ static int zloop_reset_all_zones(struct zloop_device *zlo)
static int zloop_finish_zone(struct zloop_device *zlo, unsigned int zone_no)
{
struct zloop_zone *zone = &zlo->zones[zone_no];
+ unsigned long flags;
int ret = 0;
if (test_bit(ZLOOP_ZONE_CONV, &zone->flags))
@@ -330,9 +346,11 @@ static int zloop_finish_zone(struct zloop_device *zlo, unsigned int zone_no)
goto unlock;
}
+ spin_lock_irqsave(&zone->wp_lock, flags);
zone->cond = BLK_ZONE_COND_FULL;
zone->wp = ULLONG_MAX;
clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags);
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
unlock:
mutex_unlock(&zone->lock);
@@ -374,6 +392,7 @@ static void zloop_rw(struct zloop_cmd *cmd)
struct zloop_zone *zone;
struct iov_iter iter;
struct bio_vec tmp;
+ unsigned long flags;
sector_t zone_end;
int nr_bvec = 0;
int ret;
@@ -416,19 +435,30 @@ static void zloop_rw(struct zloop_cmd *cmd)
if (!test_bit(ZLOOP_ZONE_CONV, &zone->flags) && is_write) {
mutex_lock(&zone->lock);
+ spin_lock_irqsave(&zone->wp_lock, flags);
+
/*
* Zone append operations always go at the current write
* pointer, but regular write operations must already be
* aligned to the write pointer when submitted.
*/
if (is_append) {
- if (zone->cond == BLK_ZONE_COND_FULL) {
- ret = -EIO;
- goto unlock;
+ /*
+ * If ordered zone append is in use, we already checked
+ * and set the target sector in zloop_queue_rq().
+ */
+ if (!zlo->ordered_zone_append) {
+ if (zone->cond == BLK_ZONE_COND_FULL) {
+ spin_unlock_irqrestore(&zone->wp_lock,
+ flags);
+ ret = -EIO;
+ goto unlock;
+ }
+ sector = zone->wp;
}
- sector = zone->wp;
cmd->sector = sector;
} else if (sector != zone->wp) {
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
pr_err("Zone %u: unaligned write: sect %llu, wp %llu\n",
zone_no, sector, zone->wp);
ret = -EIO;
@@ -441,15 +471,19 @@ static void zloop_rw(struct zloop_cmd *cmd)
zone->cond = BLK_ZONE_COND_IMP_OPEN;
/*
- * Advance the write pointer. If the write fails, the write
- * pointer position will be corrected when the next I/O starts
- * execution.
+ * Advance the write pointer, unless ordered zone append is in
+ * use. If the write fails, the write pointer position will be
+ * corrected when the next I/O starts execution.
*/
- zone->wp += nr_sectors;
- if (zone->wp == zone_end) {
- zone->cond = BLK_ZONE_COND_FULL;
- zone->wp = ULLONG_MAX;
+ if (!is_append || !zlo->ordered_zone_append) {
+ zone->wp += nr_sectors;
+ if (zone->wp == zone_end) {
+ zone->cond = BLK_ZONE_COND_FULL;
+ zone->wp = ULLONG_MAX;
+ }
}
+
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
}
rq_for_each_bvec(tmp, rq, rq_iter)
@@ -623,6 +657,35 @@ static void zloop_complete_rq(struct request *rq)
blk_mq_end_request(rq, sts);
}
+static bool zloop_set_zone_append_sector(struct request *rq)
+{
+ struct zloop_device *zlo = rq->q->queuedata;
+ unsigned int zone_no = rq_zone_no(rq);
+ struct zloop_zone *zone = &zlo->zones[zone_no];
+ sector_t zone_end = zone->start + zlo->zone_capacity;
+ sector_t nr_sectors = blk_rq_sectors(rq);
+ unsigned long flags;
+
+ spin_lock_irqsave(&zone->wp_lock, flags);
+
+ if (zone->cond == BLK_ZONE_COND_FULL ||
+ zone->wp + nr_sectors > zone_end) {
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
+ return false;
+ }
+
+ rq->__sector = zone->wp;
+ zone->wp += blk_rq_sectors(rq);
+ if (zone->wp >= zone_end) {
+ zone->cond = BLK_ZONE_COND_FULL;
+ zone->wp = ULLONG_MAX;
+ }
+
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
+
+ return true;
+}
+
static blk_status_t zloop_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
@@ -633,6 +696,16 @@ static blk_status_t zloop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (zlo->state == Zlo_deleting)
return BLK_STS_IOERR;
+ /*
+ * If we need to strongly order zone append operations, set the request
+ * sector to the zone write pointer location now instead of when the
+ * command work runs.
+ */
+ if (zlo->ordered_zone_append && req_op(rq) == REQ_OP_ZONE_APPEND) {
+ if (!zloop_set_zone_append_sector(rq))
+ return BLK_STS_IOERR;
+ }
+
blk_mq_start_request(rq);
INIT_WORK(&cmd->work, zloop_cmd_workfn);
@@ -667,6 +740,7 @@ static int zloop_report_zones(struct gendisk *disk, sector_t sector,
struct zloop_device *zlo = disk->private_data;
struct blk_zone blkz = {};
unsigned int first, i;
+ unsigned long flags;
int ret;
first = disk_zone_no(disk, sector);
@@ -690,7 +764,9 @@ static int zloop_report_zones(struct gendisk *disk, sector_t sector,
blkz.start = zone->start;
blkz.len = zlo->zone_size;
+ spin_lock_irqsave(&zone->wp_lock, flags);
blkz.wp = zone->wp;
+ spin_unlock_irqrestore(&zone->wp_lock, flags);
blkz.cond = zone->cond;
if (test_bit(ZLOOP_ZONE_CONV, &zone->flags)) {
blkz.type = BLK_ZONE_TYPE_CONVENTIONAL;
@@ -798,6 +874,7 @@ static int zloop_init_zone(struct zloop_device *zlo, struct zloop_options *opts,
int ret;
mutex_init(&zone->lock);
+ spin_lock_init(&zone->wp_lock);
zone->start = (sector_t)zone_no << zlo->zone_shift;
if (!restore)
@@ -951,6 +1028,8 @@ static int zloop_ctl_add(struct zloop_options *opts)
zlo->nr_conv_zones = opts->nr_conv_zones;
zlo->buffered_io = opts->buffered_io;
zlo->zone_append = opts->zone_append;
+ if (zlo->zone_append)
+ zlo->ordered_zone_append = opts->ordered_zone_append;
zlo->workqueue = alloc_workqueue("zloop%d", WQ_UNBOUND | WQ_FREEZABLE,
opts->nr_queues * opts->queue_depth, zlo->id);
@@ -1037,8 +1116,9 @@ static int zloop_ctl_add(struct zloop_options *opts)
zlo->id, zlo->nr_zones,
((sector_t)zlo->zone_size << SECTOR_SHIFT) >> 20,
zlo->block_size);
- pr_info("zloop%d: using %s zone append\n",
+ pr_info("zloop%d: using %s%s zone append\n",
zlo->id,
+ zlo->ordered_zone_append ? "ordered " : "",
zlo->zone_append ? "native" : "emulated");
return 0;
@@ -1127,6 +1207,7 @@ static int zloop_parse_options(struct zloop_options *opts, const char *buf)
opts->queue_depth = ZLOOP_DEF_QUEUE_DEPTH;
opts->buffered_io = ZLOOP_DEF_BUFFERED_IO;
opts->zone_append = ZLOOP_DEF_ZONE_APPEND;
+ opts->ordered_zone_append = ZLOOP_DEF_ORDERED_ZONE_APPEND;
if (!buf)
return 0;
@@ -1248,6 +1329,9 @@ static int zloop_parse_options(struct zloop_options *opts, const char *buf)
}
opts->zone_append = token;
break;
+ case ZLOOP_OPT_ORDERED_ZONE_APPEND:
+ opts->ordered_zone_append = true;
+ break;
case ZLOOP_OPT_ERR:
default:
pr_warn("unknown parameter or missing value '%s'\n", p);
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/6] Documentation: admin-guide: blockdev: update zloop parameters
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
` (4 preceding siblings ...)
2025-11-15 12:15 ` [PATCH 5/6] zloop: introduce the ordered_zone_append " Damien Le Moal
@ 2025-11-15 12:15 ` Damien Le Moal
2025-11-17 16:42 ` [PATCH 0/6] zloop fixes and improvements Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-11-15 12:15 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Christoph Hellwig
In Documentation/admin-guide/blockdev/zoned_loop.rst, add the
description of the zone_append and ordered_zone_append configuration
arguments of zloop "add" command (device creation).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
.../admin-guide/blockdev/zoned_loop.rst | 61 +++++++++++--------
1 file changed, 37 insertions(+), 24 deletions(-)
diff --git a/Documentation/admin-guide/blockdev/zoned_loop.rst b/Documentation/admin-guide/blockdev/zoned_loop.rst
index 64dcfde7450a..806adde664db 100644
--- a/Documentation/admin-guide/blockdev/zoned_loop.rst
+++ b/Documentation/admin-guide/blockdev/zoned_loop.rst
@@ -68,30 +68,43 @@ The options available for the add command can be listed by reading the
In more details, the options that can be used with the "add" command are as
follows.
-================ ===========================================================
-id Device number (the X in /dev/zloopX).
- Default: automatically assigned.
-capacity_mb Device total capacity in MiB. This is always rounded up to
- the nearest higher multiple of the zone size.
- Default: 16384 MiB (16 GiB).
-zone_size_mb Device zone size in MiB. Default: 256 MiB.
-zone_capacity_mb Device zone capacity (must always be equal to or lower than
- the zone size. Default: zone size.
-conv_zones Total number of conventioanl zones starting from sector 0.
- Default: 8.
-base_dir Path to the base directory where to create the directory
- containing the zone files of the device.
- Default=/var/local/zloop.
- The device directory containing the zone files is always
- named with the device ID. E.g. the default zone file
- directory for /dev/zloop0 is /var/local/zloop/0.
-nr_queues Number of I/O queues of the zoned block device. This value is
- always capped by the number of online CPUs
- Default: 1
-queue_depth Maximum I/O queue depth per I/O queue.
- Default: 64
-buffered_io Do buffered IOs instead of direct IOs (default: false)
-================ ===========================================================
+=================== =========================================================
+id Device number (the X in /dev/zloopX).
+ Default: automatically assigned.
+capacity_mb Device total capacity in MiB. This is always rounded up
+ to the nearest higher multiple of the zone size.
+ Default: 16384 MiB (16 GiB).
+zone_size_mb Device zone size in MiB. Default: 256 MiB.
+zone_capacity_mb Device zone capacity (must always be equal to or lower
+ than the zone size. Default: zone size.
+conv_zones Total number of conventioanl zones starting from
+ sector 0
+ Default: 8
+base_dir Path to the base directory where to create the directory
+ containing the zone files of the device.
+ Default=/var/local/zloop.
+ The device directory containing the zone files is always
+ named with the device ID. E.g. the default zone file
+ directory for /dev/zloop0 is /var/local/zloop/0.
+nr_queues Number of I/O queues of the zoned block device. This
+ value is always capped by the number of online CPUs
+ Default: 1
+queue_depth Maximum I/O queue depth per I/O queue.
+ Default: 64
+buffered_io Do buffered IOs instead of direct IOs (default: false)
+zone_append Enable or disable a zloop device native zone append
+ support.
+ Default: 1 (enabled).
+ If native zone append support is disabled, the block layer
+ will emulate this operation using regular write
+ operations.
+ordered_zone_append Enable zloop mitigation of zone append reordering.
+ Default: disabled.
+ This is useful for testing file systems file data mapping
+ (extents), as when enabled, this can significantly reduce
+ the number of data extents needed to for a file data
+ mapping.
+=================== =========================================================
3) Deleting a Zoned Device
--------------------------
--
2.51.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/6] zloop fixes and improvements
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
` (5 preceding siblings ...)
2025-11-15 12:15 ` [PATCH 6/6] Documentation: admin-guide: blockdev: update zloop parameters Damien Le Moal
@ 2025-11-17 16:42 ` Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2025-11-17 16:42 UTC (permalink / raw)
To: linux-block, Damien Le Moal; +Cc: Christoph Hellwig
On Sat, 15 Nov 2025 21:15:50 +0900, Damien Le Moal wrote:
> Jens,
>
> The first 2 patches are simple fixes for the zloop driver. The third
> patch is a simple refactoring. Finally, patch 4 and 5 introduce new
> configuration parameters that are very useful for testing the block
> layer zone append emulation done as part of zone write plugging (patch
> 4) and to test file systems that use zone append (XFS and btrfs) by
> changing the processing behavior of zone append operations in zloop.
>
> [...]
Applied, thanks!
[1/6] zloop: make the write pointer of full zones invalid
commit: 866d65745b635927c3d1343ab67e6fd4a99d116d
[2/6] zloop: fail zone append operations that are targeting full zones
commit: cf28f6f923cb1dd2765b5c3d7697bb4dcf2096a0
[3/6] zloop: simplify checks for writes to sequential zones
commit: e3a96ca90462f80d9f58a1236514823334deef39
[4/6] zloop: introduce the zone_append configuration parameter
commit: 9236c5fdd5a8bec2445e834e7e1bbefb2eb62f67
[5/6] zloop: introduce the ordered_zone_append configuration parameter
commit: fcc6eaa3a03a0e94f6f1d0ac455209b520ef8024
[6/6] Documentation: admin-guide: blockdev: update zloop parameters
commit: ade260ca858627b21be87711b1e12a7bf80c0261
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-17 16:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-15 12:15 [PATCH 0/6] zloop fixes and improvements Damien Le Moal
2025-11-15 12:15 ` [PATCH 1/6] zloop: make the write pointer of full zones invalid Damien Le Moal
2025-11-15 12:15 ` [PATCH 2/6] zloop: fail zone append operations that are targeting full zones Damien Le Moal
2025-11-15 12:15 ` [PATCH 3/6] zloop: simplify checks for writes to sequential zones Damien Le Moal
2025-11-15 12:15 ` [PATCH 4/6] zloop: introduce the zone_append configuration parameter Damien Le Moal
2025-11-15 12:15 ` [PATCH 5/6] zloop: introduce the ordered_zone_append " Damien Le Moal
2025-11-15 12:15 ` [PATCH 6/6] Documentation: admin-guide: blockdev: update zloop parameters Damien Le Moal
2025-11-17 16:42 ` [PATCH 0/6] zloop fixes and improvements Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).