From: Stefan Hajnoczi <stefanha@redhat.com>
To: Sam Li <faithilikerun@gmail.com>
Cc: qemu-devel@nongnu.org,
Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>,
dmitry.fomichev@wdc.com, Kevin Wolf <kwolf@redhat.com>,
cassel@kernel.org, Markus Armbruster <armbru@redhat.com>,
qemu-block@nongnu.org, dlemoal@kernel.org,
Eric Blake <eblake@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
hare@suse.de, Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v10 2/4] qcow2: add configurations for zoned format extension
Date: Thu, 14 May 2026 15:49:06 -0400 [thread overview]
Message-ID: <20260514194906.GC1535937@fedora> (raw)
In-Reply-To: <20260510175059.311814-3-faithilikerun@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 36491 bytes --]
On Sun, May 10, 2026 at 07:50:57PM +0200, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires settings as: the device size, zone model, zone size,
> zone capacity, number of conventional zones, limits on zone
> resources (max append bytes, max open zones, and max_active_zones).
>
> To create a qcow2 image with zoned format feature, use command like
> this:
> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> -o zone.max_active_zones=8 -o zone.mode=host-managed
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> block/file-posix.c | 2 +-
> block/qcow2.c | 264 ++++++++++++++++++++++++++++++-
> block/qcow2.h | 32 +++-
> docs/interop/qcow2.rst | 110 ++++++++++++-
> include/block/block_int-common.h | 15 +-
> qapi/block-core.json | 70 +++++++-
> 6 files changed, 485 insertions(+), 8 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index e49b13d6ab..14278785b9 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3607,7 +3607,7 @@ raw_co_zone_append(BlockDriverState *bs,
>
> if (*offset & zone_size_mask) {
> error_report("sector offset %" PRId64 " is not aligned to zone size "
> - "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
> + "%" PRId64 "", *offset / 512, bs->bl.zone_size / 512);
> return -EINVAL;
> }
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 81fd299b4c..b543bcf3e3 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
> #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
>
> static int coroutine_fn
> qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -194,6 +195,74 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp)
> return cryptoopts_qdict;
> }
>
> +/*
> + * Passing by the zoned device configurations by a zoned_header struct, check
> + * if the zone device options are under constraints. Return false when some
> + * option is invalid
> + */
> +static inline bool
> +qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt, Error **errp)
> +{
> + if (zone_opt) {
The zone_opt == NULL case never happens. Remove it to simplify the
function?
> + uint32_t sequential_zones;
> +
> + if (zone_opt->zone_size == 0) {
> + error_setg(errp, "Zoned extension header zone_size field "
> + "can not be 0");
> + return false;
> + }
> +
> + if (zone_opt->zone_capacity > zone_opt->zone_size) {
> + error_setg(errp, "zone capacity %" PRIu64 "B exceeds zone size "
> + "%" PRIu64 "B", zone_opt->zone_capacity,
> + zone_opt->zone_size);
> + return false;
> + }
> +
> + if (zone_opt->max_append_bytes + BDRV_SECTOR_SIZE >=
> + zone_opt->zone_capacity) {
> + error_setg(errp, "max append bytes %" PRIu64 "B exceeds zone "
> + "capacity %" PRIu32 "B by more than block size",
> + zone_opt->zone_capacity,
> + zone_opt->max_append_bytes);
> + return false;
> + }
> +
> + if (zone_opt->conventional_zones >= zone_opt->nr_zones) {
> + error_setg(errp, "Conventional_zones %" PRIu32 " exceeds "
> + "nr_zones %" PRIu32 ".",
> + zone_opt->conventional_zones, zone_opt->nr_zones);
> + return false;
> + }
> +
> + if (zone_opt->max_active_zones > zone_opt->nr_zones) {
> + error_setg(errp, "Max_active_zones %" PRIu32 " exceeds "
> + "nr_zones %" PRIu32 ". Set it to nr_zones.",
> + zone_opt->max_active_zones, zone_opt->nr_zones);
> + zone_opt->max_active_zones = zone_opt->nr_zones;
> + }
> +
> + sequential_zones = zone_opt->nr_zones - zone_opt->conventional_zones;
> + if (zone_opt->max_open_zones > sequential_zones) {
> + error_setg(errp, "Max_open_zones field can not be larger than"
> + "the number of SWR zones. Set it to number of SWR"
> + "zones %" PRIu32 ".", sequential_zones);
> + zone_opt->max_open_zones = sequential_zones;
> + }
> + if (zone_opt->max_open_zones > zone_opt->max_active_zones) {
> + error_setg(errp, "Max_open_zones %" PRIu32 " exceeds "
> + "max_active_zones %" PRIu32 ". Set it to "
> + "max_active_zones.",
> + zone_opt->max_open_zones,
> + zone_opt->max_active_zones);
> + zone_opt->max_open_zones = zone_opt->max_active_zones;
> + }
> +
> + return true;
> + }
> + return false;
> +}
> +
> /*
> * read qcow2 extension and fill bs
> * start reading from start_offset
> @@ -211,6 +280,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> uint64_t offset;
> int ret;
> Qcow2BitmapHeaderExt bitmaps_ext;
> + Qcow2ZonedHeaderExtension zoned_ext;
>
> if (need_update_header != NULL) {
> *need_update_header = false;
> @@ -432,6 +502,51 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> break;
> }
>
> + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> + {
> + if (ext.len < sizeof(zoned_ext)) {
Should this be ext.len != sizeof(zoned_ext)?
> + /* Missing fields */
> + error_setg(errp, "zoned_ext: len=%" PRIu32 " too small "
> + "(<%zu)", ext.len, sizeof(zoned_ext));
> + return -EINVAL;
> + }
> + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
...otherwise we cannot read ext.len bytes into &zoned_ext because it
could overflow.
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> + return ret;
> + }
> +
> + zoned_ext.zone_size = be64_to_cpu(zoned_ext.zone_size);
> + zoned_ext.zone_capacity = be64_to_cpu(zoned_ext.zone_capacity);
> + zoned_ext.conventional_zones =
> + be32_to_cpu(zoned_ext.conventional_zones);
> + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> + zoned_ext.max_active_zones =
> + be32_to_cpu(zoned_ext.max_active_zones);
> + zoned_ext.max_append_bytes =
> + be32_to_cpu(zoned_ext.max_append_bytes);
> + s->zoned_header = zoned_ext;
> +
> + /* refuse to open broken images */
> + if (zoned_ext.nr_zones != DIV_ROUND_UP(bs->total_sectors *
> + BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> + error_setg(errp, "Zoned extension header nr_zones field "
> + "is wrong");
> + return -EINVAL;
> + }
> + if (!qcow2_check_zone_options(&zoned_ext, errp)) {
> + return -EINVAL;
> + }
> +
> +#ifdef DEBUG_EXT
> + printf("Qcow2: Got zoned format extension: "
> + "offset=%" PRIu32 "\n", offset);
> +#endif
> + break;
> + }
> +
> default:
> /* unknown magic - save it in case we need to rewrite the header */
> /* If you add a new feature, make sure to also update the fast
> @@ -2068,6 +2183,15 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
> }
> bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
> bs->bl.pdiscard_alignment = s->cluster_size;
> + bs->bl.zoned = s->zoned_header.zoned;
Let's avoid the assumption that qcow2 file format constants are
identical to internal QEMU block layer API constants (see more on this
below). This will need to translate from the qcow2 zoned header
host-managed constant to the QEMU block layer constant.
> + bs->bl.nr_zones = s->zoned_header.nr_zones;
> + bs->bl.max_append_sectors = s->zoned_header.max_append_bytes
> + >> BDRV_SECTOR_BITS;
> + bs->bl.max_active_zones = s->zoned_header.max_active_zones;
> + bs->bl.max_open_zones = s->zoned_header.max_open_zones;
> + bs->bl.zone_size = s->zoned_header.zone_size;
> + bs->bl.zone_capacity = s->zoned_header.zone_capacity;
> + bs->bl.write_granularity = BDRV_SECTOR_SIZE;
> }
>
> static int GRAPH_UNLOCKED
> @@ -3170,6 +3294,11 @@ int qcow2_update_header(BlockDriverState *bs)
> .bit = QCOW2_INCOMPAT_EXTL2_BITNR,
> .name = "extended L2 entries",
> },
> + {
> + .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
> + .bit = QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
> + .name = "zoned format",
> + },
> {
> .type = QCOW2_FEAT_TYPE_COMPATIBLE,
> .bit = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
> @@ -3215,6 +3344,31 @@ int qcow2_update_header(BlockDriverState *bs)
> buflen -= ret;
> }
>
> + /* Zoned devices header extension */
> + if (s->zoned_header.zoned == BLK_Z_HM) {
> + Qcow2ZonedHeaderExtension zoned_header = {
> + .zoned = s->zoned_header.zoned,
> + .zone_size = cpu_to_be64(s->zoned_header.zone_size),
> + .zone_capacity = cpu_to_be64(s->zoned_header.zone_capacity),
> + .conventional_zones =
> + cpu_to_be32(s->zoned_header.conventional_zones),
> + .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
> + .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
> + .max_active_zones =
> + cpu_to_be32(s->zoned_header.max_active_zones),
> + .max_append_bytes =
> + cpu_to_be32(s->zoned_header.max_append_bytes)
> + };
> + ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
> + &zoned_header, sizeof(zoned_header),
> + buflen);
> + if (ret < 0) {
> + goto fail;
> + }
> + buf += ret;
> + buflen -= ret;
> + }
> +
> /* Keep unknown header extensions */
> QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
> ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -3589,6 +3743,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> ERRP_GUARD();
> BlockdevCreateOptionsQcow2 *qcow2_opts;
> QDict *options;
> + Qcow2ZoneCreateOptions *zone_struct;
> + Qcow2ZoneHostManaged *zone_host_managed;
>
> /*
> * Open the image file and write a minimal qcow2 header.
> @@ -3615,6 +3771,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
>
> assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
> qcow2_opts = &create_options->u.qcow2;
> + zone_struct = create_options->u.qcow2.zone;
> + zone_host_managed = &create_options->u.qcow2.zone->u.host_managed;
>
> bs = bdrv_co_open_blockdev_ref(qcow2_opts->file, errp);
> if (bs == NULL) {
> @@ -3828,6 +3986,14 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> header->incompatible_features |=
> cpu_to_be64(QCOW2_INCOMPAT_DATA_FILE);
> }
> + if (zone_struct && zone_struct->mode == QCOW2_ZONE_MODEL_HOST_MANAGED) {
> + /*
> + * The incompatible bit must be set when the zone model is
> + * host-managed
> + */
> + header->incompatible_features |=
> + cpu_to_be64(QCOW2_INCOMPAT_ZONED_FORMAT);
> + }
> if (qcow2_opts->data_file_raw) {
> header->autoclear_features |=
> cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
> @@ -3885,10 +4051,9 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> bdrv_graph_co_rdlock();
> ret = qcow2_alloc_clusters(blk_bs(blk), 3 * cluster_size);
> if (ret < 0) {
> - bdrv_graph_co_rdunlock();
> error_setg_errno(errp, -ret, "Could not allocate clusters for qcow2 "
> "header and refcount table");
> - goto out;
> + goto unlock;
>
> } else if (ret != 0) {
> error_report("Huh, first cluster in empty image is already in use?");
> @@ -3896,11 +4061,62 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> }
>
> /* Set the external data file if necessary */
> + BDRVQcow2State *s = blk_bs(blk)->opaque;
> if (data_bs) {
> - BDRVQcow2State *s = blk_bs(blk)->opaque;
> s->image_data_file = g_strdup(data_bs->filename);
> }
>
> + if (zone_struct && zone_struct->mode == QCOW2_ZONE_MODEL_HOST_MANAGED) {
> + s->zoned_header.zoned = BLK_Z_HM;
BLK_Z_HM is a QEMU block layer constant while zoned_header.zoned is a
qcow2 file format field. Please do not mix the two, even if the constant
values are identical. Instead, define a constant for host-manager (1) in
block/qcow2.h and use that. This ensures that QEMU's interval constant
values can be changed in the future and are not tied to the qcow2 file
format.
> +
> + if (!zone_host_managed->has_size) {
> + s->zoned_header.zone_size = DEFAULT_ZONE_SIZE;
> + } else {
> + s->zoned_header.zone_size = zone_host_managed->size;
> + }
> + s->zoned_header.nr_zones = DIV_ROUND_UP(qcow2_opts->size,
> + s->zoned_header.zone_size);
> +
> + if (zone_host_managed->has_capacity) {
> + s->zoned_header.zone_capacity = zone_host_managed->capacity;
> + } else {
> + s->zoned_header.zone_capacity = s->zoned_header.zone_size;
> + }
> +
> + if (zone_host_managed->has_conventional_zones) {
> + s->zoned_header.conventional_zones =
> + zone_host_managed->conventional_zones;
> + }
> +
> + if (zone_host_managed->has_max_active_zones) {
> + s->zoned_header.max_active_zones =
> + zone_host_managed->max_active_zones;
> +
> + if (zone_host_managed->has_max_open_zones) {
> + s->zoned_header.max_open_zones =
> + zone_host_managed->max_open_zones;
> + } else {
> + s->zoned_header.max_open_zones =
> + zone_host_managed->max_active_zones;
> + }
> + }
> +
> + if (zone_host_managed->has_max_append_bytes) {
> + s->zoned_header.max_append_bytes =
> + zone_host_managed->max_append_bytes;
> + } else {
> + s->zoned_header.max_append_bytes = DEFAULT_MAX_APPEND_BYTES;
> + }
> +
> + if (!qcow2_check_zone_options(&s->zoned_header, errp)) {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + }
> +
> /* Create a full header (including things like feature table) */
> ret = qcow2_update_header(blk_bs(blk));
> bdrv_graph_co_rdunlock();
> @@ -3974,6 +4190,9 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> }
>
> ret = 0;
> + goto out;
> +unlock:
> + bdrv_graph_co_rdunlock();
> out:
> blk_co_unref(blk);
> bdrv_co_unref(bs);
> @@ -4052,6 +4271,10 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> { BLOCK_OPT_COMPAT_LEVEL, "version" },
> { BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
> { BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
> + { BLOCK_OPT_CONVENTIONAL_ZONES, "zone.conventional-zones" },
> + { BLOCK_OPT_MAX_OPEN_ZONES, "zone.max-open-zones" },
> + { BLOCK_OPT_MAX_ACTIVE_ZONES, "zone.max-active-zones" },
> + { BLOCK_OPT_MAX_APPEND_BYTES, "zone.max-append-bytes" },
> { NULL, NULL },
> };
>
> @@ -6265,6 +6488,41 @@ static QemuOptsList qcow2_create_opts = {
> .type = QEMU_OPT_BOOL, \
> .help = "Assume the external data file already exists and " \
> "do not overwrite it" \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_MODEL, \
> + .type = QEMU_OPT_STRING, \
> + .help = "zone model modes, mode choice: host-managed", \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_SIZE, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone size", \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_CAPACITY, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone capacity", \
> + }, \
> + { \
> + .name = BLOCK_OPT_CONVENTIONAL_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "numbers of conventional zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_APPEND_BYTES, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "max append bytes", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_ACTIVE_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max active zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_OPEN_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max open zones", \
> },
> QCOW_COMMON_OPTIONS,
> { /* end of list */ }
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 192a45d596..743f2e3e79 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -129,6 +129,9 @@
>
> #define DEFAULT_CLUSTER_SIZE 65536
>
> +#define DEFAULT_ZONE_SIZE (256 * MiB)
> +#define DEFAULT_MAX_APPEND_BYTES (64 * KiB)
DEFAULT_ZONE_MAX_APPEND_BYTES would make it clear that this constant is
related to zones.
> +
> #define QCOW2_OPT_DATA_FILE "data-file"
> #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts"
> #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"
> @@ -237,6 +240,27 @@ typedef struct Qcow2CryptoHeaderExtension {
> uint64_t length;
> } QEMU_PACKED Qcow2CryptoHeaderExtension;
>
> +typedef struct Qcow2ZonedHeaderExtension {
> + /* Zoned device attributes */
> + uint8_t zoned;
> + uint8_t reserved[3];
I suggest padding to natural alignment so there is no need to worry
about misaligned loads/stores:
uint8_t reserved[7];
> + uint64_t zone_size;
> + uint64_t zone_capacity;
> + uint32_t conventional_zones;
> + uint32_t nr_zones;
> + uint32_t max_active_zones;
> + uint32_t max_open_zones;
> + uint32_t max_append_bytes;
> + uint64_t zonedmeta_size;
> + uint64_t zonedmeta_offset;
> +} QEMU_PACKED Qcow2ZonedHeaderExtension;
> +
> +typedef struct Qcow2ZoneListEntry {
> + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
> +} Qcow2ZoneListEntry;
> +
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> uint32_t len;
> @@ -257,17 +281,20 @@ enum {
> QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
> QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
> QCOW2_INCOMPAT_EXTL2_BITNR = 4,
> + QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
> QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
> QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
> QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
> QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
> QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
> + QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
>
> QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
> | QCOW2_INCOMPAT_CORRUPT
> | QCOW2_INCOMPAT_DATA_FILE
> | QCOW2_INCOMPAT_COMPRESSION
> - | QCOW2_INCOMPAT_EXTL2,
> + | QCOW2_INCOMPAT_EXTL2
> + | QCOW2_INCOMPAT_ZONED_FORMAT,
> };
>
> /* Compatible feature bits */
> @@ -426,6 +453,9 @@ typedef struct BDRVQcow2State {
> * is to convert the image with the desired compression type set.
> */
> Qcow2CompressionType compression_type;
> +
> + /* States of zoned device */
> + Qcow2ZonedHeaderExtension zoned_header;
> } BDRVQcow2State;
>
> typedef struct Qcow2COWRegion {
> diff --git a/docs/interop/qcow2.rst b/docs/interop/qcow2.rst
> index 5948591107..7bf3a00e77 100644
> --- a/docs/interop/qcow2.rst
> +++ b/docs/interop/qcow2.rst
> @@ -128,7 +128,26 @@ the next fields through ``header_length``.
> allows subcluster-based allocation. See the
> Extended L2 Entries section for more details.
>
> - Bits 5-63: Reserved (set to 0)
> + Bit 5: Zoned extension bit. If this bit is set then
> + the file is an emulated zoned device. The
> + zoned extension must be present.
> + Implementations that do not support zoned
> + emulation cannot open this file because it
> + generally only make sense to interpret the
> + data along with the zone information and
> + write pointers.
> +
> + It is unsafe when any qcow2 user without
> + knowing the zoned extension reads or edits
> + a file with the zoned extension. The write
> + pointer tracking can be corrupted when a
> + writer edits a file, like overwriting beyond
> + the write pointer locations. Or a reader tries
> + to access a file without knowing write
> + pointers where the software setup will cause
> + invalid reads.
> +
> + Bits 6-63: Reserved (set to 0)
>
> 80 - 87: compatible_features
> Bitmask of compatible features. An implementation can
> @@ -259,6 +278,7 @@ be stored. Each extension has a structure like the following::
> 0x23852875 - Bitmaps extension
> 0x0537be77 - Full disk encryption header pointer
> 0x44415441 - External data file name string
> + 0x007a6264 - Zoned extension
> other - Unknown header extension, can be safely
> ignored
>
> @@ -344,6 +364,94 @@ The fields of the bitmaps extension are::
> Offset into the image file at which the bitmap directory
> starts. Must be aligned to a cluster boundary.
>
> +Zoned extension
> +---------------
> +
> +The zoned extension must be present if the incompatible bit is set, and
> +omitted when the incompatible bit is clear. It contains fields for
The first sentence is a little vague about the "incompatible bit".
Either saying "the corresponding incompatible bit" or
"incompatible_features bit 5" is more specific and gives the reader an
indication that they need to look for a specific bit.
> +emulating the zoned storage model (https://zonedstorage.io/). When the
> +zone model mode is not host-managed, it is regarded as incompatible
> +and reports an error to users.
This sentence is unnecessary because the same information is repeated in
the zoned field description below. I suggest removing it since the zoned
field description provides more context, but if you want you could
replace it with something like:
Currently only the host-managed zone model is supported.
> +
> +The write pointers for each zone are stored in an area called zonedmeta
> +clusters. It is 8 bytes per zone. The offset and size of the zonedmeta
Slightly more descriptive:
"It is 8 bytes per zone" -> "The zonedmeta clusters contain a table of
the zone write pointers. Each entry is 8 bytes."
> +are kept in the zoned extension header.
> +
> +The fields of the zoned extension are::
> +
> + Byte 0: zoned
> + The byte represents the zoned model of the device. 0 is for
> + a non-zoned device (all other information in this header
> + is ignored). 1 is for a host-managed device, which only
> + allows for sequential writes within each zone. Other
> + values may be added later, the implementation must refuse
> + to open a device containing an unknown zone model.
> +
> + 1 - 3: Reserved, must be zero.
> +
> + 4 - 11: zone_size
> + Total size of each zone, in bytes. The 64-bit field is to
> + satisfy the virtio-blk zone_size range and emulate a read
> + zoned device, whose maximum zone size can be as large as
> + 2TB.
> +
> + The value must be power of 2. Linux currently requires
> + the zone size to be a power of 2 number of LBAs. Qcow2
> + following this is mainly to allow emulating a real
> + ZNS drive configuration. It is not relevant to the cluster
> + size.
> +
> + 12 - 19: zone_capacity
> + The number of writable bytes within the zones. The bytes
> + between zone capacity and zone size are unusable: reads
> + will return 0s and writes will fail.
> +
> + A zone capacity is always smaller or equal to the zone
> + size. It is for emulating a real ZNS drive configuration,
> + which has the constraint of aligning to some hardware erase
> + block size.
> +
> + 20 - 23: conventional_zones
> + The number of conventional zones. The conventional zones
> + allow sequential writes and random writes. While the
> + sequential zones only allow sequential writes.
> +
> + 24 - 27: nr_zones
> + The number of zones. It is the sum of conventional zones
> + and sequential zones. The maximum value for nr_zones is
> + (2^32 - 1)/8 = 536870911.
> +
> + 28 - 31: max_active_zones
> + The number of the zones that can be in the implicit open,
> + explicit open or closed state. The max active zones cannot
> + be larger than the max open zones.
> +
> + 32 - 35: max_open_zones
> + The maximal number of open (implicitly open or explicitly
> + open) zones. It cannot be larger than the number of SWR
> + zones of the device.
> +
> + If the limits of open zones or active zones are equal to
> + the total number of SWR zones, then it's the same as having
> + no limits therefore max open zones and max active zones are
> + set to 0.
> +
> + 36 - 39: max_append_bytes
> + The number of bytes of a zone append request that can be
> + issued to the device. It must be 512-byte aligned and less
> + than the zone capacity.
> +
> + 40 - 47: zonedmeta_size
> + The size of zoned metadata in bytes. It contains no more
> + than 4GB. The zoned metadata structure is the write
> + pointers for each zone whose size is the number of zones
> + multiplied by the zone size.
I can't parse the last sentence. Is this field necessary since there is
already a nr_zones field?
> +
> + 48 - 55: zonedmeta_offset
> + The offset of zoned metadata structure in the contained
> + image, in bytes.
Do you want to say anything about the order in which metadata is
persisted to disk when zones used? I guess the data is written into the
image file first, then the non-zoned qcow2 L1/L2/refcount metadata is
updated, and finally the write pointer is written. Write pointers are
not guaranteed to be updated on disk until the write request followed by
a flush request are both completed.
(The idea is that the data must be visible in the qcow2 file before it
is safe to update the write pointer. Otherwise a power failure would
leave the file in an inconsistent state where the write pointer has
advanced but the data was not written.)
> Full disk encryption header pointer
> -----------------------------------
>
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 147c08155f..349876bc65 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -57,6 +57,13 @@
> #define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
> #define BLOCK_OPT_EXTL2 "extended_l2"
> #define BLOCK_OPT_KEEP_DATA_FILE "keep_data_file"
> +#define BLOCK_OPT_ZONE_MODEL "zone.mode"
> +#define BLOCK_OPT_ZONE_SIZE "zone.size"
> +#define BLOCK_OPT_ZONE_CAPACITY "zone.capacity"
> +#define BLOCK_OPT_CONVENTIONAL_ZONES "zone.conventional_zones"
> +#define BLOCK_OPT_MAX_APPEND_BYTES "zone.max_append_bytes"
> +#define BLOCK_OPT_MAX_ACTIVE_ZONES "zone.max_active_zones"
> +#define BLOCK_OPT_MAX_OPEN_ZONES "zone.max_open_zones"
>
> #define BLOCK_PROBE_BUF_SIZE 512
>
> @@ -901,7 +908,13 @@ typedef struct BlockLimits {
> BlockZoneModel zoned;
>
> /* zone size expressed in bytes */
> - uint32_t zone_size;
> + uint64_t zone_size;
> +
> + /*
> + * The number of usable logical blocks within the zone, expressed
> + * in bytes. A zone capacity is smaller or equal to the zone size.
> + */
> + uint64_t zone_capacity;
>
> /* total number of zones */
> uint32_t nr_zones;
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 508b081ac1..d771dfb4a1 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5268,6 +5268,70 @@
> { 'enum': 'Qcow2CompressionType',
> 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>
> +##
> +# @Qcow2ZoneModel:
> +#
> +# Zoned device model used in qcow2 image file
> +#
> +# @host-managed: The host-managed model only allows sequential write
> +# over the device zones.
> +#
> +# Since 11.0
> +##
> +{ 'enum': 'Qcow2ZoneModel',
> + 'data': [ 'host-managed'] }
> +
> +##
> +# @Qcow2ZoneHostManaged:
> +#
> +# The host-managed zone model. It only allows sequential writes.
> +#
> +# @size: Total number of bytes within zones (default 256 MB).
> +#
> +# @capacity: The number of usable logical blocks within zones
> +# in bytes. A zone capacity is always smaller or equal to the
> +# zone size (default to zone size).
> +#
> +# @conventional-zones: The number of conventional zones of the
> +# zoned device (default 0).
> +#
> +# @max-open-zones: The maximal number of open zones. It is less than
> +# or equal to the number of sequential write required zones of
> +# the device (default 0).
> +#
> +# @max-active-zones: The maximal number of zones in the implicit
> +# open, explicit open or closed state. It is less than or equal
> +# to the max open zones (default 0).
> +#
> +# @max-append-bytes: The maximal number of bytes of a zone
> +# append request that can be issued to the device. It must be
> +# 512-byte aligned and less than the zone capacity
> +# (default 64 KB).
> +#
> +# Since 11.0
> +##
> +{ 'struct': 'Qcow2ZoneHostManaged',
> + 'data': { '*size': 'size',
> + '*capacity': 'size',
> + '*conventional-zones': 'uint32',
> + '*max-open-zones': 'uint32',
> + '*max-active-zones': 'uint32',
> + '*max-append-bytes': 'size' } }
> +
> +##
> +# @Qcow2ZoneCreateOptions:
> +#
> +# The zone device model for the qcow2 image.
> +#
> +# @mode: The zone device model modes.
> +#
> +# Since 11.0
> +##
> +{ 'union': 'Qcow2ZoneCreateOptions',
> + 'base': { 'mode': 'Qcow2ZoneModel' },
> + 'discriminator': 'mode',
> + 'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
> +
> ##
> # @BlockdevCreateOptionsQcow2:
> #
> @@ -5310,6 +5374,9 @@
> # @compression-type: The image cluster compression method
> # (default: zlib, since 5.1)
> #
> +# @zone: The zone device model modes. The default is that the
> +# device is not zoned. (since 11.0)
> +#
> # Since: 2.12
> ##
> { 'struct': 'BlockdevCreateOptionsQcow2',
> @@ -5326,7 +5393,8 @@
> '*preallocation': 'PreallocMode',
> '*lazy-refcounts': 'bool',
> '*refcount-bits': 'int',
> - '*compression-type':'Qcow2CompressionType' } }
> + '*compression-type':'Qcow2CompressionType',
> + '*zone': 'Qcow2ZoneCreateOptions' } }
>
> ##
> # @BlockdevCreateOptionsQed:
> --
> 2.43.0
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2026-05-14 19:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-10 17:50 [PATCH v10 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2026-05-10 17:50 ` [PATCH v10 1/4] docs/qcow2: add the zoned format feature Sam Li
2026-05-14 18:47 ` Stefan Hajnoczi
2026-05-10 17:50 ` [PATCH v10 2/4] qcow2: add configurations for zoned format extension Sam Li
2026-05-14 19:49 ` Stefan Hajnoczi [this message]
2026-05-10 17:50 ` [PATCH v10 3/4] qcow2: add zoned emulation capability Sam Li
2026-05-14 20:23 ` Stefan Hajnoczi
2026-05-10 17:50 ` [PATCH v10 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
2026-05-14 18:38 ` [PATCH v10 0/4] Add full zoned storage emulation to qcow2 driver Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260514194906.GC1535937@fedora \
--to=stefanha@redhat.com \
--cc=armbru@redhat.com \
--cc=cassel@kernel.org \
--cc=dlemoal@kernel.org \
--cc=dmitry.fomichev@wdc.com \
--cc=eblake@redhat.com \
--cc=faithilikerun@gmail.com \
--cc=hare@suse.de \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=pierrick.bouvier@oss.qualcomm.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.