* [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver
@ 2023-10-30 12:18 Sam Li
2023-10-30 12:18 ` [PATCH v5 1/4] docs/qcow2: add the zoned format feature Sam Li
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Sam Li @ 2023-10-30 12:18 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster, Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.
Write pointer are preserved in the zoned metadata. It will be
recovered after power cycle. Meanwhile, any open (implicit or
explicit) zone will show up as closed.
Zone states are in memory. Read-only and offline states are
device-internal events, which are not considerred in qcow2
emulation for simplicity. The other zone states
(closed, empty, full) can be inferred from write poiner
values, presistent across QEMU reboots. The open states are
kept in memory using open zone lists.
To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1
Then add it to the QEMU command line:
-blockdev node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 \
-device virtio-blk-pci,drive=drive1 \
v4->v5:
- add incompatible bit for zoned format [Eric]
- fix and manage zone resources via LRU [Damien]
- renaming functions and fields, spec changes [Markus, Damien]
- add closed zone list
- make qemu iotests for zoned device consecutive [Stefan]
v3->v4:
- use QLIST for implicit, explicit open zones management [Stefan]
- keep zone states in memory and drop state bits in wp metadata structure [Damien, Stefan]
- change zone resource management and iotests accordingly
- add tracing for number of implicit zones
- address review comments [Stefan, Markus]:
* documentation, config, style
v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]
v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
* fix zoned_mata allocation size
* use bitwise or than addition
* fix wp index overflow and locking
* cleanups: comments, naming
Sam Li (4):
docs/qcow2: add the zoned format feature
qcow2: add configurations for zoned format extension
qcow2: add zoned emulation capability
iotests: test the zoned format feature for qcow2 file
block/qcow2.c | 934 ++++++++++++++++++++++-
block/qcow2.h | 37 +-
block/trace-events | 2 +
docs/interop/qcow2.txt | 67 +-
docs/system/qemu-block-drivers.rst.inc | 33 +
include/block/block_int-common.h | 13 +
include/qemu/queue.h | 1 +
qapi/block-core.json | 45 +-
tests/qemu-iotests/tests/zoned-qcow2 | 126 +++
tests/qemu-iotests/tests/zoned-qcow2.out | 118 +++
10 files changed, 1370 insertions(+), 6 deletions(-)
create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
--
2.40.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 1/4] docs/qcow2: add the zoned format feature
2023-10-30 12:18 [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
@ 2023-10-30 12:18 ` Sam Li
2023-10-30 14:04 ` Eric Blake
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Sam Li @ 2023-10-30 12:18 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster, Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file can be taken as zoned device and passed through by
virtio-blk device or NVMe ZNS device to the guest given zoned
information.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..4647c5fa29 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
filename`` to check if the NOCOW flag is set or not (Capital 'C' is
NOCOW flag).
+ .. option:: zoned
+ 1 for host-managed zoned device and 0 for a non-zoned device.
+
+ .. option:: zone_size
+
+ The size of a zone in bytes. The device is divided into zones of this
+ size with the exception of the last zone, which may be smaller.
+
+ .. option:: zone_capacity
+
+ The initial capacity value, in bytes, for all zones. The capacity must
+ be less than or equal to zone size. If the last zone is smaller, then
+ its capacity is capped.
+
+ The zone capacity is per zone and may be different between zones in real
+ devices. For simplicity, QCow2 sets all zones to the same capacity.
+
+ .. option:: zone_nr_conv
+
+ The number of conventional zones of the zoned device.
+
+ .. option:: max_open_zones
+
+ The maximal allowed open zones.
+
+ .. option:: max_active_zones
+
+ The limit of the zones with implicit open, explicit open or closed state.
+
+ .. option:: max_append_sectors
+
+ The maximal number of 512-byte sectors in a zone append request.
+
.. program:: image-formats
.. option:: qed
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 12:18 [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2023-10-30 12:18 ` [PATCH v5 1/4] docs/qcow2: add the zoned format feature Sam Li
@ 2023-10-30 12:18 ` Sam Li
2023-10-30 14:53 ` Eric Blake
` (2 more replies)
2023-10-30 12:18 ` [PATCH v5 3/4] qcow2: add zoned emulation capability Sam Li
2023-10-30 12:18 ` [PATCH v5 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
3 siblings, 3 replies; 17+ messages in thread
From: Sam Li @ 2023-10-30 12:18 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster, Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires settings as: the device size, zone model, zone size,
zone capacity, number of conventional zones, limits on zone
resources (max append bytes, max open zones, and max_active_zones).
To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=host-managed
Signed-off-by: Sam Li <faithilikerun@gmail.com>
fix config?
---
block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
block/qcow2.h | 37 +++++-
docs/interop/qcow2.txt | 67 +++++++++-
include/block/block_int-common.h | 13 ++
qapi/block-core.json | 45 ++++++-
5 files changed, 362 insertions(+), 5 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index aa01d9e7b5..cd53268ca7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
#define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
#define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
#define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
static int coroutine_fn
qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
uint64_t offset;
int ret;
Qcow2BitmapHeaderExt bitmaps_ext;
+ Qcow2ZonedHeaderExtension zoned_ext;
if (need_update_header != NULL) {
*need_update_header = false;
@@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
break;
}
+ case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+ {
+ if (ext.len != sizeof(zoned_ext)) {
+ error_setg(errp, "zoned_ext: Invalid extension length");
+ return -EINVAL;
+ }
+ ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+ return ret;
+ }
+
+ if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
+ warn_report("A program lacking zoned format support "
+ "may modify this file and zoned metadata are "
+ "now considered inconsistent");
+ error_printf("The zoned metadata is corrupted.\n");
+ }
+
+ zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+ zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+ zoned_ext.conventional_zones =
+ be32_to_cpu(zoned_ext.conventional_zones);
+ zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+ zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+ zoned_ext.max_active_zones =
+ be32_to_cpu(zoned_ext.max_active_zones);
+ zoned_ext.max_append_bytes =
+ be32_to_cpu(zoned_ext.max_append_bytes);
+ s->zoned_header = zoned_ext;
+
+ /* refuse to open broken images */
+ if (zoned_ext.zone_size == 0) {
+ error_setg(errp, "Zoned extension header zone_size field "
+ "can not be 0");
+ return -EINVAL;
+ }
+ if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
+ error_setg(errp, "Zoned extension header zone_capacity field "
+ "can not be larger that zone_size field");
+ return -EINVAL;
+ }
+ if (zoned_ext.nr_zones != DIV_ROUND_UP(
+ bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
+ error_setg(errp, "Zoned extension header nr_zones field "
+ "is wrong");
+ return -EINVAL;
+ }
+
+#ifdef DEBUG_EXT
+ printf("Qcow2: Got zoned format extension: "
+ "offset=%" PRIu32 "\n", offset);
+#endif
+ break;
+ }
+
default:
/* unknown magic - save it in case we need to rewrite the header */
/* If you add a new feature, make sure to also update the fast
@@ -1967,6 +2026,15 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
}
bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
bs->bl.pdiscard_alignment = s->cluster_size;
+ bs->bl.zoned = s->zoned_header.zoned;
+ bs->bl.nr_zones = s->zoned_header.nr_zones;
+ bs->bl.max_append_sectors = s->zoned_header.max_append_bytes
+ >> BDRV_SECTOR_BITS;
+ bs->bl.max_active_zones = s->zoned_header.max_active_zones;
+ bs->bl.max_open_zones = s->zoned_header.max_open_zones;
+ bs->bl.zone_size = s->zoned_header.zone_size;
+ bs->bl.zone_capacity = s->zoned_header.zone_capacity;
+ bs->bl.write_granularity = BDRV_SECTOR_SIZE;
}
static int GRAPH_UNLOCKED
@@ -3062,6 +3130,11 @@ int qcow2_update_header(BlockDriverState *bs)
.bit = QCOW2_INCOMPAT_EXTL2_BITNR,
.name = "extended L2 entries",
},
+ {
+ .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+ .bit = QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
+ .name = "zoned format",
+ },
{
.type = QCOW2_FEAT_TYPE_COMPATIBLE,
.bit = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
@@ -3107,6 +3180,31 @@ int qcow2_update_header(BlockDriverState *bs)
buflen -= ret;
}
+ /* Zoned devices header extension */
+ if (s->zoned_header.zoned == BLK_Z_HM) {
+ Qcow2ZonedHeaderExtension zoned_header = {
+ .zoned = s->zoned_header.zoned,
+ .zone_size = cpu_to_be32(s->zoned_header.zone_size),
+ .zone_capacity = cpu_to_be32(s->zoned_header.zone_capacity),
+ .conventional_zones =
+ cpu_to_be32(s->zoned_header.conventional_zones),
+ .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
+ .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
+ .max_active_zones =
+ cpu_to_be32(s->zoned_header.max_active_zones),
+ .max_append_bytes =
+ cpu_to_be32(s->zoned_header.max_append_bytes)
+ };
+ ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
+ &zoned_header, sizeof(zoned_header),
+ buflen);
+ if (ret < 0) {
+ goto fail;
+ }
+ buf += ret;
+ buflen -= ret;
+ }
+
/* Keep unknown header extensions */
QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
@@ -3718,6 +3816,10 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
header->incompatible_features |=
cpu_to_be64(QCOW2_INCOMPAT_DATA_FILE);
}
+ if (qcow2_opts->zone_model != QCOW2_ZONE_MODEL_HOST_MANAGED) {
+ header->incompatible_features |=
+ cpu_to_be64(QCOW2_INCOMPAT_ZONED_FORMAT);
+ }
if (qcow2_opts->data_file_raw) {
header->autoclear_features |=
cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
@@ -3786,11 +3888,70 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
}
/* Set the external data file if necessary */
+ BDRVQcow2State *s = blk_bs(blk)->opaque;
if (data_bs) {
- BDRVQcow2State *s = blk_bs(blk)->opaque;
s->image_data_file = g_strdup(data_bs->filename);
}
+ if (qcow2_opts->zone_model == QCOW2_ZONE_MODEL_HOST_MANAGED) {
+ if (!qcow2_opts->has_zone_size) {
+ error_setg(errp, "Missing zone_size parameter");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (qcow2_opts->zone_size == 0) {
+ s->zoned_header.zoned = BLK_Z_NONE;
+ error_setg(errp, "Zoned devices can not allow a larger-than-zero "
+ "zone_size");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ s->zoned_header.zoned = BLK_Z_HM;
+ s->zoned_header.zone_size = qcow2_opts->zone_size;
+ s->zoned_header.nr_zones = DIV_ROUND_UP(qcow2_opts->size,
+ qcow2_opts->zone_size);
+
+ if (qcow2_opts->has_zone_capacity) {
+ if (qcow2_opts->zone_capacity > qcow2_opts->zone_size) {
+ s->zoned_header.zoned = BLK_Z_NONE;
+ error_setg(errp, "zone capacity %" PRIu64 "B exceeds zone size "
+ "%" PRIu64"B", qcow2_opts->zone_capacity,
+ qcow2_opts->zone_size);
+ ret = -EINVAL;
+ goto out;
+ }
+ s->zoned_header.zone_capacity = qcow2_opts->zone_capacity;
+ } else {
+ s->zoned_header.zone_capacity = qcow2_opts->zone_size;
+ }
+
+ if (qcow2_opts->has_conventional_zones) {
+ s->zoned_header.conventional_zones = qcow2_opts->conventional_zones;
+ }
+
+ if (qcow2_opts->has_max_active_zones) {
+ if (qcow2_opts->max_open_zones > qcow2_opts->max_active_zones) {
+ s->zoned_header.zoned = BLK_Z_NONE;
+ error_setg(errp, "max_open_zones %" PRIu32 " exceeds "
+ "max_active_zones %" PRIu32"",
+ qcow2_opts->max_open_zones,
+ qcow2_opts->max_active_zones);
+ ret = -EINVAL;
+ goto out;
+ }
+ if (qcow2_opts->has_max_open_zones) {
+ s->zoned_header.max_open_zones = qcow2_opts->max_active_zones;
+ } else {
+ s->zoned_header.max_open_zones = qcow2_opts->max_active_zones;
+ }
+ }
+ s->zoned_header.max_append_bytes = qcow2_opts->max_append_bytes;
+ } else {
+ s->zoned_header.zoned = BLK_Z_NONE;
+ }
+
/* Create a full header (including things like feature table) */
ret = qcow2_update_header(blk_bs(blk));
bdrv_graph_co_rdunlock();
@@ -3921,6 +4082,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
{ BLOCK_OPT_COMPAT_LEVEL, "version" },
{ BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
{ BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
+ { BLOCK_OPT_ZONE_MODEL, "zone-model"},
+ { BLOCK_OPT_CONVENTIONAL_ZONES, "conventional-zones"},
+ { BLOCK_OPT_MAX_OPEN_ZONES, "max-open-zones"},
+ { BLOCK_OPT_MAX_ACTIVE_ZONES, "max-active-zones"},
+ { BLOCK_OPT_MAX_APPEND_BYTES, "max-append-bytes"},
+ { BLOCK_OPT_ZONE_SIZE, "zone-size"},
+ { BLOCK_OPT_ZONE_CAPACITY, "zone-capacity"},
{ NULL, NULL },
};
@@ -6087,6 +6255,41 @@ static QemuOptsList qcow2_create_opts = {
.help = "Compression method used for image cluster " \
"compression", \
.def_value_str = "zlib" \
+ }, \
+ { \
+ .name = BLOCK_OPT_ZONE_MODEL, \
+ .type = QEMU_OPT_STRING, \
+ .help = "zone model", \
+ }, \
+ { \
+ .name = BLOCK_OPT_ZONE_SIZE, \
+ .type = QEMU_OPT_SIZE, \
+ .help = "zone size", \
+ }, \
+ { \
+ .name = BLOCK_OPT_ZONE_CAPACITY, \
+ .type = QEMU_OPT_SIZE, \
+ .help = "zone capacity", \
+ }, \
+ { \
+ .name = BLOCK_OPT_CONVENTIONAL_ZONES, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "numbers of conventional zones", \
+ }, \
+ { \
+ .name = BLOCK_OPT_MAX_APPEND_BYTES, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max append bytes", \
+ }, \
+ { \
+ .name = BLOCK_OPT_MAX_ACTIVE_ZONES, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max active zones", \
+ }, \
+ { \
+ .name = BLOCK_OPT_MAX_OPEN_ZONES, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max open zones", \
},
QCOW_COMMON_OPTIONS,
{ /* end of list */ }
diff --git a/block/qcow2.h b/block/qcow2.h
index 29958c512b..6e5d90afd2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -236,6 +236,27 @@ typedef struct Qcow2CryptoHeaderExtension {
uint64_t length;
} QEMU_PACKED Qcow2CryptoHeaderExtension;
+typedef struct Qcow2ZonedHeaderExtension {
+ /* Zoned device attributes */
+ uint8_t zoned;
+ uint8_t reserved[3];
+ uint32_t zone_size;
+ uint32_t zone_capacity;
+ uint32_t conventional_zones;
+ uint32_t nr_zones;
+ uint32_t max_active_zones;
+ uint32_t max_open_zones;
+ uint32_t max_append_bytes;
+ uint64_t zonedmeta_size;
+ uint64_t zonedmeta_offset;
+} QEMU_PACKED Qcow2ZonedHeaderExtension;
+
+typedef struct Qcow2ZoneListEntry {
+ QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
+ QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
+ QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
+} Qcow2ZoneListEntry;
+
typedef struct Qcow2UnknownHeaderExtension {
uint32_t magic;
uint32_t len;
@@ -256,17 +277,20 @@ enum {
QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
QCOW2_INCOMPAT_EXTL2_BITNR = 4,
+ QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
+ QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
| QCOW2_INCOMPAT_CORRUPT
| QCOW2_INCOMPAT_DATA_FILE
| QCOW2_INCOMPAT_COMPRESSION
- | QCOW2_INCOMPAT_EXTL2,
+ | QCOW2_INCOMPAT_EXTL2
+ | QCOW2_INCOMPAT_ZONED_FORMAT,
};
/* Compatible feature bits */
@@ -285,7 +309,6 @@ enum {
QCOW2_AUTOCLEAR_DATA_FILE_RAW = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
QCOW2_AUTOCLEAR_MASK = QCOW2_AUTOCLEAR_BITMAPS
- | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
};
enum qcow2_discard_type {
@@ -422,6 +445,16 @@ typedef struct BDRVQcow2State {
* is to convert the image with the desired compression type set.
*/
Qcow2CompressionType compression_type;
+
+ /* States of zoned device */
+ Qcow2ZonedHeaderExtension zoned_header;
+ QLIST_HEAD(, Qcow2ZoneListEntry) exp_open_zones;
+ QLIST_HEAD(, Qcow2ZoneListEntry) imp_open_zones;
+ QLIST_HEAD(, Qcow2ZoneListEntry) closed_zones;
+ Qcow2ZoneListEntry *zone_list_entries;
+ uint32_t nr_zones_exp_open;
+ uint32_t nr_zones_imp_open;
+ uint32_t nr_zones_closed;
} BDRVQcow2State;
typedef struct Qcow2COWRegion {
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 2c4618375a..b210bc4580 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -125,7 +125,18 @@ the next fields through header_length.
allows subcluster-based allocation. See the
Extended L2 Entries section for more details.
- Bits 5-63: Reserved (set to 0)
+ Bit 5: Zoned extension bit. If this bit is set then
+ the file is a zoned device file with
+ host-managed model.
+
+ It is unsafe when any qcow2 writer which does
+ not understand zone information edits a file
+ with the zoned extension. The write pointer
+ tracking is corrupted between different version
+ of qcow2 writes. In such cases, the file can
+ no longer be seen as a zoned device.
+
+ Bits 6-63: Reserved (set to 0)
80 - 87: compatible_features
Bitmask of compatible features. An implementation can
@@ -249,6 +260,7 @@ be stored. Each extension has a structure like the following:
0x23852875 - Bitmaps extension
0x0537be77 - Full disk encryption header pointer
0x44415441 - External data file name string
+ 0x007a6264 - Zoned extension
other - Unknown header extension, can be safely
ignored
@@ -331,6 +343,59 @@ The fields of the bitmaps extension are:
Offset into the image file at which the bitmap directory
starts. Must be aligned to a cluster boundary.
+== Zoned extension ==
+
+The zoned extension is an optional header extension. It contains fields for
+emulating the zoned stroage model (https://zonedstorage.io/).
+
+When the device model is not recognized as host-managed, it is regared as
+incompatible and report an error to users.
+
+The fields of the zoned extension are:
+ Byte 0: zoned
+ This bit represents zone model of devices. 1 is for
+ host-managed, which only allows sequential writes.
+ And 0 is for non-zoned devices without such constraints.
+
+ 1 - 3: Reserved, must be zero.
+
+ 4 - 7: zone_size
+ Total size of each zones, in bytes. It is less than 4GB
+ in the contained image for simplicity.
+
+ 8 - 11: zone_capacity
+ The number of writable bytes within the zones.
+ A zone capacity is always smaller or equal to the
+ zone size.
+
+ 12 - 15: conventional_zones
+ The number of conventional zones.
+
+ 16 - 19: nr_zones
+ The number of zones. It is the sum of conventional zones
+ and sequential zones.
+
+ 20 - 23: max_active_zones
+ The number of the zones that are in the implicit open,
+ explicit open or closed state.
+
+ 24 - 27: max_open_zones
+ The maximal number of open (implicitly open or explicitly
+ open) zones.
+
+ 28 - 31: max_append_bytes
+ The number of bytes of a zone append request that can be
+ issued to the device. It must be 512-byte aligned.
+
+ 32 - 39: zonedmeta_size
+ The size of zoned metadata in bytes. It contains no more
+ than 4GB. The zoned metadata structure is the write
+ pointers for each zone.
+
+ 40 - 47: zonedmeta_offset
+ The offset of zoned metadata structure in the contained
+ image, in bytes.
+
== Full disk encryption header pointer ==
The full disk encryption header must be present if, and only if, the
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b8d9d24f39..4b94e55eb4 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -57,6 +57,13 @@
#define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
#define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
#define BLOCK_OPT_EXTL2 "extended_l2"
+#define BLOCK_OPT_ZONE_MODEL "zone_model"
+#define BLOCK_OPT_ZONE_SIZE "zone_size"
+#define BLOCK_OPT_ZONE_CAPACITY "zone_capacity"
+#define BLOCK_OPT_CONVENTIONAL_ZONES "conventional_zones"
+#define BLOCK_OPT_MAX_APPEND_BYTES "max_append_bytes"
+#define BLOCK_OPT_MAX_ACTIVE_ZONES "max_active_zones"
+#define BLOCK_OPT_MAX_OPEN_ZONES "max_open_zones"
#define BLOCK_PROBE_BUF_SIZE 512
@@ -883,6 +890,12 @@ typedef struct BlockLimits {
/* zone size expressed in bytes */
uint32_t zone_size;
+ /*
+ * the number of usable logical blocks within the zone, expressed
+ * in bytes. A zone capacity is smaller or equal to the zone size.
+ */
+ uint32_t zone_capacity;
+
/* total number of zones */
uint32_t nr_zones;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 89751d81f2..884e058046 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4981,6 +4981,21 @@
{ 'enum': 'Qcow2CompressionType',
'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
+##
+# @Qcow2ZoneModel:
+#
+# Zoned device model used in qcow2 image file
+#
+# @non-zoned: non-zoned model is for regular block devices
+#
+# @host-managed: host-managed model only allows sequential write over the
+# device zones
+#
+# Since 8.2
+##
+{ 'enum': 'Qcow2ZoneModel',
+ 'data': ['non-zoned', 'host-managed'] }
+
##
# @BlockdevCreateOptionsQcow2:
#
@@ -5023,6 +5038,27 @@
# @compression-type: The image cluster compression method
# (default: zlib, since 5.1)
#
+# @zone-model: @Qcow2ZoneModel. The zone device model.
+# (default: non-zoned, since 8.2)
+#
+# @zone-size: Total number of bytes within zones (since 8.2)
+#
+# @zone-capacity: The number of usable logical blocks within zones
+# in bytes. A zone capacity is always smaller or equal to the
+# zone size (since 8.2)
+#
+# @conventional-zones: The number of conventional zones of the
+# zoned device (since 8.2)
+#
+# @max-open-zones: The maximal number of open zones (since 8.2)
+#
+# @max-active-zones: The maximal number of zones in the implicit
+# open, explicit open or closed state (since 8.2)
+#
+# @max-append-bytes: The maximal number of bytes of a zone
+# append request that can be issued to the device. It must be
+# 512-byte aligned (since 8.2)
+#
# Since: 2.12
##
{ 'struct': 'BlockdevCreateOptionsQcow2',
@@ -5039,7 +5075,14 @@
'*preallocation': 'PreallocMode',
'*lazy-refcounts': 'bool',
'*refcount-bits': 'int',
- '*compression-type':'Qcow2CompressionType' } }
+ '*compression-type':'Qcow2CompressionType',
+ '*zone-model': 'Qcow2ZoneModel',
+ '*zone-size': 'size',
+ '*zone-capacity': 'size',
+ '*conventional-zones': 'uint32',
+ '*max-open-zones': 'uint32',
+ '*max-active-zones': 'uint32',
+ '*max-append-bytes': 'uint32' } }
##
# @BlockdevCreateOptionsQed:
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 3/4] qcow2: add zoned emulation capability
2023-10-30 12:18 [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2023-10-30 12:18 ` [PATCH v5 1/4] docs/qcow2: add the zoned format feature Sam Li
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
@ 2023-10-30 12:18 ` Sam Li
2023-10-30 12:18 ` [PATCH v5 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
3 siblings, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-10-30 12:18 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster, Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.
Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.
Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones. It is managed by active zone lists
following LRU policy.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
block/qcow2.c | 731 ++++++++++++++++++++++++++++++++++++++++++-
block/trace-events | 2 +
include/qemu/queue.h | 1 +
3 files changed, 732 insertions(+), 2 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index cd53268ca7..b0f9023fd9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,178 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp)
return cryptoopts_qdict;
}
+#define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59)
+
+/*
+ * To emulate a real zoned device, closed, empty and full states are
+ * preserved after a power cycle. Open states are in-memory and will
+ * be lost after closing the device. Read-only and offline states are
+ * device-internal events, which are not considered for simplicity.
+ */
+static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
+ uint32_t index)
+{
+ BDRVQcow2State *s = bs->opaque;
+ Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
+ uint64_t zone_wp = bs->wps->wp[index];
+ uint64_t zone_start;
+
+ if (QCOW2_ZT_IS_CONV(zone_wp)) {
+ return BLK_ZS_NOT_WP;
+ }
+
+ if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+ return BLK_ZS_EOPEN;
+ }
+ if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+ return BLK_ZS_IOPEN;
+ }
+
+ zone_start = index * bs->bl.zone_size;
+ if (zone_wp == zone_start) {
+ return BLK_ZS_EMPTY;
+ }
+ if (zone_wp >= zone_start + bs->bl.zone_capacity) {
+ return BLK_ZS_FULL;
+ }
+ if (zone_wp > zone_start) {
+ return BLK_ZS_CLOSED;
+ }
+ return BLK_ZS_NOT_WP;
+}
+
+/*
+ * Write the new wp value to the dedicated location of the image file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ uint64_t wpv = *wp;
+ int ret;
+
+ ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
+ + sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+ if (ret < 0) {
+ goto exit;
+ }
+ trace_qcow2_wp_tracking(index, *wp >> BDRV_SECTOR_BITS);
+ return ret;
+
+exit:
+ *wp = wpv;
+ error_report("Failed to write metadata with file");
+ return ret;
+}
+
+static bool qcow2_can_activate_zone(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+ /* When the max active zone is zero, there is no limit on active zones */
+ if (!s->zoned_header.max_active_zones) {
+ return true;
+ }
+
+ /* The active zones are zones with the states of open and closed */
+ if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+ < s->zoned_header.max_active_zones) {
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * This function manages open zones under active zones limit. It checks
+ * if a zone can transition to open state while maintaining max open and
+ * active zone limits.
+ */
+static bool qcow2_can_open_zone(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+ Qcow2ZoneListEntry *zone_entry;
+
+ /* When the max open zone is zero, there is no limit on open zones */
+ if (!s->zoned_header.max_open_zones) {
+ return true;
+ }
+
+ /*
+ * The open zones are zones with the states of explicitly and
+ * implicitly open.
+ */
+ if (s->nr_zones_imp_open + s->nr_zones_exp_open <
+ s->zoned_header.max_open_zones) {
+ return true;
+ }
+
+ /*
+ * Zones are managed once at a time. Thus, the number of implicitly open
+ * zone can never be over the open zone limit. When the active zone limit
+ * is not reached, close only one implicitly open zone.
+ */
+ if (qcow2_can_activate_zone(bs)) {
+ /*
+ * The LRU policy is used for handling active zone lists. When
+ * removing a random zone entry, we discard the least recently used
+ * list item. The list item at the last is the least recently used
+ * one. The zone list maintained this property by removing the last
+ * entry and inserting before the first entry.
+ */
+ zone_entry = QLIST_LAST(&s->imp_open_zones, imp_open_zone_entry);
+ QLIST_REMOVE(zone_entry, imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(0x23, s->nr_zones_imp_open);
+ s->nr_zones_closed++;
+ return true;
+ }
+ return false;
+}
+
+/*
+ * The zoned device has limited zone resources on open, closed, active
+ * zones.
+ */
+static int qcow2_check_zone_resources(BlockDriverState *bs,
+ BlockZoneState zs)
+{
+ switch (zs) {
+ case BLK_ZS_EMPTY:
+ if (!qcow2_can_activate_zone(bs)) {
+ error_report("No enough active zones");
+ return -EINVAL;
+ }
+ break;
+ case BLK_ZS_CLOSED:
+ if (!qcow2_can_open_zone(bs)) {
+ error_report("No enough open zones");
+ return -EINVAL;
+ }
+ break;
+ default:
+ /* Other states will not affect zone resources management */
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
+{
+ int ret;
+ BDRVQcow2State *s = bs->opaque;
+ uint64_t wps_size = s->zoned_header.zonedmeta_size;
+ g_autofree uint64_t *temp = NULL;
+ temp = g_new(uint64_t, wps_size);
+ ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
+ wps_size, temp, 0);
+ if (ret < 0) {
+ error_report("Can not read metadata");
+ return ret;
+ }
+
+ memcpy(bs->wps->wp, temp, wps_size);
+ return 0;
+}
+
/*
* read qcow2 extension and fill bs
* start reading from start_offset
@@ -463,7 +635,25 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
be32_to_cpu(zoned_ext.max_active_zones);
zoned_ext.max_append_bytes =
be32_to_cpu(zoned_ext.max_append_bytes);
+ zoned_ext.zonedmeta_offset =
+ be64_to_cpu(zoned_ext.zonedmeta_offset);
+ zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
s->zoned_header = zoned_ext;
+ bs->wps = g_malloc(sizeof(BlockZoneWps)
+ + s->zoned_header.zonedmeta_size);
+ ret = qcow2_refresh_zonedmeta(bs);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "zonedmeta: "
+ "Could not update zoned meta");
+ return ret;
+ }
+
+ s->zone_list_entries = g_new0(Qcow2ZoneListEntry,
+ zoned_ext.nr_zones);
+ QLIST_INIT(&s->exp_open_zones);
+ QLIST_INIT(&s->imp_open_zones);
+ QLIST_INIT(&s->closed_zones);
+ qemu_co_mutex_init(&bs->wps->colock);
/* refuse to open broken images */
if (zoned_ext.zone_size == 0) {
@@ -2734,9 +2924,37 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
uint64_t host_offset;
QCowL2Meta *l2meta = NULL;
AioTaskPool *aio = NULL;
+ int64_t start_offset, start_bytes;
+ BlockZoneState zs;
+ int64_t end;
+ uint64_t *wp;
+ int64_t zone_size = bs->bl.zone_size;
+ int index;
trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
+ start_offset = offset;
+ start_bytes = bytes;
+ if (bs->bl.zoned == BLK_Z_HM) {
+ /*
+ * The offset should not less than the wp of that zone where
+ * offset starts.
+ */
+ index = start_offset / zone_size;
+ wp = &bs->wps->wp[index];
+ if (offset < *wp) {
+ return -EINVAL;
+ }
+
+ /* Only allow writes when there are zone resources left */
+ zs = qcow2_get_zone_state(bs, index);
+ if (zs == BLK_ZS_CLOSED || zs == BLK_ZS_EMPTY) {
+ if (qcow2_check_zone_resources(bs, zs) < 0) {
+ return -EINVAL;
+ }
+ }
+ }
+
while (bytes != 0 && aio_task_pool_status(aio) == 0) {
l2meta = NULL;
@@ -2782,6 +3000,58 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
qiov_offset += cur_bytes;
trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
}
+
+ if (bs->bl.zoned == BLK_Z_HM) {
+ index = start_offset / zone_size;
+ wp = &bs->wps->wp[index];
+ zs = qcow2_get_zone_state(bs, index);
+ uint64_t wpv = *wp;
+ if (!QCOW2_ZT_IS_CONV(wpv)) {
+ /* align up (start_offset, zone_size), the start offset is not
+ * necessarily power of two. */
+ end = ((start_offset + zone_size) / zone_size) * zone_size;
+ if (start_offset + start_bytes <= end) {
+ *wp = start_offset + start_bytes;
+ } else {
+ ret = -EINVAL;
+ goto fail_nometa;
+ }
+
+ ret = qcow2_write_wp_at(bs, wp, index);
+ if (ret < 0) {
+ goto fail_nometa;
+ }
+
+ /*
+ * The zone state transitions to implicit open when the original
+ * state is empty or closed. When the wp reaches the end, the
+ * open states (explicit open, implicit open) become full.
+ */
+ Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
+ if (!(*wp & (zone_size - 1))) {
+ /* Being aligned to zone size implies full state */
+ if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+ QLIST_REMOVE(zone_entry, exp_open_zone_entry);
+ s->nr_zones_exp_open--;
+ } else if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+ QLIST_REMOVE(zone_entry, imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(0x24,
+ s->nr_zones_imp_open);
+ }
+ } else {
+ if (zs == BLK_ZS_CLOSED || zs == BLK_ZS_EMPTY) {
+ QLIST_INSERT_HEAD(&s->imp_open_zones, zone_entry,
+ imp_open_zone_entry);
+ s->nr_zones_imp_open++;
+
+ if (zs == BLK_ZS_CLOSED) {
+ s->nr_zones_closed--;
+ }
+ }
+ }
+ }
+ }
ret = 0;
qemu_co_mutex_lock(&s->lock);
@@ -2840,6 +3110,26 @@ static int GRAPH_RDLOCK qcow2_inactivate(BlockDriverState *bs)
return result;
}
+static void qcow2_zoned_close(BDRVQcow2State *s)
+{
+ Qcow2ZoneListEntry *zone_entry, *next;
+
+ QLIST_FOREACH_SAFE(zone_entry, &s->imp_open_zones, imp_open_zone_entry,
+ next) {
+ QLIST_REMOVE(zone_entry, imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(0x22, s->nr_zones_imp_open);
+ }
+
+ QLIST_FOREACH_SAFE(zone_entry, &s->exp_open_zones, exp_open_zone_entry,
+ next) {
+ QLIST_REMOVE(zone_entry, exp_open_zone_entry);
+ s->nr_zones_exp_open--;
+ }
+
+ assert(s->nr_zones_imp_open + s->nr_zones_exp_open == 0);
+}
+
static void coroutine_mixed_fn GRAPH_RDLOCK
qcow2_do_close(BlockDriverState *bs, bool close_data_file)
{
@@ -2879,6 +3169,8 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
qcow2_refcount_close(bs);
qcow2_free_snapshots(bs);
+ qcow2_zoned_close(s);
+ g_free(bs->wps);
}
static void GRAPH_UNLOCKED qcow2_close(BlockDriverState *bs)
@@ -3193,7 +3485,10 @@ int qcow2_update_header(BlockDriverState *bs)
.max_active_zones =
cpu_to_be32(s->zoned_header.max_active_zones),
.max_append_bytes =
- cpu_to_be32(s->zoned_header.max_append_bytes)
+ cpu_to_be32(s->zoned_header.max_append_bytes),
+ .zonedmeta_offset =
+ cpu_to_be64(s->zoned_header.zonedmeta_offset),
+ .zonedmeta_size = cpu_to_be64(s->zoned_header.zonedmeta_size),
};
ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
&zoned_header, sizeof(zoned_header),
@@ -3598,7 +3893,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
int version;
int refcount_order;
uint64_t *refcount_table;
- int ret;
+ uint64_t zoned_meta_size, zoned_clusterlen;
+ int ret, offset, i;
uint8_t compression_type = QCOW2_COMPRESSION_TYPE_ZLIB;
assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
@@ -3948,6 +4244,46 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
}
}
s->zoned_header.max_append_bytes = qcow2_opts->max_append_bytes;
+
+ uint32_t nrz = s->zoned_header.nr_zones;
+ zoned_meta_size = sizeof(uint64_t) * nrz;
+ g_autofree uint64_t *meta = NULL;
+ meta = g_new0(uint64_t, nrz);
+
+ for (i = 0; i < s->zoned_header.conventional_zones; ++i) {
+ meta[i] = i * s->zoned_header.zone_size;
+ meta[i] |= 1ULL << 59;
+ }
+
+ for (; i < nrz; ++i) {
+ meta[i] = i * s->zoned_header.zone_size;
+ }
+
+ offset = qcow2_alloc_clusters(blk_bs(blk), zoned_meta_size);
+ if (offset < 0) {
+ error_setg_errno(errp, -offset, "Could not allocate clusters "
+ "for zoned metadata size");
+ goto out;
+ }
+ s->zoned_header.zonedmeta_offset = offset;
+ s->zoned_header.zonedmeta_size = zoned_meta_size;
+
+ zoned_clusterlen = size_to_clusters(s, zoned_meta_size)
+ * s->cluster_size;
+ assert(qcow2_pre_write_overlap_check(bs, 0, offset,
+ zoned_clusterlen,false) == 0);
+ ret = bdrv_pwrite_zeroes(blk_bs(blk)->file, offset,
+ zoned_clusterlen, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not zero fill zoned metadata");
+ goto out;
+ }
+ ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not write zoned metadata "
+ "to disk");
+ goto out;
+ }
} else {
s->zoned_header.zoned = BLK_Z_NONE;
}
@@ -4287,6 +4623,393 @@ qcow2_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
return ret;
}
+static int coroutine_fn
+qcow2_co_zone_report(BlockDriverState *bs, int64_t offset,
+ unsigned int *nr_zones, BlockZoneDescriptor *zones)
+{
+ BDRVQcow2State *s = bs->opaque;
+ uint64_t zone_size = s->zoned_header.zone_size;
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+ int64_t size = bs->bl.nr_zones * zone_size;
+ unsigned int nrz = *nr_zones;
+ int i = 0;
+ int si;
+
+ if (offset >= capacity) {
+ error_report("offset %" PRId64 " is equal to or greater than the "
+ "device capacity %" PRId64 "", offset, capacity);
+ return -EINVAL;
+ }
+
+ if (nrz > bs->bl.nr_zones) {
+ error_report("nr_zones %" PRId32 " should not exceed the device zones"
+ "%" PRId32 "", nrz, bs->bl.nr_zones);
+ return -EINVAL;
+ }
+
+ if (zone_size > 0) {
+ si = offset / zone_size;
+ qemu_co_mutex_lock(&bs->wps->colock);
+ for (; i < nrz; ++i) {
+ if (i + si >= bs->bl.nr_zones) {
+ break;
+ }
+
+ zones[i].start = (si + i) * zone_size;
+
+ /* The last zone can be smaller than the zone size */
+ if ((si + i + 1) == bs->bl.nr_zones && size > capacity) {
+ uint32_t l = zone_size - (size - capacity);
+ zones[i].length = l;
+ zones[i].cap = l;
+ } else {
+ zones[i].length = zone_size;
+ zones[i].cap = zone_size;
+ }
+
+ uint64_t wp = bs->wps->wp[si + i];
+ if (QCOW2_ZT_IS_CONV(wp)) {
+ zones[i].type = BLK_ZT_CONV;
+ zones[i].state = BLK_ZS_NOT_WP;
+ /* Clear the zone type bit */
+ wp &= ~(1ULL << 59);
+ } else {
+ zones[i].type = BLK_ZT_SWR;
+ zones[i].state = qcow2_get_zone_state(bs, si + i);
+ }
+ zones[i].wp = wp;
+ }
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ }
+ *nr_zones = i;
+ return 0;
+}
+
+static int qcow2_open_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&bs->wps->colock);
+ uint64_t *wp = &bs->wps->wp[index];
+ BlockZoneState zs = qcow2_get_zone_state(bs, index);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
+ if (ret < 0) {
+ goto unlock;
+ }
+ break;
+ case BLK_ZS_IOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index], imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(BLK_ZO_OPEN, s->nr_zones_imp_open);
+ break;
+ case BLK_ZS_EOPEN:
+ return 0;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ goto unlock;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ ret = qcow2_write_wp_at(bs, wp, index);
+ if (!ret) {
+ QLIST_INSERT_HEAD(&s->exp_open_zones, &s->zone_list_entries[index],
+ exp_open_zone_entry);
+ s->nr_zones_exp_open++;
+ }
+
+unlock:
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ return ret;
+}
+
+static int qcow2_close_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&bs->wps->colock);
+ BlockZoneState zs = qcow2_get_zone_state(bs, index);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ break;
+ case BLK_ZS_IOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index], imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(BLK_ZO_CLOSE, s->nr_zones_imp_open);
+ break;
+ case BLK_ZS_EOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index], exp_open_zone_entry);
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ goto unlock;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (qcow2_get_zone_state(bs, index) == BLK_ZS_CLOSED) {
+ s->nr_zones_closed++;
+ }
+ ret = 0;
+
+unlock:
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ return ret;
+}
+
+static int qcow2_finish_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&bs->wps->colock);
+ uint64_t *wp = &bs->wps->wp[index];
+ BlockZoneState zs = qcow2_get_zone_state(bs, index);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
+ if (ret < 0) {
+ goto unlock;
+ }
+ break;
+ case BLK_ZS_IOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index], imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(BLK_ZO_FINISH, s->nr_zones_imp_open);
+ break;
+ case BLK_ZS_EOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index], exp_open_zone_entry);
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ goto unlock;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ ret = 0;
+ goto unlock;
+ default:
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ *wp = ((uint64_t)index + 1) * s->zoned_header.zone_size;
+ ret = qcow2_write_wp_at(bs, wp, index);
+
+unlock:
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ return ret;
+}
+
+static int qcow2_reset_zone(BlockDriverState *bs, uint32_t index,
+ int64_t len) {
+ BDRVQcow2State *s = bs->opaque;
+ int nrz = bs->bl.nr_zones;
+ int zone_size = bs->bl.zone_size;
+ int n, ret = 0;
+
+ qemu_co_mutex_lock(&bs->wps->colock);
+ uint64_t *wp = &bs->wps->wp[index];
+ if (len == bs->total_sectors << BDRV_SECTOR_BITS) {
+ n = nrz;
+ index = 0;
+ } else {
+ n = len / zone_size;
+ }
+
+ for (int i = 0; i < n; ++i) {
+ uint64_t *wp_i = (uint64_t *)(wp + i);
+ uint64_t wpi_v = *wp_i;
+ if (QCOW2_ZT_IS_CONV(wpi_v)) {
+ continue;
+ }
+
+ BlockZoneState zs = qcow2_get_zone_state(bs, index + i);
+ switch (zs) {
+ case BLK_ZS_EMPTY:
+ break;
+ case BLK_ZS_IOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index + i], imp_open_zone_entry);
+ s->nr_zones_imp_open--;
+ trace_qcow2_imp_open_zones(BLK_ZO_RESET, s->nr_zones_imp_open);
+ break;
+ case BLK_ZS_EOPEN:
+ QLIST_REMOVE(&s->zone_list_entries[index + i], exp_open_zone_entry);
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (zs == BLK_ZS_EMPTY) {
+ continue;
+ }
+
+ *wp_i = ((uint64_t)index + i) * zone_size;
+ ret = qcow2_write_wp_at(bs, wp_i, index + i);
+ if (ret < 0) {
+ goto unlock;
+ }
+ /* clear data */
+ ret = qcow2_co_pwrite_zeroes(bs, *wp_i, zone_size, 0);
+ if (ret < 0) {
+ error_report("Failed to reset zone at 0x%" PRIx64 "", *wp_i);
+ }
+ }
+
+unlock:
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ return ret;
+}
+
+static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+ BDRVQcow2State *s = bs->opaque;
+ int ret = 0;
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+ int64_t zone_size = s->zoned_header.zone_size;
+ int64_t zone_size_mask = zone_size - 1;
+ uint32_t index = offset / zone_size;
+ BlockZoneWps *wps = bs->wps;
+
+ if (offset >= capacity) {
+ error_report("offset %" PRId64 " is equal to or greater than the"
+ "device capacity %" PRId64 "", offset, capacity);
+ return -EINVAL;
+ }
+
+ if (offset & zone_size_mask) {
+ error_report("sector offset %" PRId64 " is not aligned to zone size"
+ " %" PRId64 "", offset / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ if (((offset + len) < capacity && len & zone_size_mask) ||
+ offset + len > capacity) {
+ error_report("number of sectors %" PRId64 " is not aligned to zone"
+ " size %" PRId64 "", len / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ qemu_co_mutex_lock(&wps->colock);
+ uint64_t wpv = wps->wp[index];
+ if (QCOW2_ZT_IS_CONV(wpv) && len != capacity) {
+ error_report("zone mgmt operations are not allowed for "
+ "conventional zones");
+ ret = -EIO;
+ goto unlock;
+ }
+ qemu_co_mutex_unlock(&wps->colock);
+
+ switch(op) {
+ case BLK_ZO_OPEN:
+ ret = qcow2_open_zone(bs, index);
+ break;
+ case BLK_ZO_CLOSE:
+ ret = qcow2_close_zone(bs, index);
+ break;
+ case BLK_ZO_FINISH:
+ ret = qcow2_finish_zone(bs, index);
+ break;
+ case BLK_ZO_RESET:
+ ret = qcow2_reset_zone(bs, index, len);
+ break;
+ default:
+ error_report("Unsupported zone op: 0x%x", op);
+ ret = -ENOTSUP;
+ break;
+ }
+ return ret;
+
+unlock:
+ qemu_co_mutex_unlock(&wps->colock);
+ return ret;
+}
+
+static int coroutine_fn
+qcow2_co_zone_append(BlockDriverState *bs, int64_t *offset, QEMUIOVector *qiov,
+ BdrvRequestFlags flags)
+{
+ assert(flags == 0);
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+ uint32_t index;
+ int ret;
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
+ int64_t iov_len = 0;
+ int64_t len = 0;
+
+ if (*offset >= capacity) {
+ error_report("*offset %" PRId64 " is equal to or greater than the"
+ "device capacity %" PRId64 "", *offset, capacity);
+ return -EINVAL;
+ }
+
+ /* offset + len should not pass the end of that zone starting from offset */
+ if (*offset & zone_size_mask) {
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
+ return -EINVAL;
+ }
+
+ int64_t wg = bs->bl.write_granularity;
+ int64_t wg_mask = wg - 1;
+ for (int i = 0; i < qiov->niov; i++) {
+ iov_len = qiov->iov[i].iov_len;
+ if (iov_len & wg_mask) {
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
+ "block size %" PRId64 "", i, iov_len, wg);
+ return -EINVAL;
+ }
+ }
+ len = qiov->size;
+ index = *offset / bs->bl.zone_size;
+
+ if ((len >> BDRV_SECTOR_BITS) > bs->bl.max_append_sectors) {
+ return -ENOTSUP;
+ }
+
+ qemu_co_mutex_lock(&bs->wps->colock);
+ uint64_t wp_i = bs->wps->wp[index];
+ ret = qcow2_co_pwritev_part(bs, wp_i, len, qiov, 0, 0);
+ if (ret == 0) {
+ *offset = wp_i;
+ } else {
+ error_report("qcow2: zap failed");
+ }
+
+ qemu_co_mutex_unlock(&bs->wps->colock);
+ return ret;
+}
+
static int coroutine_fn GRAPH_RDLOCK
qcow2_co_copy_range_from(BlockDriverState *bs,
BdrvChild *src, int64_t src_offset,
@@ -6337,6 +7060,10 @@ BlockDriver bdrv_qcow2 = {
.bdrv_co_pwritev_part = qcow2_co_pwritev_part,
.bdrv_co_flush_to_os = qcow2_co_flush_to_os,
+ .bdrv_co_zone_report = qcow2_co_zone_report,
+ .bdrv_co_zone_mgmt = qcow2_co_zone_mgmt,
+ .bdrv_co_zone_append = qcow2_co_zone_append,
+
.bdrv_co_pwrite_zeroes = qcow2_co_pwrite_zeroes,
.bdrv_co_pdiscard = qcow2_co_pdiscard,
.bdrv_co_copy_range_from = qcow2_co_copy_range_from,
diff --git a/block/trace-events b/block/trace-events
index 8e789e1f12..e35222e079 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -82,6 +82,8 @@ qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int64_t bytes) "co %p offset 0x%" PRIx64 " bytes %" PRId64
qcow2_pwrite_zeroes(void *co, int64_t offset, int64_t bytes) "co %p offset 0x%" PRIx64 " bytes %" PRId64
qcow2_skip_cow(void *co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
+qcow2_wp_tracking(int index, uint64_t wp) "wps[%d]: 0x%" PRIx64
+qcow2_imp_open_zones(uint8_t op, int nrz) "nr_imp_open_zones after op 0x%x: %d"
# qcow2-cluster.c
qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index e029e7bf66..3f0a48740e 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -179,6 +179,7 @@ struct { \
#define QLIST_EMPTY(head) ((head)->lh_first == NULL)
#define QLIST_FIRST(head) ((head)->lh_first)
#define QLIST_NEXT(elm, field) ((elm)->field.le_next)
+#define QLIST_LAST(head, field) (*(head)->lh_first->field.le_prev)
/*
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 4/4] iotests: test the zoned format feature for qcow2 file
2023-10-30 12:18 [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
` (2 preceding siblings ...)
2023-10-30 12:18 ` [PATCH v5 3/4] qcow2: add zoned emulation capability Sam Li
@ 2023-10-30 12:18 ` Sam Li
3 siblings, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-10-30 12:18 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster, Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/qemu-iotests/tests/zoned-qcow2 | 126 +++++++++++++++++++++++
tests/qemu-iotests/tests/zoned-qcow2.out | 118 +++++++++++++++++++++
2 files changed, 244 insertions(+)
create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
diff --git a/tests/qemu-iotests/tests/zoned-qcow2 b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 0000000000..7749329480
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,126 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+ _cleanup_test_img
+ _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o conventional_zones=0 -o max_append_bytes=131072 \
+-o max_open_zones=0 -o max_active_zones=0 -o zone_model=host-managed
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test zone operations one by one"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report zones[-1]:" # zones[-1] dictates the last zone
+$QEMU_IO $IMG -c "zrp 0x2C000000 2" # 0x2C000000 / 512 = 0x160000
+echo
+echo
+echo "(2) open zones[0], zones[1], zones[-1] then close, finish, reset:"
+$QEMU_IO $IMG << EOF
+zo 0 0x4000000 # 0x4000000 / 512 = 0x20000
+zrp 0 1
+zo 0x4000000 0x4000000
+zrp 0x4000000 1
+zo 0x2C000000 0x4000000
+zrp 0x2C000000 2
+zc 0 0x4000000
+zrp 0 1
+zc 0x2C000000 0x4000000
+zrp 0x2C000000 2
+zf 0 0x4000000
+zrp 0 1
+zf 64M 64M
+zrp 0x4000000 2
+zf 0x2C000000 0x4000000
+zrp 0x2C000000 2
+zrs 0 0x4000000
+zrp 0 1
+zrs 0x4000000 0x4000000
+zrp 0x4000000 1
+zrs 0x2C000000 0x4000000
+zrp 0x2C000000 2
+EOF
+
+echo
+echo "(3) append write with (4k, 8k) data"
+$QEMU_IO $IMG -c "zrp 0 12" # the physical block size of the device is 4096
+echo "Append write zones[0], zones[1] twice"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0x4000000 0x1000 0x2000
+zrp 0x4000000 1
+zap -p 0x4000000 0x1000 0x2000
+zrp 0x4000000 1
+EOF
+
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M" -c "zrp 0 12"
+echo
+echo
+
+echo "case 2: test a sets of ops that works or not"
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000 # wrote (4k, 4k):
+zrp 0 1
+zap -p 0 0x1000 0x3ffd000
+zrp 0 1
+EOF
+
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M" -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zap -p 0xc000000 0x1000 0x1000
+zap -p 0x20000000 0x1000 0x1000
+zrp 0 12
+zrs 0 768M
+zrp 0 12
+EOF
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 0000000000..aec43f0d5b
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2.out
@@ -0,0 +1,118 @@
+QA output created by zoned-qcow2
+
+=== Initial image setup ===
+
+Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib zone_model=host-managed zone_size=67108864 zone_capacity=67108864 conventional_zones=0 max_append_bytes=131072 max_active_zones=0 max_open_zones=0 size=805306368 lazy_refcounts=off refcount_bits=16
+
+=== Testing a qcow2 img with zoned format ===
+
+case 1: test zone operations one by one
+(1) report zones[0]:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+
+report zones[0~9]:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+
+report zones[-1]:
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+
+
+(2) open zones[0], zones[1], zones[-1] then close, finish, reset:
+qemu-io> bad argument count 8 to zo, expected 2 arguments
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:3, [type: 2]
+qemu-io> qemu-io> start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:3, [type: 2]
+qemu-io> qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x20000, zcond:14, [type: 2]
+qemu-io> qemu-io> start: 0x20000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:14, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x160000, len 0x20000, cap 0x20000, wptr 0x180000, zcond:14, [type: 2]
+qemu-io> qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+qemu-io>
+(3) append write with (4k, 8k) data
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+start: 0x140000, len 0x20000, cap 0x20000, wptr 0x140000, zcond:1, [type: 2]
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+Append write zones[0], zones[1] twice
+qemu-io> After zap done, the append sector is 0x0
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x18, zcond:2, [type: 2]
+qemu-io> After zap done, the append sector is 0x18
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x30, zcond:2, [type: 2]
+qemu-io> After zap done, the append sector is 0x20000
+qemu-io> start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20018, zcond:2, [type: 2]
+qemu-io> After zap done, the append sector is 0x20018
+qemu-io> start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20030, zcond:2, [type: 2]
+qemu-io>
+Reset all:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+start: 0x140000, len 0x20000, cap 0x20000, wptr 0x140000, zcond:1, [type: 2]
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+
+
+case 2: test a sets of ops that works or not
+(1) append write (4k, 4k) and then write to full
+qemu-io> bad argument count 8 to zap, expected between 3 and 4 arguments
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+qemu-io> zone append failed: Operation not supported
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+qemu-io> Reset zones[0]:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+(2) write in zones[0], zones[3], zones[8], and then reset all
+qemu-io> After zap done, the append sector is 0x0
+qemu-io> After zap done, the append sector is 0x60000
+qemu-io> After zap done, the append sector is 0x100000
+qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x10, zcond:2, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60010, zcond:2, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100010, zcond:2, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+start: 0x140000, len 0x20000, cap 0x20000, wptr 0x140000, zcond:1, [type: 2]
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+qemu-io> qemu-io> start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+start: 0x140000, len 0x20000, cap 0x20000, wptr 0x140000, zcond:1, [type: 2]
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+qemu-io> *** done
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
2023-10-30 12:18 ` [PATCH v5 1/4] docs/qcow2: add the zoned format feature Sam Li
@ 2023-10-30 14:04 ` Eric Blake
2023-10-30 14:19 ` Sam Li
2023-10-31 0:53 ` Damien Le Moal
0 siblings, 2 replies; 17+ messages in thread
From: Eric Blake @ 2023-10-30 14:04 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
On Mon, Oct 30, 2023 at 08:18:44PM +0800, Sam Li wrote:
> Add the specs for the zoned format feature of the qcow2 driver.
> The qcow2 file can be taken as zoned device and passed through by
> virtio-blk device or NVMe ZNS device to the guest given zoned
> information.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> index 105cb9679c..4647c5fa29 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
> filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> NOCOW flag).
>
> + .. option:: zoned
> + 1 for host-managed zoned device and 0 for a non-zoned device.
Should this be a bool or enum type, instead of requiring the user to
know magic numbers? Is there a potential to add yet another type in
the future?
> +
> + .. option:: zone_size
> +
> + The size of a zone in bytes. The device is divided into zones of this
> + size with the exception of the last zone, which may be smaller.
> +
> + .. option:: zone_capacity
> +
> + The initial capacity value, in bytes, for all zones. The capacity must
> + be less than or equal to zone size. If the last zone is smaller, then
> + its capacity is capped.
> +
> + The zone capacity is per zone and may be different between zones in real
> + devices. For simplicity, QCow2 sets all zones to the same capacity.
Just making sure I understand: One possible setup would be to describe
a block device with zones of size 1024M but with capacity 1000M (that
is, the zone reserves 24M capacity for other purposes)?
Otherwise, I'm having a hard time seeing when you would ever set a
capacity different from size.
Are there requirements that one (or both) of these values must be
powers of 2? Or is the requirement merely that they must be a
multiple of 512 bytes (because sub-sector operations are not
permitted)? Is there any implicit requirement based on qcow2
implementation that a zone size/capacity must be a multiple of cluster
size (other than possibly for the last zone)?
> +
> + .. option:: zone_nr_conv
> +
> + The number of conventional zones of the zoned device.
> +
> + .. option:: max_open_zones
> +
> + The maximal allowed open zones.
> +
> + .. option:: max_active_zones
> +
> + The limit of the zones with implicit open, explicit open or closed state.
> +
> + .. option:: max_append_sectors
> +
> + The maximal number of 512-byte sectors in a zone append request.
Why is this value in sectors instead of bytes? I understand that
drivers may be written with sectors in mind, but any time we mix units
in the public interface, it gets awkward. I'd lean towards having
bytes here, with a requirement that it be a multiple of 512.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
2023-10-30 14:04 ` Eric Blake
@ 2023-10-30 14:19 ` Sam Li
2023-10-31 0:53 ` Damien Le Moal
1 sibling, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-10-30 14:19 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
Eric Blake <eblake@redhat.com> 于2023年10月30日周一 22:05写道:
>
> On Mon, Oct 30, 2023 at 08:18:44PM +0800, Sam Li wrote:
> > Add the specs for the zoned format feature of the qcow2 driver.
> > The qcow2 file can be taken as zoned device and passed through by
> > virtio-blk device or NVMe ZNS device to the guest given zoned
> > information.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> > index 105cb9679c..4647c5fa29 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
> > filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> > NOCOW flag).
> >
> > + .. option:: zoned
> > + 1 for host-managed zoned device and 0 for a non-zoned device.
>
> Should this be a bool or enum type, instead of requiring the user to
> know magic numbers? Is there a potential to add yet another type in
> the future?
Mistake, sorry. Forgot to document this change but the configurations
in the subsequent patch uses enum type.
>
> > +
> > + .. option:: zone_size
> > +
> > + The size of a zone in bytes. The device is divided into zones of this
> > + size with the exception of the last zone, which may be smaller.
> > +
> > + .. option:: zone_capacity
> > +
> > + The initial capacity value, in bytes, for all zones. The capacity must
> > + be less than or equal to zone size. If the last zone is smaller, then
> > + its capacity is capped.
> > +
> > + The zone capacity is per zone and may be different between zones in real
> > + devices. For simplicity, QCow2 sets all zones to the same capacity.
>
> Just making sure I understand: One possible setup would be to describe
> a block device with zones of size 1024M but with capacity 1000M (that
> is, the zone reserves 24M capacity for other purposes)?
Yes, it is. The NVMe ZNS drive allows that.
>
> Otherwise, I'm having a hard time seeing when you would ever set a
> capacity different from size.
>
> Are there requirements that one (or both) of these values must be
> powers of 2? Or is the requirement merely that they must be a
> multiple of 512 bytes (because sub-sector operations are not
> permitted)? Is there any implicit requirement based on qcow2
> implementation that a zone size/capacity must be a multiple of cluster
> size (other than possibly for the last zone)?
Yes. Linux will only expose zoned devices that have a zone size
that is a power of 2 number of LBAs.
No, the zone size/capacity is not necessarily a multiple of the cluster size.
>
> > +
> > + .. option:: zone_nr_conv
> > +
> > + The number of conventional zones of the zoned device.
> > +
> > + .. option:: max_open_zones
> > +
> > + The maximal allowed open zones.
> > +
> > + .. option:: max_active_zones
> > +
> > + The limit of the zones with implicit open, explicit open or closed state.
> > +
> > + .. option:: max_append_sectors
> > +
> > + The maximal number of 512-byte sectors in a zone append request.
>
> Why is this value in sectors instead of bytes? I understand that
> drivers may be written with sectors in mind, but any time we mix units
> in the public interface, it gets awkward. I'd lean towards having
> bytes here, with a requirement that it be a multiple of 512.
Sorry. Same, already changed this in the following patches.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization: qemu.org | libguestfs.org
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
@ 2023-10-30 14:53 ` Eric Blake
2023-10-30 15:01 ` Sam Li
` (2 more replies)
2023-11-02 10:19 ` Stefan Hajnoczi
2023-11-02 10:31 ` Stefan Hajnoczi
2 siblings, 3 replies; 17+ messages in thread
From: Eric Blake @ 2023-10-30 14:53 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires settings as: the device size, zone model, zone size,
> zone capacity, number of conventional zones, limits on zone
> resources (max append bytes, max open zones, and max_active_zones).
>
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> -o zone_model=host-managed
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>
> fix config?
Is this comment supposed to be part of the commit message? If not,...
> ---
...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
> block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
> block/qcow2.h | 37 +++++-
> docs/interop/qcow2.txt | 67 +++++++++-
> include/block/block_int-common.h | 13 ++
> qapi/block-core.json | 45 ++++++-
> 5 files changed, 362 insertions(+), 5 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index aa01d9e7b5..cd53268ca7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
> #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
>
> static int coroutine_fn
> qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> uint64_t offset;
> int ret;
> Qcow2BitmapHeaderExt bitmaps_ext;
> + Qcow2ZonedHeaderExtension zoned_ext;
>
> if (need_update_header != NULL) {
> *need_update_header = false;
> @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> break;
> }
>
> + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> + {
> + if (ext.len != sizeof(zoned_ext)) {
> + error_setg(errp, "zoned_ext: Invalid extension length");
> + return -EINVAL;
> + }
Do we ever anticipate the struct growing in size in the future to add
further features? Forcing the size to be constant, rather than a
minimum, may get in the way of clean upgrades to a future version of
the extension header.
> + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> + return ret;
> + }
> +
> + if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> + warn_report("A program lacking zoned format support "
> + "may modify this file and zoned metadata are "
> + "now considered inconsistent");
> + error_printf("The zoned metadata is corrupted.\n");
Why is this mixing warn_report and error_printf at the same time.
Also, grammar is inconsistent from the similar
QCOW2_AUTOCLEAR_BITMAPS, which used:
if (s->qcow_version < 3) {
/* Let's be a bit more specific */
warn_report("This qcow2 v2 image contains bitmaps, but "
"they may have been modified by a program "
"without persistent bitmap support; so now "
"they must all be considered inconsistent");
} else {
warn_report("a program lacking bitmap support "
"modified this file, so all bitmaps are now "
"considered inconsistent");
This also raises the question whether we want to ever allow zoned
support with a v2 image, or whether it should just be a hard error if
it is not a v3 image (my preference would be the latter).
> + }
> +
> + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> + zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> + zoned_ext.conventional_zones =
> + be32_to_cpu(zoned_ext.conventional_zones);
> + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> + zoned_ext.max_active_zones =
> + be32_to_cpu(zoned_ext.max_active_zones);
> + zoned_ext.max_append_bytes =
> + be32_to_cpu(zoned_ext.max_append_bytes);
> + s->zoned_header = zoned_ext;
> +
> + /* refuse to open broken images */
> + if (zoned_ext.zone_size == 0) {
> + error_setg(errp, "Zoned extension header zone_size field "
> + "can not be 0");
> + return -EINVAL;
> + }
> + if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> + error_setg(errp, "Zoned extension header zone_capacity field "
> + "can not be larger that zone_size field");
> + return -EINVAL;
> + }
> + if (zoned_ext.nr_zones != DIV_ROUND_UP(
> + bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> + error_setg(errp, "Zoned extension header nr_zones field "
> + "is wrong");
> + return -EINVAL;
> + }
Are there any other sanity checks you should do, such as ensuring
max_open_zones <= the number of total possible zones once you divide
image size by zone size?
Such sanity checks would also be useful when creating new image with
zones (and not just when opening a pre-existing image); should you
have a helper function that performs all the sanity checks for zone
values being self-consistent, and reuse it in each context, rather
than repeated open-coding the same checks?
> +
> +#ifdef DEBUG_EXT
> + printf("Qcow2: Got zoned format extension: "
> + "offset=%" PRIu32 "\n", offset);
> +#endif
> + break;
> + }
> +
> default:
> /* unknown magic - save it in case we need to rewrite the header */
> /* If you add a new feature, make sure to also update the fast
> +++ b/block/qcow2.h
> @@ -236,6 +236,27 @@ typedef struct Qcow2CryptoHeaderExtension {
> uint64_t length;
> } QEMU_PACKED Qcow2CryptoHeaderExtension;
>
> +typedef struct Qcow2ZonedHeaderExtension {
> + /* Zoned device attributes */
> + uint8_t zoned;
> + uint8_t reserved[3];
> + uint32_t zone_size;
> + uint32_t zone_capacity;
> + uint32_t conventional_zones;
> + uint32_t nr_zones;
> + uint32_t max_active_zones;
> + uint32_t max_open_zones;
> + uint32_t max_append_bytes;
> + uint64_t zonedmeta_size;
> + uint64_t zonedmeta_offset;
> +} QEMU_PACKED Qcow2ZonedHeaderExtension;
Nice that everything is well-aligned so that the struct packs into 6
consecutive 8-byte values.
> +
> +typedef struct Qcow2ZoneListEntry {
> + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
> +} Qcow2ZoneListEntry;
> +
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> uint32_t len;
> @@ -256,17 +277,20 @@ enum {
> QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
> QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
> QCOW2_INCOMPAT_EXTL2_BITNR = 4,
> + QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
> QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
> QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
> QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
> QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
> QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
> + QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
>
> QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
> | QCOW2_INCOMPAT_CORRUPT
> | QCOW2_INCOMPAT_DATA_FILE
> | QCOW2_INCOMPAT_COMPRESSION
> - | QCOW2_INCOMPAT_EXTL2,
> + | QCOW2_INCOMPAT_EXTL2
> + | QCOW2_INCOMPAT_ZONED_FORMAT,
> };
In the previous version, I had suggested an autoclear bit; but it
looks like you went with a full-bore incompatible bit. What about the
format of the disk makes it impossible to read an image if you don't
know about zoned formats? Other incompat bits have obvious reasons:
for example, you can't extract data from an extl2 if you don't know
how to access the external data; and you can't uncompress an image
with zstd if you don't have the compression header calling out that
compression type. But so far, I haven't seen anything about how zone
information makes the image unreadable by an earlier version of qemu.
Do you have proper sanity checks that the incompat bit and the zone
extension header are either both present or both absent?
>
> /* Compatible feature bits */
> @@ -285,7 +309,6 @@ enum {
> QCOW2_AUTOCLEAR_DATA_FILE_RAW = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
>
> QCOW2_AUTOCLEAR_MASK = QCOW2_AUTOCLEAR_BITMAPS
> - | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
> };
Why are you removing a bit from the autoclear mask? Did you
experiment with making zoned devices an autoclear bit, and then change
your mind to making it an incompatible bit instead? At any rate, this
part of the hunk looks wrong.
>
> enum qcow2_discard_type {
> @@ -422,6 +445,16 @@ typedef struct BDRVQcow2State {
> * is to convert the image with the desired compression type set.
> */
> Qcow2CompressionType compression_type;
> +
> + /* States of zoned device */
> + Qcow2ZonedHeaderExtension zoned_header;
> + QLIST_HEAD(, Qcow2ZoneListEntry) exp_open_zones;
> + QLIST_HEAD(, Qcow2ZoneListEntry) imp_open_zones;
> + QLIST_HEAD(, Qcow2ZoneListEntry) closed_zones;
> + Qcow2ZoneListEntry *zone_list_entries;
> + uint32_t nr_zones_exp_open;
> + uint32_t nr_zones_imp_open;
> + uint32_t nr_zones_closed;
> } BDRVQcow2State;
>
> typedef struct Qcow2COWRegion {
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index 2c4618375a..b210bc4580 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -125,7 +125,18 @@ the next fields through header_length.
> allows subcluster-based allocation. See the
> Extended L2 Entries section for more details.
>
> - Bits 5-63: Reserved (set to 0)
> + Bit 5: Zoned extension bit. If this bit is set then
> + the file is a zoned device file with
> + host-managed model.
> +
> + It is unsafe when any qcow2 writer which does
> + not understand zone information edits a file
> + with the zoned extension. The write pointer
> + tracking is corrupted between different version
> + of qcow2 writes. In such cases, the file can
> + no longer be seen as a zoned device.
This wording sounds more like you want an autoclear bit than an
incompat bit. An incompat bit implies that an image unaware of the
zone extension cannot even open the device for reads (making it
impossible to write and corrupt the zone information).
> +
> + Bits 6-63: Reserved (set to 0)
>
> 80 - 87: compatible_features
> Bitmask of compatible features. An implementation can
> @@ -249,6 +260,7 @@ be stored. Each extension has a structure like the following:
> 0x23852875 - Bitmaps extension
> 0x0537be77 - Full disk encryption header pointer
> 0x44415441 - External data file name string
> + 0x007a6264 - Zoned extension
> other - Unknown header extension, can be safely
> ignored
>
> @@ -331,6 +343,59 @@ The fields of the bitmaps extension are:
> Offset into the image file at which the bitmap directory
> starts. Must be aligned to a cluster boundary.
>
> +== Zoned extension ==
> +
> +The zoned extension is an optional header extension. It contains fields for
> +emulating the zoned stroage model (https://zonedstorage.io/).
If you stick with an incompat bit, then this header should be
mandatory when the incompat bit is set, and omitted when the incompat
bit is clear.
> +
> +When the device model is not recognized as host-managed, it is regared as
regarded
> +incompatible and report an error to users.
I'm not quite sure what you meant by a 'device model is not recognized
as host-managed'. The phrase 'and report an error to users' does not
seem to match anywhere else in the spec; I think what you are trying
to state is that a given implementation must refuse to open a qcow2
file with the zone extension header containing a 'zoned' byte (the
enum at offset 0) with a value it cannot support.
> +
> +The fields of the zoned extension are:
> + Byte 0: zoned
> + This bit represents zone model of devices. 1 is for
> + host-managed, which only allows sequential writes.
> + And 0 is for non-zoned devices without such constraints.
Grammar suggestions, and it's nice to list values in order. How about:
This bit represents the zone model of the device. 0 is for a
non-zoned device (all other information in this header is ignored). 1
is for a host-managed device, which only allows for sequential writes
within each zone. Other values may be added later; the implementation
must refuse to open a device containing an unknown zone model.
> +
> + 1 - 3: Reserved, must be zero.
> +
> + 4 - 7: zone_size
> + Total size of each zones, in bytes. It is less than 4GB
each zone
> + in the contained image for simplicity.
Must this be a power of 2, and/or a multiple of the cluster size?
Will we ever want to support making zones larger than 4G, in which
case we need to plan on sizing this field bigger to begin with?
> +
> + 8 - 11: zone_capacity
> + The number of writable bytes within the zones.
> + A zone capacity is always smaller or equal to the
> + zone size.
I'm still not understanding why we need this separate from zone_size.
> +
> + 12 - 15: conventional_zones
> + The number of conventional zones.
Given that this is a spec, it is probably worth defining what a
conventional zone is. I'm assuming it is a zone that does not have
append-only semantics?
> +
> + 16 - 19: nr_zones
> + The number of zones. It is the sum of conventional zones
> + and sequential zones.
Does the qcow2 implementation of zones guarantee that all conventional
zones appear before any sequential zones?
> +
> + 20 - 23: max_active_zones
> + The number of the zones that are in the implicit open,
> + explicit open or closed state.
s/are/can be/
> +
> + 24 - 27: max_open_zones
> + The maximal number of open (implicitly open or explicitly
> + open) zones.
What other states are there besides open and closed? If a closed zone
is active, then when can a zone ever not be active? Is it required
that max_open_zones <= max_active_zones <= nr_zones?
> +
> + 28 - 31: max_append_bytes
> + The number of bytes of a zone append request that can be
> + issued to the device. It must be 512-byte aligned.
Here, you call out bytes; in the docs above you called out sectors.
Make sure your patch is self-consistent.
> +
> + 32 - 39: zonedmeta_size
> + The size of zoned metadata in bytes. It contains no more
> + than 4GB. The zoned metadata structure is the write
> + pointers for each zone.
It contains no more than 4G, but yet has an 8-byte value. Why?
> +
> + 40 - 47: zonedmeta_offset
> + The offset of zoned metadata structure in the contained
> + image, in bytes.
> +
> == Full disk encryption header pointer ==
>
> The full disk encryption header must be present if, and only if, the
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
It feels like you are missing documentation on the zonedmeta
cluster(s) - you've described it as being write pointers for each
zone, but is it just 8 bytes per zone, taking up as many bytes as
needed to cover all zones, with all remaining bytes in the cluster
being zero padding?
> +++ b/qapi/block-core.json
> @@ -4981,6 +4981,21 @@
> { 'enum': 'Qcow2CompressionType',
> 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>
> +##
> +# @Qcow2ZoneModel:
> +#
> +# Zoned device model used in qcow2 image file
> +#
> +# @non-zoned: non-zoned model is for regular block devices
> +#
> +# @host-managed: host-managed model only allows sequential write over the
> +# device zones
> +#
> +# Since 8.2
> +##
> +{ 'enum': 'Qcow2ZoneModel',
> + 'data': ['non-zoned', 'host-managed'] }
> +
> ##
> # @BlockdevCreateOptionsQcow2:
> #
> @@ -5023,6 +5038,27 @@
> # @compression-type: The image cluster compression method
> # (default: zlib, since 5.1)
> #
> +# @zone-model: @Qcow2ZoneModel. The zone device model.
> +# (default: non-zoned, since 8.2)
> +#
> +# @zone-size: Total number of bytes within zones (since 8.2)
If @zone-model is "non-zoned", does it make sense to even allow
@zone-size and friends? Should this use a QMP union, where you can
pass in the remaining zone-* fields only when zone-model is set to
host-managed?
> +#
> +# @zone-capacity: The number of usable logical blocks within zones
> +# in bytes. A zone capacity is always smaller or equal to the
> +# zone size (since 8.2)
> +#
> +# @conventional-zones: The number of conventional zones of the
> +# zoned device (since 8.2)
> +#
> +# @max-open-zones: The maximal number of open zones (since 8.2)
> +#
> +# @max-active-zones: The maximal number of zones in the implicit
> +# open, explicit open or closed state (since 8.2)
> +#
> +# @max-append-bytes: The maximal number of bytes of a zone
> +# append request that can be issued to the device. It must be
> +# 512-byte aligned (since 8.2)
> +#
> # Since: 2.12
> ##
> { 'struct': 'BlockdevCreateOptionsQcow2',
> @@ -5039,7 +5075,14 @@
> '*preallocation': 'PreallocMode',
> '*lazy-refcounts': 'bool',
> '*refcount-bits': 'int',
> - '*compression-type':'Qcow2CompressionType' } }
> + '*compression-type':'Qcow2CompressionType',
> + '*zone-model': 'Qcow2ZoneModel',
> + '*zone-size': 'size',
> + '*zone-capacity': 'size',
> + '*conventional-zones': 'uint32',
> + '*max-open-zones': 'uint32',
> + '*max-active-zones': 'uint32',
> + '*max-append-bytes': 'uint32' } }
In other words, I'm envisioning something like an optional
'*zone':'ZoneStruct', where:
{ 'struct': 'ZoneHostManaged',
'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 'uint32' } }
{ 'union': 'ZoneStruct',
'base': { 'model': 'Qcow2ZoneModel' },
'discriminator': 'model',
'data': { 'non-zoned': {},
'host-managed': 'ZoneHostManaged' } }
then over the wire, QMP can use the existing:
{ ..., "compression-type":"zstd" }
as a synonym for the new but explicit non-zoned:
{ ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
and when we want to use zones, we pass:
{ ..., "compression-type":"zstd", "zone":{"mode":"host-managed", "size":16777216} }
where you don't have to have zone- prefixing everywhere because it is
instead contained in the smart union object where it is obvious from
the 'mode' field what other fields should be present.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 14:53 ` Eric Blake
@ 2023-10-30 15:01 ` Sam Li
2023-11-02 6:30 ` Stefan Hajnoczi
2023-11-03 9:08 ` Markus Armbruster
2023-11-16 18:26 ` Sam Li
2 siblings, 1 reply; 17+ messages in thread
From: Sam Li @ 2023-10-30 15:01 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
Hi Eric,
Eric Blake <eblake@redhat.com> 于2023年10月30日周一 22:53写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append bytes, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> > max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=host-managed
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> >
> > fix config?
>
> Is this comment supposed to be part of the commit message? If not,...
No...
>
> > ---
>
> ...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
>
> > block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
> > block/qcow2.h | 37 +++++-
> > docs/interop/qcow2.txt | 67 +++++++++-
> > include/block/block_int-common.h | 13 ++
> > qapi/block-core.json | 45 ++++++-
> > 5 files changed, 362 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index aa01d9e7b5..cd53268ca7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> > #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> > #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> > #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
> >
> > static int coroutine_fn
> > qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > uint64_t offset;
> > int ret;
> > Qcow2BitmapHeaderExt bitmaps_ext;
> > + Qcow2ZonedHeaderExtension zoned_ext;
> >
> > if (need_update_header != NULL) {
> > *need_update_header = false;
> > @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > break;
> > }
> >
> > + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > + {
> > + if (ext.len != sizeof(zoned_ext)) {
> > + error_setg(errp, "zoned_ext: Invalid extension length");
> > + return -EINVAL;
> > + }
>
> Do we ever anticipate the struct growing in size in the future to add
> further features? Forcing the size to be constant, rather than a
> minimum, may get in the way of clean upgrades to a future version of
> the extension header.
The zoned extension could grow. So ext.len > sizeof(zoned_ext) -> invalid.
>
> > + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > + return ret;
> > + }
> > +
> > + if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> > + warn_report("A program lacking zoned format support "
> > + "may modify this file and zoned metadata are "
> > + "now considered inconsistent");
> > + error_printf("The zoned metadata is corrupted.\n");
>
> Why is this mixing warn_report and error_printf at the same time.
> Also, grammar is inconsistent from the similar
> QCOW2_AUTOCLEAR_BITMAPS, which used:
>
> if (s->qcow_version < 3) {
> /* Let's be a bit more specific */
> warn_report("This qcow2 v2 image contains bitmaps, but "
> "they may have been modified by a program "
> "without persistent bitmap support; so now "
> "they must all be considered inconsistent");
> } else {
> warn_report("a program lacking bitmap support "
> "modified this file, so all bitmaps are now "
> "considered inconsistent");
>
> This also raises the question whether we want to ever allow zoned
> support with a v2 image, or whether it should just be a hard error if
> it is not a v3 image (my preference would be the latter).
>
Actually, this part should be gotten rid of after discussion with
Stefan. The incompatible feature bit of zoned format does not need to
check if the zoned model is host-managed.
It's a bit late for me. So I will respond to the rest of the comments
later. Thanks for reviewing!
Sam
> > + }
> > +
> > + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> > + zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> > + zoned_ext.conventional_zones =
> > + be32_to_cpu(zoned_ext.conventional_zones);
> > + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> > + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> > + zoned_ext.max_active_zones =
> > + be32_to_cpu(zoned_ext.max_active_zones);
> > + zoned_ext.max_append_bytes =
> > + be32_to_cpu(zoned_ext.max_append_bytes);
> > + s->zoned_header = zoned_ext;
> > +
> > + /* refuse to open broken images */
> > + if (zoned_ext.zone_size == 0) {
> > + error_setg(errp, "Zoned extension header zone_size field "
> > + "can not be 0");
> > + return -EINVAL;
> > + }
> > + if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> > + error_setg(errp, "Zoned extension header zone_capacity field "
> > + "can not be larger that zone_size field");
> > + return -EINVAL;
> > + }
> > + if (zoned_ext.nr_zones != DIV_ROUND_UP(
> > + bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> > + error_setg(errp, "Zoned extension header nr_zones field "
> > + "is wrong");
> > + return -EINVAL;
> > + }
>
> Are there any other sanity checks you should do, such as ensuring
> max_open_zones <= the number of total possible zones once you divide
> image size by zone size?
>
> Such sanity checks would also be useful when creating new image with
> zones (and not just when opening a pre-existing image); should you
> have a helper function that performs all the sanity checks for zone
> values being self-consistent, and reuse it in each context, rather
> than repeated open-coding the same checks?
>
> > +
> > +#ifdef DEBUG_EXT
> > + printf("Qcow2: Got zoned format extension: "
> > + "offset=%" PRIu32 "\n", offset);
> > +#endif
> > + break;
> > + }
> > +
> > default:
> > /* unknown magic - save it in case we need to rewrite the header */
> > /* If you add a new feature, make sure to also update the fast
> > +++ b/block/qcow2.h
> > @@ -236,6 +236,27 @@ typedef struct Qcow2CryptoHeaderExtension {
> > uint64_t length;
> > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >
> > +typedef struct Qcow2ZonedHeaderExtension {
> > + /* Zoned device attributes */
> > + uint8_t zoned;
> > + uint8_t reserved[3];
> > + uint32_t zone_size;
> > + uint32_t zone_capacity;
> > + uint32_t conventional_zones;
> > + uint32_t nr_zones;
> > + uint32_t max_active_zones;
> > + uint32_t max_open_zones;
> > + uint32_t max_append_bytes;
> > + uint64_t zonedmeta_size;
> > + uint64_t zonedmeta_offset;
> > +} QEMU_PACKED Qcow2ZonedHeaderExtension;
>
> Nice that everything is well-aligned so that the struct packs into 6
> consecutive 8-byte values.
>
> > +
> > +typedef struct Qcow2ZoneListEntry {
> > + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
> > +} Qcow2ZoneListEntry;
> > +
> > typedef struct Qcow2UnknownHeaderExtension {
> > uint32_t magic;
> > uint32_t len;
> > @@ -256,17 +277,20 @@ enum {
> > QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
> > QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
> > QCOW2_INCOMPAT_EXTL2_BITNR = 4,
> > + QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
> > QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
> > QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
> > QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
> > QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
> > QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
> > + QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
> >
> > QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
> > | QCOW2_INCOMPAT_CORRUPT
> > | QCOW2_INCOMPAT_DATA_FILE
> > | QCOW2_INCOMPAT_COMPRESSION
> > - | QCOW2_INCOMPAT_EXTL2,
> > + | QCOW2_INCOMPAT_EXTL2
> > + | QCOW2_INCOMPAT_ZONED_FORMAT,
> > };
>
> In the previous version, I had suggested an autoclear bit; but it
> looks like you went with a full-bore incompatible bit. What about the
> format of the disk makes it impossible to read an image if you don't
> know about zoned formats? Other incompat bits have obvious reasons:
> for example, you can't extract data from an extl2 if you don't know
> how to access the external data; and you can't uncompress an image
> with zstd if you don't have the compression header calling out that
> compression type. But so far, I haven't seen anything about how zone
> information makes the image unreadable by an earlier version of qemu.
>
> Do you have proper sanity checks that the incompat bit and the zone
> extension header are either both present or both absent?
>
> >
> > /* Compatible feature bits */
> > @@ -285,7 +309,6 @@ enum {
> > QCOW2_AUTOCLEAR_DATA_FILE_RAW = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
> >
> > QCOW2_AUTOCLEAR_MASK = QCOW2_AUTOCLEAR_BITMAPS
> > - | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
> > };
>
> Why are you removing a bit from the autoclear mask? Did you
> experiment with making zoned devices an autoclear bit, and then change
> your mind to making it an incompatible bit instead? At any rate, this
> part of the hunk looks wrong.
>
> >
> > enum qcow2_discard_type {
> > @@ -422,6 +445,16 @@ typedef struct BDRVQcow2State {
> > * is to convert the image with the desired compression type set.
> > */
> > Qcow2CompressionType compression_type;
> > +
> > + /* States of zoned device */
> > + Qcow2ZonedHeaderExtension zoned_header;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) exp_open_zones;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) imp_open_zones;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) closed_zones;
> > + Qcow2ZoneListEntry *zone_list_entries;
> > + uint32_t nr_zones_exp_open;
> > + uint32_t nr_zones_imp_open;
> > + uint32_t nr_zones_closed;
> > } BDRVQcow2State;
> >
> > typedef struct Qcow2COWRegion {
> > diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> > index 2c4618375a..b210bc4580 100644
> > --- a/docs/interop/qcow2.txt
> > +++ b/docs/interop/qcow2.txt
> > @@ -125,7 +125,18 @@ the next fields through header_length.
> > allows subcluster-based allocation. See the
> > Extended L2 Entries section for more details.
> >
> > - Bits 5-63: Reserved (set to 0)
> > + Bit 5: Zoned extension bit. If this bit is set then
> > + the file is a zoned device file with
> > + host-managed model.
> > +
> > + It is unsafe when any qcow2 writer which does
> > + not understand zone information edits a file
> > + with the zoned extension. The write pointer
> > + tracking is corrupted between different version
> > + of qcow2 writes. In such cases, the file can
> > + no longer be seen as a zoned device.
>
> This wording sounds more like you want an autoclear bit than an
> incompat bit. An incompat bit implies that an image unaware of the
> zone extension cannot even open the device for reads (making it
> impossible to write and corrupt the zone information).
>
> > +
> > + Bits 6-63: Reserved (set to 0)
> >
> > 80 - 87: compatible_features
> > Bitmask of compatible features. An implementation can
> > @@ -249,6 +260,7 @@ be stored. Each extension has a structure like the following:
> > 0x23852875 - Bitmaps extension
> > 0x0537be77 - Full disk encryption header pointer
> > 0x44415441 - External data file name string
> > + 0x007a6264 - Zoned extension
> > other - Unknown header extension, can be safely
> > ignored
> >
> > @@ -331,6 +343,59 @@ The fields of the bitmaps extension are:
> > Offset into the image file at which the bitmap directory
> > starts. Must be aligned to a cluster boundary.
> >
> > +== Zoned extension ==
> > +
> > +The zoned extension is an optional header extension. It contains fields for
> > +emulating the zoned stroage model (https://zonedstorage.io/).
>
> If you stick with an incompat bit, then this header should be
> mandatory when the incompat bit is set, and omitted when the incompat
> bit is clear.
>
> > +
> > +When the device model is not recognized as host-managed, it is regared as
>
> regarded
>
> > +incompatible and report an error to users.
>
> I'm not quite sure what you meant by a 'device model is not recognized
> as host-managed'. The phrase 'and report an error to users' does not
> seem to match anywhere else in the spec; I think what you are trying
> to state is that a given implementation must refuse to open a qcow2
> file with the zone extension header containing a 'zoned' byte (the
> enum at offset 0) with a value it cannot support.
>
> > +
> > +The fields of the zoned extension are:
> > + Byte 0: zoned
> > + This bit represents zone model of devices. 1 is for
> > + host-managed, which only allows sequential writes.
> > + And 0 is for non-zoned devices without such constraints.
>
> Grammar suggestions, and it's nice to list values in order. How about:
>
> This bit represents the zone model of the device. 0 is for a
> non-zoned device (all other information in this header is ignored). 1
> is for a host-managed device, which only allows for sequential writes
> within each zone. Other values may be added later; the implementation
> must refuse to open a device containing an unknown zone model.
>
> > +
> > + 1 - 3: Reserved, must be zero.
> > +
> > + 4 - 7: zone_size
> > + Total size of each zones, in bytes. It is less than 4GB
>
> each zone
>
> > + in the contained image for simplicity.
>
> Must this be a power of 2, and/or a multiple of the cluster size?
> Will we ever want to support making zones larger than 4G, in which
> case we need to plan on sizing this field bigger to begin with?
>
> > +
> > + 8 - 11: zone_capacity
> > + The number of writable bytes within the zones.
> > + A zone capacity is always smaller or equal to the
> > + zone size.
>
> I'm still not understanding why we need this separate from zone_size.
>
> > +
> > + 12 - 15: conventional_zones
> > + The number of conventional zones.
>
> Given that this is a spec, it is probably worth defining what a
> conventional zone is. I'm assuming it is a zone that does not have
> append-only semantics?
>
> > +
> > + 16 - 19: nr_zones
> > + The number of zones. It is the sum of conventional zones
> > + and sequential zones.
>
> Does the qcow2 implementation of zones guarantee that all conventional
> zones appear before any sequential zones?
>
> > +
> > + 20 - 23: max_active_zones
> > + The number of the zones that are in the implicit open,
> > + explicit open or closed state.
>
> s/are/can be/
>
> > +
> > + 24 - 27: max_open_zones
> > + The maximal number of open (implicitly open or explicitly
> > + open) zones.
>
> What other states are there besides open and closed? If a closed zone
> is active, then when can a zone ever not be active? Is it required
> that max_open_zones <= max_active_zones <= nr_zones?
>
> > +
> > + 28 - 31: max_append_bytes
> > + The number of bytes of a zone append request that can be
> > + issued to the device. It must be 512-byte aligned.
>
> Here, you call out bytes; in the docs above you called out sectors.
> Make sure your patch is self-consistent.
>
> > +
> > + 32 - 39: zonedmeta_size
> > + The size of zoned metadata in bytes. It contains no more
> > + than 4GB. The zoned metadata structure is the write
> > + pointers for each zone.
>
> It contains no more than 4G, but yet has an 8-byte value. Why?
>
> > +
> > + 40 - 47: zonedmeta_offset
> > + The offset of zoned metadata structure in the contained
> > + image, in bytes.
> > +
> > == Full disk encryption header pointer ==
> >
> > The full disk encryption header must be present if, and only if, the
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
>
> It feels like you are missing documentation on the zonedmeta
> cluster(s) - you've described it as being write pointers for each
> zone, but is it just 8 bytes per zone, taking up as many bytes as
> needed to cover all zones, with all remaining bytes in the cluster
> being zero padding?
>
> > +++ b/qapi/block-core.json
> > @@ -4981,6 +4981,21 @@
> > { 'enum': 'Qcow2CompressionType',
> > 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >
> > +##
> > +# @Qcow2ZoneModel:
> > +#
> > +# Zoned device model used in qcow2 image file
> > +#
> > +# @non-zoned: non-zoned model is for regular block devices
> > +#
> > +# @host-managed: host-managed model only allows sequential write over the
> > +# device zones
> > +#
> > +# Since 8.2
> > +##
> > +{ 'enum': 'Qcow2ZoneModel',
> > + 'data': ['non-zoned', 'host-managed'] }
> > +
> > ##
> > # @BlockdevCreateOptionsQcow2:
> > #
> > @@ -5023,6 +5038,27 @@
> > # @compression-type: The image cluster compression method
> > # (default: zlib, since 5.1)
> > #
> > +# @zone-model: @Qcow2ZoneModel. The zone device model.
> > +# (default: non-zoned, since 8.2)
> > +#
> > +# @zone-size: Total number of bytes within zones (since 8.2)
>
> If @zone-model is "non-zoned", does it make sense to even allow
> @zone-size and friends? Should this use a QMP union, where you can
> pass in the remaining zone-* fields only when zone-model is set to
> host-managed?
>
> > +#
> > +# @zone-capacity: The number of usable logical blocks within zones
> > +# in bytes. A zone capacity is always smaller or equal to the
> > +# zone size (since 8.2)
> > +#
> > +# @conventional-zones: The number of conventional zones of the
> > +# zoned device (since 8.2)
> > +#
> > +# @max-open-zones: The maximal number of open zones (since 8.2)
> > +#
> > +# @max-active-zones: The maximal number of zones in the implicit
> > +# open, explicit open or closed state (since 8.2)
> > +#
> > +# @max-append-bytes: The maximal number of bytes of a zone
> > +# append request that can be issued to the device. It must be
> > +# 512-byte aligned (since 8.2)
> > +#
> > # Since: 2.12
> > ##
> > { 'struct': 'BlockdevCreateOptionsQcow2',
> > @@ -5039,7 +5075,14 @@
> > '*preallocation': 'PreallocMode',
> > '*lazy-refcounts': 'bool',
> > '*refcount-bits': 'int',
> > - '*compression-type':'Qcow2CompressionType' } }
> > + '*compression-type':'Qcow2CompressionType',
> > + '*zone-model': 'Qcow2ZoneModel',
> > + '*zone-size': 'size',
> > + '*zone-capacity': 'size',
> > + '*conventional-zones': 'uint32',
> > + '*max-open-zones': 'uint32',
> > + '*max-active-zones': 'uint32',
> > + '*max-append-bytes': 'uint32' } }
>
> In other words, I'm envisioning something like an optional
> '*zone':'ZoneStruct', where:
>
> { 'struct': 'ZoneHostManaged',
> 'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 'uint32' } }
> { 'union': 'ZoneStruct',
> 'base': { 'model': 'Qcow2ZoneModel' },
> 'discriminator': 'model',
> 'data': { 'non-zoned': {},
> 'host-managed': 'ZoneHostManaged' } }
>
> then over the wire, QMP can use the existing:
> { ..., "compression-type":"zstd" }
>
> as a synonym for the new but explicit non-zoned:
> { ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
>
> and when we want to use zones, we pass:
> { ..., "compression-type":"zstd", "zone":{"mode":"host-managed", "size":16777216} }
>
> where you don't have to have zone- prefixing everywhere because it is
> instead contained in the smart union object where it is obvious from
> the 'mode' field what other fields should be present.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization: qemu.org | libguestfs.org
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
2023-10-30 14:04 ` Eric Blake
2023-10-30 14:19 ` Sam Li
@ 2023-10-31 0:53 ` Damien Le Moal
1 sibling, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2023-10-31 0:53 UTC (permalink / raw)
To: Eric Blake, Sam Li
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, hare, dmitry.fomichev,
stefanha, qemu-block, Markus Armbruster
On 10/30/23 23:04, Eric Blake wrote:
>> +
>> + .. option:: zone_size
>> +
>> + The size of a zone in bytes. The device is divided into zones of this
>> + size with the exception of the last zone, which may be smaller.
>> +
>> + .. option:: zone_capacity
>> +
>> + The initial capacity value, in bytes, for all zones. The capacity must
>> + be less than or equal to zone size. If the last zone is smaller, then
>> + its capacity is capped.
>> +
>> + The zone capacity is per zone and may be different between zones in real
>> + devices. For simplicity, QCow2 sets all zones to the same capacity.
>
> Just making sure I understand: One possible setup would be to describe
> a block device with zones of size 1024M but with capacity 1000M (that
> is, the zone reserves 24M capacity for other purposes)?
>
> Otherwise, I'm having a hard time seeing when you would ever set a
> capacity different from size.
>
> Are there requirements that one (or both) of these values must be
> powers of 2? Or is the requirement merely that they must be a
> multiple of 512 bytes (because sub-sector operations are not
> permitted)? Is there any implicit requirement based on qcow2
> implementation that a zone size/capacity must be a multiple of cluster
> size (other than possibly for the last zone)?
Linux requires the zone size to be a power of 2 number of LBAs. As a value so
defined may not align to a ZNS drive internal superblock size (e.g. align to
erase blocks), the zone capacity can be smaller than the zone size. The zone
capacity represents the number of LBAs that are usable within a zone. The LBAs
between zone capacity and zone size are unusable: reads will return 0s and
writes will fail for these LBAs. A drive capacity is reported as the sum of all
zone sizes, so it may be larger than the actual usable capacity, which is the
sum of all zone capacities.
Qcow2 follows this model despite the fact that we do not have the constraint of
aligning to some hardware erase block size. This is mainly to allow emulating a
real drive configuration. If a real drive emulation is not the goal of the
use-case at hand, most users will simply want to have zone size == zone capacity
for their zoned qcow2 drives.
>
>> +
>> + .. option:: zone_nr_conv
>> +
>> + The number of conventional zones of the zoned device.
>> +
>> + .. option:: max_open_zones
>> +
>> + The maximal allowed open zones.
>> +
>> + .. option:: max_active_zones
>> +
>> + The limit of the zones with implicit open, explicit open or closed state.
>> +
>> + .. option:: max_append_sectors
>> +
>> + The maximal number of 512-byte sectors in a zone append request.
>
> Why is this value in sectors instead of bytes? I understand that
> drivers may be written with sectors in mind, but any time we mix units
> in the public interface, it gets awkward. I'd lean towards having
> bytes here, with a requirement that it be a multiple of 512.
Agreed. Let's use bytes to avoid confusion.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 15:01 ` Sam Li
@ 2023-11-02 6:30 ` Stefan Hajnoczi
0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2023-11-02 6:30 UTC (permalink / raw)
To: Sam Li
Cc: Eric Blake, qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, qemu-block, Markus Armbruster
[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]
On Mon, Oct 30, 2023 at 11:01:26PM +0800, Sam Li wrote:
> Hi Eric,
>
> Eric Blake <eblake@redhat.com> 于2023年10月30日周一 22:53写道:
> >
> > On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > > To configure the zoned format feature on the qcow2 driver, it
> > > requires settings as: the device size, zone model, zone size,
> > > zone capacity, number of conventional zones, limits on zone
> > > resources (max append bytes, max open zones, and max_active_zones).
> > >
> > > To create a qcow2 file with zoned format, use command like this:
> > > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > > zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> > > max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> > > -o zone_model=host-managed
> > >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > >
> > > fix config?
> >
> > Is this comment supposed to be part of the commit message? If not,...
>
> No...
>
> >
> > > ---
> >
> > ...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
> >
> > > block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
> > > block/qcow2.h | 37 +++++-
> > > docs/interop/qcow2.txt | 67 +++++++++-
> > > include/block/block_int-common.h | 13 ++
> > > qapi/block-core.json | 45 ++++++-
> > > 5 files changed, 362 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/block/qcow2.c b/block/qcow2.c
> > > index aa01d9e7b5..cd53268ca7 100644
> > > --- a/block/qcow2.c
> > > +++ b/block/qcow2.c
> > > @@ -73,6 +73,7 @@ typedef struct {
> > > #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> > > #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> > > #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > > +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
> > >
> > > static int coroutine_fn
> > > qcow2_co_preadv_compressed(BlockDriverState *bs,
> > > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > > uint64_t offset;
> > > int ret;
> > > Qcow2BitmapHeaderExt bitmaps_ext;
> > > + Qcow2ZonedHeaderExtension zoned_ext;
> > >
> > > if (need_update_header != NULL) {
> > > *need_update_header = false;
> > > @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > > break;
> > > }
> > >
> > > + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > > + {
> > > + if (ext.len != sizeof(zoned_ext)) {
> > > + error_setg(errp, "zoned_ext: Invalid extension length");
> > > + return -EINVAL;
> > > + }
> >
> > Do we ever anticipate the struct growing in size in the future to add
> > further features? Forcing the size to be constant, rather than a
> > minimum, may get in the way of clean upgrades to a future version of
> > the extension header.
>
> The zoned extension could grow. So ext.len > sizeof(zoned_ext) -> invalid.
No, ext.len >= sizeof(zoned_ext) is valid. The program can extract the
zoned_ext fields that it knows about. Any additional fields are ignored
by the program because it doesn't know how to interpret them. ext.len <
sizeof(zoned_ext) is invalid because required fields are missing.
When zoned_ext is extended to add a new feature the first time, the code
is updated like this:
if (ext.len < endof(zoned_ext, last_minimum_field)) {
...invalid...
}
...handle minimum zoned_ext fields...
if (ext.len >= endof(zoned_ext, additional_field)) {
...handle additional_field...
}
...more additional fields...
This approach is often used by Linux ioctl(2) interfaces. It allows
extensions to the struct without breaking old programs that still use
shorter versions of the struct.
Only optional features can be added using this approach, so it's often
combined with a 'flags' field that indicates which mandatory features
userspace wants and the kernel has provided. That way the kernel can
reject ioctls that require a new feature that the old kernel doesn't
know and the kernel can fill in flags upon returning from ioctl(2) so
userspace knows which functionality was provided by the kernel. qcow2
has feature bits, so I don't think a 'flags' field is required.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
2023-10-30 14:53 ` Eric Blake
@ 2023-11-02 10:19 ` Stefan Hajnoczi
2023-11-02 10:31 ` Stefan Hajnoczi
2 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2023-11-02 10:19 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, qemu-block, Markus Armbruster
[-- Attachment #1: Type: text/plain, Size: 27578 bytes --]
On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires settings as: the device size, zone model, zone size,
> zone capacity, number of conventional zones, limits on zone
> resources (max append bytes, max open zones, and max_active_zones).
>
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> -o zone_model=host-managed
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>
> fix config?
> ---
> block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
> block/qcow2.h | 37 +++++-
> docs/interop/qcow2.txt | 67 +++++++++-
> include/block/block_int-common.h | 13 ++
> qapi/block-core.json | 45 ++++++-
> 5 files changed, 362 insertions(+), 5 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index aa01d9e7b5..cd53268ca7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
> #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
>
> static int coroutine_fn
> qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> uint64_t offset;
> int ret;
> Qcow2BitmapHeaderExt bitmaps_ext;
> + Qcow2ZonedHeaderExtension zoned_ext;
>
> if (need_update_header != NULL) {
> *need_update_header = false;
> @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> break;
> }
>
> + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> + {
> + if (ext.len != sizeof(zoned_ext)) {
> + error_setg(errp, "zoned_ext: Invalid extension length");
> + return -EINVAL;
> + }
> + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> + return ret;
> + }
> +
> + if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> + warn_report("A program lacking zoned format support "
> + "may modify this file and zoned metadata are "
> + "now considered inconsistent");
> + error_printf("The zoned metadata is corrupted.\n");
> + }
This check is not necessary: the code already checks for unknown
incompatible features (QCOW2_INCOMPAT_MASK) in qcow2_do_open().
> +
> + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> + zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> + zoned_ext.conventional_zones =
> + be32_to_cpu(zoned_ext.conventional_zones);
> + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> + zoned_ext.max_active_zones =
> + be32_to_cpu(zoned_ext.max_active_zones);
> + zoned_ext.max_append_bytes =
> + be32_to_cpu(zoned_ext.max_append_bytes);
> + s->zoned_header = zoned_ext;
> +
> + /* refuse to open broken images */
> + if (zoned_ext.zone_size == 0) {
> + error_setg(errp, "Zoned extension header zone_size field "
> + "can not be 0");
> + return -EINVAL;
> + }
> + if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> + error_setg(errp, "Zoned extension header zone_capacity field "
> + "can not be larger that zone_size field");
> + return -EINVAL;
> + }
> + if (zoned_ext.nr_zones != DIV_ROUND_UP(
> + bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> + error_setg(errp, "Zoned extension header nr_zones field "
> + "is wrong");
> + return -EINVAL;
> + }
> +
> +#ifdef DEBUG_EXT
> + printf("Qcow2: Got zoned format extension: "
> + "offset=%" PRIu32 "\n", offset);
> +#endif
> + break;
> + }
> +
> default:
> /* unknown magic - save it in case we need to rewrite the header */
> /* If you add a new feature, make sure to also update the fast
> @@ -1967,6 +2026,15 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
> }
> bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
> bs->bl.pdiscard_alignment = s->cluster_size;
> + bs->bl.zoned = s->zoned_header.zoned;
> + bs->bl.nr_zones = s->zoned_header.nr_zones;
> + bs->bl.max_append_sectors = s->zoned_header.max_append_bytes
> + >> BDRV_SECTOR_BITS;
> + bs->bl.max_active_zones = s->zoned_header.max_active_zones;
> + bs->bl.max_open_zones = s->zoned_header.max_open_zones;
> + bs->bl.zone_size = s->zoned_header.zone_size;
> + bs->bl.zone_capacity = s->zoned_header.zone_capacity;
> + bs->bl.write_granularity = BDRV_SECTOR_SIZE;
> }
>
> static int GRAPH_UNLOCKED
> @@ -3062,6 +3130,11 @@ int qcow2_update_header(BlockDriverState *bs)
> .bit = QCOW2_INCOMPAT_EXTL2_BITNR,
> .name = "extended L2 entries",
> },
> + {
> + .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
> + .bit = QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
> + .name = "zoned format",
> + },
> {
> .type = QCOW2_FEAT_TYPE_COMPATIBLE,
> .bit = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
> @@ -3107,6 +3180,31 @@ int qcow2_update_header(BlockDriverState *bs)
> buflen -= ret;
> }
>
> + /* Zoned devices header extension */
> + if (s->zoned_header.zoned == BLK_Z_HM) {
> + Qcow2ZonedHeaderExtension zoned_header = {
> + .zoned = s->zoned_header.zoned,
> + .zone_size = cpu_to_be32(s->zoned_header.zone_size),
> + .zone_capacity = cpu_to_be32(s->zoned_header.zone_capacity),
> + .conventional_zones =
> + cpu_to_be32(s->zoned_header.conventional_zones),
> + .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
> + .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
> + .max_active_zones =
> + cpu_to_be32(s->zoned_header.max_active_zones),
> + .max_append_bytes =
> + cpu_to_be32(s->zoned_header.max_append_bytes)
> + };
> + ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
> + &zoned_header, sizeof(zoned_header),
> + buflen);
> + if (ret < 0) {
> + goto fail;
> + }
> + buf += ret;
> + buflen -= ret;
> + }
> +
> /* Keep unknown header extensions */
> QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
> ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -3718,6 +3816,10 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> header->incompatible_features |=
> cpu_to_be64(QCOW2_INCOMPAT_DATA_FILE);
> }
> + if (qcow2_opts->zone_model != QCOW2_ZONE_MODEL_HOST_MANAGED) {
Should != be ==? The incompatible feature bit must be set when the zone
model is host-managed.
> + header->incompatible_features |=
> + cpu_to_be64(QCOW2_INCOMPAT_ZONED_FORMAT);
> + }
> if (qcow2_opts->data_file_raw) {
> header->autoclear_features |=
> cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
> @@ -3786,11 +3888,70 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> }
>
> /* Set the external data file if necessary */
> + BDRVQcow2State *s = blk_bs(blk)->opaque;
> if (data_bs) {
> - BDRVQcow2State *s = blk_bs(blk)->opaque;
> s->image_data_file = g_strdup(data_bs->filename);
> }
>
> + if (qcow2_opts->zone_model == QCOW2_ZONE_MODEL_HOST_MANAGED) {
> + if (!qcow2_opts->has_zone_size) {
> + error_setg(errp, "Missing zone_size parameter");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (qcow2_opts->zone_size == 0) {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + error_setg(errp, "Zoned devices can not allow a larger-than-zero "
> + "zone_size");
This error message is confusing. The zone size must be larger than zero
when the zone model is host-managed:
error_setg(errp, "zone_size must be larger than zero");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + s->zoned_header.zoned = BLK_Z_HM;
> + s->zoned_header.zone_size = qcow2_opts->zone_size;
> + s->zoned_header.nr_zones = DIV_ROUND_UP(qcow2_opts->size,
> + qcow2_opts->zone_size);
> +
> + if (qcow2_opts->has_zone_capacity) {
> + if (qcow2_opts->zone_capacity > qcow2_opts->zone_size) {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + error_setg(errp, "zone capacity %" PRIu64 "B exceeds zone size "
> + "%" PRIu64"B", qcow2_opts->zone_capacity,
> + qcow2_opts->zone_size);
> + ret = -EINVAL;
> + goto out;
> + }
> + s->zoned_header.zone_capacity = qcow2_opts->zone_capacity;
> + } else {
> + s->zoned_header.zone_capacity = qcow2_opts->zone_size;
> + }
> +
> + if (qcow2_opts->has_conventional_zones) {
> + s->zoned_header.conventional_zones = qcow2_opts->conventional_zones;
> + }
> +
> + if (qcow2_opts->has_max_active_zones) {
> + if (qcow2_opts->max_open_zones > qcow2_opts->max_active_zones) {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + error_setg(errp, "max_open_zones %" PRIu32 " exceeds "
> + "max_active_zones %" PRIu32"",
> + qcow2_opts->max_open_zones,
> + qcow2_opts->max_active_zones);
> + ret = -EINVAL;
> + goto out;
> + }
> + if (qcow2_opts->has_max_open_zones) {
> + s->zoned_header.max_open_zones = qcow2_opts->max_active_zones;
> + } else {
> + s->zoned_header.max_open_zones = qcow2_opts->max_active_zones;
> + }
> + }
> + s->zoned_header.max_append_bytes = qcow2_opts->max_append_bytes;
> + } else {
> + s->zoned_header.zoned = BLK_Z_NONE;
> + }
> +
> /* Create a full header (including things like feature table) */
> ret = qcow2_update_header(blk_bs(blk));
> bdrv_graph_co_rdunlock();
> @@ -3921,6 +4082,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> { BLOCK_OPT_COMPAT_LEVEL, "version" },
> { BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
> { BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
> + { BLOCK_OPT_ZONE_MODEL, "zone-model"},
> + { BLOCK_OPT_CONVENTIONAL_ZONES, "conventional-zones"},
> + { BLOCK_OPT_MAX_OPEN_ZONES, "max-open-zones"},
> + { BLOCK_OPT_MAX_ACTIVE_ZONES, "max-active-zones"},
> + { BLOCK_OPT_MAX_APPEND_BYTES, "max-append-bytes"},
> + { BLOCK_OPT_ZONE_SIZE, "zone-size"},
> + { BLOCK_OPT_ZONE_CAPACITY, "zone-capacity"},
> { NULL, NULL },
> };
>
> @@ -6087,6 +6255,41 @@ static QemuOptsList qcow2_create_opts = {
> .help = "Compression method used for image cluster " \
> "compression", \
> .def_value_str = "zlib" \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_MODEL, \
> + .type = QEMU_OPT_STRING, \
> + .help = "zone model", \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_SIZE, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone size", \
> + }, \
> + { \
> + .name = BLOCK_OPT_ZONE_CAPACITY, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone capacity", \
> + }, \
> + { \
> + .name = BLOCK_OPT_CONVENTIONAL_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "numbers of conventional zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_APPEND_BYTES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max append bytes", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_ACTIVE_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max active zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_MAX_OPEN_ZONES, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max open zones", \
> },
> QCOW_COMMON_OPTIONS,
> { /* end of list */ }
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 29958c512b..6e5d90afd2 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -236,6 +236,27 @@ typedef struct Qcow2CryptoHeaderExtension {
> uint64_t length;
> } QEMU_PACKED Qcow2CryptoHeaderExtension;
>
> +typedef struct Qcow2ZonedHeaderExtension {
> + /* Zoned device attributes */
> + uint8_t zoned;
> + uint8_t reserved[3];
> + uint32_t zone_size;
> + uint32_t zone_capacity;
> + uint32_t conventional_zones;
> + uint32_t nr_zones;
> + uint32_t max_active_zones;
> + uint32_t max_open_zones;
> + uint32_t max_append_bytes;
> + uint64_t zonedmeta_size;
> + uint64_t zonedmeta_offset;
> +} QEMU_PACKED Qcow2ZonedHeaderExtension;
> +
> +typedef struct Qcow2ZoneListEntry {
> + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
> +} Qcow2ZoneListEntry;
> +
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> uint32_t len;
> @@ -256,17 +277,20 @@ enum {
> QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
> QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
> QCOW2_INCOMPAT_EXTL2_BITNR = 4,
> + QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
> QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
> QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
> QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
> QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
> QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
> + QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
>
> QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
> | QCOW2_INCOMPAT_CORRUPT
> | QCOW2_INCOMPAT_DATA_FILE
> | QCOW2_INCOMPAT_COMPRESSION
> - | QCOW2_INCOMPAT_EXTL2,
> + | QCOW2_INCOMPAT_EXTL2
> + | QCOW2_INCOMPAT_ZONED_FORMAT,
> };
>
> /* Compatible feature bits */
> @@ -285,7 +309,6 @@ enum {
> QCOW2_AUTOCLEAR_DATA_FILE_RAW = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
>
> QCOW2_AUTOCLEAR_MASK = QCOW2_AUTOCLEAR_BITMAPS
> - | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
What is the purpose of this change?
> };
>
> enum qcow2_discard_type {
> @@ -422,6 +445,16 @@ typedef struct BDRVQcow2State {
> * is to convert the image with the desired compression type set.
> */
> Qcow2CompressionType compression_type;
> +
> + /* States of zoned device */
> + Qcow2ZonedHeaderExtension zoned_header;
> + QLIST_HEAD(, Qcow2ZoneListEntry) exp_open_zones;
> + QLIST_HEAD(, Qcow2ZoneListEntry) imp_open_zones;
> + QLIST_HEAD(, Qcow2ZoneListEntry) closed_zones;
> + Qcow2ZoneListEntry *zone_list_entries;
> + uint32_t nr_zones_exp_open;
> + uint32_t nr_zones_imp_open;
> + uint32_t nr_zones_closed;
> } BDRVQcow2State;
>
> typedef struct Qcow2COWRegion {
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index 2c4618375a..b210bc4580 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -125,7 +125,18 @@ the next fields through header_length.
> allows subcluster-based allocation. See the
> Extended L2 Entries section for more details.
>
> - Bits 5-63: Reserved (set to 0)
> + Bit 5: Zoned extension bit. If this bit is set then
> + the file is a zoned device file with
> + host-managed model.
> +
> + It is unsafe when any qcow2 writer which does
> + not understand zone information edits a file
> + with the zoned extension. The write pointer
> + tracking is corrupted between different version
> + of qcow2 writes. In such cases, the file can
> + no longer be seen as a zoned device.
Here is a suggestion for rephrasing this:
Bit 5:
Zoned emulation bit. If this bit is set then the file is an emulated
zoned device. The Zoned extension must be present. Implementations
that do not support zoned emulation cannot open this file because it
generally only makes sense to interpret the data along with the zone
information and write pointers.
Don't mention the host-managed model in case other zone models are
supported in the future.
> +
> + Bits 6-63: Reserved (set to 0)
>
> 80 - 87: compatible_features
> Bitmask of compatible features. An implementation can
> @@ -249,6 +260,7 @@ be stored. Each extension has a structure like the following:
> 0x23852875 - Bitmaps extension
> 0x0537be77 - Full disk encryption header pointer
> 0x44415441 - External data file name string
> + 0x007a6264 - Zoned extension
> other - Unknown header extension, can be safely
> ignored
>
> @@ -331,6 +343,59 @@ The fields of the bitmaps extension are:
> Offset into the image file at which the bitmap directory
> starts. Must be aligned to a cluster boundary.
>
> +== Zoned extension ==
> +
> +The zoned extension is an optional header extension. It contains fields for
> +emulating the zoned stroage model (https://zonedstorage.io/).
s/stroage/storage/
> +
> +When the device model is not recognized as host-managed, it is regared as
s/regared/regarded/
> +incompatible and report an error to users.
> +
> +The fields of the zoned extension are:
> + Byte 0: zoned
> + This bit represents zone model of devices. 1 is for
> + host-managed, which only allows sequential writes.
> + And 0 is for non-zoned devices without such constraints.
> +
> + 1 - 3: Reserved, must be zero.
> +
> + 4 - 7: zone_size
> + Total size of each zones, in bytes. It is less than 4GB
> + in the contained image for simplicity.
> +
> + 8 - 11: zone_capacity
> + The number of writable bytes within the zones.
> + A zone capacity is always smaller or equal to the
> + zone size.
> +
> + 12 - 15: conventional_zones
> + The number of conventional zones.
> +
> + 16 - 19: nr_zones
> + The number of zones. It is the sum of conventional zones
> + and sequential zones.
> +
> + 20 - 23: max_active_zones
> + The number of the zones that are in the implicit open,
> + explicit open or closed state.
> +
> + 24 - 27: max_open_zones
> + The maximal number of open (implicitly open or explicitly
> + open) zones.
> +
> + 28 - 31: max_append_bytes
> + The number of bytes of a zone append request that can be
> + issued to the device. It must be 512-byte aligned.
> +
> + 32 - 39: zonedmeta_size
> + The size of zoned metadata in bytes. It contains no more
> + than 4GB. The zoned metadata structure is the write
> + pointers for each zone.
> +
> + 40 - 47: zonedmeta_offset
> + The offset of zoned metadata structure in the contained
> + image, in bytes.
> +
> == Full disk encryption header pointer ==
>
> The full disk encryption header must be present if, and only if, the
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index b8d9d24f39..4b94e55eb4 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -57,6 +57,13 @@
> #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
> #define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
> #define BLOCK_OPT_EXTL2 "extended_l2"
> +#define BLOCK_OPT_ZONE_MODEL "zone_model"
> +#define BLOCK_OPT_ZONE_SIZE "zone_size"
> +#define BLOCK_OPT_ZONE_CAPACITY "zone_capacity"
> +#define BLOCK_OPT_CONVENTIONAL_ZONES "conventional_zones"
> +#define BLOCK_OPT_MAX_APPEND_BYTES "max_append_bytes"
> +#define BLOCK_OPT_MAX_ACTIVE_ZONES "max_active_zones"
> +#define BLOCK_OPT_MAX_OPEN_ZONES "max_open_zones"
>
> #define BLOCK_PROBE_BUF_SIZE 512
>
> @@ -883,6 +890,12 @@ typedef struct BlockLimits {
> /* zone size expressed in bytes */
> uint32_t zone_size;
>
> + /*
> + * the number of usable logical blocks within the zone, expressed
> + * in bytes. A zone capacity is smaller or equal to the zone size.
> + */
> + uint32_t zone_capacity;
> +
> /* total number of zones */
> uint32_t nr_zones;
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 89751d81f2..884e058046 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4981,6 +4981,21 @@
> { 'enum': 'Qcow2CompressionType',
> 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>
> +##
> +# @Qcow2ZoneModel:
> +#
> +# Zoned device model used in qcow2 image file
> +#
> +# @non-zoned: non-zoned model is for regular block devices
> +#
> +# @host-managed: host-managed model only allows sequential write over the
> +# device zones
> +#
> +# Since 8.2
> +##
> +{ 'enum': 'Qcow2ZoneModel',
> + 'data': ['non-zoned', 'host-managed'] }
> +
> ##
> # @BlockdevCreateOptionsQcow2:
> #
> @@ -5023,6 +5038,27 @@
> # @compression-type: The image cluster compression method
> # (default: zlib, since 5.1)
> #
> +# @zone-model: @Qcow2ZoneModel. The zone device model.
> +# (default: non-zoned, since 8.2)
> +#
> +# @zone-size: Total number of bytes within zones (since 8.2)
> +#
> +# @zone-capacity: The number of usable logical blocks within zones
> +# in bytes. A zone capacity is always smaller or equal to the
> +# zone size (since 8.2)
> +#
> +# @conventional-zones: The number of conventional zones of the
> +# zoned device (since 8.2)
> +#
> +# @max-open-zones: The maximal number of open zones (since 8.2)
> +#
> +# @max-active-zones: The maximal number of zones in the implicit
> +# open, explicit open or closed state (since 8.2)
> +#
> +# @max-append-bytes: The maximal number of bytes of a zone
> +# append request that can be issued to the device. It must be
> +# 512-byte aligned (since 8.2)
> +#
> # Since: 2.12
> ##
> { 'struct': 'BlockdevCreateOptionsQcow2',
> @@ -5039,7 +5075,14 @@
> '*preallocation': 'PreallocMode',
> '*lazy-refcounts': 'bool',
> '*refcount-bits': 'int',
> - '*compression-type':'Qcow2CompressionType' } }
> + '*compression-type':'Qcow2CompressionType',
> + '*zone-model': 'Qcow2ZoneModel',
> + '*zone-size': 'size',
> + '*zone-capacity': 'size',
> + '*conventional-zones': 'uint32',
> + '*max-open-zones': 'uint32',
> + '*max-active-zones': 'uint32',
> + '*max-append-bytes': 'uint32' } }
>
> ##
> # @BlockdevCreateOptionsQed:
> --
> 2.40.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
2023-10-30 14:53 ` Eric Blake
2023-11-02 10:19 ` Stefan Hajnoczi
@ 2023-11-02 10:31 ` Stefan Hajnoczi
2023-11-16 17:55 ` Sam Li
2 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2023-11-02 10:31 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, qemu-block, Markus Armbruster
[-- Attachment #1: Type: text/plain, Size: 307 bytes --]
On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> +typedef struct Qcow2ZoneListEntry {
> + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
Where is closed_zone_entry used?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 14:53 ` Eric Blake
2023-10-30 15:01 ` Sam Li
@ 2023-11-03 9:08 ` Markus Armbruster
2023-11-16 18:01 ` Sam Li
2023-11-16 18:26 ` Sam Li
2 siblings, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2023-11-03 9:08 UTC (permalink / raw)
To: Eric Blake
Cc: Sam Li, qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
Eric Blake <eblake@redhat.com> writes:
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
>> To configure the zoned format feature on the qcow2 driver, it
>> requires settings as: the device size, zone model, zone size,
>> zone capacity, number of conventional zones, limits on zone
>> resources (max append bytes, max open zones, and max_active_zones).
>>
>> To create a qcow2 file with zoned format, use command like this:
>> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
>> zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
>> max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
>> -o zone_model=host-managed
>>
>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>
>> fix config?
>
> Is this comment supposed to be part of the commit message? If not,...
>
>> ---
>
> ...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
[...]
>> +++ b/qapi/block-core.json
>> @@ -4981,6 +4981,21 @@
>> { 'enum': 'Qcow2CompressionType',
>> 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>>
>> +##
>> +# @Qcow2ZoneModel:
>> +#
>> +# Zoned device model used in qcow2 image file
>> +#
>> +# @non-zoned: non-zoned model is for regular block devices
>> +#
>> +# @host-managed: host-managed model only allows sequential write over the
>> +# device zones
>> +#
>> +# Since 8.2
>> +##
>> +{ 'enum': 'Qcow2ZoneModel',
>> + 'data': ['non-zoned', 'host-managed'] }
>> +
>> ##
>> # @BlockdevCreateOptionsQcow2:
>> #
>> @@ -5023,6 +5038,27 @@
>> # @compression-type: The image cluster compression method
>> # (default: zlib, since 5.1)
>> #
>> +# @zone-model: @Qcow2ZoneModel. The zone device model.
>> +# (default: non-zoned, since 8.2)
>> +#
>> +# @zone-size: Total number of bytes within zones (since 8.2)
>
> If @zone-model is "non-zoned", does it make sense to even allow
> @zone-size and friends? Should this use a QMP union, where you can
> pass in the remaining zone-* fields only when zone-model is set to
> host-managed?
Valid question; needs an answer.
>> +#
>> +# @zone-capacity: The number of usable logical blocks within zones
>> +# in bytes. A zone capacity is always smaller or equal to the
>> +# zone size (since 8.2)
>> +#
>> +# @conventional-zones: The number of conventional zones of the
>> +# zoned device (since 8.2)
>> +#
>> +# @max-open-zones: The maximal number of open zones (since 8.2)
>> +#
>> +# @max-active-zones: The maximal number of zones in the implicit
>> +# open, explicit open or closed state (since 8.2)
>> +#
>> +# @max-append-bytes: The maximal number of bytes of a zone
>> +# append request that can be issued to the device. It must be
>> +# 512-byte aligned (since 8.2)
>> +#
>> # Since: 2.12
>> ##
>> { 'struct': 'BlockdevCreateOptionsQcow2',
>> @@ -5039,7 +5075,14 @@
>> '*preallocation': 'PreallocMode',
>> '*lazy-refcounts': 'bool',
>> '*refcount-bits': 'int',
>> - '*compression-type':'Qcow2CompressionType' } }
>> + '*compression-type':'Qcow2CompressionType',
>> + '*zone-model': 'Qcow2ZoneModel',
>> + '*zone-size': 'size',
>> + '*zone-capacity': 'size',
>> + '*conventional-zones': 'uint32',
>> + '*max-open-zones': 'uint32',
>> + '*max-active-zones': 'uint32',
>> + '*max-append-bytes': 'uint32' } }
>
> In other words, I'm envisioning something like an optional
> '*zone':'ZoneStruct', where:
>
> { 'struct': 'ZoneHostManaged',
> 'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 'uint32' } }
> { 'union': 'ZoneStruct',
> 'base': { 'model': 'Qcow2ZoneModel' },
> 'discriminator': 'model',
> 'data': { 'non-zoned': {},
> 'host-managed': 'ZoneHostManaged' } }
>
> then over the wire, QMP can use the existing:
> { ..., "compression-type":"zstd" }
>
> as a synonym for the new but explicit non-zoned:
> { ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
I.e. @zone is optional, and defaults to {"mode": "non-zoned"}.
> and when we want to use zones, we pass:
> { ..., "compression-type":"zstd", "zone":{"mode":"host-managed", "size":16777216} }
>
> where you don't have to have zone- prefixing everywhere because it is
> instead contained in the smart union object where it is obvious from
> the 'mode' field what other fields should be present.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-11-02 10:31 ` Stefan Hajnoczi
@ 2023-11-16 17:55 ` Sam Li
0 siblings, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-11-16 17:55 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, Eric Blake, dlemoal, hare,
dmitry.fomichev, qemu-block, Markus Armbruster
Stefan Hajnoczi <stefanha@redhat.com> 于2023年11月3日周五 11:24写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > +typedef struct Qcow2ZoneListEntry {
> > + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
>
> Where is closed_zone_entry used?
When the number of implicitly open zones are reaching the max
implicitly open zone and one implicitly open zone is closed, it will
add one closed zone to closed_zone_entry. (Will be in the next
version)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-11-03 9:08 ` Markus Armbruster
@ 2023-11-16 18:01 ` Sam Li
0 siblings, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-11-16 18:01 UTC (permalink / raw)
To: Markus Armbruster
Cc: Eric Blake, qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block
Markus Armbruster <armbru@redhat.com> 于2023年11月3日周五 17:08写道:
>
> Eric Blake <eblake@redhat.com> writes:
>
> > On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> >> To configure the zoned format feature on the qcow2 driver, it
> >> requires settings as: the device size, zone model, zone size,
> >> zone capacity, number of conventional zones, limits on zone
> >> resources (max append bytes, max open zones, and max_active_zones).
> >>
> >> To create a qcow2 file with zoned format, use command like this:
> >> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> >> zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> >> max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> >> -o zone_model=host-managed
> >>
> >> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> >>
> >> fix config?
> >
> > Is this comment supposed to be part of the commit message? If not,...
> >
> >> ---
> >
> > ...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
>
> [...]
>
> >> +++ b/qapi/block-core.json
> >> @@ -4981,6 +4981,21 @@
> >> { 'enum': 'Qcow2CompressionType',
> >> 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >>
> >> +##
> >> +# @Qcow2ZoneModel:
> >> +#
> >> +# Zoned device model used in qcow2 image file
> >> +#
> >> +# @non-zoned: non-zoned model is for regular block devices
> >> +#
> >> +# @host-managed: host-managed model only allows sequential write over the
> >> +# device zones
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'enum': 'Qcow2ZoneModel',
> >> + 'data': ['non-zoned', 'host-managed'] }
> >> +
> >> ##
> >> # @BlockdevCreateOptionsQcow2:
> >> #
> >> @@ -5023,6 +5038,27 @@
> >> # @compression-type: The image cluster compression method
> >> # (default: zlib, since 5.1)
> >> #
> >> +# @zone-model: @Qcow2ZoneModel. The zone device model.
> >> +# (default: non-zoned, since 8.2)
> >> +#
> >> +# @zone-size: Total number of bytes within zones (since 8.2)
> >
> > If @zone-model is "non-zoned", does it make sense to even allow
> > @zone-size and friends? Should this use a QMP union, where you can
> > pass in the remaining zone-* fields only when zone-model is set to
> > host-managed?
>
> Valid question; needs an answer.
Yes, it should use a QMP union. It's better to separate those fields
for zoned and non-zoned.
>
> >> +#
> >> +# @zone-capacity: The number of usable logical blocks within zones
> >> +# in bytes. A zone capacity is always smaller or equal to the
> >> +# zone size (since 8.2)
> >> +#
> >> +# @conventional-zones: The number of conventional zones of the
> >> +# zoned device (since 8.2)
> >> +#
> >> +# @max-open-zones: The maximal number of open zones (since 8.2)
> >> +#
> >> +# @max-active-zones: The maximal number of zones in the implicit
> >> +# open, explicit open or closed state (since 8.2)
> >> +#
> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> +# append request that can be issued to the device. It must be
> >> +# 512-byte aligned (since 8.2)
> >> +#
> >> # Since: 2.12
> >> ##
> >> { 'struct': 'BlockdevCreateOptionsQcow2',
> >> @@ -5039,7 +5075,14 @@
> >> '*preallocation': 'PreallocMode',
> >> '*lazy-refcounts': 'bool',
> >> '*refcount-bits': 'int',
> >> - '*compression-type':'Qcow2CompressionType' } }
> >> + '*compression-type':'Qcow2CompressionType',
> >> + '*zone-model': 'Qcow2ZoneModel',
> >> + '*zone-size': 'size',
> >> + '*zone-capacity': 'size',
> >> + '*conventional-zones': 'uint32',
> >> + '*max-open-zones': 'uint32',
> >> + '*max-active-zones': 'uint32',
> >> + '*max-append-bytes': 'uint32' } }
> >
> > In other words, I'm envisioning something like an optional
> > '*zone':'ZoneStruct', where:
> >
> > { 'struct': 'ZoneHostManaged',
> > 'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 'uint32' } }
> > { 'union': 'ZoneStruct',
> > 'base': { 'model': 'Qcow2ZoneModel' },
> > 'discriminator': 'model',
> > 'data': { 'non-zoned': {},
> > 'host-managed': 'ZoneHostManaged' } }
> >
> > then over the wire, QMP can use the existing:
> > { ..., "compression-type":"zstd" }
> >
> > as a synonym for the new but explicit non-zoned:
> > { ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
>
> I.e. @zone is optional, and defaults to {"mode": "non-zoned"}.
>
> > and when we want to use zones, we pass:
> > { ..., "compression-type":"zstd", "zone":{"mode":"host-managed", "size":16777216} }
> >
> > where you don't have to have zone- prefixing everywhere because it is
> > instead contained in the smart union object where it is obvious from
> > the 'mode' field what other fields should be present.
>
Yes, it's better. Thanks!
Sam
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension
2023-10-30 14:53 ` Eric Blake
2023-10-30 15:01 ` Sam Li
2023-11-03 9:08 ` Markus Armbruster
@ 2023-11-16 18:26 ` Sam Li
2 siblings, 0 replies; 17+ messages in thread
From: Sam Li @ 2023-11-16 18:26 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, Kevin Wolf, Hanna Reitz, dlemoal, hare,
dmitry.fomichev, stefanha, qemu-block, Markus Armbruster
Hi Eric,
Eric Blake <eblake@redhat.com> 于2023年10月30日周一 22:53写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append bytes, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> > max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=host-managed
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> >
> > fix config?
>
> Is this comment supposed to be part of the commit message? If not,...
>
> > ---
>
> ...place it here under the divider, so 'git am' won't include it, if there is nothing further to change on this patch.
>
> > block/qcow2.c | 205 ++++++++++++++++++++++++++++++-
> > block/qcow2.h | 37 +++++-
> > docs/interop/qcow2.txt | 67 +++++++++-
> > include/block/block_int-common.h | 13 ++
> > qapi/block-core.json | 45 ++++++-
> > 5 files changed, 362 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index aa01d9e7b5..cd53268ca7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> > #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> > #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> > #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
> >
> > static int coroutine_fn
> > qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > uint64_t offset;
> > int ret;
> > Qcow2BitmapHeaderExt bitmaps_ext;
> > + Qcow2ZonedHeaderExtension zoned_ext;
> >
> > if (need_update_header != NULL) {
> > *need_update_header = false;
> > @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > break;
> > }
> >
> > + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > + {
> > + if (ext.len != sizeof(zoned_ext)) {
> > + error_setg(errp, "zoned_ext: Invalid extension length");
> > + return -EINVAL;
> > + }
>
> Do we ever anticipate the struct growing in size in the future to add
> further features? Forcing the size to be constant, rather than a
> minimum, may get in the way of clean upgrades to a future version of
> the extension header.
>
> > + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > + return ret;
> > + }
> > +
> > + if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> > + warn_report("A program lacking zoned format support "
> > + "may modify this file and zoned metadata are "
> > + "now considered inconsistent");
> > + error_printf("The zoned metadata is corrupted.\n");
>
> Why is this mixing warn_report and error_printf at the same time.
> Also, grammar is inconsistent from the similar
> QCOW2_AUTOCLEAR_BITMAPS, which used:
>
> if (s->qcow_version < 3) {
> /* Let's be a bit more specific */
> warn_report("This qcow2 v2 image contains bitmaps, but "
> "they may have been modified by a program "
> "without persistent bitmap support; so now "
> "they must all be considered inconsistent");
> } else {
> warn_report("a program lacking bitmap support "
> "modified this file, so all bitmaps are now "
> "considered inconsistent");
>
> This also raises the question whether we want to ever allow zoned
> support with a v2 image, or whether it should just be a hard error if
> it is not a v3 image (my preference would be the latter).
>
> > + }
> > +
> > + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> > + zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> > + zoned_ext.conventional_zones =
> > + be32_to_cpu(zoned_ext.conventional_zones);
> > + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> > + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> > + zoned_ext.max_active_zones =
> > + be32_to_cpu(zoned_ext.max_active_zones);
> > + zoned_ext.max_append_bytes =
> > + be32_to_cpu(zoned_ext.max_append_bytes);
> > + s->zoned_header = zoned_ext;
> > +
> > + /* refuse to open broken images */
> > + if (zoned_ext.zone_size == 0) {
> > + error_setg(errp, "Zoned extension header zone_size field "
> > + "can not be 0");
> > + return -EINVAL;
> > + }
> > + if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> > + error_setg(errp, "Zoned extension header zone_capacity field "
> > + "can not be larger that zone_size field");
> > + return -EINVAL;
> > + }
> > + if (zoned_ext.nr_zones != DIV_ROUND_UP(
> > + bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> > + error_setg(errp, "Zoned extension header nr_zones field "
> > + "is wrong");
> > + return -EINVAL;
> > + }
>
> Are there any other sanity checks you should do, such as ensuring
> max_open_zones <= the number of total possible zones once you divide
> image size by zone size?
>
> Such sanity checks would also be useful when creating new image with
> zones (and not just when opening a pre-existing image); should you
> have a helper function that performs all the sanity checks for zone
> values being self-consistent, and reuse it in each context, rather
> than repeated open-coding the same checks?
>
> > +
> > +#ifdef DEBUG_EXT
> > + printf("Qcow2: Got zoned format extension: "
> > + "offset=%" PRIu32 "\n", offset);
> > +#endif
> > + break;
> > + }
> > +
> > default:
> > /* unknown magic - save it in case we need to rewrite the header */
> > /* If you add a new feature, make sure to also update the fast
> > +++ b/block/qcow2.h
> > @@ -236,6 +236,27 @@ typedef struct Qcow2CryptoHeaderExtension {
> > uint64_t length;
> > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >
> > +typedef struct Qcow2ZonedHeaderExtension {
> > + /* Zoned device attributes */
> > + uint8_t zoned;
> > + uint8_t reserved[3];
> > + uint32_t zone_size;
> > + uint32_t zone_capacity;
> > + uint32_t conventional_zones;
> > + uint32_t nr_zones;
> > + uint32_t max_active_zones;
> > + uint32_t max_open_zones;
> > + uint32_t max_append_bytes;
> > + uint64_t zonedmeta_size;
> > + uint64_t zonedmeta_offset;
> > +} QEMU_PACKED Qcow2ZonedHeaderExtension;
>
> Nice that everything is well-aligned so that the struct packs into 6
> consecutive 8-byte values.
>
> > +
> > +typedef struct Qcow2ZoneListEntry {
> > + QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> > + QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
> > +} Qcow2ZoneListEntry;
> > +
> > typedef struct Qcow2UnknownHeaderExtension {
> > uint32_t magic;
> > uint32_t len;
> > @@ -256,17 +277,20 @@ enum {
> > QCOW2_INCOMPAT_DATA_FILE_BITNR = 2,
> > QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
> > QCOW2_INCOMPAT_EXTL2_BITNR = 4,
> > + QCOW2_INCOMPAT_ZONED_FORMAT_BITNR = 5,
> > QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
> > QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
> > QCOW2_INCOMPAT_DATA_FILE = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
> > QCOW2_INCOMPAT_COMPRESSION = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
> > QCOW2_INCOMPAT_EXTL2 = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
> > + QCOW2_INCOMPAT_ZONED_FORMAT = 1 << QCOW2_INCOMPAT_ZONED_FORMAT_BITNR,
> >
> > QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
> > | QCOW2_INCOMPAT_CORRUPT
> > | QCOW2_INCOMPAT_DATA_FILE
> > | QCOW2_INCOMPAT_COMPRESSION
> > - | QCOW2_INCOMPAT_EXTL2,
> > + | QCOW2_INCOMPAT_EXTL2
> > + | QCOW2_INCOMPAT_ZONED_FORMAT,
> > };
>
> In the previous version, I had suggested an autoclear bit; but it
> looks like you went with a full-bore incompatible bit. What about the
> format of the disk makes it impossible to read an image if you don't
> know about zoned formats? Other incompat bits have obvious reasons:
> for example, you can't extract data from an extl2 if you don't know
> how to access the external data; and you can't uncompress an image
> with zstd if you don't have the compression header calling out that
> compression type. But so far, I haven't seen anything about how zone
> information makes the image unreadable by an earlier version of qemu.
>
> Do you have proper sanity checks that the incompat bit and the zone
> extension header are either both present or both absent?
>
> >
> > /* Compatible feature bits */
> > @@ -285,7 +309,6 @@ enum {
> > QCOW2_AUTOCLEAR_DATA_FILE_RAW = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
> >
> > QCOW2_AUTOCLEAR_MASK = QCOW2_AUTOCLEAR_BITMAPS
> > - | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
> > };
>
> Why are you removing a bit from the autoclear mask? Did you
> experiment with making zoned devices an autoclear bit, and then change
> your mind to making it an incompatible bit instead? At any rate, this
> part of the hunk looks wrong.
>
> >
> > enum qcow2_discard_type {
> > @@ -422,6 +445,16 @@ typedef struct BDRVQcow2State {
> > * is to convert the image with the desired compression type set.
> > */
> > Qcow2CompressionType compression_type;
> > +
> > + /* States of zoned device */
> > + Qcow2ZonedHeaderExtension zoned_header;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) exp_open_zones;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) imp_open_zones;
> > + QLIST_HEAD(, Qcow2ZoneListEntry) closed_zones;
> > + Qcow2ZoneListEntry *zone_list_entries;
> > + uint32_t nr_zones_exp_open;
> > + uint32_t nr_zones_imp_open;
> > + uint32_t nr_zones_closed;
> > } BDRVQcow2State;
> >
> > typedef struct Qcow2COWRegion {
> > diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> > index 2c4618375a..b210bc4580 100644
> > --- a/docs/interop/qcow2.txt
> > +++ b/docs/interop/qcow2.txt
> > @@ -125,7 +125,18 @@ the next fields through header_length.
> > allows subcluster-based allocation. See the
> > Extended L2 Entries section for more details.
> >
> > - Bits 5-63: Reserved (set to 0)
> > + Bit 5: Zoned extension bit. If this bit is set then
> > + the file is a zoned device file with
> > + host-managed model.
> > +
> > + It is unsafe when any qcow2 writer which does
> > + not understand zone information edits a file
> > + with the zoned extension. The write pointer
> > + tracking is corrupted between different version
> > + of qcow2 writes. In such cases, the file can
> > + no longer be seen as a zoned device.
>
> This wording sounds more like you want an autoclear bit than an
> incompat bit. An incompat bit implies that an image unaware of the
> zone extension cannot even open the device for reads (making it
> impossible to write and corrupt the zone information).
>
> > +
> > + Bits 6-63: Reserved (set to 0)
> >
> > 80 - 87: compatible_features
> > Bitmask of compatible features. An implementation can
> > @@ -249,6 +260,7 @@ be stored. Each extension has a structure like the following:
> > 0x23852875 - Bitmaps extension
> > 0x0537be77 - Full disk encryption header pointer
> > 0x44415441 - External data file name string
> > + 0x007a6264 - Zoned extension
> > other - Unknown header extension, can be safely
> > ignored
> >
> > @@ -331,6 +343,59 @@ The fields of the bitmaps extension are:
> > Offset into the image file at which the bitmap directory
> > starts. Must be aligned to a cluster boundary.
> >
> > +== Zoned extension ==
> > +
> > +The zoned extension is an optional header extension. It contains fields for
> > +emulating the zoned stroage model (https://zonedstorage.io/).
>
> If you stick with an incompat bit, then this header should be
> mandatory when the incompat bit is set, and omitted when the incompat
> bit is clear.
>
> > +
> > +When the device model is not recognized as host-managed, it is regared as
>
> regarded
>
> > +incompatible and report an error to users.
>
> I'm not quite sure what you meant by a 'device model is not recognized
> as host-managed'. The phrase 'and report an error to users' does not
> seem to match anywhere else in the spec; I think what you are trying
> to state is that a given implementation must refuse to open a qcow2
> file with the zone extension header containing a 'zoned' byte (the
> enum at offset 0) with a value it cannot support.
>
> > +
> > +The fields of the zoned extension are:
> > + Byte 0: zoned
> > + This bit represents zone model of devices. 1 is for
> > + host-managed, which only allows sequential writes.
> > + And 0 is for non-zoned devices without such constraints.
>
> Grammar suggestions, and it's nice to list values in order. How about:
>
> This bit represents the zone model of the device. 0 is for a
> non-zoned device (all other information in this header is ignored). 1
> is for a host-managed device, which only allows for sequential writes
> within each zone. Other values may be added later; the implementation
> must refuse to open a device containing an unknown zone model.
>
> > +
> > + 1 - 3: Reserved, must be zero.
> > +
> > + 4 - 7: zone_size
> > + Total size of each zones, in bytes. It is less than 4GB
>
> each zone
>
> > + in the contained image for simplicity.
>
> Must this be a power of 2, and/or a multiple of the cluster size?
> Will we ever want to support making zones larger than 4G, in which
> case we need to plan on sizing this field bigger to begin with?
>
> > +
> > + 8 - 11: zone_capacity
> > + The number of writable bytes within the zones.
> > + A zone capacity is always smaller or equal to the
> > + zone size.
>
> I'm still not understanding why we need this separate from zone_size.
>
> > +
> > + 12 - 15: conventional_zones
> > + The number of conventional zones.
>
> Given that this is a spec, it is probably worth defining what a
> conventional zone is. I'm assuming it is a zone that does not have
> append-only semantics?
Ok. The conventional zones refer to zones that allow sequential writes
and random writes. While the sequential zones only allows sequential
writes.
>
> > +
> > + 16 - 19: nr_zones
> > + The number of zones. It is the sum of conventional zones
> > + and sequential zones.
>
> Does the qcow2 implementation of zones guarantee that all conventional
> zones appear before any sequential zones?
Yes. The conventional zones are first.
>
> > +
> > + 20 - 23: max_active_zones
> > + The number of the zones that are in the implicit open,
> > + explicit open or closed state.
>
> s/are/can be/
>
> > +
> > + 24 - 27: max_open_zones
> > + The maximal number of open (implicitly open or explicitly
> > + open) zones.
>
> What other states are there besides open and closed? If a closed zone
> is active, then when can a zone ever not be active? Is it required
> that max_open_zones <= max_active_zones <= nr_zones?
The other states are empty and full (read only and offline states are
device-internal events and not considered in qcow2 emulation). When
opening a qcow2 emulated file, the closed zones are kept active.
It is mandatorily required but implicitly, yes.
>
> > +
> > + 28 - 31: max_append_bytes
> > + The number of bytes of a zone append request that can be
> > + issued to the device. It must be 512-byte aligned.
>
> Here, you call out bytes; in the docs above you called out sectors.
> Make sure your patch is self-consistent.
It's bytes here. Sorry about that. I will check the consistency
whenever sending patches again.
>
> > +
> > + 32 - 39: zonedmeta_size
> > + The size of zoned metadata in bytes. It contains no more
> > + than 4GB. The zoned metadata structure is the write
> > + pointers for each zone.
>
> It contains no more than 4G, but yet has an 8-byte value. Why?
The zone size and the number of zones are 4-byte values. The zoned
meta is nr_zones * zone_size. Therefore, it is an 8-byte value. Will
clarify this more clearly in v6.
>
> > +
> > + 40 - 47: zonedmeta_offset
> > + The offset of zoned metadata structure in the contained
> > + image, in bytes.
> > +
> > == Full disk encryption header pointer ==
> >
> > The full disk encryption header must be present if, and only if, the
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
>
> It feels like you are missing documentation on the zonedmeta
> cluster(s) - you've described it as being write pointers for each
> zone, but is it just 8 bytes per zone, taking up as many bytes as
> needed to cover all zones, with all remaining bytes in the cluster
> being zero padding?
Ok, I will add it. Yes, it's just 8 bytes per zone for all zones. The
remaining bytes in the cluster are not zero padding.
>
> > +++ b/qapi/block-core.json
> > @@ -4981,6 +4981,21 @@
> > { 'enum': 'Qcow2CompressionType',
> > 'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >
> > +##
> > +# @Qcow2ZoneModel:
> > +#
> > +# Zoned device model used in qcow2 image file
> > +#
> > +# @non-zoned: non-zoned model is for regular block devices
> > +#
> > +# @host-managed: host-managed model only allows sequential write over the
> > +# device zones
> > +#
> > +# Since 8.2
> > +##
> > +{ 'enum': 'Qcow2ZoneModel',
> > + 'data': ['non-zoned', 'host-managed'] }
> > +
> > ##
> > # @BlockdevCreateOptionsQcow2:
> > #
> > @@ -5023,6 +5038,27 @@
> > # @compression-type: The image cluster compression method
> > # (default: zlib, since 5.1)
> > #
> > +# @zone-model: @Qcow2ZoneModel. The zone device model.
> > +# (default: non-zoned, since 8.2)
> > +#
> > +# @zone-size: Total number of bytes within zones (since 8.2)
>
> If @zone-model is "non-zoned", does it make sense to even allow
> @zone-size and friends? Should this use a QMP union, where you can
> pass in the remaining zone-* fields only when zone-model is set to
> host-managed?
>
> > +#
> > +# @zone-capacity: The number of usable logical blocks within zones
> > +# in bytes. A zone capacity is always smaller or equal to the
> > +# zone size (since 8.2)
> > +#
> > +# @conventional-zones: The number of conventional zones of the
> > +# zoned device (since 8.2)
> > +#
> > +# @max-open-zones: The maximal number of open zones (since 8.2)
> > +#
> > +# @max-active-zones: The maximal number of zones in the implicit
> > +# open, explicit open or closed state (since 8.2)
> > +#
> > +# @max-append-bytes: The maximal number of bytes of a zone
> > +# append request that can be issued to the device. It must be
> > +# 512-byte aligned (since 8.2)
> > +#
> > # Since: 2.12
> > ##
> > { 'struct': 'BlockdevCreateOptionsQcow2',
> > @@ -5039,7 +5075,14 @@
> > '*preallocation': 'PreallocMode',
> > '*lazy-refcounts': 'bool',
> > '*refcount-bits': 'int',
> > - '*compression-type':'Qcow2CompressionType' } }
> > + '*compression-type':'Qcow2CompressionType',
> > + '*zone-model': 'Qcow2ZoneModel',
> > + '*zone-size': 'size',
> > + '*zone-capacity': 'size',
> > + '*conventional-zones': 'uint32',
> > + '*max-open-zones': 'uint32',
> > + '*max-active-zones': 'uint32',
> > + '*max-append-bytes': 'uint32' } }
>
> In other words, I'm envisioning something like an optional
> '*zone':'ZoneStruct', where:
>
> { 'struct': 'ZoneHostManaged',
> 'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 'uint32' } }
> { 'union': 'ZoneStruct',
> 'base': { 'model': 'Qcow2ZoneModel' },
> 'discriminator': 'model',
> 'data': { 'non-zoned': {},
> 'host-managed': 'ZoneHostManaged' } }
>
> then over the wire, QMP can use the existing:
> { ..., "compression-type":"zstd" }
>
> as a synonym for the new but explicit non-zoned:
> { ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
>
> and when we want to use zones, we pass:
> { ..., "compression-type":"zstd", "zone":{"mode":"host-managed", "size":16777216} }
>
> where you don't have to have zone- prefixing everywhere because it is
> instead contained in the smart union object where it is obvious from
> the 'mode' field what other fields should be present.
I see. Thanks!
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization: qemu.org | libguestfs.org
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-11-16 18:28 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-30 12:18 [PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2023-10-30 12:18 ` [PATCH v5 1/4] docs/qcow2: add the zoned format feature Sam Li
2023-10-30 14:04 ` Eric Blake
2023-10-30 14:19 ` Sam Li
2023-10-31 0:53 ` Damien Le Moal
2023-10-30 12:18 ` [PATCH v5 2/4] qcow2: add configurations for zoned format extension Sam Li
2023-10-30 14:53 ` Eric Blake
2023-10-30 15:01 ` Sam Li
2023-11-02 6:30 ` Stefan Hajnoczi
2023-11-03 9:08 ` Markus Armbruster
2023-11-16 18:01 ` Sam Li
2023-11-16 18:26 ` Sam Li
2023-11-02 10:19 ` Stefan Hajnoczi
2023-11-02 10:31 ` Stefan Hajnoczi
2023-11-16 17:55 ` Sam Li
2023-10-30 12:18 ` [PATCH v5 3/4] qcow2: add zoned emulation capability Sam Li
2023-10-30 12:18 ` [PATCH v5 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).