* [RFC 1/4] docs/qcow2: add the zoned format feature
2023-06-05 10:41 [RFC 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
@ 2023-06-05 10:41 ` Sam Li
2023-06-12 4:10 ` Stefan Hajnoczi
2023-06-05 10:41 ` [RFC 2/4] qcow2: add configurations for zoned format extension Sam Li
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Sam Li @ 2023-06-05 10:41 UTC (permalink / raw)
To: qemu-devel
Cc: dlemoal, dmitry.fomichev, hare, stefanha, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz, Sam Li
Add the specs for the zoned format feature of the qcow2 driver. Once
the zoned_profile is set to `zbc`, then the qcow2 file can be taken
as zoned devices and passed through by virtio-blk device to the guest.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
docs/system/qemu-block-drivers.rst.inc | 31 ++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..fdcf343652 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,37 @@ This section describes each format and the options that are supported for it.
filename`` to check if the NOCOW flag is set or not (Capital 'C' is
NOCOW flag).
+ .. option:: zoned_profile
+
+ The option configures the zoned format feature on the qcow2 driver. If
+ this is set to ``zbc``, then it follows the basics of ZBC/ZAC protocol.
+
+ .. option:: zone_size
+
+ The size of a zone of the zoned device. The zoned device have the same
+ size of zones with an optional smaller last zone.
+
+ .. option:: zone_capacity
+
+ The capacity of a zone of the zoned device. The zoned device follows the
+ ZBC protocol tends to have the same size as its zone.
+
+ .. option:: zone_nr_conv
+
+ The number of conventional zones of the zoned device.
+
+ .. option:: max_open_zones
+
+ The maximal allowed open zones.
+
+ .. option:: max_active_zones
+
+ The limit of the zones with implicit open, explicit open or closed state.
+
+ .. option:: max_append_sectors
+
+ The maximal sectors that is allowed to append to zones while writing.
+
.. program:: image-formats
.. option:: qed
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 1/4] docs/qcow2: add the zoned format feature
2023-06-05 10:41 ` [RFC 1/4] docs/qcow2: add the zoned format feature Sam Li
@ 2023-06-12 4:10 ` Stefan Hajnoczi
2023-06-13 8:01 ` Sam Li
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-12 4:10 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]
On Mon, Jun 05, 2023 at 06:41:05PM +0800, Sam Li wrote:
> Add the specs for the zoned format feature of the qcow2 driver. Once
> the zoned_profile is set to `zbc`, then the qcow2 file can be taken
> as zoned devices and passed through by virtio-blk device to the guest.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> docs/system/qemu-block-drivers.rst.inc | 31 ++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> index 105cb9679c..fdcf343652 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -172,6 +172,37 @@ This section describes each format and the options that are supported for it.
> filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> NOCOW flag).
>
> + .. option:: zoned_profile
> +
> + The option configures the zoned format feature on the qcow2 driver. If
> + this is set to ``zbc``, then it follows the basics of ZBC/ZAC protocol.
What about virtio-blk? NVMe ZNS? Please indicate what effect the profile
has and whether it works with all emulated storage controllers that
support zoned storage.
> +
> + .. option:: zone_size
> +
> + The size of a zone of the zoned device. The zoned device have the same
"in bytes"? Please document the units.
> + size of zones with an optional smaller last zone.
"The device is divided into zones of this size with the exception of the
last zone, which may be smaller."
> +
> + .. option:: zone_capacity
> +
> + The capacity of a zone of the zoned device.
This can be expanded:
The initial capacity value for all zones. The capacity must be less
than or equal to zone size. If the last zone is smaller, then its
capacity is capped.
> The zoned device follows the
> + ZBC protocol tends to have the same size as its zone.
> +
> + .. option:: zone_nr_conv
> +
> + The number of conventional zones of the zoned device.
> +
> + .. option:: max_open_zones
> +
> + The maximal allowed open zones.
> +
> + .. option:: max_active_zones
> +
> + The limit of the zones with implicit open, explicit open or closed state.
> +
> + .. option:: max_append_sectors
> +
> + The maximal sectors that is allowed to append to zones while writing.
Does "sectors" mean 512B blocks or logical block size?
> +
> .. program:: image-formats
> .. option:: qed
>
> --
> 2.40.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 1/4] docs/qcow2: add the zoned format feature
2023-06-12 4:10 ` Stefan Hajnoczi
@ 2023-06-13 8:01 ` Sam Li
0 siblings, 0 replies; 16+ messages in thread
From: Sam Li @ 2023-06-13 8:01 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月13日周二 15:04写道:
>
> On Mon, Jun 05, 2023 at 06:41:05PM +0800, Sam Li wrote:
> > Add the specs for the zoned format feature of the qcow2 driver. Once
> > the zoned_profile is set to `zbc`, then the qcow2 file can be taken
> > as zoned devices and passed through by virtio-blk device to the guest.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> > docs/system/qemu-block-drivers.rst.inc | 31 ++++++++++++++++++++++++++
> > 1 file changed, 31 insertions(+)
> >
> > diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> > index 105cb9679c..fdcf343652 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -172,6 +172,37 @@ This section describes each format and the options that are supported for it.
> > filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> > NOCOW flag).
> >
> > + .. option:: zoned_profile
> > +
> > + The option configures the zoned format feature on the qcow2 driver. If
> > + this is set to ``zbc``, then it follows the basics of ZBC/ZAC protocol.
>
> What about virtio-blk? NVMe ZNS? Please indicate what effect the profile
> has and whether it works with all emulated storage controllers that
> support zoned storage.
The ZNS profile hasn't been included in this patch series yet.
Both ZNS and ZBC profiles can be used as the backing file for
virtio-blk passthrough. Though the NVMe controller with ZNS can only
use the ZNS profile. The ZNS profile can have a smaller size of
zoned_capacity than zone_size and only allows sequential write
required zoned type.
>
> > +
> > + .. option:: zone_size
> > +
> > + The size of a zone of the zoned device. The zoned device have the same
>
> "in bytes"? Please document the units.
>
> > + size of zones with an optional smaller last zone.
>
> "The device is divided into zones of this size with the exception of the
> last zone, which may be smaller."
>
> > +
> > + .. option:: zone_capacity
> > +
> > + The capacity of a zone of the zoned device.
>
> This can be expanded:
>
> The initial capacity value for all zones. The capacity must be less
> than or equal to zone size. If the last zone is smaller, then its
> capacity is capped.
>
> > The zoned device follows the
> > + ZBC protocol tends to have the same size as its zone.
> > +
> > + .. option:: zone_nr_conv
> > +
> > + The number of conventional zones of the zoned device.
> > +
> > + .. option:: max_open_zones
> > +
> > + The maximal allowed open zones.
> > +
> > + .. option:: max_active_zones
> > +
> > + The limit of the zones with implicit open, explicit open or closed state.
> > +
> > + .. option:: max_append_sectors
> > +
> > + The maximal sectors that is allowed to append to zones while writing.
>
> Does "sectors" mean 512B blocks or logical block size?
According to virtio spec, it means 512B blocks.
Thanks!
>
> > +
> > .. program:: image-formats
> > .. option:: qed
> >
> > --
> > 2.40.1
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-05 10:41 [RFC 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2023-06-05 10:41 ` [RFC 1/4] docs/qcow2: add the zoned format feature Sam Li
@ 2023-06-05 10:41 ` Sam Li
2023-06-05 14:50 ` Eric Blake
2023-06-19 10:10 ` Stefan Hajnoczi
2023-06-05 10:41 ` [RFC 3/4] qcow2: add zoned emulation capability Sam Li
2023-06-05 10:41 ` [RFC 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
3 siblings, 2 replies; 16+ messages in thread
From: Sam Li @ 2023-06-05 10:41 UTC (permalink / raw)
To: qemu-devel
Cc: dlemoal, dmitry.fomichev, hare, stefanha, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz, Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires following arguments: the device size, zoned profile,
zoned model, zone size, zone capacity, number of conventional
zones, limits on zone resources (max append sectors, max open
zones, and max_active_zones).
To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zoned_profile=zbc
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
block/qcow2.c | 119 +++++++++++++++++++++++++++++++
block/qcow2.h | 21 ++++++
include/block/block-common.h | 5 ++
include/block/block_int-common.h | 8 +++
qapi/block-core.json | 46 ++++++++----
5 files changed, 185 insertions(+), 14 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 7f3948360d..b886dab42b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
#define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
#define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
#define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
static int coroutine_fn
qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
uint64_t offset;
int ret;
Qcow2BitmapHeaderExt bitmaps_ext;
+ Qcow2ZonedHeaderExtension zoned_ext;
if (need_update_header != NULL) {
*need_update_header = false;
@@ -431,6 +433,37 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
break;
}
+ case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+ {
+ if (ext.len != sizeof(zoned_ext)) {
+ error_setg_errno(errp, -ret, "zoned_ext: "
+ "Invalid extension length");
+ return -EINVAL;
+ }
+ ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+ return ret;
+ }
+
+ zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+ zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+ zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
+ zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+ zoned_ext.max_active_zones =
+ be32_to_cpu(zoned_ext.max_active_zones);
+ zoned_ext.max_append_sectors =
+ be32_to_cpu(zoned_ext.max_append_sectors);
+ s->zoned_header = zoned_ext;
+
+#ifdef DEBUG_EXT
+ printf("Qcow2: Got zoned format extension: "
+ "offset=%" PRIu32 "\n", offset);
+#endif
+ break;
+ }
+
default:
/* unknown magic - save it in case we need to rewrite the header */
/* If you add a new feature, make sure to also update the fast
@@ -3071,6 +3104,31 @@ int qcow2_update_header(BlockDriverState *bs)
buflen -= ret;
}
+ /* Zoned devices header extension */
+ if (s->zoned_header.zoned == BLK_Z_HM) {
+ Qcow2ZonedHeaderExtension zoned_header = {
+ .zoned_profile = s->zoned_header.zoned_profile,
+ .zoned = s->zoned_header.zoned,
+ .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
+ .zone_size = cpu_to_be32(s->zoned_header.zone_size),
+ .zone_capacity = cpu_to_be32(s->zoned_header.zone_capacity),
+ .zone_nr_conv = cpu_to_be32(s->zoned_header.zone_nr_conv),
+ .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
+ .max_active_zones =
+ cpu_to_be32(s->zoned_header.max_active_zones),
+ .max_append_sectors =
+ cpu_to_be32(s->zoned_header.max_append_sectors)
+ };
+ ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
+ &zoned_header, sizeof(zoned_header),
+ buflen);
+ if (ret < 0) {
+ goto fail;
+ }
+ buf += ret;
+ buflen -= ret;
+ }
+
/* Keep unknown header extensions */
QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
@@ -3755,6 +3813,18 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
s->image_data_file = g_strdup(data_bs->filename);
}
+ if (!strcmp(qcow2_opts->zoned_profile, "zbc")) {
+ BDRVQcow2State *s = blk_bs(blk)->opaque;
+ s->zoned_header.zoned_profile = BLK_ZP_ZBC;
+ s->zoned_header.zoned = BLK_Z_HM;
+ s->zoned_header.zone_size = qcow2_opts->zone_size;
+ s->zoned_header.zone_capacity = qcow2_opts->zone_capacity;
+ s->zoned_header.zone_nr_conv = qcow2_opts->zone_nr_conv;
+ s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
+ s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
+ s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
+ }
+
/* Create a full header (including things like feature table) */
ret = qcow2_update_header(blk_bs(blk));
bdrv_graph_co_rdunlock();
@@ -3873,6 +3943,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
qdict_put_str(qdict, BLOCK_OPT_COMPAT_LEVEL, "v3");
}
+ /* The available zoned-profile options are zbc, which stands for
+ * ZBC/ZAC standards, and zns following NVMe ZNS spec. */
+ val = qdict_get_try_str(qdict, BLOCK_OPT_Z_PROFILE);
+ if (val) {
+ qdict_put_str(qdict, BLOCK_OPT_Z_PROFILE, val);
+ }
+
/* Change legacy command line options into QMP ones */
static const QDictRenames opt_renames[] = {
{ BLOCK_OPT_BACKING_FILE, "backing-file" },
@@ -3885,6 +3962,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
{ BLOCK_OPT_COMPAT_LEVEL, "version" },
{ BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
{ BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
+ { BLOCK_OPT_Z_PROFILE, "zoned-profile"},
+ { BLOCK_OPT_Z_NR_COV, "zone-nr-conv"},
+ { BLOCK_OPT_Z_MOZ, "max-open-zones"},
+ { BLOCK_OPT_Z_MAZ, "max-active-zones"},
+ { BLOCK_OPT_Z_MAS, "max-append-sectors"},
+ { BLOCK_OPT_Z_SIZE, "zone-size"},
+ { BLOCK_OPT_Z_CAP, "zone-capacity"},
{ NULL, NULL },
};
@@ -6048,6 +6132,41 @@ static QemuOptsList qcow2_create_opts = {
.help = "Compression method used for image cluster " \
"compression", \
.def_value_str = "zlib" \
+ }, \
+ {
+ .name = BLOCK_OPT_Z_PROFILE, \
+ .type = QEMU_OPT_STRING, \
+ .help = "zoned format option for the disk img", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_SIZE, \
+ .type = QEMU_OPT_SIZE, \
+ .help = "zone size", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_CAP, \
+ .type = QEMU_OPT_SIZE, \
+ .help = "zone capacity", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_NR_COV, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "numbers of conventional zones", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_MAS, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max append sectors", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_MAZ, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max active zones", \
+ }, \
+ { \
+ .name = BLOCK_OPT_Z_MOZ, \
+ .type = QEMU_OPT_NUMBER, \
+ .help = "max open zones", \
},
QCOW_COMMON_OPTIONS,
{ /* end of list */ }
diff --git a/block/qcow2.h b/block/qcow2.h
index 4f67eb912a..fe18dc4d97 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
uint64_t length;
} QEMU_PACKED Qcow2CryptoHeaderExtension;
+typedef struct Qcow2ZonedHeaderExtension {
+ /* Zoned device attributes */
+ BlockZonedProfile zoned_profile;
+ BlockZoneModel zoned;
+ uint32_t zone_size;
+ uint32_t zone_capacity;
+ uint32_t nr_zones;
+ uint32_t zone_nr_conv;
+ uint32_t max_active_zones;
+ uint32_t max_open_zones;
+ uint32_t max_append_sectors;
+ uint8_t padding[3];
+} QEMU_PACKED Qcow2ZonedHeaderExtension;
+
typedef struct Qcow2UnknownHeaderExtension {
uint32_t magic;
uint32_t len;
@@ -419,6 +433,13 @@ typedef struct BDRVQcow2State {
* is to convert the image with the desired compression type set.
*/
Qcow2CompressionType compression_type;
+
+ /* States of zoned device */
+ Qcow2ZonedHeaderExtension zoned_header;
+ uint32_t nr_zones_exp_open;
+ uint32_t nr_zones_imp_open;
+ uint32_t nr_zones_closed;
+ BlockZoneWps *wps;
} BDRVQcow2State;
typedef struct Qcow2COWRegion {
diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..9f04a772f6 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -108,6 +108,11 @@ typedef enum BlockZoneType {
BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
} BlockZoneType;
+typedef enum BlockZonedProfile {
+ BLK_ZP_ZBC = 0x1,
+ BLK_ZP_ZNS = 0x2,
+} BlockZonedProfile;
+
/*
* Zone descriptor data structure.
* Provides information on a zone with all position and size values in bytes.
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 74195c3004..4be35feaf8 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -57,6 +57,14 @@
#define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
#define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
#define BLOCK_OPT_EXTL2 "extended_l2"
+#define BLOCK_OPT_Z_PROFILE "zoned_profile"
+#define BLOCK_OPT_Z_MODEL "zoned"
+#define BLOCK_OPT_Z_SIZE "zone_size"
+#define BLOCK_OPT_Z_CAP "zone_capacity"
+#define BLOCK_OPT_Z_NR_COV "zone_nr_conv"
+#define BLOCK_OPT_Z_MAS "max_append_sectors"
+#define BLOCK_OPT_Z_MAZ "max_active_zones"
+#define BLOCK_OPT_Z_MOZ "max_open_zones"
#define BLOCK_PROBE_BUF_SIZE 512
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4bf89171c6..f9a584cbb3 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5013,24 +5013,42 @@
#
# @compression-type: The image cluster compression method
# (default: zlib, since 5.1)
+# @zoned-profile: Two zoned device protocol options, zbc or zns
+# (default: off, since 8.0)
+# @zone-size: The size of a zone of the zoned device (since 8.0)
+# @zone-capacity: The capacity of a zone of the zoned device (since 8.0)
+# @zone-nr-conv: The number of conventional zones of the zoned device
+# (since 8.0)
+# @max-open-zones: The maximal allowed open zones (since 8.0)
+# @max-active-zones: The limit of the zones that have the implicit open,
+# explicit open or closed state (since 8.0)
+# @max-append-sectors: The maximal sectors that is allowed to append write
+# (since 8.0)
#
# Since: 2.12
##
{ 'struct': 'BlockdevCreateOptionsQcow2',
- 'data': { 'file': 'BlockdevRef',
- '*data-file': 'BlockdevRef',
- '*data-file-raw': 'bool',
- '*extended-l2': 'bool',
- 'size': 'size',
- '*version': 'BlockdevQcow2Version',
- '*backing-file': 'str',
- '*backing-fmt': 'BlockdevDriver',
- '*encrypt': 'QCryptoBlockCreateOptions',
- '*cluster-size': 'size',
- '*preallocation': 'PreallocMode',
- '*lazy-refcounts': 'bool',
- '*refcount-bits': 'int',
- '*compression-type':'Qcow2CompressionType' } }
+ 'data': { 'file': 'BlockdevRef',
+ '*data-file': 'BlockdevRef',
+ '*data-file-raw': 'bool',
+ '*extended-l2': 'bool',
+ 'size': 'size',
+ '*version': 'BlockdevQcow2Version',
+ '*backing-file': 'str',
+ '*backing-fmt': 'BlockdevDriver',
+ '*encrypt': 'QCryptoBlockCreateOptions',
+ '*cluster-size': 'size',
+ '*preallocation': 'PreallocMode',
+ '*lazy-refcounts': 'bool',
+ '*refcount-bits': 'int',
+ '*compression-type': 'Qcow2CompressionType',
+ '*zoned-profile': 'str',
+ '*zone-size': 'size',
+ '*zone-capacity': 'size',
+ '*zone-nr-conv': 'uint32',
+ '*max-open-zones': 'uint32',
+ '*max-active-zones': 'uint32',
+ '*max-append-sectors': 'uint32'}}
##
# @BlockdevCreateOptionsQed:
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-05 10:41 ` [RFC 2/4] qcow2: add configurations for zoned format extension Sam Li
@ 2023-06-05 14:50 ` Eric Blake
2023-06-19 10:10 ` Stefan Hajnoczi
1 sibling, 0 replies; 16+ messages in thread
From: Eric Blake @ 2023-06-05 14:50 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, stefanha,
Markus Armbruster, Kevin Wolf, qemu-block, Hanna Reitz
On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires following arguments: the device size, zoned profile,
> zoned model, zone size, zone capacity, number of conventional
> zones, limits on zone resources (max append sectors, max open
> zones, and max_active_zones).
>
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> -o zoned_profile=zbc
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> block/qcow2.c | 119 +++++++++++++++++++++++++++++++
> block/qcow2.h | 21 ++++++
> include/block/block-common.h | 5 ++
> include/block/block_int-common.h | 8 +++
> qapi/block-core.json | 46 ++++++++----
> 5 files changed, 185 insertions(+), 14 deletions(-)
>
...
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7f3948360d..b886dab42b 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
> #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>
> static int coroutine_fn
> qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> uint64_t offset;
> int ret;
> Qcow2BitmapHeaderExt bitmaps_ext;
> + Qcow2ZonedHeaderExtension zoned_ext;
>
> if (need_update_header != NULL) {
> *need_update_header = false;
> @@ -431,6 +433,37 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> break;
> }
>
> + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> + {
Missing a patch to docs/interop/qcow2.txt that describes the new
header so that other qcow2 implementations can be interoperable with
it.
[unrelated - maybe we should convert that file to .rst someday?]
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-05 10:41 ` [RFC 2/4] qcow2: add configurations for zoned format extension Sam Li
2023-06-05 14:50 ` Eric Blake
@ 2023-06-19 10:10 ` Stefan Hajnoczi
2023-06-19 10:32 ` Sam Li
1 sibling, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-19 10:10 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 17507 bytes --]
On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires following arguments: the device size, zoned profile,
> zoned model, zone size, zone capacity, number of conventional
> zones, limits on zone resources (max append sectors, max open
> zones, and max_active_zones).
>
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> -o zoned_profile=zbc
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> block/qcow2.c | 119 +++++++++++++++++++++++++++++++
> block/qcow2.h | 21 ++++++
> include/block/block-common.h | 5 ++
> include/block/block_int-common.h | 8 +++
> qapi/block-core.json | 46 ++++++++----
> 5 files changed, 185 insertions(+), 14 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7f3948360d..b886dab42b 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
> #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>
> static int coroutine_fn
> qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> uint64_t offset;
> int ret;
> Qcow2BitmapHeaderExt bitmaps_ext;
> + Qcow2ZonedHeaderExtension zoned_ext;
>
> if (need_update_header != NULL) {
> *need_update_header = false;
> @@ -431,6 +433,37 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> break;
> }
>
> + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> + {
> + if (ext.len != sizeof(zoned_ext)) {
> + error_setg_errno(errp, -ret, "zoned_ext: "
> + "Invalid extension length");
> + return -EINVAL;
> + }
> + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> + return ret;
> + }
> +
> + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> + zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
> + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> + zoned_ext.max_active_zones =
> + be32_to_cpu(zoned_ext.max_active_zones);
> + zoned_ext.max_append_sectors =
> + be32_to_cpu(zoned_ext.max_append_sectors);
> + s->zoned_header = zoned_ext;
Please validate these values. The image file is not trusted and may be
broken/corrupt. For example, zone_size=0 and nr_zones=0 must be rejected
because the code can't do anything useful when these values are zero
(similar for values that are not multiples of the block size).
> +
> +#ifdef DEBUG_EXT
> + printf("Qcow2: Got zoned format extension: "
> + "offset=%" PRIu32 "\n", offset);
> +#endif
> + break;
> + }
> +
> default:
> /* unknown magic - save it in case we need to rewrite the header */
> /* If you add a new feature, make sure to also update the fast
> @@ -3071,6 +3104,31 @@ int qcow2_update_header(BlockDriverState *bs)
> buflen -= ret;
> }
>
> + /* Zoned devices header extension */
> + if (s->zoned_header.zoned == BLK_Z_HM) {
> + Qcow2ZonedHeaderExtension zoned_header = {
> + .zoned_profile = s->zoned_header.zoned_profile,
> + .zoned = s->zoned_header.zoned,
> + .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
> + .zone_size = cpu_to_be32(s->zoned_header.zone_size),
> + .zone_capacity = cpu_to_be32(s->zoned_header.zone_capacity),
> + .zone_nr_conv = cpu_to_be32(s->zoned_header.zone_nr_conv),
> + .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
> + .max_active_zones =
> + cpu_to_be32(s->zoned_header.max_active_zones),
> + .max_append_sectors =
> + cpu_to_be32(s->zoned_header.max_append_sectors)
> + };
> + ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
> + &zoned_header, sizeof(zoned_header),
> + buflen);
> + if (ret < 0) {
> + goto fail;
> + }
> + buf += ret;
> + buflen -= ret;
> + }
> +
> /* Keep unknown header extensions */
> QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
> ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -3755,6 +3813,18 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> s->image_data_file = g_strdup(data_bs->filename);
> }
>
> + if (!strcmp(qcow2_opts->zoned_profile, "zbc")) {
> + BDRVQcow2State *s = blk_bs(blk)->opaque;
> + s->zoned_header.zoned_profile = BLK_ZP_ZBC;
> + s->zoned_header.zoned = BLK_Z_HM;
> + s->zoned_header.zone_size = qcow2_opts->zone_size;
> + s->zoned_header.zone_capacity = qcow2_opts->zone_capacity;
> + s->zoned_header.zone_nr_conv = qcow2_opts->zone_nr_conv;
> + s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
> + s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
> + s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
Where is the presence of these optional qcow2_opts checked? For example,
if the user didn't specify zone_size, then they cannot create an image
with a zoned profile.
These options also need to be validated to ensure that they contain
reasonable values (e.g. not 0).
> + }
> +
> /* Create a full header (including things like feature table) */
> ret = qcow2_update_header(blk_bs(blk));
> bdrv_graph_co_rdunlock();
> @@ -3873,6 +3943,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> qdict_put_str(qdict, BLOCK_OPT_COMPAT_LEVEL, "v3");
> }
>
> + /* The available zoned-profile options are zbc, which stands for
> + * ZBC/ZAC standards, and zns following NVMe ZNS spec. */
> + val = qdict_get_try_str(qdict, BLOCK_OPT_Z_PROFILE);
> + if (val) {
> + qdict_put_str(qdict, BLOCK_OPT_Z_PROFILE, val);
> + }
What is the purpose of this code, it fetches and replaces the same qdict
element?
> +
> /* Change legacy command line options into QMP ones */
> static const QDictRenames opt_renames[] = {
> { BLOCK_OPT_BACKING_FILE, "backing-file" },
> @@ -3885,6 +3962,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> { BLOCK_OPT_COMPAT_LEVEL, "version" },
> { BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
> { BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
> + { BLOCK_OPT_Z_PROFILE, "zoned-profile"},
> + { BLOCK_OPT_Z_NR_COV, "zone-nr-conv"},
> + { BLOCK_OPT_Z_MOZ, "max-open-zones"},
> + { BLOCK_OPT_Z_MAZ, "max-active-zones"},
> + { BLOCK_OPT_Z_MAS, "max-append-sectors"},
> + { BLOCK_OPT_Z_SIZE, "zone-size"},
> + { BLOCK_OPT_Z_CAP, "zone-capacity"},
> { NULL, NULL },
> };
>
> @@ -6048,6 +6132,41 @@ static QemuOptsList qcow2_create_opts = {
> .help = "Compression method used for image cluster " \
> "compression", \
> .def_value_str = "zlib" \
> + }, \
> + {
The forward slash ('\') that wraps the line is missing and indentation
is off.
> + .name = BLOCK_OPT_Z_PROFILE, \
> + .type = QEMU_OPT_STRING, \
> + .help = "zoned format option for the disk img", \
> + }, \
> + { \
Indentation is off.
> + .name = BLOCK_OPT_Z_SIZE, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone size", \
> + }, \
> + { \
> + .name = BLOCK_OPT_Z_CAP, \
> + .type = QEMU_OPT_SIZE, \
> + .help = "zone capacity", \
> + }, \
> + { \
> + .name = BLOCK_OPT_Z_NR_COV, \
Indentation is off.
> + .type = QEMU_OPT_NUMBER, \
> + .help = "numbers of conventional zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_Z_MAS, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max append sectors", \
> + }, \
> + { \
> + .name = BLOCK_OPT_Z_MAZ, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max active zones", \
> + }, \
> + { \
> + .name = BLOCK_OPT_Z_MOZ, \
> + .type = QEMU_OPT_NUMBER, \
> + .help = "max open zones", \
> },
> QCOW_COMMON_OPTIONS,
> { /* end of list */ }
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 4f67eb912a..fe18dc4d97 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> uint64_t length;
> } QEMU_PACKED Qcow2CryptoHeaderExtension;
>
> +typedef struct Qcow2ZonedHeaderExtension {
> + /* Zoned device attributes */
> + BlockZonedProfile zoned_profile;
> + BlockZoneModel zoned;
> + uint32_t zone_size;
> + uint32_t zone_capacity;
> + uint32_t nr_zones;
> + uint32_t zone_nr_conv;
> + uint32_t max_active_zones;
> + uint32_t max_open_zones;
> + uint32_t max_append_sectors;
> + uint8_t padding[3];
This looks strange. Why is there 3 bytes of padding at the end? Normally
padding would align to an even power-of-two number of bytes like 2, 4,
8, etc.
> +} QEMU_PACKED Qcow2ZonedHeaderExtension;
> +
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> uint32_t len;
> @@ -419,6 +433,13 @@ typedef struct BDRVQcow2State {
> * is to convert the image with the desired compression type set.
> */
> Qcow2CompressionType compression_type;
> +
> + /* States of zoned device */
> + Qcow2ZonedHeaderExtension zoned_header;
> + uint32_t nr_zones_exp_open;
> + uint32_t nr_zones_imp_open;
> + uint32_t nr_zones_closed;
> + BlockZoneWps *wps;
Normally qcow2 code passes bs around, so it should be possible to access
the wps pointer without duplicating it here. This new field is not used
in this patch, so I can't tell yet how important it is. It's safer to
avoid duplicating pointers when the original pointer can be accessed
conveniently so that use-after-free, double-free, and similar memory
management bugs can be eliminated.
> } BDRVQcow2State;
>
> typedef struct Qcow2COWRegion {
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index e15395f2cb..9f04a772f6 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -108,6 +108,11 @@ typedef enum BlockZoneType {
> BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> } BlockZoneType;
>
> +typedef enum BlockZonedProfile {
> + BLK_ZP_ZBC = 0x1,
> + BLK_ZP_ZNS = 0x2,
> +} BlockZonedProfile;
> +
> /*
> * Zone descriptor data structure.
> * Provides information on a zone with all position and size values in bytes.
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 74195c3004..4be35feaf8 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -57,6 +57,14 @@
> #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
> #define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
> #define BLOCK_OPT_EXTL2 "extended_l2"
> +#define BLOCK_OPT_Z_PROFILE "zoned_profile"
> +#define BLOCK_OPT_Z_MODEL "zoned"
> +#define BLOCK_OPT_Z_SIZE "zone_size"
> +#define BLOCK_OPT_Z_CAP "zone_capacity"
> +#define BLOCK_OPT_Z_NR_COV "zone_nr_conv"
> +#define BLOCK_OPT_Z_MAS "max_append_sectors"
> +#define BLOCK_OPT_Z_MAZ "max_active_zones"
> +#define BLOCK_OPT_Z_MOZ "max_open_zones"
>
> #define BLOCK_PROBE_BUF_SIZE 512
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 4bf89171c6..f9a584cbb3 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5013,24 +5013,42 @@
> #
> # @compression-type: The image cluster compression method
> # (default: zlib, since 5.1)
> +# @zoned-profile: Two zoned device protocol options, zbc or zns
> +# (default: off, since 8.0)
> +# @zone-size: The size of a zone of the zoned device (since 8.0)
> +# @zone-capacity: The capacity of a zone of the zoned device (since 8.0)
> +# @zone-nr-conv: The number of conventional zones of the zoned device
> +# (since 8.0)
> +# @max-open-zones: The maximal allowed open zones (since 8.0)
> +# @max-active-zones: The limit of the zones that have the implicit open,
> +# explicit open or closed state (since 8.0)
> +# @max-append-sectors: The maximal sectors that is allowed to append write
> +# (since 8.0)
Since 8.1.
> #
> # Since: 2.12
> ##
> { 'struct': 'BlockdevCreateOptionsQcow2',
> - 'data': { 'file': 'BlockdevRef',
> - '*data-file': 'BlockdevRef',
> - '*data-file-raw': 'bool',
> - '*extended-l2': 'bool',
> - 'size': 'size',
> - '*version': 'BlockdevQcow2Version',
> - '*backing-file': 'str',
> - '*backing-fmt': 'BlockdevDriver',
> - '*encrypt': 'QCryptoBlockCreateOptions',
> - '*cluster-size': 'size',
> - '*preallocation': 'PreallocMode',
> - '*lazy-refcounts': 'bool',
> - '*refcount-bits': 'int',
> - '*compression-type':'Qcow2CompressionType' } }
> + 'data': { 'file': 'BlockdevRef',
> + '*data-file': 'BlockdevRef',
> + '*data-file-raw': 'bool',
> + '*extended-l2': 'bool',
> + 'size': 'size',
> + '*version': 'BlockdevQcow2Version',
> + '*backing-file': 'str',
> + '*backing-fmt': 'BlockdevDriver',
> + '*encrypt': 'QCryptoBlockCreateOptions',
> + '*cluster-size': 'size',
> + '*preallocation': 'PreallocMode',
> + '*lazy-refcounts': 'bool',
> + '*refcount-bits': 'int',
> + '*compression-type': 'Qcow2CompressionType',
> + '*zoned-profile': 'str',
> + '*zone-size': 'size',
> + '*zone-capacity': 'size',
> + '*zone-nr-conv': 'uint32',
> + '*max-open-zones': 'uint32',
> + '*max-active-zones': 'uint32',
> + '*max-append-sectors': 'uint32'}}
>
> ##
> # @BlockdevCreateOptionsQed:
> --
> 2.40.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-19 10:10 ` Stefan Hajnoczi
@ 2023-06-19 10:32 ` Sam Li
2023-06-19 14:42 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: Sam Li @ 2023-06-19 10:32 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 18:10写道:
>
> On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires following arguments: the device size, zoned profile,
> > zoned model, zone size, zone capacity, number of conventional
> > zones, limits on zone resources (max append sectors, max open
> > zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zoned_profile=zbc
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> > block/qcow2.c | 119 +++++++++++++++++++++++++++++++
> > block/qcow2.h | 21 ++++++
> > include/block/block-common.h | 5 ++
> > include/block/block_int-common.h | 8 +++
> > qapi/block-core.json | 46 ++++++++----
> > 5 files changed, 185 insertions(+), 14 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 7f3948360d..b886dab42b 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> > #define QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> > #define QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> > #define QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
> >
> > static int coroutine_fn
> > qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > uint64_t offset;
> > int ret;
> > Qcow2BitmapHeaderExt bitmaps_ext;
> > + Qcow2ZonedHeaderExtension zoned_ext;
> >
> > if (need_update_header != NULL) {
> > *need_update_header = false;
> > @@ -431,6 +433,37 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> > break;
> > }
> >
> > + case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > + {
> > + if (ext.len != sizeof(zoned_ext)) {
> > + error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Invalid extension length");
> > + return -EINVAL;
> > + }
> > + ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > + return ret;
> > + }
> > +
> > + zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> > + zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> > + zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
> > + zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> > + zoned_ext.max_active_zones =
> > + be32_to_cpu(zoned_ext.max_active_zones);
> > + zoned_ext.max_append_sectors =
> > + be32_to_cpu(zoned_ext.max_append_sectors);
> > + s->zoned_header = zoned_ext;
>
> Please validate these values. The image file is not trusted and may be
> broken/corrupt. For example, zone_size=0 and nr_zones=0 must be rejected
> because the code can't do anything useful when these values are zero
> (similar for values that are not multiples of the block size).
>
> > +
> > +#ifdef DEBUG_EXT
> > + printf("Qcow2: Got zoned format extension: "
> > + "offset=%" PRIu32 "\n", offset);
> > +#endif
> > + break;
> > + }
> > +
> > default:
> > /* unknown magic - save it in case we need to rewrite the header */
> > /* If you add a new feature, make sure to also update the fast
> > @@ -3071,6 +3104,31 @@ int qcow2_update_header(BlockDriverState *bs)
> > buflen -= ret;
> > }
> >
> > + /* Zoned devices header extension */
> > + if (s->zoned_header.zoned == BLK_Z_HM) {
> > + Qcow2ZonedHeaderExtension zoned_header = {
> > + .zoned_profile = s->zoned_header.zoned_profile,
> > + .zoned = s->zoned_header.zoned,
> > + .nr_zones = cpu_to_be32(s->zoned_header.nr_zones),
> > + .zone_size = cpu_to_be32(s->zoned_header.zone_size),
> > + .zone_capacity = cpu_to_be32(s->zoned_header.zone_capacity),
> > + .zone_nr_conv = cpu_to_be32(s->zoned_header.zone_nr_conv),
> > + .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
> > + .max_active_zones =
> > + cpu_to_be32(s->zoned_header.max_active_zones),
> > + .max_append_sectors =
> > + cpu_to_be32(s->zoned_header.max_append_sectors)
> > + };
> > + ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
> > + &zoned_header, sizeof(zoned_header),
> > + buflen);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + buf += ret;
> > + buflen -= ret;
> > + }
> > +
> > /* Keep unknown header extensions */
> > QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
> > ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> > @@ -3755,6 +3813,18 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> > s->image_data_file = g_strdup(data_bs->filename);
> > }
> >
> > + if (!strcmp(qcow2_opts->zoned_profile, "zbc")) {
> > + BDRVQcow2State *s = blk_bs(blk)->opaque;
> > + s->zoned_header.zoned_profile = BLK_ZP_ZBC;
> > + s->zoned_header.zoned = BLK_Z_HM;
> > + s->zoned_header.zone_size = qcow2_opts->zone_size;
> > + s->zoned_header.zone_capacity = qcow2_opts->zone_capacity;
> > + s->zoned_header.zone_nr_conv = qcow2_opts->zone_nr_conv;
> > + s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
> > + s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
> > + s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
>
> Where is the presence of these optional qcow2_opts checked? For example,
> if the user didn't specify zone_size, then they cannot create an image
> with a zoned profile.
>
> These options also need to be validated to ensure that they contain
> reasonable values (e.g. not 0).
>
> > + }
> > +
> > /* Create a full header (including things like feature table) */
> > ret = qcow2_update_header(blk_bs(blk));
> > bdrv_graph_co_rdunlock();
> > @@ -3873,6 +3943,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> > qdict_put_str(qdict, BLOCK_OPT_COMPAT_LEVEL, "v3");
> > }
> >
> > + /* The available zoned-profile options are zbc, which stands for
> > + * ZBC/ZAC standards, and zns following NVMe ZNS spec. */
> > + val = qdict_get_try_str(qdict, BLOCK_OPT_Z_PROFILE);
> > + if (val) {
> > + qdict_put_str(qdict, BLOCK_OPT_Z_PROFILE, val);
> > + }
>
> What is the purpose of this code, it fetches and replaces the same qdict
> element?
It creates a string configuration for zoned_profile and matches the
user input to that config.
>
> > +
> > /* Change legacy command line options into QMP ones */
> > static const QDictRenames opt_renames[] = {
> > { BLOCK_OPT_BACKING_FILE, "backing-file" },
> > @@ -3885,6 +3962,13 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
> > { BLOCK_OPT_COMPAT_LEVEL, "version" },
> > { BLOCK_OPT_DATA_FILE_RAW, "data-file-raw" },
> > { BLOCK_OPT_COMPRESSION_TYPE, "compression-type" },
> > + { BLOCK_OPT_Z_PROFILE, "zoned-profile"},
> > + { BLOCK_OPT_Z_NR_COV, "zone-nr-conv"},
> > + { BLOCK_OPT_Z_MOZ, "max-open-zones"},
> > + { BLOCK_OPT_Z_MAZ, "max-active-zones"},
> > + { BLOCK_OPT_Z_MAS, "max-append-sectors"},
> > + { BLOCK_OPT_Z_SIZE, "zone-size"},
> > + { BLOCK_OPT_Z_CAP, "zone-capacity"},
> > { NULL, NULL },
> > };
> >
> > @@ -6048,6 +6132,41 @@ static QemuOptsList qcow2_create_opts = {
> > .help = "Compression method used for image cluster " \
> > "compression", \
> > .def_value_str = "zlib" \
> > + }, \
> > + {
>
> The forward slash ('\') that wraps the line is missing and indentation
> is off.
>
> > + .name = BLOCK_OPT_Z_PROFILE, \
> > + .type = QEMU_OPT_STRING, \
> > + .help = "zoned format option for the disk img", \
> > + }, \
> > + { \
>
> Indentation is off.
>
> > + .name = BLOCK_OPT_Z_SIZE, \
> > + .type = QEMU_OPT_SIZE, \
> > + .help = "zone size", \
> > + }, \
> > + { \
> > + .name = BLOCK_OPT_Z_CAP, \
> > + .type = QEMU_OPT_SIZE, \
> > + .help = "zone capacity", \
> > + }, \
> > + { \
> > + .name = BLOCK_OPT_Z_NR_COV, \
>
> Indentation is off.
>
> > + .type = QEMU_OPT_NUMBER, \
> > + .help = "numbers of conventional zones", \
> > + }, \
> > + { \
> > + .name = BLOCK_OPT_Z_MAS, \
> > + .type = QEMU_OPT_NUMBER, \
> > + .help = "max append sectors", \
> > + }, \
> > + { \
> > + .name = BLOCK_OPT_Z_MAZ, \
> > + .type = QEMU_OPT_NUMBER, \
> > + .help = "max active zones", \
> > + }, \
> > + { \
> > + .name = BLOCK_OPT_Z_MOZ, \
> > + .type = QEMU_OPT_NUMBER, \
> > + .help = "max open zones", \
> > },
> > QCOW_COMMON_OPTIONS,
> > { /* end of list */ }
> > diff --git a/block/qcow2.h b/block/qcow2.h
> > index 4f67eb912a..fe18dc4d97 100644
> > --- a/block/qcow2.h
> > +++ b/block/qcow2.h
> > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > uint64_t length;
> > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >
> > +typedef struct Qcow2ZonedHeaderExtension {
> > + /* Zoned device attributes */
> > + BlockZonedProfile zoned_profile;
> > + BlockZoneModel zoned;
> > + uint32_t zone_size;
> > + uint32_t zone_capacity;
> > + uint32_t nr_zones;
> > + uint32_t zone_nr_conv;
> > + uint32_t max_active_zones;
> > + uint32_t max_open_zones;
> > + uint32_t max_append_sectors;
> > + uint8_t padding[3];
>
> This looks strange. Why is there 3 bytes of padding at the end? Normally
> padding would align to an even power-of-two number of bytes like 2, 4,
> 8, etc.
It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
16, the padding is 2.
>
> > +} QEMU_PACKED Qcow2ZonedHeaderExtension;
> > +
> > typedef struct Qcow2UnknownHeaderExtension {
> > uint32_t magic;
> > uint32_t len;
> > @@ -419,6 +433,13 @@ typedef struct BDRVQcow2State {
> > * is to convert the image with the desired compression type set.
> > */
> > Qcow2CompressionType compression_type;
> > +
> > + /* States of zoned device */
> > + Qcow2ZonedHeaderExtension zoned_header;
> > + uint32_t nr_zones_exp_open;
> > + uint32_t nr_zones_imp_open;
> > + uint32_t nr_zones_closed;
> > + BlockZoneWps *wps;
>
> Normally qcow2 code passes bs around, so it should be possible to access
> the wps pointer without duplicating it here. This new field is not used
> in this patch, so I can't tell yet how important it is. It's safer to
> avoid duplicating pointers when the original pointer can be accessed
> conveniently so that use-after-free, double-free, and similar memory
> management bugs can be eliminated.
I see. Thanks!
>
> > } BDRVQcow2State;
> >
> > typedef struct Qcow2COWRegion {
> > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > index e15395f2cb..9f04a772f6 100644
> > --- a/include/block/block-common.h
> > +++ b/include/block/block-common.h
> > @@ -108,6 +108,11 @@ typedef enum BlockZoneType {
> > BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> > } BlockZoneType;
> >
> > +typedef enum BlockZonedProfile {
> > + BLK_ZP_ZBC = 0x1,
> > + BLK_ZP_ZNS = 0x2,
> > +} BlockZonedProfile;
> > +
> > /*
> > * Zone descriptor data structure.
> > * Provides information on a zone with all position and size values in bytes.
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index 74195c3004..4be35feaf8 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -57,6 +57,14 @@
> > #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
> > #define BLOCK_OPT_COMPRESSION_TYPE "compression_type"
> > #define BLOCK_OPT_EXTL2 "extended_l2"
> > +#define BLOCK_OPT_Z_PROFILE "zoned_profile"
> > +#define BLOCK_OPT_Z_MODEL "zoned"
> > +#define BLOCK_OPT_Z_SIZE "zone_size"
> > +#define BLOCK_OPT_Z_CAP "zone_capacity"
> > +#define BLOCK_OPT_Z_NR_COV "zone_nr_conv"
> > +#define BLOCK_OPT_Z_MAS "max_append_sectors"
> > +#define BLOCK_OPT_Z_MAZ "max_active_zones"
> > +#define BLOCK_OPT_Z_MOZ "max_open_zones"
> >
> > #define BLOCK_PROBE_BUF_SIZE 512
> >
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 4bf89171c6..f9a584cbb3 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5013,24 +5013,42 @@
> > #
> > # @compression-type: The image cluster compression method
> > # (default: zlib, since 5.1)
> > +# @zoned-profile: Two zoned device protocol options, zbc or zns
> > +# (default: off, since 8.0)
> > +# @zone-size: The size of a zone of the zoned device (since 8.0)
> > +# @zone-capacity: The capacity of a zone of the zoned device (since 8.0)
> > +# @zone-nr-conv: The number of conventional zones of the zoned device
> > +# (since 8.0)
> > +# @max-open-zones: The maximal allowed open zones (since 8.0)
> > +# @max-active-zones: The limit of the zones that have the implicit open,
> > +# explicit open or closed state (since 8.0)
> > +# @max-append-sectors: The maximal sectors that is allowed to append write
> > +# (since 8.0)
>
> Since 8.1.
>
> > #
> > # Since: 2.12
> > ##
> > { 'struct': 'BlockdevCreateOptionsQcow2',
> > - 'data': { 'file': 'BlockdevRef',
> > - '*data-file': 'BlockdevRef',
> > - '*data-file-raw': 'bool',
> > - '*extended-l2': 'bool',
> > - 'size': 'size',
> > - '*version': 'BlockdevQcow2Version',
> > - '*backing-file': 'str',
> > - '*backing-fmt': 'BlockdevDriver',
> > - '*encrypt': 'QCryptoBlockCreateOptions',
> > - '*cluster-size': 'size',
> > - '*preallocation': 'PreallocMode',
> > - '*lazy-refcounts': 'bool',
> > - '*refcount-bits': 'int',
> > - '*compression-type':'Qcow2CompressionType' } }
> > + 'data': { 'file': 'BlockdevRef',
> > + '*data-file': 'BlockdevRef',
> > + '*data-file-raw': 'bool',
> > + '*extended-l2': 'bool',
> > + 'size': 'size',
> > + '*version': 'BlockdevQcow2Version',
> > + '*backing-file': 'str',
> > + '*backing-fmt': 'BlockdevDriver',
> > + '*encrypt': 'QCryptoBlockCreateOptions',
> > + '*cluster-size': 'size',
> > + '*preallocation': 'PreallocMode',
> > + '*lazy-refcounts': 'bool',
> > + '*refcount-bits': 'int',
> > + '*compression-type': 'Qcow2CompressionType',
> > + '*zoned-profile': 'str',
> > + '*zone-size': 'size',
> > + '*zone-capacity': 'size',
> > + '*zone-nr-conv': 'uint32',
> > + '*max-open-zones': 'uint32',
> > + '*max-active-zones': 'uint32',
> > + '*max-append-sectors': 'uint32'}}
> >
> > ##
> > # @BlockdevCreateOptionsQed:
> > --
> > 2.40.1
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-19 10:32 ` Sam Li
@ 2023-06-19 14:42 ` Stefan Hajnoczi
2023-06-19 14:50 ` Sam Li
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-19 14:42 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]
On Mon, Jun 19, 2023 at 06:32:52PM +0800, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 18:10写道:
> > On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > > diff --git a/block/qcow2.h b/block/qcow2.h
> > > index 4f67eb912a..fe18dc4d97 100644
> > > --- a/block/qcow2.h
> > > +++ b/block/qcow2.h
> > > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > > uint64_t length;
> > > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> > >
> > > +typedef struct Qcow2ZonedHeaderExtension {
> > > + /* Zoned device attributes */
> > > + BlockZonedProfile zoned_profile;
> > > + BlockZoneModel zoned;
> > > + uint32_t zone_size;
> > > + uint32_t zone_capacity;
> > > + uint32_t nr_zones;
> > > + uint32_t zone_nr_conv;
> > > + uint32_t max_active_zones;
> > > + uint32_t max_open_zones;
> > > + uint32_t max_append_sectors;
> > > + uint8_t padding[3];
> >
> > This looks strange. Why is there 3 bytes of padding at the end? Normally
> > padding would align to an even power-of-two number of bytes like 2, 4,
> > 8, etc.
>
> It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
> 16, the padding is 2.
I don't understand. Can you explain why there is padding at the end of
this struct?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-19 14:42 ` Stefan Hajnoczi
@ 2023-06-19 14:50 ` Sam Li
2023-06-20 14:44 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: Sam Li @ 2023-06-19 14:50 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 22:42写道:
>
> On Mon, Jun 19, 2023 at 06:32:52PM +0800, Sam Li wrote:
> > Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 18:10写道:
> > > On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > > > diff --git a/block/qcow2.h b/block/qcow2.h
> > > > index 4f67eb912a..fe18dc4d97 100644
> > > > --- a/block/qcow2.h
> > > > +++ b/block/qcow2.h
> > > > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > > > uint64_t length;
> > > > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> > > >
> > > > +typedef struct Qcow2ZonedHeaderExtension {
> > > > + /* Zoned device attributes */
> > > > + BlockZonedProfile zoned_profile;
> > > > + BlockZoneModel zoned;
> > > > + uint32_t zone_size;
> > > > + uint32_t zone_capacity;
> > > > + uint32_t nr_zones;
> > > > + uint32_t zone_nr_conv;
> > > > + uint32_t max_active_zones;
> > > > + uint32_t max_open_zones;
> > > > + uint32_t max_append_sectors;
> > > > + uint8_t padding[3];
> > >
> > > This looks strange. Why is there 3 bytes of padding at the end? Normally
> > > padding would align to an even power-of-two number of bytes like 2, 4,
> > > 8, etc.
> >
> > It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
> > 16, the padding is 2.
>
> I don't understand. Can you explain why there is padding at the end of
> this struct?
The overall size should be aligned with 64 bit, which leaves use one
uint32_t and two fields zoned, zoned_profile. I am not sure the size
of macros here and it used 4 for each. So it makes 3 (*8) + 32 + 8 =
64 in the end. If the macro size is wrong, then the padding will
change as well.
Sam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-19 14:50 ` Sam Li
@ 2023-06-20 14:44 ` Stefan Hajnoczi
2023-06-20 15:07 ` Sam Li
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-20 14:44 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 2462 bytes --]
On Mon, Jun 19, 2023 at 10:50:31PM +0800, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 22:42写道:
> >
> > On Mon, Jun 19, 2023 at 06:32:52PM +0800, Sam Li wrote:
> > > Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 18:10写道:
> > > > On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > > > > diff --git a/block/qcow2.h b/block/qcow2.h
> > > > > index 4f67eb912a..fe18dc4d97 100644
> > > > > --- a/block/qcow2.h
> > > > > +++ b/block/qcow2.h
> > > > > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > > > > uint64_t length;
> > > > > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> > > > >
> > > > > +typedef struct Qcow2ZonedHeaderExtension {
> > > > > + /* Zoned device attributes */
> > > > > + BlockZonedProfile zoned_profile;
> > > > > + BlockZoneModel zoned;
> > > > > + uint32_t zone_size;
> > > > > + uint32_t zone_capacity;
> > > > > + uint32_t nr_zones;
> > > > > + uint32_t zone_nr_conv;
> > > > > + uint32_t max_active_zones;
> > > > > + uint32_t max_open_zones;
> > > > > + uint32_t max_append_sectors;
> > > > > + uint8_t padding[3];
> > > >
> > > > This looks strange. Why is there 3 bytes of padding at the end? Normally
> > > > padding would align to an even power-of-two number of bytes like 2, 4,
> > > > 8, etc.
> > >
> > > It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
> > > 16, the padding is 2.
> >
> > I don't understand. Can you explain why there is padding at the end of
> > this struct?
>
> The overall size should be aligned with 64 bit, which leaves use one
> uint32_t and two fields zoned, zoned_profile. I am not sure the size
> of macros here and it used 4 for each. So it makes 3 (*8) + 32 + 8 =
> 64 in the end. If the macro size is wrong, then the padding will
> change as well.
The choice of the type (char or int) representing an enum is
implementation-defined according to the C17 standard (see "6.7.2.2
Enumeration specifiers").
Therefore it's not portable to use enums in structs exposed to the
outside world (on-disk formats or network protocols).
Please use uint8_t for the zoned_profile and zoned fields and move them
to the end of the struct so the uint32_t fields are naturally aligned.
I think only 2 bytes of padding will be required to align the struct to
a 64-bit boundary once you've done that.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 2/4] qcow2: add configurations for zoned format extension
2023-06-20 14:44 ` Stefan Hajnoczi
@ 2023-06-20 15:07 ` Sam Li
0 siblings, 0 replies; 16+ messages in thread
From: Sam Li @ 2023-06-20 15:07 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月20日周二 22:48写道:
>
> On Mon, Jun 19, 2023 at 10:50:31PM +0800, Sam Li wrote:
> > Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 22:42写道:
> > >
> > > On Mon, Jun 19, 2023 at 06:32:52PM +0800, Sam Li wrote:
> > > > Stefan Hajnoczi <stefanha@redhat.com> 于2023年6月19日周一 18:10写道:
> > > > > On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > > > > > diff --git a/block/qcow2.h b/block/qcow2.h
> > > > > > index 4f67eb912a..fe18dc4d97 100644
> > > > > > --- a/block/qcow2.h
> > > > > > +++ b/block/qcow2.h
> > > > > > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > > > > > uint64_t length;
> > > > > > } QEMU_PACKED Qcow2CryptoHeaderExtension;
> > > > > >
> > > > > > +typedef struct Qcow2ZonedHeaderExtension {
> > > > > > + /* Zoned device attributes */
> > > > > > + BlockZonedProfile zoned_profile;
> > > > > > + BlockZoneModel zoned;
> > > > > > + uint32_t zone_size;
> > > > > > + uint32_t zone_capacity;
> > > > > > + uint32_t nr_zones;
> > > > > > + uint32_t zone_nr_conv;
> > > > > > + uint32_t max_active_zones;
> > > > > > + uint32_t max_open_zones;
> > > > > > + uint32_t max_append_sectors;
> > > > > > + uint8_t padding[3];
> > > > >
> > > > > This looks strange. Why is there 3 bytes of padding at the end? Normally
> > > > > padding would align to an even power-of-two number of bytes like 2, 4,
> > > > > 8, etc.
> > > >
> > > > It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
> > > > 16, the padding is 2.
> > >
> > > I don't understand. Can you explain why there is padding at the end of
> > > this struct?
> >
> > The overall size should be aligned with 64 bit, which leaves use one
> > uint32_t and two fields zoned, zoned_profile. I am not sure the size
> > of macros here and it used 4 for each. So it makes 3 (*8) + 32 + 8 =
> > 64 in the end. If the macro size is wrong, then the padding will
> > change as well.
>
> The choice of the type (char or int) representing an enum is
> implementation-defined according to the C17 standard (see "6.7.2.2
> Enumeration specifiers").
>
> Therefore it's not portable to use enums in structs exposed to the
> outside world (on-disk formats or network protocols).
>
> Please use uint8_t for the zoned_profile and zoned fields and move them
> to the end of the struct so the uint32_t fields are naturally aligned.
>
> I think only 2 bytes of padding will be required to align the struct to
> a 64-bit boundary once you've done that.
I see. Thanks!
Sam
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 3/4] qcow2: add zoned emulation capability
2023-06-05 10:41 [RFC 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2023-06-05 10:41 ` [RFC 1/4] docs/qcow2: add the zoned format feature Sam Li
2023-06-05 10:41 ` [RFC 2/4] qcow2: add configurations for zoned format extension Sam Li
@ 2023-06-05 10:41 ` Sam Li
2023-06-19 14:35 ` Stefan Hajnoczi
2023-06-05 10:41 ` [RFC 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
3 siblings, 1 reply; 16+ messages in thread
From: Sam Li @ 2023-06-05 10:41 UTC (permalink / raw)
To: qemu-devel
Cc: dlemoal, dmitry.fomichev, hare, stefanha, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz, Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.
Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.
Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
block/qcow2.c | 629 +++++++++++++++++++++++++++++++++++++++++++++++++-
block/qcow2.h | 2 +
2 files changed, 629 insertions(+), 2 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index b886dab42b..f030965d5d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp)
return cryptoopts_qdict;
}
+#define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59)
+
+static inline int qcow2_get_wp(uint64_t wp)
+{
+ /* clear state and type information */
+ return ((wp << 5) >> 5);
+}
+
+static inline int qcow2_get_zs(uint64_t wp)
+{
+ return (wp >> 60);
+}
+
+static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
+{
+ uint64_t addr = qcow2_get_wp(*wp);
+ addr |= ((uint64_t)zs << 60);
+ *wp = addr;
+}
+
+/*
+ * File wp tracking: reset zone, finish zone and append zone can
+ * change the value of write pointer. All zone operations will change
+ * the state of that/those zone.
+ * */
+static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
+ /* format: operations, the wp. */
+ printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
+}
+
+/*
+ * Perform a state assignment and a flush operation that writes the new wp
+ * value to the dedicated location of the disk file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index, BlockZoneState zs) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qcow2_set_wp(wp, zs);
+ ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
+ + sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+
+ if (ret < 0) {
+ goto exit;
+ }
+ qcow2_wp_tracking_helper(index, *wp);
+ return ret;
+
+exit:
+ error_report("Failed to write metadata with file");
+ return ret;
+}
+
+static int qcow2_check_active(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+
+ if (!s->zoned_header.max_active_zones) {
+ return 0;
+ }
+
+ if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+ < s->zoned_header.max_active_zones) {
+ return 0;
+ }
+
+ return -1;
+}
+
+static int qcow2_check_open(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ if (!s->zoned_header.max_open_zones) {
+ return 0;
+ }
+
+ if (s->nr_zones_exp_open + s->nr_zones_imp_open
+ < s->zoned_header.max_open_zones) {
+ return 0;
+ }
+
+ if(s->nr_zones_imp_open) {
+ ret = qcow2_check_active(bs);
+ if (ret == 0) {
+ /* TODO: it takes O(n) time complexity (n = nr_zones).
+ * Optimizations required. */
+ /* close one implicitly open zones to make it available */
+ for (int i = s->zoned_header.zone_nr_conv;
+ i < bs->bl.nr_zones; ++i) {
+ uint64_t *wp = &s->wps->wp[i];
+ if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
+ ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ return ret;
+ }
+ s->wps->wp[i] = *wp;
+ s->nr_zones_imp_open--;
+ s->nr_zones_closed++;
+ break;
+ }
+ }
+ return 0;
+ }
+ return ret;
+ }
+
+ return -1;
+}
+
+/*
+ * The zoned device has limited zone resources of open, closed, active
+ * zones.
+ */
+static int qcow2_check_zone_resources(BlockDriverState *bs,
+ BlockZoneState zs)
+{
+ int ret;
+
+ switch (zs) {
+ case BLK_ZS_EMPTY:
+ ret = qcow2_check_active(bs);
+ if (ret < 0) {
+ error_report("No enough active zones");
+ return ret;
+ }
+ return ret;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_open(bs);
+ if (ret < 0) {
+ error_report("No enough open zones");
+ return ret;
+ }
+ return ret;
+ default:
+ return -EINVAL;
+ }
+
+}
+
+static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
+{
+ int ret;
+ BDRVQcow2State *s = bs->opaque;
+ uint64_t *temp = g_malloc(s->zoned_header.zonedmeta_size);
+ ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
+ s->zoned_header.zonedmeta_size, temp, 0);
+ if (ret < 0) {
+ error_report("Can not read metadata\n");
+ return ret;
+ }
+
+ memcpy(s->wps->wp, temp, s->zoned_header.zonedmeta_size);
+ return 0;
+}
+
/*
* read qcow2 extension and fill bs
* start reading from start_offset
@@ -455,7 +613,19 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
be32_to_cpu(zoned_ext.max_active_zones);
zoned_ext.max_append_sectors =
be32_to_cpu(zoned_ext.max_append_sectors);
+ zoned_ext.zonedmeta_offset =
+ be64_to_cpu(zoned_ext.zonedmeta_offset);
+ zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
s->zoned_header = zoned_ext;
+ s->wps = g_malloc(sizeof(BlockZoneWps)
+ + s->zoned_header.zonedmeta_size);
+ ret = qcow2_refresh_zonedmeta(bs);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "zonedmeta: "
+ "Could not update zoned meta");
+ return ret;
+ }
+ qemu_co_mutex_init(&s->wps->colock);
#ifdef DEBUG_EXT
printf("Qcow2: Got zoned format extension: "
@@ -1982,6 +2152,14 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
}
bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
bs->bl.pdiscard_alignment = s->cluster_size;
+ bs->bl.zoned = s->zoned_header.zoned;
+ bs->bl.nr_zones = s->zoned_header.nr_zones;
+ bs->wps = s->wps;
+ bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
+ bs->bl.max_active_zones = s->zoned_header.max_active_zones;
+ bs->bl.max_open_zones = s->zoned_header.max_open_zones;
+ bs->bl.zone_size = s->zoned_header.zone_size;
+ bs->bl.write_granularity = BDRV_SECTOR_SIZE;
}
static int qcow2_reopen_prepare(BDRVReopenState *state,
@@ -2672,9 +2850,26 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
uint64_t host_offset;
QCowL2Meta *l2meta = NULL;
AioTaskPool *aio = NULL;
+ int64_t start_offset, start_bytes;
+ BlockZoneState zs;
+ int64_t end;
+ uint64_t *wp;
+ int64_t zone_size = bs->bl.zone_size;
+ int index;
trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
+ start_offset = offset;
+ start_bytes = bytes;
+ /* The offset should not less than the wp of that
+ * zone where offset starts. */
+ if (zone_size) {
+ index = start_offset / zone_size;
+ wp = &s->wps->wp[index];
+ if (offset < qcow2_get_wp(*wp)) {
+ return -EINVAL;
+ }
+ }
while (bytes != 0 && aio_task_pool_status(aio) == 0) {
l2meta = NULL;
@@ -2720,6 +2915,47 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
qiov_offset += cur_bytes;
trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
}
+
+ if (zone_size) {
+ index = start_offset / zone_size;
+ wp = &s->wps->wp[index];
+ uint64_t wpv = *wp;
+ if (!QCOW2_ZT_IS_CONV(wpv)) {
+ /*
+ * Implicitly open one closed zone to write if there are zone resources
+ * left.
+ */
+ zs = qcow2_get_zs(wpv);
+ if (zs == BLK_ZS_CLOSED || zs == BLK_ZS_EMPTY) {
+ ret = qcow2_check_zone_resources(bs, zs);
+ if (ret < 0) {
+ goto fail_nometa;
+ }
+
+ if (zs == BLK_ZS_CLOSED) {
+ s->nr_zones_closed--;
+ s->nr_zones_imp_open++;
+ } else {
+ s->nr_zones_imp_open++;
+ }
+ }
+
+ /* align up (start_offset, zone_size), the start offset is not
+ * necessarily power of two. */
+ end = ((start_offset + zone_size) / zone_size) * zone_size;
+ if (start_offset + start_bytes <= end) {
+ *wp = start_offset + start_bytes;
+ } else {
+ ret = -EINVAL;
+ goto fail_nometa;
+ }
+
+ ret = qcow2_write_wp_at(bs, wp, index,BLK_ZS_IOPEN);
+ if (ret < 0) {
+ goto fail_nometa;
+ }
+ }
+ }
ret = 0;
qemu_co_mutex_lock(&s->lock);
@@ -3117,7 +3353,9 @@ int qcow2_update_header(BlockDriverState *bs)
.max_active_zones =
cpu_to_be32(s->zoned_header.max_active_zones),
.max_append_sectors =
- cpu_to_be32(s->zoned_header.max_append_sectors)
+ cpu_to_be32(s->zoned_header.max_append_sectors),
+ .zonedmeta_offset = cpu_to_be64(s->zoned_header.zonedmeta_offset),
+ .zonedmeta_size = cpu_to_be64(s->zoned_header.zonedmeta_size)
};
ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
&zoned_header, sizeof(zoned_header),
@@ -3522,7 +3760,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
int version;
int refcount_order;
uint64_t *refcount_table;
- int ret;
+ uint64_t zoned_meta_size, zoned_clusterlen;
+ int ret, offset, i;
uint8_t compression_type = QCOW2_COMPRESSION_TYPE_ZLIB;
assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
@@ -3823,6 +4062,48 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
+ s->zoned_header.nr_zones = qcow2_opts->size / qcow2_opts->zone_size;
+
+ zoned_meta_size = sizeof(uint64_t) * s->zoned_header.nr_zones;
+ uint64_t meta[zoned_meta_size];
+ memset(meta, 0, zoned_meta_size);
+
+ for (i = 0; i < s->zoned_header.zone_nr_conv; ++i) {
+ meta[i] = i * s->zoned_header.zone_size;
+ meta[i] += 1ULL << 59;
+ }
+ for (; i < s->zoned_header.nr_zones; ++i) {
+ meta[i] = i * s->zoned_header.zone_size;
+ /* For sequential zones, the first four most significant bit
+ * indicates zone states. */
+ meta[i] += ((uint64_t)BLK_ZS_EMPTY << 60);
+ }
+
+ offset = qcow2_alloc_clusters(blk_bs(blk), zoned_meta_size);
+ if (offset < 0) {
+ error_setg_errno(errp, -offset, "Could not allocate clusters "
+ "for zoned metadata size");
+ goto out;
+ }
+ s->zoned_header.zonedmeta_offset = offset;
+ s->zoned_header.zonedmeta_size = zoned_meta_size;
+
+ zoned_clusterlen = size_to_clusters(s, zoned_meta_size)
+ * s->cluster_size;
+ assert(qcow2_pre_write_overlap_check(bs, 0, offset,
+ zoned_clusterlen,false) == 0);
+ ret = bdrv_pwrite_zeroes(blk_bs(blk)->file, offset,
+ zoned_clusterlen, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not zero fill zoned metadata");
+ goto out;
+ }
+ ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not write zoned metadata "
+ "to disk");
+ goto out;
+ }
}
/* Create a full header (including things like feature table) */
@@ -4166,6 +4447,346 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
return ret;
}
+static int coroutine_fn
+qcow2_co_zone_report(BlockDriverState *bs, int64_t offset,
+ unsigned int *nr_zones, BlockZoneDescriptor *zones)
+{
+ BDRVQcow2State *s = bs->opaque;
+ uint64_t zone_size = s->zoned_header.zone_size;
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+ int64_t size = bs->bl.nr_zones * zone_size;
+ int i = 0;
+ int si;
+
+ if (zone_size > 0) {
+ si = offset / zone_size;
+ unsigned int nrz = *nr_zones;
+ qemu_co_mutex_lock(&s->wps->colock);
+ for (; i < nrz; ++i) {
+ zones[i].start = (si + i) * zone_size;
+
+ /* The last zone can be smaller than the zone size */
+ if ((si + i + 1) == bs->bl.nr_zones && size > capacity) {
+ zones[i].length = zone_size - (size - capacity);
+ } else {
+ zones[i].length = zone_size;
+ }
+ zones[i].cap = zone_size;
+
+ uint64_t wp = s->wps->wp[si + i];
+ if (QCOW2_ZT_IS_CONV(wp)) {
+ zones[i].type = BLK_ZT_CONV;
+ zones[i].state = BLK_ZS_NOT_WP;
+ /* Clear the zone type bit */
+ wp &= ~(1ULL << 59);
+ } else {
+ zones[i].type = BLK_ZT_SWR;
+ zones[i].state = qcow2_get_zs(wp);
+ /* Clear the zone state bits */
+ wp = qcow2_get_wp(wp);
+ }
+
+ zones[i].wp = wp;
+ if (si + i == bs->bl.nr_zones) {
+ break;
+ }
+ }
+ qemu_co_mutex_unlock(&s->wps->colock);
+ }
+ *nr_zones = i;
+ return 0;
+}
+
+static int qcow2_open_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&s->wps->colock);
+ uint64_t *wp = &s->wps->wp[index];
+ BlockZoneState zs = qcow2_get_zs(*wp);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
+ if (ret < 0) {
+ return ret;
+ }
+ break;
+ case BLK_ZS_IOPEN:
+ s->nr_zones_imp_open--;
+ break;
+ case BLK_ZS_EOPEN:
+ return 0;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ return ret;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ return -EINVAL;
+ }
+ ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_EOPEN);
+ if (!ret) {
+ s->nr_zones_exp_open++;
+ }
+ qemu_co_mutex_unlock(&s->wps->colock);
+ return ret;
+}
+
+static int qcow2_close_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&s->wps->colock);
+ uint64_t *wp = &s->wps->wp[index];
+ BlockZoneState zs = qcow2_get_zs(*wp);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ break;
+ case BLK_ZS_IOPEN:
+ s->nr_zones_imp_open--;
+ break;
+ case BLK_ZS_EOPEN:
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ return ret;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (zs == BLK_ZS_EMPTY) {
+ ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_EMPTY);
+ } else {
+ ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_CLOSED);
+ if (!ret) {
+ s->nr_zones_closed++;
+ }
+ }
+ qemu_co_mutex_unlock(&s->wps->colock);
+ return ret;
+}
+
+static int qcow2_finish_zone(BlockDriverState *bs, uint32_t index) {
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&s->wps->colock);
+ uint64_t *wp = &s->wps->wp[index];
+ BlockZoneState zs = qcow2_get_zs(*wp);
+
+ switch(zs) {
+ case BLK_ZS_EMPTY:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
+ if (ret < 0) {
+ return ret;
+ }
+ break;
+ case BLK_ZS_IOPEN:
+ s->nr_zones_imp_open--;
+ break;
+ case BLK_ZS_EOPEN:
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
+ if (ret < 0) {
+ return ret;
+ }
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ return 0;
+ default:
+ return -EINVAL;
+ }
+
+ *wp = (index + 1) * s->zoned_header.zone_size;
+ ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_FULL);
+ qemu_co_mutex_unlock(&s->wps->colock);
+ return ret;
+}
+
+static int qcow2_reset_zone(BlockDriverState *bs, uint32_t index,
+ int64_t len) {
+ BDRVQcow2State *s = bs->opaque;
+ int nrz = bs->bl.nr_zones;
+ int zone_size = bs->bl.zone_size;
+ int n, ret = 0;
+
+ qemu_co_mutex_lock(&s->wps->colock);
+ uint64_t *wp = &s->wps->wp[index];
+ if (len == bs->total_sectors << BDRV_SECTOR_BITS) {
+ n = nrz;
+ index = 0;
+ } else {
+ n = len / zone_size;
+ }
+
+ for (int i = 0; i < n; ++i) {
+ uint64_t *wp_i = (uint64_t *)(wp + i);
+ uint64_t wpi_v = *wp_i;
+ if (QCOW2_ZT_IS_CONV(wpi_v)) {
+ continue;
+ }
+
+ BlockZoneState zs = qcow2_get_zs(wpi_v);
+ switch (zs) {
+ case BLK_ZS_EMPTY:
+ break;
+ case BLK_ZS_IOPEN:
+ s->nr_zones_imp_open--;
+ break;
+ case BLK_ZS_EOPEN:
+ s->nr_zones_exp_open--;
+ break;
+ case BLK_ZS_CLOSED:
+ s->nr_zones_closed--;
+ break;
+ case BLK_ZS_FULL:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (zs == BLK_ZS_EMPTY) {
+ continue;
+ }
+
+ *wp_i = (index + i) * zone_size;
+ ret = qcow2_write_wp_at(bs, wp_i, index + i, BLK_ZS_EMPTY);
+ if (ret < 0) {
+ return ret;
+ }
+ /* clear data */
+ ret = qcow2_co_pwrite_zeroes(bs, qcow2_get_wp(*wp_i), zone_size, 0);
+ if (ret < 0) {
+ error_report("Failed to reset zone at 0x%" PRIx64 "", *wp_i);
+ }
+ }
+ qemu_co_mutex_unlock(&s->wps->colock);
+ return ret;
+}
+
+static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+ BDRVQcow2State *s = bs->opaque;
+ int ret = 0;
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+ int64_t zone_size = s->zoned_header.zone_size;
+ int64_t zone_size_mask = zone_size - 1;
+ uint32_t index = offset / zone_size;
+ BlockZoneWps *wps = s->wps;
+
+ if (offset & zone_size_mask) {
+ error_report("sector offset %" PRId64 " is not aligned to zone size"
+ " %" PRId64 "", offset / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ if (((offset + len) < capacity && len & zone_size_mask) ||
+ offset + len > capacity) {
+ error_report("number of sectors %" PRId64 " is not aligned to zone"
+ " size %" PRId64 "", len / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ qemu_co_mutex_lock(&wps->colock);
+ uint64_t wpv = wps->wp[offset / zone_size];
+ if (QCOW2_ZT_IS_CONV(wpv) && len != capacity) {
+ error_report("zone mgmt operations are not allowed for "
+ "conventional zones");
+ ret = -EIO;
+ goto unlock;
+ }
+ qemu_co_mutex_unlock(&wps->colock);
+
+ switch(op) {
+ case BLK_ZO_OPEN:
+ ret = qcow2_open_zone(bs, index);
+ break;
+ case BLK_ZO_CLOSE:
+ ret = qcow2_close_zone(bs, index);
+ break;
+ case BLK_ZO_FINISH:
+ ret = qcow2_finish_zone(bs, index);
+ break;
+ case BLK_ZO_RESET:
+ ret = qcow2_reset_zone(bs, index, len);
+ break;
+ default:
+ error_report("Unsupported zone op: 0x%x", op);
+ ret = -ENOTSUP;
+ break;
+ }
+ return ret;
+
+unlock:
+ qemu_co_mutex_unlock(&wps->colock);
+ return ret;
+}
+
+static int coroutine_fn
+qcow2_co_zone_append(BlockDriverState *bs, int64_t *offset, QEMUIOVector *qiov,
+ BdrvRequestFlags flags)
+{
+ assert(flags == 0);
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
+ int64_t iov_len = 0;
+ int64_t len = 0;
+
+ /* offset + len should not pass the end of that zone starting from offset */
+ if (*offset & zone_size_mask) {
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
+ return -EINVAL;
+ }
+
+ int64_t wg = bs->bl.write_granularity;
+ int64_t wg_mask = wg - 1;
+ for (int i = 0; i < qiov->niov; i++) {
+ iov_len = qiov->iov[i].iov_len;
+ if (iov_len & wg_mask) {
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
+ "block size %" PRId64 "", i, iov_len, wg);
+ return -EINVAL;
+ }
+ }
+ len = qiov->size;
+
+ if ((len >> BDRV_SECTOR_BITS) > bs->bl.max_append_sectors) {
+ return -ENOTSUP;
+ }
+
+ qemu_co_mutex_lock(&s->wps->colock);
+ uint64_t wp = s->wps->wp[*offset / bs->bl.zone_size];
+ uint64_t wp_i = qcow2_get_wp(wp);
+ ret = qcow2_co_pwritev_part(bs, wp_i, len, qiov, 0, 0);
+ if (ret == 0) {
+ *offset = wp_i;
+ } else {
+ error_report("qcow2: zap failed");
+ }
+
+ qemu_co_mutex_unlock(&s->wps->colock);
+ return ret;
+}
+
static int coroutine_fn GRAPH_RDLOCK
qcow2_co_copy_range_from(BlockDriverState *bs,
BdrvChild *src, int64_t src_offset,
@@ -6214,6 +6835,10 @@ BlockDriver bdrv_qcow2 = {
.bdrv_co_pwritev_part = qcow2_co_pwritev_part,
.bdrv_co_flush_to_os = qcow2_co_flush_to_os,
+ .bdrv_co_zone_report = qcow2_co_zone_report,
+ .bdrv_co_zone_mgmt = qcow2_co_zone_mgmt,
+ .bdrv_co_zone_append = qcow2_co_zone_append,
+
.bdrv_co_pwrite_zeroes = qcow2_co_pwrite_zeroes,
.bdrv_co_pdiscard = qcow2_co_pdiscard,
.bdrv_co_copy_range_from = qcow2_co_copy_range_from,
diff --git a/block/qcow2.h b/block/qcow2.h
index fe18dc4d97..a3a96ddbce 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -246,6 +246,8 @@ typedef struct Qcow2ZonedHeaderExtension {
uint32_t max_active_zones;
uint32_t max_open_zones;
uint32_t max_append_sectors;
+ uint64_t zonedmeta_offset;
+ uint64_t zonedmeta_size;
uint8_t padding[3];
} QEMU_PACKED Qcow2ZonedHeaderExtension;
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 3/4] qcow2: add zoned emulation capability
2023-06-05 10:41 ` [RFC 3/4] qcow2: add zoned emulation capability Sam Li
@ 2023-06-19 14:35 ` Stefan Hajnoczi
0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-19 14:35 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 28171 bytes --]
On Mon, Jun 05, 2023 at 06:41:07PM +0800, Sam Li wrote:
> By adding zone operations and zoned metadata, the zoned emulation
> capability enables full emulation support of zoned device using
> a qcow2 file. The zoned device metadata includes zone type,
> zoned device state and write pointer of each zone, which is stored
> to an array of unsigned integers.
>
> Each zone of a zoned device makes state transitions following
> the zone state machine. The zone state machine mainly describes
> five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> READ ONLY and OFFLINE states will generally be affected by device
> internal events. The operations on zones cause corresponding state
> changing.
>
> Zoned devices have a limit on zone resources, which puts constraints on
> write operations into zones.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> block/qcow2.c | 629 +++++++++++++++++++++++++++++++++++++++++++++++++-
> block/qcow2.h | 2 +
> 2 files changed, 629 insertions(+), 2 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b886dab42b..f030965d5d 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp)
> return cryptoopts_qdict;
> }
>
> +#define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59)
> +
> +static inline int qcow2_get_wp(uint64_t wp)
> +{
> + /* clear state and type information */
> + return ((wp << 5) >> 5);
> +}
> +
> +static inline int qcow2_get_zs(uint64_t wp)
> +{
> + return (wp >> 60);
> +}
> +
> +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
> +{
> + uint64_t addr = qcow2_get_wp(*wp);
> + addr |= ((uint64_t)zs << 60);
> + *wp = addr;
> +}
> +
> +/*
> + * File wp tracking: reset zone, finish zone and append zone can
> + * change the value of write pointer. All zone operations will change
> + * the state of that/those zone.
> + * */
> +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
> + /* format: operations, the wp. */
> + printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
> +}
> +
> +/*
> + * Perform a state assignment and a flush operation that writes the new wp
> + * value to the dedicated location of the disk file.
> + */
> +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
> + uint32_t index, BlockZoneState zs) {
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> +
> + qcow2_set_wp(wp, zs);
> + ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
> + + sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
> +
> + if (ret < 0) {
> + goto exit;
> + }
> + qcow2_wp_tracking_helper(index, *wp);
> + return ret;
> +
> +exit:
> + error_report("Failed to write metadata with file");
> + return ret;
> +}
> +
> +static int qcow2_check_active(BlockDriverState *bs)
> +{
> + BDRVQcow2State *s = bs->opaque;
> +
> + if (!s->zoned_header.max_active_zones) {
> + return 0;
> + }
> +
> + if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
> + < s->zoned_header.max_active_zones) {
> + return 0;
> + }
> +
> + return -1;
> +}
> +
> +static int qcow2_check_open(BlockDriverState *bs)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> +
> + if (!s->zoned_header.max_open_zones) {
> + return 0;
> + }
> +
> + if (s->nr_zones_exp_open + s->nr_zones_imp_open
> + < s->zoned_header.max_open_zones) {
> + return 0;
> + }
> +
> + if(s->nr_zones_imp_open) {
> + ret = qcow2_check_active(bs);
> + if (ret == 0) {
> + /* TODO: it takes O(n) time complexity (n = nr_zones).
> + * Optimizations required. */
> + /* close one implicitly open zones to make it available */
> + for (int i = s->zoned_header.zone_nr_conv;
> + i < bs->bl.nr_zones; ++i) {
> + uint64_t *wp = &s->wps->wp[i];
> + if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> + ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
> + if (ret < 0) {
> + return ret;
> + }
> + s->wps->wp[i] = *wp;
> + s->nr_zones_imp_open--;
> + s->nr_zones_closed++;
> + break;
> + }
> + }
> + return 0;
> + }
> + return ret;
> + }
> +
> + return -1;
> +}
> +
> +/*
> + * The zoned device has limited zone resources of open, closed, active
> + * zones.
> + */
> +static int qcow2_check_zone_resources(BlockDriverState *bs,
> + BlockZoneState zs)
> +{
> + int ret;
> +
> + switch (zs) {
> + case BLK_ZS_EMPTY:
> + ret = qcow2_check_active(bs);
> + if (ret < 0) {
> + error_report("No enough active zones");
> + return ret;
> + }
> + return ret;
> + case BLK_ZS_CLOSED:
> + ret = qcow2_check_open(bs);
> + if (ret < 0) {
> + error_report("No enough open zones");
> + return ret;
> + }
> + return ret;
> + default:
> + return -EINVAL;
> + }
> +
> +}
> +
> +static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
> +{
> + int ret;
> + BDRVQcow2State *s = bs->opaque;
> + uint64_t *temp = g_malloc(s->zoned_header.zonedmeta_size);
> + ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
> + s->zoned_header.zonedmeta_size, temp, 0);
> + if (ret < 0) {
> + error_report("Can not read metadata\n");
> + return ret;
> + }
> +
> + memcpy(s->wps->wp, temp, s->zoned_header.zonedmeta_size);
> + return 0;
> +}
> +
> /*
> * read qcow2 extension and fill bs
> * start reading from start_offset
> @@ -455,7 +613,19 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> be32_to_cpu(zoned_ext.max_active_zones);
> zoned_ext.max_append_sectors =
> be32_to_cpu(zoned_ext.max_append_sectors);
> + zoned_ext.zonedmeta_offset =
> + be64_to_cpu(zoned_ext.zonedmeta_offset);
> + zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
> s->zoned_header = zoned_ext;
> + s->wps = g_malloc(sizeof(BlockZoneWps)
> + + s->zoned_header.zonedmeta_size);
> + ret = qcow2_refresh_zonedmeta(bs);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "zonedmeta: "
> + "Could not update zoned meta");
> + return ret;
> + }
> + qemu_co_mutex_init(&s->wps->colock);
>
> #ifdef DEBUG_EXT
> printf("Qcow2: Got zoned format extension: "
> @@ -1982,6 +2152,14 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
> }
> bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
> bs->bl.pdiscard_alignment = s->cluster_size;
> + bs->bl.zoned = s->zoned_header.zoned;
> + bs->bl.nr_zones = s->zoned_header.nr_zones;
> + bs->wps = s->wps;
> + bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
> + bs->bl.max_active_zones = s->zoned_header.max_active_zones;
> + bs->bl.max_open_zones = s->zoned_header.max_open_zones;
> + bs->bl.zone_size = s->zoned_header.zone_size;
> + bs->bl.write_granularity = BDRV_SECTOR_SIZE;
> }
>
> static int qcow2_reopen_prepare(BDRVReopenState *state,
> @@ -2672,9 +2850,26 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
> uint64_t host_offset;
> QCowL2Meta *l2meta = NULL;
> AioTaskPool *aio = NULL;
> + int64_t start_offset, start_bytes;
> + BlockZoneState zs;
> + int64_t end;
> + uint64_t *wp;
> + int64_t zone_size = bs->bl.zone_size;
> + int index;
>
> trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
>
> + start_offset = offset;
> + start_bytes = bytes;
> + /* The offset should not less than the wp of that
> + * zone where offset starts. */
> + if (zone_size) {
> + index = start_offset / zone_size;
> + wp = &s->wps->wp[index];
> + if (offset < qcow2_get_wp(*wp)) {
> + return -EINVAL;
> + }
> + }
> while (bytes != 0 && aio_task_pool_status(aio) == 0) {
>
> l2meta = NULL;
> @@ -2720,6 +2915,47 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
> qiov_offset += cur_bytes;
> trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
> }
> +
> + if (zone_size) {
> + index = start_offset / zone_size;
> + wp = &s->wps->wp[index];
> + uint64_t wpv = *wp;
> + if (!QCOW2_ZT_IS_CONV(wpv)) {
> + /*
> + * Implicitly open one closed zone to write if there are zone resources
> + * left.
> + */
> + zs = qcow2_get_zs(wpv);
> + if (zs == BLK_ZS_CLOSED || zs == BLK_ZS_EMPTY) {
> + ret = qcow2_check_zone_resources(bs, zs);
> + if (ret < 0) {
> + goto fail_nometa;
> + }
> +
> + if (zs == BLK_ZS_CLOSED) {
> + s->nr_zones_closed--;
> + s->nr_zones_imp_open++;
> + } else {
> + s->nr_zones_imp_open++;
> + }
> + }
> +
> + /* align up (start_offset, zone_size), the start offset is not
> + * necessarily power of two. */
> + end = ((start_offset + zone_size) / zone_size) * zone_size;
> + if (start_offset + start_bytes <= end) {
> + *wp = start_offset + start_bytes;
> + } else {
> + ret = -EINVAL;
> + goto fail_nometa;
> + }
> +
> + ret = qcow2_write_wp_at(bs, wp, index,BLK_ZS_IOPEN);
> + if (ret < 0) {
> + goto fail_nometa;
> + }
> + }
> + }
> ret = 0;
>
> qemu_co_mutex_lock(&s->lock);
> @@ -3117,7 +3353,9 @@ int qcow2_update_header(BlockDriverState *bs)
> .max_active_zones =
> cpu_to_be32(s->zoned_header.max_active_zones),
> .max_append_sectors =
> - cpu_to_be32(s->zoned_header.max_append_sectors)
> + cpu_to_be32(s->zoned_header.max_append_sectors),
> + .zonedmeta_offset = cpu_to_be64(s->zoned_header.zonedmeta_offset),
> + .zonedmeta_size = cpu_to_be64(s->zoned_header.zonedmeta_size)
> };
> ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
> &zoned_header, sizeof(zoned_header),
> @@ -3522,7 +3760,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> int version;
> int refcount_order;
> uint64_t *refcount_table;
> - int ret;
> + uint64_t zoned_meta_size, zoned_clusterlen;
> + int ret, offset, i;
> uint8_t compression_type = QCOW2_COMPRESSION_TYPE_ZLIB;
>
> assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
> @@ -3823,6 +4062,48 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
> s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
> s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
> s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
> + s->zoned_header.nr_zones = qcow2_opts->size / qcow2_opts->zone_size;
> +
> + zoned_meta_size = sizeof(uint64_t) * s->zoned_header.nr_zones;
> + uint64_t meta[zoned_meta_size];
zoned_meta_size is in bytes but the array is in uint64_t. I guess the
array size should be s->zoned_header.nr_zones (in zones) instead of
zoned_meta_size (in bytes).
Please use g_autoptr and g_new() for this to avoid stack overflow issues
if nr_zones is large.
> + memset(meta, 0, zoned_meta_size);
Unnecessary if you use g_new0(). Also, zeroing is probably unnecessary
since the for loops below fill in every element of the array.
> +
> + for (i = 0; i < s->zoned_header.zone_nr_conv; ++i) {
> + meta[i] = i * s->zoned_header.zone_size;
> + meta[i] += 1ULL << 59;
Bitwise OR ('|') is clearer than addition. You do not rely on or want
the add operation's arithmetic carry here.
> + }
> + for (; i < s->zoned_header.nr_zones; ++i) {
> + meta[i] = i * s->zoned_header.zone_size;
> + /* For sequential zones, the first four most significant bit
> + * indicates zone states. */
> + meta[i] += ((uint64_t)BLK_ZS_EMPTY << 60);
Bitwise OR.
> + }
> +
> + offset = qcow2_alloc_clusters(blk_bs(blk), zoned_meta_size);
> + if (offset < 0) {
> + error_setg_errno(errp, -offset, "Could not allocate clusters "
> + "for zoned metadata size");
> + goto out;
> + }
> + s->zoned_header.zonedmeta_offset = offset;
> + s->zoned_header.zonedmeta_size = zoned_meta_size;
> +
> + zoned_clusterlen = size_to_clusters(s, zoned_meta_size)
> + * s->cluster_size;
> + assert(qcow2_pre_write_overlap_check(bs, 0, offset,
> + zoned_clusterlen,false) == 0);
> + ret = bdrv_pwrite_zeroes(blk_bs(blk)->file, offset,
> + zoned_clusterlen, 0);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Could not zero fill zoned metadata");
> + goto out;
> + }
> + ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Could not write zoned metadata "
> + "to disk");
> + goto out;
> + }
> }
>
> /* Create a full header (including things like feature table) */
> @@ -4166,6 +4447,346 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
> return ret;
> }
>
> +static int coroutine_fn
> +qcow2_co_zone_report(BlockDriverState *bs, int64_t offset,
> + unsigned int *nr_zones, BlockZoneDescriptor *zones)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + uint64_t zone_size = s->zoned_header.zone_size;
> + int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
> + int64_t size = bs->bl.nr_zones * zone_size;
> + int i = 0;
> + int si;
> +
> + if (zone_size > 0) {
> + si = offset / zone_size;
offset must be validated. It might be beyond the capacity of the device.
> + unsigned int nrz = *nr_zones;
nr_zones must be validated. It might be larger than bs->bl.nr_zones.
> + qemu_co_mutex_lock(&s->wps->colock);
> + for (; i < nrz; ++i) {
> + zones[i].start = (si + i) * zone_size;
> +
> + /* The last zone can be smaller than the zone size */
> + if ((si + i + 1) == bs->bl.nr_zones && size > capacity) {
> + zones[i].length = zone_size - (size - capacity);
> + } else {
> + zones[i].length = zone_size;
> + }
> + zones[i].cap = zone_size;
Should capacity also be capped for the last zone?
> +
> + uint64_t wp = s->wps->wp[si + i];
> + if (QCOW2_ZT_IS_CONV(wp)) {
> + zones[i].type = BLK_ZT_CONV;
> + zones[i].state = BLK_ZS_NOT_WP;
> + /* Clear the zone type bit */
> + wp &= ~(1ULL << 59);
> + } else {
> + zones[i].type = BLK_ZT_SWR;
> + zones[i].state = qcow2_get_zs(wp);
> + /* Clear the zone state bits */
> + wp = qcow2_get_wp(wp);
> + }
> +
> + zones[i].wp = wp;
> + if (si + i == bs->bl.nr_zones) {
> + break;
> + }
This check is too late because wp[] has already been indexed. It is
insufficient when the first zone is already out of bounds.
> + }
> + qemu_co_mutex_unlock(&s->wps->colock);
> + }
> + *nr_zones = i;
> + return 0;
> +}
> +
> +static int qcow2_open_zone(BlockDriverState *bs, uint32_t index) {
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> +
> + qemu_co_mutex_lock(&s->wps->colock);
> + uint64_t *wp = &s->wps->wp[index];
> + BlockZoneState zs = qcow2_get_zs(*wp);
> +
> + switch(zs) {
> + case BLK_ZS_EMPTY:
> + ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
> + if (ret < 0) {
> + return ret;
> + }
> + break;
> + case BLK_ZS_IOPEN:
> + s->nr_zones_imp_open--;
> + break;
> + case BLK_ZS_EOPEN:
> + return 0;
> + case BLK_ZS_CLOSED:
> + ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
> + if (ret < 0) {
> + return ret;
> + }
> + s->nr_zones_closed--;
> + break;
> + case BLK_ZS_FULL:
> + break;
> + default:
> + return -EINVAL;
> + }
s->wps->colock is not unlocked in the return code paths above.
> + ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_EOPEN);
I wanted to confirm with you and Damien that zone states are persisted
in the zoned storage model, even OPEN/CLOSED? To me, OPEN/CLOSED, seem
more related to runtime resource limits than persistent state that needs
to be stored.
> + if (!ret) {
> + s->nr_zones_exp_open++;
> + }
> + qemu_co_mutex_unlock(&s->wps->colock);
> + return ret;
> +}
> +
> +static int qcow2_close_zone(BlockDriverState *bs, uint32_t index) {
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> +
> + qemu_co_mutex_lock(&s->wps->colock);
> + uint64_t *wp = &s->wps->wp[index];
> + BlockZoneState zs = qcow2_get_zs(*wp);
> +
> + switch(zs) {
> + case BLK_ZS_EMPTY:
> + break;
> + case BLK_ZS_IOPEN:
> + s->nr_zones_imp_open--;
> + break;
> + case BLK_ZS_EOPEN:
> + s->nr_zones_exp_open--;
> + break;
> + case BLK_ZS_CLOSED:
> + ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
> + if (ret < 0) {
> + return ret;
> + }
> + s->nr_zones_closed--;
> + break;
> + case BLK_ZS_FULL:
> + break;
> + default:
> + return -EINVAL;
> + }
s->wps->colock is not unlocked in the return code paths above.
> +
> + if (zs == BLK_ZS_EMPTY) {
> + ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_EMPTY);
> + } else {
> + ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_CLOSED);
> + if (!ret) {
> + s->nr_zones_closed++;
> + }
> + }
> + qemu_co_mutex_unlock(&s->wps->colock);
> + return ret;
> +}
> +
> +static int qcow2_finish_zone(BlockDriverState *bs, uint32_t index) {
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> +
> + qemu_co_mutex_lock(&s->wps->colock);
> + uint64_t *wp = &s->wps->wp[index];
> + BlockZoneState zs = qcow2_get_zs(*wp);
> +
> + switch(zs) {
> + case BLK_ZS_EMPTY:
> + ret = qcow2_check_zone_resources(bs, BLK_ZS_EMPTY);
> + if (ret < 0) {
> + return ret;
> + }
> + break;
> + case BLK_ZS_IOPEN:
> + s->nr_zones_imp_open--;
> + break;
> + case BLK_ZS_EOPEN:
> + s->nr_zones_exp_open--;
> + break;
> + case BLK_ZS_CLOSED:
> + ret = qcow2_check_zone_resources(bs, BLK_ZS_CLOSED);
> + if (ret < 0) {
> + return ret;
> + }
> + s->nr_zones_closed--;
> + break;
> + case BLK_ZS_FULL:
> + return 0;
> + default:
> + return -EINVAL;
> + }
s->wps->colock is not unlocked in the return code paths above.
> +
> + *wp = (index + 1) * s->zoned_header.zone_size;
There is an integer overflow here. Please see my comment in
qcow2_reset_zone() below.
> + ret = qcow2_write_wp_at(bs, wp, index, BLK_ZS_FULL);
> + qemu_co_mutex_unlock(&s->wps->colock);
> + return ret;
> +}
> +
> +static int qcow2_reset_zone(BlockDriverState *bs, uint32_t index,
> + int64_t len) {
> + BDRVQcow2State *s = bs->opaque;
> + int nrz = bs->bl.nr_zones;
> + int zone_size = bs->bl.zone_size;
> + int n, ret = 0;
> +
> + qemu_co_mutex_lock(&s->wps->colock);
> + uint64_t *wp = &s->wps->wp[index];
> + if (len == bs->total_sectors << BDRV_SECTOR_BITS) {
> + n = nrz;
> + index = 0;
> + } else {
> + n = len / zone_size;
> + }
> +
> + for (int i = 0; i < n; ++i) {
> + uint64_t *wp_i = (uint64_t *)(wp + i);
> + uint64_t wpi_v = *wp_i;
> + if (QCOW2_ZT_IS_CONV(wpi_v)) {
> + continue;
> + }
> +
> + BlockZoneState zs = qcow2_get_zs(wpi_v);
> + switch (zs) {
> + case BLK_ZS_EMPTY:
> + break;
> + case BLK_ZS_IOPEN:
> + s->nr_zones_imp_open--;
> + break;
> + case BLK_ZS_EOPEN:
> + s->nr_zones_exp_open--;
> + break;
> + case BLK_ZS_CLOSED:
> + s->nr_zones_closed--;
> + break;
> + case BLK_ZS_FULL:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (zs == BLK_ZS_EMPTY) {
> + continue;
> + }
> +
> + *wp_i = (index + i) * zone_size;
This calculation needs uint64_t to avoid overflowing int. The types
involved are:
uint64_t = (uint32_t + int) * int;
You can fix it using:
*wp_i = ((uint64_t)index + i) * zone_size;
Then the entire expression will be evaluated as a 64-bit integer instead
of a 32-bit integer.
> + ret = qcow2_write_wp_at(bs, wp_i, index + i, BLK_ZS_EMPTY);
> + if (ret < 0) {
> + return ret;
s->wps->colock must be unlocked.
> + }
> + /* clear data */
> + ret = qcow2_co_pwrite_zeroes(bs, qcow2_get_wp(*wp_i), zone_size, 0);
Does zone reset guarantee that the data blocks will be zeroed according
to the zoned storage model?
> + if (ret < 0) {
> + error_report("Failed to reset zone at 0x%" PRIx64 "", *wp_i);
> + }
> + }
> + qemu_co_mutex_unlock(&s->wps->colock);
> + return ret;
> +}
> +
> +static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> + int64_t offset, int64_t len)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + int ret = 0;
> + int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
> + int64_t zone_size = s->zoned_header.zone_size;
> + int64_t zone_size_mask = zone_size - 1;
> + uint32_t index = offset / zone_size;
> + BlockZoneWps *wps = s->wps;
> +
> + if (offset & zone_size_mask) {
> + error_report("sector offset %" PRId64 " is not aligned to zone size"
> + " %" PRId64 "", offset / 512, zone_size / 512);
> + return -EINVAL;
> + }
> +
> + if (((offset + len) < capacity && len & zone_size_mask) ||
> + offset + len > capacity) {
> + error_report("number of sectors %" PRId64 " is not aligned to zone"
> + " size %" PRId64 "", len / 512, zone_size / 512);
> + return -EINVAL;
> + }
> +
> + qemu_co_mutex_lock(&wps->colock);
> + uint64_t wpv = wps->wp[offset / zone_size];
Use index here instead of recalculating it?
> + if (QCOW2_ZT_IS_CONV(wpv) && len != capacity) {
> + error_report("zone mgmt operations are not allowed for "
> + "conventional zones");
> + ret = -EIO;
> + goto unlock;
> + }
> + qemu_co_mutex_unlock(&wps->colock);
> +
> + switch(op) {
> + case BLK_ZO_OPEN:
> + ret = qcow2_open_zone(bs, index);
> + break;
> + case BLK_ZO_CLOSE:
> + ret = qcow2_close_zone(bs, index);
> + break;
> + case BLK_ZO_FINISH:
> + ret = qcow2_finish_zone(bs, index);
> + break;
> + case BLK_ZO_RESET:
> + ret = qcow2_reset_zone(bs, index, len);
> + break;
> + default:
> + error_report("Unsupported zone op: 0x%x", op);
> + ret = -ENOTSUP;
> + break;
> + }
> + return ret;
> +
> +unlock:
> + qemu_co_mutex_unlock(&wps->colock);
> + return ret;
> +}
> +
> +static int coroutine_fn
> +qcow2_co_zone_append(BlockDriverState *bs, int64_t *offset, QEMUIOVector *qiov,
> + BdrvRequestFlags flags)
> +{
> + assert(flags == 0);
> + BDRVQcow2State *s = bs->opaque;
> + int ret;
> + int64_t zone_size_mask = bs->bl.zone_size - 1;
> + int64_t iov_len = 0;
> + int64_t len = 0;
> +
> + /* offset + len should not pass the end of that zone starting from offset */
> + if (*offset & zone_size_mask) {
> + error_report("sector offset %" PRId64 " is not aligned to zone size "
> + "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
> + return -EINVAL;
> + }
> +
> + int64_t wg = bs->bl.write_granularity;
> + int64_t wg_mask = wg - 1;
> + for (int i = 0; i < qiov->niov; i++) {
> + iov_len = qiov->iov[i].iov_len;
> + if (iov_len & wg_mask) {
> + error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
> + "block size %" PRId64 "", i, iov_len, wg);
> + return -EINVAL;
> + }
> + }
> + len = qiov->size;
> +
> + if ((len >> BDRV_SECTOR_BITS) > bs->bl.max_append_sectors) {
> + return -ENOTSUP;
> + }
> +
> + qemu_co_mutex_lock(&s->wps->colock);
> + uint64_t wp = s->wps->wp[*offset / bs->bl.zone_size];
Where is *offset checked against nr_zones to prevent an access beyond
the end of the array?
> + uint64_t wp_i = qcow2_get_wp(wp);
> + ret = qcow2_co_pwritev_part(bs, wp_i, len, qiov, 0, 0);
> + if (ret == 0) {
> + *offset = wp_i;
> + } else {
> + error_report("qcow2: zap failed");
> + }
> +
> + qemu_co_mutex_unlock(&s->wps->colock);
> + return ret;
> +}
> +
> static int coroutine_fn GRAPH_RDLOCK
> qcow2_co_copy_range_from(BlockDriverState *bs,
> BdrvChild *src, int64_t src_offset,
> @@ -6214,6 +6835,10 @@ BlockDriver bdrv_qcow2 = {
> .bdrv_co_pwritev_part = qcow2_co_pwritev_part,
> .bdrv_co_flush_to_os = qcow2_co_flush_to_os,
>
> + .bdrv_co_zone_report = qcow2_co_zone_report,
> + .bdrv_co_zone_mgmt = qcow2_co_zone_mgmt,
> + .bdrv_co_zone_append = qcow2_co_zone_append,
> +
> .bdrv_co_pwrite_zeroes = qcow2_co_pwrite_zeroes,
> .bdrv_co_pdiscard = qcow2_co_pdiscard,
> .bdrv_co_copy_range_from = qcow2_co_copy_range_from,
> diff --git a/block/qcow2.h b/block/qcow2.h
> index fe18dc4d97..a3a96ddbce 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -246,6 +246,8 @@ typedef struct Qcow2ZonedHeaderExtension {
> uint32_t max_active_zones;
> uint32_t max_open_zones;
> uint32_t max_append_sectors;
> + uint64_t zonedmeta_offset;
> + uint64_t zonedmeta_size;
> uint8_t padding[3];
> } QEMU_PACKED Qcow2ZonedHeaderExtension;
>
> --
> 2.40.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 4/4] iotests: test the zoned format feature for qcow2 file
2023-06-05 10:41 [RFC 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
` (2 preceding siblings ...)
2023-06-05 10:41 ` [RFC 3/4] qcow2: add zoned emulation capability Sam Li
@ 2023-06-05 10:41 ` Sam Li
2023-06-19 14:40 ` Stefan Hajnoczi
3 siblings, 1 reply; 16+ messages in thread
From: Sam Li @ 2023-06-05 10:41 UTC (permalink / raw)
To: qemu-devel
Cc: dlemoal, dmitry.fomichev, hare, stefanha, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz, Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check zoned-qcow2
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
tests/qemu-iotests/tests/zoned-qcow2 | 110 +++++++++++++++++++++++
tests/qemu-iotests/tests/zoned-qcow2.out | 87 ++++++++++++++++++
2 files changed, 197 insertions(+)
create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
diff --git a/tests/qemu-iotests/tests/zoned-qcow2 b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 0000000000..6aa5ab3a03
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+ _cleanup_test_img
+ _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o zone_nr_conv=0 -o max_append_sectors=512 \
+-o max_open_zones=0 -o max_active_zones=0 -o zoned_profile=zbc
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: if the operations work"
+
+echo "(1) report the first zone:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x2C000000 2" # 0x2C000000 / 512 = 0x160000
+echo
+echo
+echo "(2) opening the first zone"
+$QEMU_IO $IMG -c "zo 0 0x4000000" # 0x4000000 / 512 = 0x20000
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+$QEMU_IO $IMG -c "zo 0x4000000 0x4000000"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x4000000 1"
+echo
+echo "opening the last zone"
+$QEMU_IO $IMG -c "zo 0x2C000000 0x4000000"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x2C000000 2"
+echo
+echo
+echo "(3) closing the first zone"
+$QEMU_IO $IMG -c "zc 0 0x4000000"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+$QEMU_IO $IMG -c "zc 0x3e70000000 0x4000000"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+$QEMU_IO $IMG -c "zf 0x4000000 0x4000000"
+echo "After finishing a zone:"
+$QEMU_IO $IMG -c "zrp 0x4000000 1"
+echo
+echo
+echo "(5) resetting the second zone"
+$QEMU_IO $IMG -c "zrs 0x4000000 0x4000000"
+echo "After resetting a zone:"
+$QEMU_IO $IMG -c "zrp 0x4000000 1"
+echo
+echo
+echo "(6) append write" # the physical block size of the device is 4096
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone firstly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone secondly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0x4000000 0x1000 0x2000"
+echo "After appending the second zone firstly:"
+$QEMU_IO $IMG -c "zrp 0x4000000 1"
+$QEMU_IO $IMG -c "zap -p 0x4000000 0x1000 0x2000"
+echo "After appending the second zone secondly:"
+$QEMU_IO $IMG -c "zrp 0x4000000 1"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 0000000000..288bceffc4
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2.out
@@ -0,0 +1,87 @@
+QA output created by zoned-qcow2
+
+=== Initial image setup ===
+
+Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib zoned_profile=zbc zone_size=67108864 zone_capacity=67108864 zone_nr_conv=0 max_append_sectors=512 max_active_zones=0 max_open_zones=0 size=805306368 lazy_refcounts=off refcount_bits=16
+
+=== Testing a qcow2 img with zoned format ===
+
+case 1: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
+start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
+start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
+start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
+start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
+start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
+start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
+
+
+(2) opening the first zone
+wps[0]: 0x0
+report after:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+wps[1]: 0x20000
+report after:
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:3, [type: 2]
+
+opening the last zone
+wps[11]: 0x160000
+report after:
+start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:3, [type: 2]
+
+
+(3) closing the first zone
+wps[0]: 0x0
+report after:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:4, [type: 2]
+
+closing the last zone
+zone close failed: Input/output error
+report after:
+start: 0x1f380000, len 0x20000, cap 0x20000, wptr 0x0, zcond:0, [type: 2]
+start: 0x1f3a0000, len 0x20000, cap 0x20000, wptr 0x0, zcond:0, [type: 2]
+
+
+(4) finishing the second zone
+wps[1]: 0x40000
+After finishing a zone:
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:14, [type: 2]
+
+
+(5) resetting the second zone
+wps[1]: 0x20000
+After resetting a zone:
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
+
+
+(6) append write
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:4, [type: 2]
+wps[0]: 0x18
+After zap done, the append sector is 0x0
+After appending the first zone firstly:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x18, zcond:2, [type: 2]
+wps[0]: 0x30
+After zap done, the append sector is 0x18
+After appending the first zone secondly:
+start: 0x0, len 0x20000, cap 0x20000, wptr 0x30, zcond:2, [type: 2]
+wps[1]: 0x20018
+After zap done, the append sector is 0x20000
+After appending the second zone firstly:
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20018, zcond:2, [type: 2]
+wps[1]: 0x20030
+After zap done, the append sector is 0x20018
+After appending the second zone secondly:
+start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20030, zcond:2, [type: 2]
+*** done
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 4/4] iotests: test the zoned format feature for qcow2 file
2023-06-05 10:41 ` [RFC 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
@ 2023-06-19 14:40 ` Stefan Hajnoczi
0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-06-19 14:40 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, dlemoal, dmitry.fomichev, hare, Markus Armbruster,
Kevin Wolf, qemu-block, Eric Blake, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 8060 bytes --]
On Mon, Jun 05, 2023 at 06:41:08PM +0800, Sam Li wrote:
> The zoned format feature can be tested by:
> $ tests/qemu-iotests/check zoned-qcow2
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
> tests/qemu-iotests/tests/zoned-qcow2 | 110 +++++++++++++++++++++++
> tests/qemu-iotests/tests/zoned-qcow2.out | 87 ++++++++++++++++++
> 2 files changed, 197 insertions(+)
> create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
> create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
>
> diff --git a/tests/qemu-iotests/tests/zoned-qcow2 b/tests/qemu-iotests/tests/zoned-qcow2
> new file mode 100755
> index 0000000000..6aa5ab3a03
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/zoned-qcow2
> @@ -0,0 +1,110 @@
> +#!/usr/bin/env bash
> +#
> +# Test zone management operations for qcow2 file.
> +#
> +
> +seq="$(basename $0)"
> +echo "QA output created by $seq"
> +status=1 # failure is the default!
> +
> +file_name="zbc.qcow2"
Please use $TEST_IMG_FILE instead of defining your own variable here.
(TEST_IMG_FILE is already defined in common.rc.)
> +_cleanup()
> +{
> + _cleanup_test_img
> + _rm_test_img "$file_name"
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ../common.rc
> +. ../common.filter
> +. ../common.qemu
> +
> +# This test only runs on Linux hosts with qcow2 image files.
Then you need to add:
_supported_fmt qcow2
> +_supported_proto file
> +_supported_os Linux
Is this test really Linux-specific?
> +
> +echo
> +echo "=== Initial image setup ==="
> +echo
> +
> +$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
> +-o zone_capacity=64M -o zone_nr_conv=0 -o max_append_sectors=512 \
> +-o max_open_zones=0 -o max_active_zones=0 -o zoned_profile=zbc
> +
> +IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
> +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> +
> +echo
> +echo "=== Testing a qcow2 img with zoned format ==="
> +echo
> +echo "case 1: if the operations work"
> +
> +echo "(1) report the first zone:"
> +$QEMU_IO $IMG -c "zrp 0 1"
> +echo
> +echo "report the first 10 zones"
> +$QEMU_IO $IMG -c "zrp 0 10"
> +echo
> +echo "report the last zone:"
> +$QEMU_IO $IMG -c "zrp 0x2C000000 2" # 0x2C000000 / 512 = 0x160000
> +echo
> +echo
> +echo "(2) opening the first zone"
> +$QEMU_IO $IMG -c "zo 0 0x4000000" # 0x4000000 / 512 = 0x20000
> +echo "report after:"
> +$QEMU_IO $IMG -c "zrp 0 1"
> +echo
> +echo "opening the second zone"
> +$QEMU_IO $IMG -c "zo 0x4000000 0x4000000"
> +echo "report after:"
> +$QEMU_IO $IMG -c "zrp 0x4000000 1"
> +echo
> +echo "opening the last zone"
> +$QEMU_IO $IMG -c "zo 0x2C000000 0x4000000"
> +echo "report after:"
> +$QEMU_IO $IMG -c "zrp 0x2C000000 2"
> +echo
> +echo
> +echo "(3) closing the first zone"
> +$QEMU_IO $IMG -c "zc 0 0x4000000"
> +echo "report after:"
> +$QEMU_IO $IMG -c "zrp 0 1"
> +echo
> +echo "closing the last zone"
> +$QEMU_IO $IMG -c "zc 0x3e70000000 0x4000000"
> +echo "report after:"
> +$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
> +echo
> +echo
> +echo "(4) finishing the second zone"
> +$QEMU_IO $IMG -c "zf 0x4000000 0x4000000"
> +echo "After finishing a zone:"
> +$QEMU_IO $IMG -c "zrp 0x4000000 1"
> +echo
> +echo
> +echo "(5) resetting the second zone"
> +$QEMU_IO $IMG -c "zrs 0x4000000 0x4000000"
> +echo "After resetting a zone:"
> +$QEMU_IO $IMG -c "zrp 0x4000000 1"
> +echo
> +echo
> +echo "(6) append write" # the physical block size of the device is 4096
> +$QEMU_IO $IMG -c "zrp 0 1"
> +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
> +echo "After appending the first zone firstly:"
> +$QEMU_IO $IMG -c "zrp 0 1"
> +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
> +echo "After appending the first zone secondly:"
> +$QEMU_IO $IMG -c "zrp 0 1"
> +$QEMU_IO $IMG -c "zap -p 0x4000000 0x1000 0x2000"
> +echo "After appending the second zone firstly:"
> +$QEMU_IO $IMG -c "zrp 0x4000000 1"
> +$QEMU_IO $IMG -c "zap -p 0x4000000 0x1000 0x2000"
> +echo "After appending the second zone secondly:"
> +$QEMU_IO $IMG -c "zrp 0x4000000 1"
> +
> +# success, all done
> +echo "*** done"
> +rm -f $seq.full
> +status=0
> diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out b/tests/qemu-iotests/tests/zoned-qcow2.out
> new file mode 100644
> index 0000000000..288bceffc4
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/zoned-qcow2.out
> @@ -0,0 +1,87 @@
> +QA output created by zoned-qcow2
> +
> +=== Initial image setup ===
> +
> +Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib zoned_profile=zbc zone_size=67108864 zone_capacity=67108864 zone_nr_conv=0 max_append_sectors=512 max_active_zones=0 max_open_zones=0 size=805306368 lazy_refcounts=off refcount_bits=16
> +
> +=== Testing a qcow2 img with zoned format ===
> +
> +case 1: if the operations work
> +(1) report the first zone:
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
> +
> +report the first 10 zones
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:1, [type: 2]
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
> +start: 0x40000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:1, [type: 2]
> +start: 0x60000, len 0x20000, cap 0x20000, wptr 0x60000, zcond:1, [type: 2]
> +start: 0x80000, len 0x20000, cap 0x20000, wptr 0x80000, zcond:1, [type: 2]
> +start: 0xa0000, len 0x20000, cap 0x20000, wptr 0xa0000, zcond:1, [type: 2]
> +start: 0xc0000, len 0x20000, cap 0x20000, wptr 0xc0000, zcond:1, [type: 2]
> +start: 0xe0000, len 0x20000, cap 0x20000, wptr 0xe0000, zcond:1, [type: 2]
> +start: 0x100000, len 0x20000, cap 0x20000, wptr 0x100000, zcond:1, [type: 2]
> +start: 0x120000, len 0x20000, cap 0x20000, wptr 0x120000, zcond:1, [type: 2]
> +
> +report the last zone:
> +start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:1, [type: 2]
> +
> +
> +(2) opening the first zone
> +wps[0]: 0x0
> +report after:
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:3, [type: 2]
> +
> +opening the second zone
> +wps[1]: 0x20000
> +report after:
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:3, [type: 2]
> +
> +opening the last zone
> +wps[11]: 0x160000
> +report after:
> +start: 0x160000, len 0x20000, cap 0x20000, wptr 0x160000, zcond:3, [type: 2]
> +
> +
> +(3) closing the first zone
> +wps[0]: 0x0
> +report after:
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:4, [type: 2]
> +
> +closing the last zone
> +zone close failed: Input/output error
> +report after:
> +start: 0x1f380000, len 0x20000, cap 0x20000, wptr 0x0, zcond:0, [type: 2]
> +start: 0x1f3a0000, len 0x20000, cap 0x20000, wptr 0x0, zcond:0, [type: 2]
> +
> +
> +(4) finishing the second zone
> +wps[1]: 0x40000
> +After finishing a zone:
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x40000, zcond:14, [type: 2]
> +
> +
> +(5) resetting the second zone
> +wps[1]: 0x20000
> +After resetting a zone:
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20000, zcond:1, [type: 2]
> +
> +
> +(6) append write
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x0, zcond:4, [type: 2]
> +wps[0]: 0x18
> +After zap done, the append sector is 0x0
> +After appending the first zone firstly:
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x18, zcond:2, [type: 2]
> +wps[0]: 0x30
> +After zap done, the append sector is 0x18
> +After appending the first zone secondly:
> +start: 0x0, len 0x20000, cap 0x20000, wptr 0x30, zcond:2, [type: 2]
> +wps[1]: 0x20018
> +After zap done, the append sector is 0x20000
> +After appending the second zone firstly:
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20018, zcond:2, [type: 2]
> +wps[1]: 0x20030
> +After zap done, the append sector is 0x20018
> +After appending the second zone secondly:
> +start: 0x20000, len 0x20000, cap 0x20000, wptr 0x20030, zcond:2, [type: 2]
> +*** done
> --
> 2.40.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread