qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Add support for zoned device
@ 2022-08-16  6:25 Sam Li
  2022-08-16  6:25 ` [PATCH v7 1/8] include: add zoned device structs Sam Li
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
that are larger than the LBA size. It can only allow sequential writes, which
reduces write amplification in SSD, leading to higher throughput and increased
capacity. More details about ZBDs can be found at:

https://zonedstorage.io/docs/introduction/zoned-storage

The zoned device support aims to let guests (virtual machines) access zoned
storage devices on the host (hypervisor) through a virtio-blk device. This
involves extending QEMU's block layer and virtio-blk emulation code.  In its
current status, the virtio-blk device is not aware of ZBDs but the guest sees
host-managed drives as regular drive that will runs correctly under the most
common write workloads.

This patch series extend the block layer APIs with the minimum set of zoned
commands that are necessary to support zoned devices. The commands are - Report
Zones, four zone operations and Zone Append (developing).

It can be tested on a null_blk device using qemu-io or qemu-iotests. For
example, the command line for zone report using qemu-io is:
$ path/to/qemu-io --image-opts driver=zoned_host_device,filename=/dev/nullb0
-c "zrp offset nr_zones"

v7:
- address review comments
  * modify sysfs attribute helper funcations
  * move the input validation and error checking into raw_co_zone_* function
  * fix checks in config

v6:
- drop virtio-blk emulation changes
- address Stefan's review comments
  * fix CONFIG_BLKZONED configs in related functions
  * replace reading fd by g_file_get_contents() in get_sysfs_str_val()
  * rewrite documentation for zoned storage

v5:
- add zoned storage emulation to virtio-blk device
- add documentation for zoned storage
- address review comments
  * fix qemu-iotests
  * fix check to block layer
  * modify interfaces of sysfs helper functions
  * rename zoned device structs according to QEMU styles
  * reorder patches

v4:
- add virtio-blk headers for zoned device
- add configurations for zoned host device
- add zone operations for raw-format
- address review comments
  * fix memory leak bug in zone_report
  * add checks to block layers
  * fix qemu-iotests format
  * fix sysfs helper functions

v3:
- add helper functions to get sysfs attributes
- address review comments
  * fix zone report bugs
  * fix the qemu-io code path
  * use thread pool to avoid blocking ioctl() calls

v2:
- add qemu-io sub-commands
- address review comments
  * modify interfaces of APIs

v1:
- add block layer APIs resembling Linux ZoneBlockDevice ioctls

Sam Li (8):
  include: add zoned device structs
  file-posix: introduce get_sysfs_str_val for device zoned model
  file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  raw-format: add zone operations to pass through requests
  config: add check to block layer
  qemu-iotests: test new zone operations
  docs/zoned-storage: add zoned device documentation

 block.c                                |  14 +
 block/block-backend.c                  |  50 +++
 block/file-posix.c                     | 449 +++++++++++++++++++++++--
 block/io.c                             |  41 +++
 block/raw-format.c                     |  14 +
 docs/devel/zoned-storage.rst           |  41 +++
 docs/system/qemu-block-drivers.rst.inc |   6 +
 include/block/block-common.h           |  44 ++-
 include/block/block-io.h               |  13 +
 include/block/block_int-common.h       |  30 +-
 include/block/raw-aio.h                |   6 +-
 include/sysemu/block-backend-io.h      |   6 +
 meson.build                            |   1 +
 qapi/block-core.json                   |   8 +-
 qemu-io-cmds.c                         | 143 ++++++++
 tests/qemu-iotests/tests/zoned.out     |  53 +++
 tests/qemu-iotests/tests/zoned.sh      |  86 +++++
 17 files changed, 963 insertions(+), 42 deletions(-)
 create mode 100644 docs/devel/zoned-storage.rst
 create mode 100644 tests/qemu-iotests/tests/zoned.out
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

-- 
2.37.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 1/8] include: add zoned device structs
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16 17:27   ` Damien Le Moal
  2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..36bd0e480e 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildClass BdrvChildClass;
 
+typedef enum BlockZoneOp {
+    BLK_ZO_OPEN,
+    BLK_ZO_CLOSE,
+    BLK_ZO_FINISH,
+    BLK_ZO_RESET,
+} BlockZoneOp;
+
+typedef enum BlockZoneModel {
+    BLK_Z_NONE = 0x0, /* Regular block device */
+    BLK_Z_HM = 0x1, /* Host-managed zoned block device */
+    BLK_Z_HA = 0x2, /* Host-aware zoned block device */
+} BlockZoneModel;
+
+typedef enum BlockZoneCondition {
+    BLK_ZS_NOT_WP = 0x0,
+    BLK_ZS_EMPTY = 0x1,
+    BLK_ZS_IOPEN = 0x2,
+    BLK_ZS_EOPEN = 0x3,
+    BLK_ZS_CLOSED = 0x4,
+    BLK_ZS_RDONLY = 0xD,
+    BLK_ZS_FULL = 0xE,
+    BLK_ZS_OFFLINE = 0xF,
+} BlockZoneCondition;
+
+typedef enum BlockZoneType {
+    BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
+    BLK_ZT_SWR = 0x2, /* Sequential writes required */
+    BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
+} BlockZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provides information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+    uint64_t start;
+    uint64_t length;
+    uint64_t cap;
+    uint64_t wp;
+    BlockZoneType type;
+    BlockZoneCondition cond;
+} BlockZoneDescriptor;
+
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
     int cluster_size;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
  2022-08-16  6:25 ` [PATCH v7 1/8] include: add zoned device structs Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16 16:11   ` Sam Li
                     ` (2 more replies)
  2022-08-16  6:25 ` [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute Sam Li
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Use sysfs attribute files to get the string value of device
zoned model. Then get_sysfs_zoned_model can convert it to
BlockZoneModel type in QEMU.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/file-posix.c               | 93 ++++++++++++++++++--------------
 include/block/block_int-common.h |  3 ++
 2 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 48cd096624..c07ac4c697 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1210,66 +1210,71 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
 #endif
 }
 
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Convert the zoned attribute file in sysfs to internal value.
+ */
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
+                             char **val) {
 #ifdef CONFIG_LINUX
-    char buf[32];
-    const char *end;
-    char *sysfspath = NULL;
+    g_autofree char *sysfspath = NULL;
     int ret;
-    int sysfd = -1;
-    long max_segments;
-
-    if (S_ISCHR(st->st_mode)) {
-        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
-            return ret;
-        }
-        return -ENOTSUP;
-    }
+    size_t len;
 
     if (!S_ISBLK(st->st_mode)) {
         return -ENOTSUP;
     }
 
-    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-                                major(st->st_rdev), minor(st->st_rdev));
-    sysfd = open(sysfspath, O_RDONLY);
-    if (sysfd == -1) {
-        ret = -errno;
-        goto out;
+    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+                                major(st->st_rdev), minor(st->st_rdev),
+                                attribute);
+    ret = g_file_get_contents(sysfspath, val, &len, NULL);
+    if (ret == -1) {
+        return -ENOENT;
     }
-    do {
-        ret = read(sysfd, buf, sizeof(buf) - 1);
-    } while (ret == -1 && errno == EINTR);
+    return ret;
+#else
+    return -ENOTSUP;
+#endif
+}
+
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
+    g_autofree char *val = NULL;
+    int ret;
+
+    ret = get_sysfs_str_val(st, "zoned", &val);
     if (ret < 0) {
-        ret = -errno;
-        goto out;
-    } else if (ret == 0) {
-        ret = -EIO;
-        goto out;
+        return ret;
     }
-    buf[ret] = 0;
-    /* The file is ended with '\n', pass 'end' to accept that. */
-    ret = qemu_strtol(buf, &end, 10, &max_segments);
-    if (ret == 0 && end && *end == '\n') {
-        ret = max_segments;
+
+    if (strcmp(val, "host-managed") == 0) {
+        *zoned = BLK_Z_HM;
+    } else if (strcmp(val, "host-aware") == 0) {
+        *zoned = BLK_Z_HA;
+    } else if (strcmp(val, "none") == 0) {
+        *zoned = BLK_Z_NONE;
+    } else {
+        return -ENOTSUP;
     }
+    return 0;
+}
 
-out:
-    if (sysfd != -1) {
-        close(sysfd);
+static int hdev_get_max_segments(int fd, struct stat *st) {
+    int ret;
+    if (S_ISCHR(st->st_mode)) {
+        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
+            return ret;
+        }
+        return -ENOTSUP;
     }
-    g_free(sysfspath);
-    return ret;
-#else
-    return -ENOTSUP;
-#endif
+    return get_sysfs_long_val(st, "max_segments");
 }
 
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     struct stat st;
+    int ret;
+    BlockZoneModel zoned;
 
     s->needs_alignment = raw_needs_alignment(bs);
     raw_probe_alignment(bs, s->fd, errp);
@@ -1307,6 +1312,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
             bs->bl.max_hw_iov = ret;
         }
     }
+
+    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
+    if (ret < 0) {
+        zoned = BLK_Z_NONE;
+    }
+    bs->bl.zoned = zoned;
 }
 
 static int check_for_dasd(int fd)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..7f7863cc9e 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -825,6 +825,9 @@ typedef struct BlockLimits {
 
     /* maximum number of iovec elements */
     int max_iov;
+
+    /* device zone model */
+    BlockZoneModel zoned;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
  2022-08-16  6:25 ` [PATCH v7 1/8] include: add zoned device structs Sam Li
  2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16 16:13   ` Sam Li
  2022-08-16 17:35   ` Damien Le Moal
  2022-08-16  6:25 ` [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Use sysfs attribute files to get the long value of zoned device
information.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index c07ac4c697..727389488c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
     return 0;
 }
 
+/*
+ * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
+ * max_open_zones, max_active_zones) through sysfs attribute files.
+ */
+static long get_sysfs_long_val(struct stat *st, const char *attribute) {
+#ifdef CONFIG_LINUX
+    g_autofree char *str = NULL;
+    const char *end;
+    long val;
+    int ret;
+
+    ret = get_sysfs_str_val(st, attribute, &str);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* The file is ended with '\n', pass 'end' to accept that. */
+    ret = qemu_strtol(str, &end, 10, &val);
+    if (ret == 0 && end && *end == '\n') {
+        ret = val;
+    }
+    return ret;
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static int hdev_get_max_segments(int fd, struct stat *st) {
     int ret;
     if (S_ISCHR(st->st_mode)) {
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
                   ` (2 preceding siblings ...)
  2022-08-16  6:25 ` [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16 17:50   ` Damien Le Moal
  2022-08-23  0:49   ` Stefan Hajnoczi
  2022-08-16  6:25 ` [PATCH v7 5/8] raw-format: add zone operations to pass through requests Sam Li
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

By adding zone management operations in BlockDriver, storage controller
emulation can use the new block layer APIs including Report Zone and
four zone management operations (open, close, finish, reset).

Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
zone_close(zc), zone_reset(zrs), zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/block-backend.c             |  50 +++++
 block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
 block/io.c                        |  41 ++++
 include/block/block-common.h      |   1 -
 include/block/block-io.h          |  13 ++
 include/block/block_int-common.h  |  22 +-
 include/block/raw-aio.h           |   6 +-
 include/sysemu/block-backend-io.h |   6 +
 meson.build                       |   1 +
 qapi/block-core.json              |   8 +-
 qemu-io-cmds.c                    | 143 +++++++++++++
 11 files changed, 625 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d4a5df2ac2..fc639b0cd7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
     return ret;
 }
 
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+                                    unsigned int *nr_zones,
+                                    BlockZoneDescriptor *zones)
+{
+    int ret;
+    IO_CODE();
+
+    blk_inc_in_flight(blk); /* increase before waiting */
+    blk_wait_while_drained(blk);
+    if (!blk_is_available(blk)) {
+        blk_dec_in_flight(blk);
+        return -ENOMEDIUM;
+    }
+    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
+    blk_dec_in_flight(blk);
+    return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * offset is the starting zone specified as a sector offset.
+ * len is the maximum number of sectors the command should operate on.
+ */
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+        int64_t offset, int64_t len)
+{
+    int ret;
+    IO_CODE();
+
+    ret = blk_check_byte_request(blk, offset, len);
+    if (ret < 0) {
+        return ret;
+    }
+    blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
+    if (!blk_is_available(blk)) {
+        blk_dec_in_flight(blk);
+        return -ENOMEDIUM;
+    }
+    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
+    blk_dec_in_flight(blk);
+    return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
     BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 727389488c..29f67082d9 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -67,6 +67,9 @@
 #include <sys/param.h>
 #include <sys/syscall.h>
 #include <sys/vfs.h>
+#if defined(CONFIG_BLKZONED)
+#include <linux/blkzoned.h>
+#endif
 #include <linux/cdrom.h>
 #include <linux/fd.h>
 #include <linux/fs.h>
@@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
             PreallocMode prealloc;
             Error **errp;
         } truncate;
+        struct {
+            unsigned int *nr_zones;
+            BlockZoneDescriptor *zones;
+        } zone_report;
+        struct {
+            unsigned long ioctl_op;
+        } zone_mgmt;
     };
 } RawPosixAIOData;
 
@@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 #endif
 
     if (bs->sg || S_ISBLK(st.st_mode)) {
-        int ret = hdev_get_max_hw_transfer(s->fd, &st);
+        ret = hdev_get_max_hw_transfer(s->fd, &st);
 
         if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
             bs->bl.max_hw_transfer = ret;
@@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
         }
     }
 
-    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
+    ret = get_sysfs_zoned_model(&st, &zoned);
     if (ret < 0) {
         zoned = BLK_Z_NONE;
     }
     bs->bl.zoned = zoned;
+    if (zoned != BLK_Z_NONE) {
+        ret = get_sysfs_long_val(&st, "chunk_sectors");
+        if (ret > 0) {
+            bs->bl.zone_sectors = ret;
+        }
+
+        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
+        if (ret > 0) {
+            bs->bl.zone_append_max_bytes = ret;
+        }
+
+        ret = get_sysfs_long_val(&st, "max_open_zones");
+        if (ret > 0) {
+            bs->bl.max_open_zones = ret;
+        }
+
+        ret = get_sysfs_long_val(&st, "max_active_zones");
+        if (ret > 0) {
+            bs->bl.max_active_zones = ret;
+        }
+    }
 }
 
 static int check_for_dasd(int fd)
@@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
 }
 #endif
 
+/*
+ * parse_zone - Fill a zone descriptor
+ */
+#if defined(CONFIG_BLKZONED)
+static inline void parse_zone(struct BlockZoneDescriptor *zone,
+                              struct blk_zone *blkz) {
+    zone->start = blkz->start;
+    zone->length = blkz->len;
+    zone->cap = blkz->capacity;
+    zone->wp = blkz->wp;
+
+    switch (blkz->type) {
+    case BLK_ZONE_TYPE_SEQWRITE_REQ:
+        zone->type = BLK_ZT_SWR;
+        break;
+    case BLK_ZONE_TYPE_SEQWRITE_PREF:
+        zone->type = BLK_ZT_SWP;
+        break;
+    case BLK_ZONE_TYPE_CONVENTIONAL:
+        zone->type = BLK_ZT_CONV;
+        break;
+    default:
+        error_report("Invalid zone type: 0x%x", blkz->type);
+    }
+
+    switch (blkz->cond) {
+    case BLK_ZONE_COND_NOT_WP:
+        zone->cond = BLK_ZS_NOT_WP;
+        break;
+    case BLK_ZONE_COND_EMPTY:
+        zone->cond = BLK_ZS_EMPTY;
+        break;
+    case BLK_ZONE_COND_IMP_OPEN:
+        zone->cond =BLK_ZS_IOPEN;
+        break;
+    case BLK_ZONE_COND_EXP_OPEN:
+        zone->cond = BLK_ZS_EOPEN;
+        break;
+    case BLK_ZONE_COND_CLOSED:
+        zone->cond = BLK_ZS_CLOSED;
+        break;
+    case BLK_ZONE_COND_READONLY:
+        zone->cond = BLK_ZS_RDONLY;
+        break;
+    case BLK_ZONE_COND_FULL:
+        zone->cond = BLK_ZS_FULL;
+        break;
+    case BLK_ZONE_COND_OFFLINE:
+        zone->cond = BLK_ZS_OFFLINE;
+        break;
+    default:
+        error_report("Invalid zone condition 0x%x", blkz->cond);
+    }
+}
+#endif
+
+static int handle_aiocb_zone_report(void *opaque) {
+#if defined(CONFIG_BLKZONED)
+    RawPosixAIOData *aiocb = opaque;
+    int fd = aiocb->aio_fildes;
+    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
+    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
+    int64_t sector = aiocb->aio_offset;
+
+    struct blk_zone *blkz;
+    int64_t rep_size;
+    unsigned int nrz;
+    int ret, n = 0, i = 0;
+
+    nrz = *nr_zones;
+    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+    g_autofree struct blk_zone_report *rep = NULL;
+    rep = g_malloc(rep_size);
+
+    blkz = (struct blk_zone *)(rep + 1);
+    while (n < nrz) {
+        memset(rep, 0, rep_size);
+        rep->sector = sector;
+        rep->nr_zones = nrz - n;
+
+        ret = ioctl(fd, BLKREPORTZONE, rep);
+        if (ret != 0) {
+            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+                         fd, sector, errno);
+            return -errno;
+        }
+
+        if (!rep->nr_zones) {
+            break;
+        }
+
+        for (i = 0; i < rep->nr_zones; i++, n++) {
+            parse_zone(&zones[n], &blkz[i]);
+            /* The next report should start after the last zone reported */
+            sector = blkz[i].start + blkz[i].len;
+        }
+    }
+
+    *nr_zones = n;
+    return 0;
+#else
+    return -ENOTSUP;
+#endif
+}
+
+static int handle_aiocb_zone_mgmt(void *opaque) {
+#if defined(CONFIG_BLKZONED)
+    RawPosixAIOData *aiocb = opaque;
+    int fd = aiocb->aio_fildes;
+    int64_t sector = aiocb->aio_offset;
+    int64_t nr_sectors = aiocb->aio_nbytes;
+    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;
+    struct blk_zone_range range;
+    int ret;
+
+    /* Execute the operation */
+    range.sector = sector;
+    range.nr_sectors = nr_sectors;
+    do {
+        ret = ioctl(fd, ioctl_op, &range);
+    } while (ret != 0 && errno == EINTR);
+
+    return ret;
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static int handle_aiocb_copy_range(void *opaque)
 {
     RawPosixAIOData *aiocb = opaque;
@@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
     }
 }
 
+/*
+ * zone report - Get a zone block device's information in the form
+ * of an array of zone descriptors.
+ *
+ * @param bs: passing zone block device file descriptor
+ * @param zones: an array of zone descriptors to hold zone
+ * information on reply
+ * @param offset: offset can be any byte within the zone size.
+ * @param len: (not sure yet.
+ * @return 0 on success, -1 on failure
+ */
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                           unsigned int *nr_zones,
+                                           BlockZoneDescriptor *zones) {
+#if defined(CONFIG_BLKZONED)
+    BDRVRawState *s = bs->opaque;
+    RawPosixAIOData acb;
+
+    acb = (RawPosixAIOData) {
+        .bs         = bs,
+        .aio_fildes = s->fd,
+        .aio_type   = QEMU_AIO_ZONE_REPORT,
+        /* zoned block devices use 512-byte sectors */
+        .aio_offset = offset / 512,
+        .zone_report    = {
+                .nr_zones       = nr_zones,
+                .zones          = zones,
+        },
+    };
+
+    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
+#else
+    return -ENOTSUP;
+#endif
+}
+
+/*
+ * zone management operations - Execute an operation on a zone
+ */
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+        int64_t offset, int64_t len) {
+#if defined(CONFIG_BLKZONED)
+    BDRVRawState *s = bs->opaque;
+    RawPosixAIOData acb;
+    int64_t zone_sector, zone_sector_mask;
+    const char *ioctl_name;
+    unsigned long ioctl_op;
+    int ret;
+
+    struct stat st;
+    if (fstat(s->fd, &st) < 0) {
+        ret = -errno;
+        return ret;
+    }
+    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
+    if (zone_sector < 0) {
+        error_report("invalid zone sector size %" PRId64 "", zone_sector);
+        return -EINVAL;
+    }
+
+    zone_sector_mask = zone_sector - 1;
+    if (offset & zone_sector_mask) {
+        error_report("sector offset %" PRId64 " is not aligned to zone size "
+                     "%" PRId64 "", offset, zone_sector);
+        return -EINVAL;
+    }
+
+    if (len & zone_sector_mask) {
+        error_report("number of sectors %" PRId64 " is not aligned to zone size"
+                      " %" PRId64 "", len, zone_sector);
+        return -EINVAL;
+    }
+
+    switch (op) {
+    case BLK_ZO_OPEN:
+        ioctl_name = "BLKOPENZONE";
+        ioctl_op = BLKOPENZONE;
+        break;
+    case BLK_ZO_CLOSE:
+        ioctl_name = "BLKCLOSEZONE";
+        ioctl_op = BLKCLOSEZONE;
+        break;
+    case BLK_ZO_FINISH:
+        ioctl_name = "BLKFINISHZONE";
+        ioctl_op = BLKFINISHZONE;
+        break;
+    case BLK_ZO_RESET:
+        ioctl_name = "BLKRESETZONE";
+        ioctl_op = BLKRESETZONE;
+        break;
+    default:
+        error_report("Invalid zone operation 0x%x", op);
+        return -EINVAL;
+    }
+
+    acb = (RawPosixAIOData) {
+        .bs             = bs,
+        .aio_fildes     = s->fd,
+        .aio_type       = QEMU_AIO_ZONE_MGMT,
+        .aio_offset     = offset,
+        .aio_nbytes     = len,
+        .zone_mgmt  = {
+                .ioctl_op = ioctl_op,
+        },
+    };
+
+    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
+    if (ret != 0) {
+        error_report("ioctl %s failed %d", ioctl_name, errno);
+        return -errno;
+    }
+
+    return ret;
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static coroutine_fn int
 raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
                 bool blkdev)
@@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
     bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
 }
 
+#if defined(CONFIG_BLKZONED)
+static void zoned_host_device_parse_filename(const char *filename, QDict *options,
+                                Error **errp)
+{
+    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
+}
+#endif
+
 static bool hdev_is_sg(BlockDriverState *bs)
 {
 
@@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
 #endif
 };
 
+#if defined(CONFIG_BLKZONED)
+static BlockDriver bdrv_zoned_host_device = {
+        .format_name = "zoned_host_device",
+        .protocol_name = "zoned_host_device",
+        .instance_size = sizeof(BDRVRawState),
+        .bdrv_needs_filename = true,
+        .bdrv_probe_device  = hdev_probe_device,
+        .bdrv_parse_filename = zoned_host_device_parse_filename,
+        .bdrv_file_open     = hdev_open,
+        .bdrv_close         = raw_close,
+        .bdrv_reopen_prepare = raw_reopen_prepare,
+        .bdrv_reopen_commit  = raw_reopen_commit,
+        .bdrv_reopen_abort   = raw_reopen_abort,
+        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
+        .create_opts         = &bdrv_create_opts_simple,
+        .mutable_opts        = mutable_opts,
+        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
+        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
+
+        .bdrv_co_preadv         = raw_co_preadv,
+        .bdrv_co_pwritev        = raw_co_pwritev,
+        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
+        .bdrv_co_pdiscard       = hdev_co_pdiscard,
+        .bdrv_co_copy_range_from = raw_co_copy_range_from,
+        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
+        .bdrv_refresh_limits = raw_refresh_limits,
+        .bdrv_io_plug = raw_aio_plug,
+        .bdrv_io_unplug = raw_aio_unplug,
+        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
+
+        .bdrv_co_truncate       = raw_co_truncate,
+        .bdrv_getlength = raw_getlength,
+        .bdrv_get_info = raw_get_info,
+        .bdrv_get_allocated_file_size
+                            = raw_get_allocated_file_size,
+        .bdrv_get_specific_stats = hdev_get_specific_stats,
+        .bdrv_check_perm = raw_check_perm,
+        .bdrv_set_perm   = raw_set_perm,
+        .bdrv_abort_perm_update = raw_abort_perm_update,
+        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
+        .bdrv_probe_geometry = hdev_probe_geometry,
+        .bdrv_co_ioctl = hdev_co_ioctl,
+
+        /* zone management operations */
+        .bdrv_co_zone_report = raw_co_zone_report,
+        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
+};
+#endif
+
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 static void cdrom_parse_filename(const char *filename, QDict *options,
                                  Error **errp)
@@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
     bdrv_register(&bdrv_file);
 #if defined(HAVE_HOST_BLOCK_DEVICE)
     bdrv_register(&bdrv_host_device);
+#if defined(CONFIG_BLKZONED)
+    bdrv_register(&bdrv_zoned_host_device);
+#endif
 #ifdef __linux__
     bdrv_register(&bdrv_host_cdrom);
 #endif
diff --git a/block/io.c b/block/io.c
index 0a8cbefe86..de9ec1d740 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3198,6 +3198,47 @@ out:
     return co.ret;
 }
 
+int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
+                        unsigned int *nr_zones,
+                        BlockZoneDescriptor *zones)
+{
+    BlockDriver *drv = bs->drv;
+    CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+    };
+    IO_CODE();
+
+    bdrv_inc_in_flight(bs);
+    if (!drv || !drv->bdrv_co_zone_report) {
+        co.ret = -ENOTSUP;
+        goto out;
+    }
+    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
+out:
+    bdrv_dec_in_flight(bs);
+    return co.ret;
+}
+
+int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+        int64_t offset, int64_t len)
+{
+    BlockDriver *drv = bs->drv;
+    CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+    };
+    IO_CODE();
+
+    bdrv_inc_in_flight(bs);
+    if (!drv || !drv->bdrv_co_zone_mgmt) {
+        co.ret = -ENOTSUP;
+        goto out;
+    }
+    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
+out:
+    bdrv_dec_in_flight(bs);
+    return co.ret;
+}
+
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
     IO_CODE();
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 36bd0e480e..5102fa6858 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -23,7 +23,6 @@
  */
 #ifndef BLOCK_COMMON_H
 #define BLOCK_COMMON_H
-
 #include "block/aio.h"
 #include "block/aio-wait.h"
 #include "qemu/iov.h"
diff --git a/include/block/block-io.h b/include/block/block-io.h
index fd25ffa9be..55ad261e16 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 /* Ensure contents are flushed to disk.  */
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 
+/* Report zone information of zone block device. */
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                     unsigned int *nr_zones,
+                                     BlockZoneDescriptor *zones);
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+                                   int64_t offset, int64_t len);
+
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
@@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int generated_co_wrapper
 bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 
+int generated_co_wrapper
+blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
+                BlockZoneDescriptor *zones);
+int generated_co_wrapper
+blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
+
 /**
  * bdrv_parent_drained_begin_single:
  *
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 7f7863cc9e..de44c7b6f4 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
     struct BdrvTrackedRequest *waiting_for;
 } BdrvTrackedRequest;
 
-
 struct BlockDriver {
     /*
      * These fields are initialized when this object is created,
@@ -691,6 +690,12 @@ struct BlockDriver {
                                           QEMUIOVector *qiov,
                                           int64_t pos);
 
+    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
+            int64_t offset, unsigned int *nr_zones,
+            BlockZoneDescriptor *zones);
+    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
+            int64_t offset, int64_t len);
+
     /* removable device specific */
     bool (*bdrv_is_inserted)(BlockDriverState *bs);
     void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
@@ -828,6 +833,21 @@ typedef struct BlockLimits {
 
     /* device zone model */
     BlockZoneModel zoned;
+
+    /* zone size expressed in 512-byte sectors */
+    uint32_t zone_sectors;
+
+    /* total number of zones */
+    unsigned int nr_zones;
+
+    /* maximum size in bytes of a zone append write operation */
+    int64_t zone_append_max_bytes;
+
+    /* maximum number of open zones */
+    int64_t max_open_zones;
+
+    /* maximum number of active zones */
+    int64_t max_active_zones;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 21fc10c4c9..3d26929cdd 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -29,6 +29,8 @@
 #define QEMU_AIO_WRITE_ZEROES 0x0020
 #define QEMU_AIO_COPY_RANGE   0x0040
 #define QEMU_AIO_TRUNCATE     0x0080
+#define QEMU_AIO_ZONE_REPORT  0x0100
+#define QEMU_AIO_ZONE_MGMT    0x0200
 #define QEMU_AIO_TYPE_MASK \
         (QEMU_AIO_READ | \
          QEMU_AIO_WRITE | \
@@ -37,7 +39,9 @@
          QEMU_AIO_DISCARD | \
          QEMU_AIO_WRITE_ZEROES | \
          QEMU_AIO_COPY_RANGE | \
-         QEMU_AIO_TRUNCATE)
+         QEMU_AIO_TRUNCATE  | \
+         QEMU_AIO_ZONE_REPORT | \
+         QEMU_AIO_ZONE_MGMT)
 
 /* AIO flags */
 #define QEMU_AIO_MISALIGNED   0x1000
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index 50f5aa2e07..6e7df1d93b 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
 int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                                       int64_t bytes, BdrvRequestFlags flags);
 
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+                                    unsigned int *nr_zones,
+                                    BlockZoneDescriptor *zones);
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+                                  int64_t offset, int64_t len);
+
 int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
                                       int64_t bytes);
 int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
diff --git a/meson.build b/meson.build
index 294e9a8f32..c3219b0e87 100644
--- a/meson.build
+++ b/meson.build
@@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
 # has_header
 config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
 config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
 config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
 config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
 config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2173e7734a..c6bbb7a037 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2942,6 +2942,7 @@
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
 # @snapshot-access: Since 7.0
+# @zoned_host_device: Since 7.2
 #
 # Since: 2.9
 ##
@@ -2955,7 +2956,8 @@
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
             'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
-            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
+            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -4329,7 +4331,9 @@
       'vhdx':       'BlockdevOptionsGenericFormat',
       'vmdk':       'BlockdevOptionsGenericCOWFormat',
       'vpc':        'BlockdevOptionsGenericFormat',
-      'vvfat':      'BlockdevOptionsVVFAT'
+      'vvfat':      'BlockdevOptionsVVFAT',
+      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
+                             'if': 'CONFIG_BLKZONED' }
   } }
 
 ##
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 952dc940f1..687c3a624c 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
     .oneline    = "flush all in-core file state to disk",
 };
 
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset;
+    unsigned int nr_zones;
+
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    nr_zones = cvtnum(argv[optind]);
+
+    g_autofree BlockZoneDescriptor *zones = NULL;
+    zones = g_new(BlockZoneDescriptor, nr_zones);
+    ret = blk_zone_report(blk, offset, &nr_zones, zones);
+    if (ret < 0) {
+        printf("zone report failed: %s\n", strerror(-ret));
+    } else {
+        for (int i = 0; i < nr_zones; ++i) {
+            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
+                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
+                   "zcond:%u, [type: %u]\n",
+                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
+                   zones[i].cond, zones[i].type);
+        }
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_report_cmd = {
+        .name = "zone_report",
+        .altname = "zrp",
+        .cfunc = zone_report_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset number",
+        .oneline = "report zone information",
+};
+
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
+    if (ret < 0) {
+        printf("zone open failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_open_cmd = {
+        .name = "zone_open",
+        .altname = "zo",
+        .cfunc = zone_open_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "explicit open a range of zones in zone block device",
+};
+
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
+    if (ret < 0) {
+        printf("zone close failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_close_cmd = {
+        .name = "zone_close",
+        .altname = "zc",
+        .cfunc = zone_close_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "close a range of zones in zone block device",
+};
+
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
+    if (ret < 0) {
+        printf("zone finish failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_finish_cmd = {
+        .name = "zone_finish",
+        .altname = "zf",
+        .cfunc = zone_finish_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "finish a range of zones in zone block device",
+};
+
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
+    if (ret < 0) {
+        printf("zone reset failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_reset_cmd = {
+        .name = "zone_reset",
+        .altname = "zrs",
+        .cfunc = zone_reset_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "reset a zone write pointer in zone block device",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
     .name       = "truncate",
@@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
     qemuio_add_command(&aio_write_cmd);
     qemuio_add_command(&aio_flush_cmd);
     qemuio_add_command(&flush_cmd);
+    qemuio_add_command(&zone_report_cmd);
+    qemuio_add_command(&zone_open_cmd);
+    qemuio_add_command(&zone_close_cmd);
+    qemuio_add_command(&zone_finish_cmd);
+    qemuio_add_command(&zone_reset_cmd);
     qemuio_add_command(&truncate_cmd);
     qemuio_add_command(&length_cmd);
     qemuio_add_command(&info_cmd);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 5/8] raw-format: add zone operations to pass through requests
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
                   ` (3 preceding siblings ...)
  2022-08-16  6:25 ` [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16  6:25 ` [PATCH v7 6/8] config: add check to block layer Sam Li
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/raw-format.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 69fd650eaf..6b20bd22ef 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                           unsigned int *nr_zones,
+                                           BlockZoneDescriptor *zones) {
+    return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+                                         int64_t offset, int64_t len) {
+    return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
 static int64_t raw_getlength(BlockDriverState *bs)
 {
     int64_t len;
@@ -614,6 +625,8 @@ BlockDriver bdrv_raw = {
     .bdrv_co_pwritev      = &raw_co_pwritev,
     .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
     .bdrv_co_pdiscard     = &raw_co_pdiscard,
+    .bdrv_co_zone_report  = &raw_co_zone_report,
+    .bdrv_co_zone_mgmt  = &raw_co_zone_mgmt,
     .bdrv_co_block_status = &raw_co_block_status,
     .bdrv_co_copy_range_from = &raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = &raw_co_copy_range_to,
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 6/8] config: add check to block layer
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
                   ` (4 preceding siblings ...)
  2022-08-16  6:25 ` [PATCH v7 5/8] raw-format: add zone operations to pass through requests Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-23  0:54   ` Stefan Hajnoczi
  2022-08-16  6:25 ` [PATCH v7 7/8] qemu-iotests: test new zone operations Sam Li
  2022-08-16  6:25 ` [PATCH v7 8/8] docs/zoned-storage: add zoned device documentation Sam Li
  7 siblings, 1 reply; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                          | 14 ++++++++++++++
 block/raw-format.c               |  1 +
 include/block/block_int-common.h |  5 +++++
 3 files changed, 20 insertions(+)

diff --git a/block.c b/block.c
index bc85f46eed..affe6c597e 100644
--- a/block.c
+++ b/block.c
@@ -7947,6 +7947,20 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
         return;
     }
 
+    /*
+     * Non-zoned block drivers do not follow zoned storage constraints
+     * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
+     * drivers in a graph.
+     */
+    if (!parent_bs->drv->supports_zoned_children &&
+        child_bs->bl.zoned != BLK_Z_HM) {
+        error_setg(errp, "Cannot add a %s child to a %s parent",
+                   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
+                   parent_bs->drv->supports_zoned_children ?
+                   "support zoned children" : "not support zoned children");
+        return;
+    }
+
     if (!QLIST_EMPTY(&child_bs->parents)) {
         error_setg(errp, "The node %s already has a parent",
                    child_bs->node_name);
diff --git a/block/raw-format.c b/block/raw-format.c
index 6b20bd22ef..9441536819 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
 BlockDriver bdrv_raw = {
     .format_name          = "raw",
     .instance_size        = sizeof(BDRVRawState),
+    .supports_zoned_children = true,
     .bdrv_probe           = &raw_probe,
     .bdrv_reopen_prepare  = &raw_reopen_prepare,
     .bdrv_reopen_commit   = &raw_reopen_commit,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index de44c7b6f4..4c44592b59 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -126,6 +126,11 @@ struct BlockDriver {
      */
     bool is_format;
 
+    /*
+     * Set to true if the BlockDriver supports zoned children.
+     */
+    bool supports_zoned_children;
+
     /*
      * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
      * this field set to true, except ones that are defined only by their
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 7/8] qemu-iotests: test new zone operations
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
                   ` (5 preceding siblings ...)
  2022-08-16  6:25 ` [PATCH v7 6/8] config: add check to block layer Sam Li
@ 2022-08-16  6:25 ` Sam Li
  2022-08-16  6:25 ` [PATCH v7 8/8] docs/zoned-storage: add zoned device documentation Sam Li
  7 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

We have added new block layer APIs of zoned block devices. Test it with:
Create a null_blk device, run each zone operation on it and see
whether reporting right zone information.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
 tests/qemu-iotests/tests/zoned.sh  | 86 ++++++++++++++++++++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100644 tests/qemu-iotests/tests/zoned.out
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
new file mode 100644
index 0000000000..d09be2ffcd
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -0,0 +1,53 @@
+QA output created by zoned.sh
+Testing a null_blk device:
+Simple cases: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x80000, cap 0x80000,wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x80000, cap 0x80000,wptr 0x0, zcond:1, [type: 2]
+start: 0x80000, len 0x80000, cap 0x80000,wptr 0x80000, zcond:1, [type: 2]
+start: 0x100000, len 0x80000, cap 0x80000,wptr 0x100000, zcond:1, [type: 2]
+start: 0x180000, len 0x80000, cap 0x80000,wptr 0x180000, zcond:1, [type: 2]
+start: 0x200000, len 0x80000, cap 0x80000,wptr 0x200000, zcond:1, [type: 2]
+start: 0x280000, len 0x80000, cap 0x80000,wptr 0x280000, zcond:1, [type: 2]
+start: 0x300000, len 0x80000, cap 0x80000,wptr 0x300000, zcond:1, [type: 2]
+start: 0x380000, len 0x80000, cap 0x80000,wptr 0x380000, zcond:1, [type: 2]
+start: 0x400000, len 0x80000, cap 0x80000,wptr 0x400000, zcond:1, [type: 2]
+start: 0x480000, len 0x80000, cap 0x80000,wptr 0x480000, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x1f380000, len 0x80000, cap 0x80000,wptr 0x1f380000, zcond:1, [type: 2]
+
+
+(2) opening the first zone
+report after:
+start: 0x0, len 0x80000, cap 0x80000,wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+report after:
+start: 0x80000, len 0x80000, cap 0x80000,wptr 0x80000, zcond:3, [type: 2]
+
+opening the last zone
+report after:
+start: 0x1f380000, len 0x80000, cap 0x80000,wptr 0x1f380000, zcond:3, [type: 2]
+
+
+(3) closing the first zone
+report after:
+start: 0x0, len 0x80000, cap 0x80000,wptr 0x0, zcond:1, [type: 2]
+
+closing the last zone
+report after:
+start: 0x1f380000, len 0x80000, cap 0x80000,wptr 0x1f380000, zcond:1, [type: 2]
+
+
+(4) finishing the second zone
+After finishing a zone:
+start: 0x80000, len 0x80000, cap 0x80000,wptr 0x100000, zcond:14, [type: 2]
+
+
+(5) resetting the second zone
+After resetting a zone:
+start: 0x80000, len 0x80000, cap 0x80000,wptr 0x80000, zcond:1, [type: 2]
+*** done
diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
new file mode 100755
index 0000000000..d158db09c8
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+  _cleanup_test_img
+  sudo rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+QEMU_IO="build/qemu-io"
+IMG="--image-opts -n driver=zoned_host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device:"
+echo "Simple cases: if the operations work"
+sudo modprobe null_blk nr_devices=1 zoned=1
+
+echo "(1) report the first zone:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+sudo $QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(2) opening the first zone"
+sudo $QEMU_IO $IMG -c "zo 0 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+sudo $QEMU_IO $IMG -c "zo 524288 0x80000" # 524288 is the zone sector size
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1" # 268435456 / 512 = 524288
+echo
+echo "opening the last zone"
+sudo $QEMU_IO $IMG -c "zo 0x1f380000 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(3) closing the first zone"
+sudo $QEMU_IO $IMG -c "zc 0 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+sudo $QEMU_IO $IMG -c "zc 0x1f380000 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+sudo $QEMU_IO $IMG -c "zf 524288 0x80000"
+echo "After finishing a zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+
+echo "(5) resetting the second zone"
+sudo $QEMU_IO $IMG -c "zrs 524288 0x80000"
+echo "After resetting a zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v7 8/8] docs/zoned-storage: add zoned device documentation
  2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
                   ` (6 preceding siblings ...)
  2022-08-16  6:25 ` [PATCH v7 7/8] qemu-iotests: test new zone operations Sam Li
@ 2022-08-16  6:25 ` Sam Li
  7 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block, damien.lemoal,
	Sam Li

Add the documentation about the zoned device support to virtio-blk
emulation.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/devel/zoned-storage.rst           | 41 ++++++++++++++++++++++++++
 docs/system/qemu-block-drivers.rst.inc |  6 ++++
 2 files changed, 47 insertions(+)
 create mode 100644 docs/devel/zoned-storage.rst

diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
new file mode 100644
index 0000000000..ead2d149cc
--- /dev/null
+++ b/docs/devel/zoned-storage.rst
@@ -0,0 +1,41 @@
+=============
+zoned-storage
+=============
+
+Zoned Block Devices (ZBDs) devide the LBA space into block regions called zones
+that are larger than the LBA size. It can only allow sequential writes, which
+reduces write amplification in SSDs, leading to higher throughput and increased
+capacity. More details about ZBDs can be found at:
+
+https://zonedstorage.io/docs/introduction/zoned-storage
+
+1. Block layer APIs for zoned storage
+-------------------------------------
+QEMU block layer has three zoned storage model:
+- BLK_Z_HM: This model only allows sequential writes access. It supports a set
+of ZBD-specific I/O request that used by the host to manage device zones.
+- BLK_Z_HA: It deals with both sequential writes and random writes access.
+- BLK_Z_NONE: Regular block devices and drive-managed ZBDs are treated as
+non-zoned devices.
+
+The block device information resides inside BlockDriverState. QEMU uses
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
+block layer while processing I/O requests. A BlockBackend has a root pointer to
+a BlockDriverState graph(for example, raw format on top of file-posix). The
+zoned storage information can be propagated from the leaf BlockDriverState all
+the way up to the BlockBackend. If the zoned storage model in file-posix is
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
+
+The block layer APIs support commands needed for zoned storage devices,
+including report zones, four zone operations, and zone append.
+
+2. Emulating zoned storage controllers
+--------------------------------------
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
+APIs for zoned storage emulation or testing.
+
+For example, the command line for zone report testing a null_blk device of
+qemu-io-cmds.c is:
+$ path/to/qemu-io --image-opts driver=zoned_host_device,filename=/dev/nullb0 -c
+"zrp offset nr_zones"
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..0b97227fd9 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
   you may corrupt your host data (use the ``-snapshot`` command
   line option or modify the device permissions accordingly).
 
+Zoned block devices
+  Zoned block devices can be passed through to the guest if the emulated storage
+  controller supports zoned storage. Use ``--blockdev zoned_host_device,
+  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
+  as ``drive0``.
+
 Windows
 ^^^^^^^
 
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model
  2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
@ 2022-08-16 16:11   ` Sam Li
  2022-08-16 17:32   ` Damien Le Moal
  2022-08-22 23:05   ` Stefan Hajnoczi
  2 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16 16:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi, Hanna Reitz, Dmitry Fomichev,
	qemu block, Damien Le Moal

Sam Li <faithilikerun@gmail.com> 于2022年8月16日周二 14:25写道:
>
> Use sysfs attribute files to get the string value of device
> zoned model. Then get_sysfs_zoned_model can convert it to
> BlockZoneModel type in QEMU.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/file-posix.c               | 93 ++++++++++++++++++--------------
>  include/block/block_int-common.h |  3 ++
>  2 files changed, 55 insertions(+), 41 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..c07ac4c697 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1210,66 +1210,71 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
>  #endif
>  }
>
> -static int hdev_get_max_segments(int fd, struct stat *st)
> -{
> +/*
> + * Convert the zoned attribute file in sysfs to internal value.
> + */
> +static int get_sysfs_str_val(struct stat *st, const char *attribute,
> +                             char **val) {
>  #ifdef CONFIG_LINUX
> -    char buf[32];
> -    const char *end;
> -    char *sysfspath = NULL;
> +    g_autofree char *sysfspath = NULL;
>      int ret;
> -    int sysfd = -1;
> -    long max_segments;
> -
> -    if (S_ISCHR(st->st_mode)) {
> -        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> -            return ret;
> -        }
> -        return -ENOTSUP;
> -    }
> +    size_t len;
>
>      if (!S_ISBLK(st->st_mode)) {
>          return -ENOTSUP;
>      }
>
> -    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st->st_rdev), minor(st->st_rdev));
> -    sysfd = open(sysfspath, O_RDONLY);
> -    if (sysfd == -1) {
> -        ret = -errno;
> -        goto out;
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> +                                major(st->st_rdev), minor(st->st_rdev),
> +                                attribute);
> +    ret = g_file_get_contents(sysfspath, val, &len, NULL);
> +    if (ret == -1) {
> +        return -ENOENT;
>      }

+/* The file is ended with '\n' */
+char *p;
+p = *val;
+if (*(p + len - 1) == '\n') {
+    *(p + len - 1) = '\0';
+}

I'm sorry to miss this part to make the str end with '\0'.

> -    do {
> -        ret = read(sysfd, buf, sizeof(buf) - 1);
> -    } while (ret == -1 && errno == EINTR);
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
> +    g_autofree char *val = NULL;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, "zoned", &val);
>      if (ret < 0) {
> -        ret = -errno;
> -        goto out;
> -    } else if (ret == 0) {
> -        ret = -EIO;
> -        goto out;
> +        return ret;
>      }
> -    buf[ret] = 0;
> -    /* The file is ended with '\n', pass 'end' to accept that. */
> -    ret = qemu_strtol(buf, &end, 10, &max_segments);
> -    if (ret == 0 && end && *end == '\n') {
> -        ret = max_segments;
> +
> +    if (strcmp(val, "host-managed") == 0) {
> +        *zoned = BLK_Z_HM;
> +    } else if (strcmp(val, "host-aware") == 0) {
> +        *zoned = BLK_Z_HA;
> +    } else if (strcmp(val, "none") == 0) {
> +        *zoned = BLK_Z_NONE;
> +    } else {
> +        return -ENOTSUP;
>      }
> +    return 0;
> +}
>
> -out:
> -    if (sysfd != -1) {
> -        close(sysfd);
> +static int hdev_get_max_segments(int fd, struct stat *st) {
> +    int ret;
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -ENOTSUP;
>      }
> -    g_free(sysfspath);
> -    return ret;
> -#else
> -    return -ENOTSUP;
> -#endif
> +    return get_sysfs_long_val(st, "max_segments");
>  }
>
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
>      struct stat st;
> +    int ret;
> +    BlockZoneModel zoned;
>
>      s->needs_alignment = raw_needs_alignment(bs);
>      raw_probe_alignment(bs, s->fd, errp);
> @@ -1307,6 +1312,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>              bs->bl.max_hw_iov = ret;
>          }
>      }
> +
> +    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> +    if (ret < 0) {
> +        zoned = BLK_Z_NONE;
> +    }
> +    bs->bl.zoned = zoned;
>  }
>
>  static int check_for_dasd(int fd)
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 8947abab76..7f7863cc9e 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -825,6 +825,9 @@ typedef struct BlockLimits {
>
>      /* maximum number of iovec elements */
>      int max_iov;
> +
> +    /* device zone model */
> +    BlockZoneModel zoned;
>  } BlockLimits;
>
>  typedef struct BdrvOpBlocker BdrvOpBlocker;
> --
> 2.37.1
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  2022-08-16  6:25 ` [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute Sam Li
@ 2022-08-16 16:13   ` Sam Li
  2022-08-16 17:35   ` Damien Le Moal
  1 sibling, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-16 16:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi, Hanna Reitz, Dmitry Fomichev,
	qemu block, Damien Le Moal

Sam Li <faithilikerun@gmail.com> 于2022年8月16日周二 14:25写道:
>
> Use sysfs attribute files to get the long value of zoned device
> information.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index c07ac4c697..727389488c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
>      return 0;
>  }
>
> +/*
> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> + * max_open_zones, max_active_zones) through sysfs attribute files.
> + */
> +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
> +#ifdef CONFIG_LINUX
> +    g_autofree char *str = NULL;
> +    const char *end;
> +    long val;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, attribute, &str);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /* The file is ended with '\n', pass 'end' to accept that. */
> +    ret = qemu_strtol(str, &end, 10, &val);
> +    if (ret == 0 && end && *end == '\n') {

Should be  "if (ret == 0 && end && *end == '\0') {". Changes accordingly.

> +        ret = val;
> +    }
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static int hdev_get_max_segments(int fd, struct stat *st) {
>      int ret;
>      if (S_ISCHR(st->st_mode)) {
> --
> 2.37.1
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 1/8] include: add zoned device structs
  2022-08-16  6:25 ` [PATCH v7 1/8] include: add zoned device structs Sam Li
@ 2022-08-16 17:27   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2022-08-16 17:27 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block

On 2022/08/15 23:25, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

> ---
>  include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index fdb7306e78..36bd0e480e 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildClass BdrvChildClass;
>  
> +typedef enum BlockZoneOp {
> +    BLK_ZO_OPEN,
> +    BLK_ZO_CLOSE,
> +    BLK_ZO_FINISH,
> +    BLK_ZO_RESET,
> +} BlockZoneOp;
> +
> +typedef enum BlockZoneModel {
> +    BLK_Z_NONE = 0x0, /* Regular block device */
> +    BLK_Z_HM = 0x1, /* Host-managed zoned block device */
> +    BLK_Z_HA = 0x2, /* Host-aware zoned block device */
> +} BlockZoneModel;
> +
> +typedef enum BlockZoneCondition {
> +    BLK_ZS_NOT_WP = 0x0,
> +    BLK_ZS_EMPTY = 0x1,
> +    BLK_ZS_IOPEN = 0x2,
> +    BLK_ZS_EOPEN = 0x3,
> +    BLK_ZS_CLOSED = 0x4,
> +    BLK_ZS_RDONLY = 0xD,
> +    BLK_ZS_FULL = 0xE,
> +    BLK_ZS_OFFLINE = 0xF,
> +} BlockZoneCondition;
> +
> +typedef enum BlockZoneType {
> +    BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
> +    BLK_ZT_SWR = 0x2, /* Sequential writes required */
> +    BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> +} BlockZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provides information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> +    uint64_t start;
> +    uint64_t length;
> +    uint64_t cap;
> +    uint64_t wp;
> +    BlockZoneType type;
> +    BlockZoneCondition cond;
> +} BlockZoneDescriptor;
> +
>  typedef struct BlockDriverInfo {
>      /* in bytes, 0 if irrelevant */
>      int cluster_size;


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model
  2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
  2022-08-16 16:11   ` Sam Li
@ 2022-08-16 17:32   ` Damien Le Moal
  2022-08-22 23:05   ` Stefan Hajnoczi
  2 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2022-08-16 17:32 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block

On 2022/08/15 23:25, Sam Li wrote:
> Use sysfs attribute files to get the string value of device
> zoned model. Then get_sysfs_zoned_model can convert it to
> BlockZoneModel type in QEMU.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/file-posix.c               | 93 ++++++++++++++++++--------------
>  include/block/block_int-common.h |  3 ++
>  2 files changed, 55 insertions(+), 41 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..c07ac4c697 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1210,66 +1210,71 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(int fd, struct stat *st)
> -{
> +/*
> + * Convert the zoned attribute file in sysfs to internal value.

This function does not convert anything. So this comment should be changed to
something like:

/*
 * Get a sysfs attribute value as a character string.
 */

> + */
> +static int get_sysfs_str_val(struct stat *st, const char *attribute,
> +                             char **val) {
>  #ifdef CONFIG_LINUX
> -    char buf[32];
> -    const char *end;
> -    char *sysfspath = NULL;
> +    g_autofree char *sysfspath = NULL;
>      int ret;
> -    int sysfd = -1;
> -    long max_segments;
> -
> -    if (S_ISCHR(st->st_mode)) {
> -        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> -            return ret;
> -        }
> -        return -ENOTSUP;
> -    }
> +    size_t len;
>  
>      if (!S_ISBLK(st->st_mode)) {
>          return -ENOTSUP;
>      }
>  
> -    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st->st_rdev), minor(st->st_rdev));
> -    sysfd = open(sysfspath, O_RDONLY);
> -    if (sysfd == -1) {
> -        ret = -errno;
> -        goto out;
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> +                                major(st->st_rdev), minor(st->st_rdev),
> +                                attribute);
> +    ret = g_file_get_contents(sysfspath, val, &len, NULL);
> +    if (ret == -1) {
> +        return -ENOENT;
>      }
> -    do {
> -        ret = read(sysfd, buf, sizeof(buf) - 1);
> -    } while (ret == -1 && errno == EINTR);
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
> +    g_autofree char *val = NULL;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, "zoned", &val);
>      if (ret < 0) {
> -        ret = -errno;
> -        goto out;
> -    } else if (ret == 0) {
> -        ret = -EIO;
> -        goto out;
> +        return ret;
>      }
> -    buf[ret] = 0;
> -    /* The file is ended with '\n', pass 'end' to accept that. */
> -    ret = qemu_strtol(buf, &end, 10, &max_segments);
> -    if (ret == 0 && end && *end == '\n') {
> -        ret = max_segments;
> +
> +    if (strcmp(val, "host-managed") == 0) {
> +        *zoned = BLK_Z_HM;
> +    } else if (strcmp(val, "host-aware") == 0) {
> +        *zoned = BLK_Z_HA;
> +    } else if (strcmp(val, "none") == 0) {
> +        *zoned = BLK_Z_NONE;
> +    } else {
> +        return -ENOTSUP;
>      }
> +    return 0;
> +}
>  
> -out:
> -    if (sysfd != -1) {
> -        close(sysfd);
> +static int hdev_get_max_segments(int fd, struct stat *st) {
> +    int ret;

Add a blank line here ? Not sure about the qemu code style convention. But a
blank line after a variable declaration is always nice to clearly separate
declarations and code.

> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -ENOTSUP;
>      }
> -    g_free(sysfspath);
> -    return ret;
> -#else
> -    return -ENOTSUP;
> -#endif
> +    return get_sysfs_long_val(st, "max_segments");
>  }
>  
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
>      struct stat st;
> +    int ret;
> +    BlockZoneModel zoned;
>  
>      s->needs_alignment = raw_needs_alignment(bs);
>      raw_probe_alignment(bs, s->fd, errp);
> @@ -1307,6 +1312,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>              bs->bl.max_hw_iov = ret;
>          }
>      }
> +
> +    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> +    if (ret < 0) {
> +        zoned = BLK_Z_NONE;
> +    }
> +    bs->bl.zoned = zoned;
>  }
>  
>  static int check_for_dasd(int fd)
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 8947abab76..7f7863cc9e 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -825,6 +825,9 @@ typedef struct BlockLimits {
>  
>      /* maximum number of iovec elements */
>      int max_iov;
> +
> +    /* device zone model */
> +    BlockZoneModel zoned;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  2022-08-16  6:25 ` [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute Sam Li
  2022-08-16 16:13   ` Sam Li
@ 2022-08-16 17:35   ` Damien Le Moal
  2022-08-16 17:53     ` Sam Li
  1 sibling, 1 reply; 28+ messages in thread
From: Damien Le Moal @ 2022-08-16 17:35 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block

On 2022/08/15 23:25, Sam Li wrote:
> Use sysfs attribute files to get the long value of zoned device
> information.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index c07ac4c697..727389488c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
>      return 0;
>  }
>  
> +/*
> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> + * max_open_zones, max_active_zones) through sysfs attribute files.
> + */

The comment here needs to be more generic since this helper is used in patch 2
in hdev_get_max_segments(). So simply something like:

/*
 * Get a sysfs attribute value as a long integer.
 */

And since this helper is used in patch 2, this patch needs to go before patch 2
(reverse patch 2 and 3 order).

> +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
> +#ifdef CONFIG_LINUX
> +    g_autofree char *str = NULL;
> +    const char *end;
> +    long val;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, attribute, &str);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /* The file is ended with '\n', pass 'end' to accept that. */
> +    ret = qemu_strtol(str, &end, 10, &val);
> +    if (ret == 0 && end && *end == '\n') {
> +        ret = val;
> +    }
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static int hdev_get_max_segments(int fd, struct stat *st) {
>      int ret;
>      if (S_ISCHR(st->st_mode)) {


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-16  6:25 ` [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2022-08-16 17:50   ` Damien Le Moal
  2022-08-26 12:20     ` Sam Li
  2022-08-23  0:49   ` Stefan Hajnoczi
  1 sibling, 1 reply; 28+ messages in thread
From: Damien Le Moal @ 2022-08-16 17:50 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: hare, Fam Zheng, Kevin Wolf, Eric Blake, Markus Armbruster,
	stefanha, Hanna Reitz, dmitry.fomichev, qemu-block

On 2022/08/15 23:25, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
> 
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
> 
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/block-backend.c             |  50 +++++
>  block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
>  block/io.c                        |  41 ++++
>  include/block/block-common.h      |   1 -
>  include/block/block-io.h          |  13 ++
>  include/block/block_int-common.h  |  22 +-
>  include/block/raw-aio.h           |   6 +-
>  include/sysemu/block-backend-io.h |   6 +
>  meson.build                       |   1 +
>  qapi/block-core.json              |   8 +-
>  qemu-io-cmds.c                    | 143 +++++++++++++
>  11 files changed, 625 insertions(+), 7 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index d4a5df2ac2..fc639b0cd7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>      return ret;
>  }
>  
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    unsigned int *nr_zones,
> +                                    BlockZoneDescriptor *zones)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk); /* increase before waiting */
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +        blk_dec_in_flight(blk);
> +        return -ENOMEDIUM;
> +    }
> +    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * offset is the starting zone specified as a sector offset.
> + * len is the maximum number of sectors the command should operate on.

You should mention that len should be zone size aligned. Also, for completness,
add a short description of the op argument too ?

> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +        int64_t offset, int64_t len)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    ret = blk_check_byte_request(blk, offset, len);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +        blk_dec_in_flight(blk);
> +        return -ENOMEDIUM;
> +    }
> +    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 727389488c..29f67082d9 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,9 @@
>  #include <sys/param.h>
>  #include <sys/syscall.h>
>  #include <sys/vfs.h>
> +#if defined(CONFIG_BLKZONED)
> +#include <linux/blkzoned.h>
> +#endif
>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>              PreallocMode prealloc;
>              Error **errp;
>          } truncate;
> +        struct {
> +            unsigned int *nr_zones;
> +            BlockZoneDescriptor *zones;
> +        } zone_report;
> +        struct {
> +            unsigned long ioctl_op;

May be clarify this field usage by calling it zone_op ?

> +        } zone_mgmt;
>      };
>  } RawPosixAIOData;
>  
> @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  #endif
>  
>      if (bs->sg || S_ISBLK(st.st_mode)) {
> -        int ret = hdev_get_max_hw_transfer(s->fd, &st);
> +        ret = hdev_get_max_hw_transfer(s->fd, &st);
>  
>          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>              bs->bl.max_hw_transfer = ret;
> @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>          }
>      }
>  
> -    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> +    ret = get_sysfs_zoned_model(&st, &zoned);
>      if (ret < 0) {
>          zoned = BLK_Z_NONE;
>      }
>      bs->bl.zoned = zoned;
> +    if (zoned != BLK_Z_NONE) {
> +        ret = get_sysfs_long_val(&st, "chunk_sectors");
> +        if (ret > 0) {
> +            bs->bl.zone_sectors = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
> +        if (ret > 0) {
> +            bs->bl.zone_append_max_bytes = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "max_open_zones");
> +        if (ret > 0) {

The value can be 0, so this should be "if (ret >= 0) {".

> +            bs->bl.max_open_zones = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "max_active_zones");
> +        if (ret > 0) {

The value can be 0, so this should be "if (ret >= 0) {".

> +            bs->bl.max_active_zones = ret;
> +        }
> +    }
>  }
>  
>  static int check_for_dasd(int fd)
> @@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>  }
>  #endif
>  
> +/*
> + * parse_zone - Fill a zone descriptor
> + */
> +#if defined(CONFIG_BLKZONED)
> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> +                              struct blk_zone *blkz) {
> +    zone->start = blkz->start;
> +    zone->length = blkz->len;
> +    zone->cap = blkz->capacity;
> +    zone->wp = blkz->wp;
> +
> +    switch (blkz->type) {
> +    case BLK_ZONE_TYPE_SEQWRITE_REQ:
> +        zone->type = BLK_ZT_SWR;
> +        break;
> +    case BLK_ZONE_TYPE_SEQWRITE_PREF:
> +        zone->type = BLK_ZT_SWP;
> +        break;
> +    case BLK_ZONE_TYPE_CONVENTIONAL:
> +        zone->type = BLK_ZT_CONV;
> +        break;
> +    default:
> +        error_report("Invalid zone type: 0x%x", blkz->type);
> +    }
> +
> +    switch (blkz->cond) {
> +    case BLK_ZONE_COND_NOT_WP:
> +        zone->cond = BLK_ZS_NOT_WP;
> +        break;
> +    case BLK_ZONE_COND_EMPTY:
> +        zone->cond = BLK_ZS_EMPTY;
> +        break;
> +    case BLK_ZONE_COND_IMP_OPEN:
> +        zone->cond =BLK_ZS_IOPEN;
> +        break;
> +    case BLK_ZONE_COND_EXP_OPEN:
> +        zone->cond = BLK_ZS_EOPEN;
> +        break;
> +    case BLK_ZONE_COND_CLOSED:
> +        zone->cond = BLK_ZS_CLOSED;
> +        break;
> +    case BLK_ZONE_COND_READONLY:
> +        zone->cond = BLK_ZS_RDONLY;
> +        break;
> +    case BLK_ZONE_COND_FULL:
> +        zone->cond = BLK_ZS_FULL;
> +        break;
> +    case BLK_ZONE_COND_OFFLINE:
> +        zone->cond = BLK_ZS_OFFLINE;
> +        break;
> +    default:
> +        error_report("Invalid zone condition 0x%x", blkz->cond);
> +    }
> +}
> +#endif
> +
> +static int handle_aiocb_zone_report(void *opaque) {
> +#if defined(CONFIG_BLKZONED)
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> +    int64_t sector = aiocb->aio_offset;
> +
> +    struct blk_zone *blkz;
> +    int64_t rep_size;
> +    unsigned int nrz;
> +    int ret, n = 0, i = 0;
> +
> +    nrz = *nr_zones;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +    rep = g_malloc(rep_size);
> +
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +        memset(rep, 0, rep_size);
> +        rep->sector = sector;
> +        rep->nr_zones = nrz - n;
> +
> +        ret = ioctl(fd, BLKREPORTZONE, rep);
> +        if (ret != 0) {
> +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +                         fd, sector, errno);
> +            return -errno;
> +        }
> +
> +        if (!rep->nr_zones) {
> +            break;
> +        }
> +
> +        for (i = 0; i < rep->nr_zones; i++, n++) {
> +            parse_zone(&zones[n], &blkz[i]);
> +            /* The next report should start after the last zone reported */
> +            sector = blkz[i].start + blkz[i].len;
> +        }
> +    }
> +
> +    *nr_zones = n;
> +    return 0;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +static int handle_aiocb_zone_mgmt(void *opaque) {
> +#if defined(CONFIG_BLKZONED)
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t sector = aiocb->aio_offset;
> +    int64_t nr_sectors = aiocb->aio_nbytes;
> +    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;

Nit: I do not think these variables are very useful. You could reference
directly the aiocb fields in the code below.

> +    struct blk_zone_range range;
> +    int ret;
> +
> +    /* Execute the operation */
> +    range.sector = sector;
> +    range.nr_sectors = nr_sectors;
> +    do {
> +        ret = ioctl(fd, ioctl_op, &range);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static int handle_aiocb_copy_range(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>      }
>  }
>  
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + *
> + * @param bs: passing zone block device file descriptor
> + * @param zones: an array of zone descriptors to hold zone
> + * information on reply
> + * @param offset: offset can be any byte within the zone size.
> + * @param len: (not sure yet.
> + * @return 0 on success, -1 on failure
> + */
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                           unsigned int *nr_zones,
> +                                           BlockZoneDescriptor *zones) {
> +#if defined(CONFIG_BLKZONED)
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs         = bs,
> +        .aio_fildes = s->fd,
> +        .aio_type   = QEMU_AIO_ZONE_REPORT,
> +        /* zoned block devices use 512-byte sectors */
> +        .aio_offset = offset / 512,

This conversion from bytes to 512B sectors would be better placed in
handle_aiocb_zone_report(). Doing so, all the API uses bytes, similarly to other
operations and the conversion to 512B sectors only done for Linux specific ioctl
code.

> +        .zone_report    = {
> +                .nr_zones       = nr_zones,
> +                .zones          = zones,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +        int64_t offset, int64_t len) {
> +#if defined(CONFIG_BLKZONED)
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +    int64_t zone_sector, zone_sector_mask;
> +    const char *ioctl_name;
> +    unsigned long ioctl_op;
> +    int ret;
> +
> +    struct stat st;
> +    if (fstat(s->fd, &st) < 0) {
> +        ret = -errno;
> +        return ret;
> +    }
> +    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
> +    if (zone_sector < 0) {
> +        error_report("invalid zone sector size %" PRId64 "", zone_sector);
> +        return -EINVAL;
> +    }

You already got this value in bs->bl.zone_sectors in raw_refresh_limits(). So I
you should not need to read it again from sysfs.

> +
> +    zone_sector_mask = zone_sector - 1;
> +    if (offset & zone_sector_mask) {
> +        error_report("sector offset %" PRId64 " is not aligned to zone size "
> +                     "%" PRId64 "", offset, zone_sector);
> +        return -EINVAL;
> +    }
> +
> +    if (len & zone_sector_mask) {
> +        error_report("number of sectors %" PRId64 " is not aligned to zone size"
> +                      " %" PRId64 "", len, zone_sector);
> +        return -EINVAL;
> +    }
> +
> +    switch (op) {
> +    case BLK_ZO_OPEN:
> +        ioctl_name = "BLKOPENZONE";
> +        ioctl_op = BLKOPENZONE;
> +        break;
> +    case BLK_ZO_CLOSE:
> +        ioctl_name = "BLKCLOSEZONE";
> +        ioctl_op = BLKCLOSEZONE;
> +        break;
> +    case BLK_ZO_FINISH:
> +        ioctl_name = "BLKFINISHZONE";
> +        ioctl_op = BLKFINISHZONE;
> +        break;
> +    case BLK_ZO_RESET:
> +        ioctl_name = "BLKRESETZONE";
> +        ioctl_op = BLKRESETZONE;
> +        break;
> +    default:
> +        error_report("Invalid zone operation 0x%x", op);
> +        return -EINVAL;
> +    }
> +
> +    acb = (RawPosixAIOData) {
> +        .bs             = bs,
> +        .aio_fildes     = s->fd,
> +        .aio_type       = QEMU_AIO_ZONE_MGMT,
> +        .aio_offset     = offset,
> +        .aio_nbytes     = len,

Are these 2 values in bytes or in 512B sectors ? Looking at
handle_aiocb_zone_mgmt(), it looks like 512B sectors. So where is the conversion
done from bytes ?

> +        .zone_mgmt  = {
> +                .ioctl_op = ioctl_op,
> +        },
> +    };
> +
> +    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> +    if (ret != 0) {
> +        error_report("ioctl %s failed %d", ioctl_name, errno);
> +        return -errno;
> +    }
> +
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static coroutine_fn int
>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                  bool blkdev)
> @@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
>      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
> +                                Error **errp)
> +{
> +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
> +}
> +#endif
> +
>  static bool hdev_is_sg(BlockDriverState *bs)
>  {
>  
> @@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
>  #endif
>  };
>  
> +#if defined(CONFIG_BLKZONED)
> +static BlockDriver bdrv_zoned_host_device = {
> +        .format_name = "zoned_host_device",
> +        .protocol_name = "zoned_host_device",
> +        .instance_size = sizeof(BDRVRawState),
> +        .bdrv_needs_filename = true,
> +        .bdrv_probe_device  = hdev_probe_device,
> +        .bdrv_parse_filename = zoned_host_device_parse_filename,
> +        .bdrv_file_open     = hdev_open,
> +        .bdrv_close         = raw_close,
> +        .bdrv_reopen_prepare = raw_reopen_prepare,
> +        .bdrv_reopen_commit  = raw_reopen_commit,
> +        .bdrv_reopen_abort   = raw_reopen_abort,
> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> +        .create_opts         = &bdrv_create_opts_simple,
> +        .mutable_opts        = mutable_opts,
> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> +
> +        .bdrv_co_preadv         = raw_co_preadv,
> +        .bdrv_co_pwritev        = raw_co_pwritev,
> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> +        .bdrv_refresh_limits = raw_refresh_limits,
> +        .bdrv_io_plug = raw_aio_plug,
> +        .bdrv_io_unplug = raw_aio_unplug,
> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> +
> +        .bdrv_co_truncate       = raw_co_truncate,
> +        .bdrv_getlength = raw_getlength,
> +        .bdrv_get_info = raw_get_info,
> +        .bdrv_get_allocated_file_size
> +                            = raw_get_allocated_file_size,
> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> +        .bdrv_check_perm = raw_check_perm,
> +        .bdrv_set_perm   = raw_set_perm,
> +        .bdrv_abort_perm_update = raw_abort_perm_update,
> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +        .bdrv_probe_geometry = hdev_probe_geometry,
> +        .bdrv_co_ioctl = hdev_co_ioctl,
> +
> +        /* zone management operations */
> +        .bdrv_co_zone_report = raw_co_zone_report,
> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +};
> +#endif
> +
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static void cdrom_parse_filename(const char *filename, QDict *options,
>                                   Error **errp)
> @@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
>      bdrv_register(&bdrv_file);
>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>      bdrv_register(&bdrv_host_device);
> +#if defined(CONFIG_BLKZONED)
> +    bdrv_register(&bdrv_zoned_host_device);
> +#endif
>  #ifdef __linux__
>      bdrv_register(&bdrv_host_cdrom);
>  #endif
> diff --git a/block/io.c b/block/io.c
> index 0a8cbefe86..de9ec1d740 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3198,6 +3198,47 @@ out:
>      return co.ret;
>  }
>  
> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                        unsigned int *nr_zones,
> +                        BlockZoneDescriptor *zones)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || !drv->bdrv_co_zone_report) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
> +int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +        int64_t offset, int64_t len)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || !drv->bdrv_co_zone_mgmt) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
>      IO_CODE();
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 36bd0e480e..5102fa6858 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -23,7 +23,6 @@
>   */
>  #ifndef BLOCK_COMMON_H
>  #define BLOCK_COMMON_H
> -
>  #include "block/aio.h"
>  #include "block/aio-wait.h"
>  #include "qemu/iov.h"
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index fd25ffa9be..55ad261e16 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>  /* Ensure contents are flushed to disk.  */
>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>  
> +/* Report zone information of zone block device. */
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                     unsigned int *nr_zones,
> +                                     BlockZoneDescriptor *zones);
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +                                   int64_t offset, int64_t len);
> +
>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> @@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  int generated_co_wrapper
>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  
> +int generated_co_wrapper
> +blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
> +                BlockZoneDescriptor *zones);
> +int generated_co_wrapper
> +blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
> +
>  /**
>   * bdrv_parent_drained_begin_single:
>   *
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 7f7863cc9e..de44c7b6f4 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
>      struct BdrvTrackedRequest *waiting_for;
>  } BdrvTrackedRequest;
>  
> -
>  struct BlockDriver {
>      /*
>       * These fields are initialized when this object is created,
> @@ -691,6 +690,12 @@ struct BlockDriver {
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
>  
> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> +            int64_t offset, unsigned int *nr_zones,
> +            BlockZoneDescriptor *zones);
> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
> +            int64_t offset, int64_t len);
> +
>      /* removable device specific */
>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> @@ -828,6 +833,21 @@ typedef struct BlockLimits {
>  
>      /* device zone model */
>      BlockZoneModel zoned;
> +
> +    /* zone size expressed in 512-byte sectors */
> +    uint32_t zone_sectors;
> +
> +    /* total number of zones */
> +    unsigned int nr_zones;
> +
> +    /* maximum size in bytes of a zone append write operation */
> +    int64_t zone_append_max_bytes;
> +
> +    /* maximum number of open zones */
> +    int64_t max_open_zones;
> +
> +    /* maximum number of active zones */
> +    int64_t max_active_zones;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index 21fc10c4c9..3d26929cdd 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -29,6 +29,8 @@
>  #define QEMU_AIO_WRITE_ZEROES 0x0020
>  #define QEMU_AIO_COPY_RANGE   0x0040
>  #define QEMU_AIO_TRUNCATE     0x0080
> +#define QEMU_AIO_ZONE_REPORT  0x0100
> +#define QEMU_AIO_ZONE_MGMT    0x0200
>  #define QEMU_AIO_TYPE_MASK \
>          (QEMU_AIO_READ | \
>           QEMU_AIO_WRITE | \
> @@ -37,7 +39,9 @@
>           QEMU_AIO_DISCARD | \
>           QEMU_AIO_WRITE_ZEROES | \
>           QEMU_AIO_COPY_RANGE | \
> -         QEMU_AIO_TRUNCATE)
> +         QEMU_AIO_TRUNCATE  | \
> +         QEMU_AIO_ZONE_REPORT | \
> +         QEMU_AIO_ZONE_MGMT)
>  
>  /* AIO flags */
>  #define QEMU_AIO_MISALIGNED   0x1000
> diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
> index 50f5aa2e07..6e7df1d93b 100644
> --- a/include/sysemu/block-backend-io.h
> +++ b/include/sysemu/block-backend-io.h
> @@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>  int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>                                        int64_t bytes, BdrvRequestFlags flags);
>  
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    unsigned int *nr_zones,
> +                                    BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +                                  int64_t offset, int64_t len);
> +
>  int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
>                                        int64_t bytes);
>  int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
> diff --git a/meson.build b/meson.build
> index 294e9a8f32..c3219b0e87 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
>  # has_header
>  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
>  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
> +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
>  config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
>  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
>  config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2173e7734a..c6bbb7a037 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2942,6 +2942,7 @@
>  # @compress: Since 5.0
>  # @copy-before-write: Since 6.2
>  # @snapshot-access: Since 7.0
> +# @zoned_host_device: Since 7.2
>  #
>  # Since: 2.9
>  ##
> @@ -2955,7 +2956,8 @@
>              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> +            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -4329,7 +4331,9 @@
>        'vhdx':       'BlockdevOptionsGenericFormat',
>        'vmdk':       'BlockdevOptionsGenericCOWFormat',
>        'vpc':        'BlockdevOptionsGenericFormat',
> -      'vvfat':      'BlockdevOptionsVVFAT'
> +      'vvfat':      'BlockdevOptionsVVFAT',
> +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
> +                             'if': 'CONFIG_BLKZONED' }
>    } }
>  
>  ##
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 952dc940f1..687c3a624c 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
>      .oneline    = "flush all in-core file state to disk",
>  };
>  
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset;
> +    unsigned int nr_zones;
> +
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    nr_zones = cvtnum(argv[optind]);
> +
> +    g_autofree BlockZoneDescriptor *zones = NULL;
> +    zones = g_new(BlockZoneDescriptor, nr_zones);
> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> +    if (ret < 0) {
> +        printf("zone report failed: %s\n", strerror(-ret));
> +    } else {
> +        for (int i = 0; i < nr_zones; ++i) {
> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> +                   "zcond:%u, [type: %u]\n",
> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> +                   zones[i].cond, zones[i].type);
> +        }
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> +        .name = "zone_report",
> +        .altname = "zrp",
> +        .cfunc = zone_report_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset number",
> +        .oneline = "report zone information",
> +};
> +
> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
> +    if (ret < 0) {
> +        printf("zone open failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_open_cmd = {
> +        .name = "zone_open",
> +        .altname = "zo",
> +        .cfunc = zone_open_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "explicit open a range of zones in zone block device",
> +};
> +
> +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
> +    if (ret < 0) {
> +        printf("zone close failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_close_cmd = {
> +        .name = "zone_close",
> +        .altname = "zc",
> +        .cfunc = zone_close_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "close a range of zones in zone block device",
> +};
> +
> +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
> +    if (ret < 0) {
> +        printf("zone finish failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_finish_cmd = {
> +        .name = "zone_finish",
> +        .altname = "zf",
> +        .cfunc = zone_finish_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "finish a range of zones in zone block device",
> +};
> +
> +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
> +    if (ret < 0) {
> +        printf("zone reset failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_reset_cmd = {
> +        .name = "zone_reset",
> +        .altname = "zrs",
> +        .cfunc = zone_reset_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "reset a zone write pointer in zone block device",
> +};
> +
>  static int truncate_f(BlockBackend *blk, int argc, char **argv);
>  static const cmdinfo_t truncate_cmd = {
>      .name       = "truncate",
> @@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
>      qemuio_add_command(&aio_write_cmd);
>      qemuio_add_command(&aio_flush_cmd);
>      qemuio_add_command(&flush_cmd);
> +    qemuio_add_command(&zone_report_cmd);
> +    qemuio_add_command(&zone_open_cmd);
> +    qemuio_add_command(&zone_close_cmd);
> +    qemuio_add_command(&zone_finish_cmd);
> +    qemuio_add_command(&zone_reset_cmd);
>      qemuio_add_command(&truncate_cmd);
>      qemuio_add_command(&length_cmd);
>      qemuio_add_command(&info_cmd);


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  2022-08-16 17:35   ` Damien Le Moal
@ 2022-08-16 17:53     ` Sam Li
  2022-08-16 17:55       ` Damien Le Moal
  0 siblings, 1 reply; 28+ messages in thread
From: Sam Li @ 2022-08-16 17:53 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi, Hanna Reitz, Dmitry Fomichev,
	qemu block

Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年8月17日周三 01:35写道:
>
> On 2022/08/15 23:25, Sam Li wrote:
> > Use sysfs attribute files to get the long value of zoned device
> > information.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/file-posix.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index c07ac4c697..727389488c 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
> >      return 0;
> >  }
> >
> > +/*
> > + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> > + * max_open_zones, max_active_zones) through sysfs attribute files.
> > + */
>
> The comment here needs to be more generic since this helper is used in patch 2
> in hdev_get_max_segments(). So simply something like:
>
> /*
>  * Get a sysfs attribute value as a long integer.
>  */
>
> And since this helper is used in patch 2, this patch needs to go before patch 2
> (reverse patch 2 and 3 order).

Can I merge patch2 and patch 3 into one patch? Because in patch 2
hdev_get_max_segments -> get_sysfs_long_val(-> get_sysfs_str_val)
while in patch 3 get_sysfs_long_val-> get_sysfs_str_val,
hdev_get_max_segments is required for qemu setting up I guess so the
dependency is intertwined here. If we use separate patches, then the
last patch will modify the first patch's code, which I think is messy.

>
> > +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
> > +#ifdef CONFIG_LINUX
> > +    g_autofree char *str = NULL;
> > +    const char *end;
> > +    long val;
> > +    int ret;
> > +
> > +    ret = get_sysfs_str_val(st, attribute, &str);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    /* The file is ended with '\n', pass 'end' to accept that. */
> > +    ret = qemu_strtol(str, &end, 10, &val);
> > +    if (ret == 0 && end && *end == '\n') {
> > +        ret = val;
> > +    }
> > +    return ret;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static int hdev_get_max_segments(int fd, struct stat *st) {
> >      int ret;
> >      if (S_ISCHR(st->st_mode)) {
>
>
> --
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
  2022-08-16 17:53     ` Sam Li
@ 2022-08-16 17:55       ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2022-08-16 17:55 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi, Hanna Reitz, Dmitry Fomichev,
	qemu block

On 2022/08/16 10:53, Sam Li wrote:
> Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年8月17日周三 01:35写道:
>>
>> On 2022/08/15 23:25, Sam Li wrote:
>>> Use sysfs attribute files to get the long value of zoned device
>>> information.
>>>
>>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>  block/file-posix.c | 27 +++++++++++++++++++++++++++
>>>  1 file changed, 27 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index c07ac4c697..727389488c 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
>>>      return 0;
>>>  }
>>>
>>> +/*
>>> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
>>> + * max_open_zones, max_active_zones) through sysfs attribute files.
>>> + */
>>
>> The comment here needs to be more generic since this helper is used in patch 2
>> in hdev_get_max_segments(). So simply something like:
>>
>> /*
>>  * Get a sysfs attribute value as a long integer.
>>  */
>>
>> And since this helper is used in patch 2, this patch needs to go before patch 2
>> (reverse patch 2 and 3 order).
> 
> Can I merge patch2 and patch 3 into one patch? Because in patch 2
> hdev_get_max_segments -> get_sysfs_long_val(-> get_sysfs_str_val)
> while in patch 3 get_sysfs_long_val-> get_sysfs_str_val,
> hdev_get_max_segments is required for qemu setting up I guess so the
> dependency is intertwined here. If we use separate patches, then the
> last patch will modify the first patch's code, which I think is messy.

Indeed. So merge the 2 patches to solve this. Rework the commit message too to
mention the introduction of the get_sysfs_long_val() helper.

> 
>>
>>> +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
>>> +#ifdef CONFIG_LINUX
>>> +    g_autofree char *str = NULL;
>>> +    const char *end;
>>> +    long val;
>>> +    int ret;
>>> +
>>> +    ret = get_sysfs_str_val(st, attribute, &str);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* The file is ended with '\n', pass 'end' to accept that. */
>>> +    ret = qemu_strtol(str, &end, 10, &val);
>>> +    if (ret == 0 && end && *end == '\n') {
>>> +        ret = val;
>>> +    }
>>> +    return ret;
>>> +#else
>>> +    return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>>  static int hdev_get_max_segments(int fd, struct stat *st) {
>>>      int ret;
>>>      if (S_ISCHR(st->st_mode)) {
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model
  2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
  2022-08-16 16:11   ` Sam Li
  2022-08-16 17:32   ` Damien Le Moal
@ 2022-08-22 23:05   ` Stefan Hajnoczi
  2022-08-23  4:31     ` Sam Li
  2 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-08-22 23:05 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, hare, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, dmitry.fomichev, qemu-block,
	damien.lemoal

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On Tue, Aug 16, 2022 at 02:25:16PM +0800, Sam Li wrote:
> +static int hdev_get_max_segments(int fd, struct stat *st) {
> +    int ret;
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {

The ioctl must be within #ifdef CONFIG_LINUX since SG_GET_SG_TABLESIZE
will be undefined on other operating systems and a compiler error will
be encountered. Maybe keep the #ifdef around the entire body of this
hdev_get_max_segments().

> +            return ret;
> +        }
> +        return -ENOTSUP;
>      }
> -    g_free(sysfspath);
> -    return ret;
> -#else
> -    return -ENOTSUP;
> -#endif
> +    return get_sysfs_long_val(st, "max_segments");

Where is get_sysfs_long_val() defined? Maybe in a later patch? The code
must compile after each patch. You can test this with "git rebase -i
origin/master" and then adding "x make" lines after each commit in the
interactive rebase file. When rebase runs it will execute make after
each commit and will stop if make fails.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-16  6:25 ` [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
  2022-08-16 17:50   ` Damien Le Moal
@ 2022-08-23  0:49   ` Stefan Hajnoczi
  2022-08-23  4:12     ` Sam Li
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-08-23  0:49 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, hare, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, dmitry.fomichev, qemu-block,
	damien.lemoal

[-- Attachment #1: Type: text/plain, Size: 32412 bytes --]

On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
> 
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
> 
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/block-backend.c             |  50 +++++
>  block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
>  block/io.c                        |  41 ++++
>  include/block/block-common.h      |   1 -
>  include/block/block-io.h          |  13 ++
>  include/block/block_int-common.h  |  22 +-
>  include/block/raw-aio.h           |   6 +-
>  include/sysemu/block-backend-io.h |   6 +
>  meson.build                       |   1 +
>  qapi/block-core.json              |   8 +-
>  qemu-io-cmds.c                    | 143 +++++++++++++
>  11 files changed, 625 insertions(+), 7 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index d4a5df2ac2..fc639b0cd7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>      return ret;
>  }
>  
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    unsigned int *nr_zones,
> +                                    BlockZoneDescriptor *zones)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk); /* increase before waiting */
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +        blk_dec_in_flight(blk);
> +        return -ENOMEDIUM;
> +    }
> +    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * offset is the starting zone specified as a sector offset.
> + * len is the maximum number of sectors the command should operate on.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +        int64_t offset, int64_t len)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    ret = blk_check_byte_request(blk, offset, len);
> +    if (ret < 0) {
> +        return ret;
> +    }

blk_check_byte_request() calls blk_is_available() and returns -ENOMEDIUM
when it fails. You can therefore move this down and replace "if
(!blk_is_available(blk)) {".

> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +        blk_dec_in_flight(blk);
> +        return -ENOMEDIUM;
> +    }
> +    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 727389488c..29f67082d9 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,9 @@
>  #include <sys/param.h>
>  #include <sys/syscall.h>
>  #include <sys/vfs.h>
> +#if defined(CONFIG_BLKZONED)
> +#include <linux/blkzoned.h>
> +#endif
>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>              PreallocMode prealloc;
>              Error **errp;
>          } truncate;
> +        struct {
> +            unsigned int *nr_zones;
> +            BlockZoneDescriptor *zones;
> +        } zone_report;
> +        struct {
> +            unsigned long ioctl_op;
> +        } zone_mgmt;
>      };
>  } RawPosixAIOData;
>  
> @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  #endif
>  
>      if (bs->sg || S_ISBLK(st.st_mode)) {
> -        int ret = hdev_get_max_hw_transfer(s->fd, &st);
> +        ret = hdev_get_max_hw_transfer(s->fd, &st);
>  
>          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>              bs->bl.max_hw_transfer = ret;
> @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>          }
>      }
>  
> -    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> +    ret = get_sysfs_zoned_model(&st, &zoned);
>      if (ret < 0) {
>          zoned = BLK_Z_NONE;
>      }
>      bs->bl.zoned = zoned;
> +    if (zoned != BLK_Z_NONE) {
> +        ret = get_sysfs_long_val(&st, "chunk_sectors");
> +        if (ret > 0) {
> +            bs->bl.zone_sectors = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
> +        if (ret > 0) {
> +            bs->bl.zone_append_max_bytes = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "max_open_zones");
> +        if (ret > 0) {
> +            bs->bl.max_open_zones = ret;
> +        }
> +
> +        ret = get_sysfs_long_val(&st, "max_active_zones");
> +        if (ret > 0) {
> +            bs->bl.max_active_zones = ret;
> +        }
> +    }
>  }
>  
>  static int check_for_dasd(int fd)
> @@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>  }
>  #endif
>  
> +/*
> + * parse_zone - Fill a zone descriptor
> + */
> +#if defined(CONFIG_BLKZONED)
> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> +                              struct blk_zone *blkz) {

Declaring the second argument "const struct blk_zone *blkz" would make
it clear that this function converts from blk_zone to
BlockZoneDescriptor.

> +    zone->start = blkz->start;
> +    zone->length = blkz->len;
> +    zone->cap = blkz->capacity;
> +    zone->wp = blkz->wp;
> +
> +    switch (blkz->type) {
> +    case BLK_ZONE_TYPE_SEQWRITE_REQ:
> +        zone->type = BLK_ZT_SWR;
> +        break;
> +    case BLK_ZONE_TYPE_SEQWRITE_PREF:
> +        zone->type = BLK_ZT_SWP;
> +        break;
> +    case BLK_ZONE_TYPE_CONVENTIONAL:
> +        zone->type = BLK_ZT_CONV;
> +        break;
> +    default:
> +        error_report("Invalid zone type: 0x%x", blkz->type);

Or g_assert_not_reached() to indicate that this should never happen. If
it does happen the process will call abort(3) and it will terminate with
a coredump file for debugging.

> +    }
> +
> +    switch (blkz->cond) {
> +    case BLK_ZONE_COND_NOT_WP:
> +        zone->cond = BLK_ZS_NOT_WP;
> +        break;
> +    case BLK_ZONE_COND_EMPTY:
> +        zone->cond = BLK_ZS_EMPTY;
> +        break;
> +    case BLK_ZONE_COND_IMP_OPEN:
> +        zone->cond =BLK_ZS_IOPEN;
> +        break;
> +    case BLK_ZONE_COND_EXP_OPEN:
> +        zone->cond = BLK_ZS_EOPEN;
> +        break;
> +    case BLK_ZONE_COND_CLOSED:
> +        zone->cond = BLK_ZS_CLOSED;
> +        break;
> +    case BLK_ZONE_COND_READONLY:
> +        zone->cond = BLK_ZS_RDONLY;
> +        break;
> +    case BLK_ZONE_COND_FULL:
> +        zone->cond = BLK_ZS_FULL;
> +        break;
> +    case BLK_ZONE_COND_OFFLINE:
> +        zone->cond = BLK_ZS_OFFLINE;
> +        break;
> +    default:
> +        error_report("Invalid zone condition 0x%x", blkz->cond);

Same here.

> +    }
> +}
> +#endif
> +
> +static int handle_aiocb_zone_report(void *opaque) {
> +#if defined(CONFIG_BLKZONED)
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> +    int64_t sector = aiocb->aio_offset;
> +
> +    struct blk_zone *blkz;
> +    int64_t rep_size;
> +    unsigned int nrz;
> +    int ret, n = 0, i = 0;
> +
> +    nrz = *nr_zones;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +    rep = g_malloc(rep_size);
> +
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +        memset(rep, 0, rep_size);
> +        rep->sector = sector;
> +        rep->nr_zones = nrz - n;
> +
> +        ret = ioctl(fd, BLKREPORTZONE, rep);

Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?

> +        if (ret != 0) {
> +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +                         fd, sector, errno);
> +            return -errno;
> +        }
> +
> +        if (!rep->nr_zones) {
> +            break;
> +        }
> +
> +        for (i = 0; i < rep->nr_zones; i++, n++) {
> +            parse_zone(&zones[n], &blkz[i]);
> +            /* The next report should start after the last zone reported */
> +            sector = blkz[i].start + blkz[i].len;
> +        }
> +    }
> +
> +    *nr_zones = n;
> +    return 0;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +static int handle_aiocb_zone_mgmt(void *opaque) {
> +#if defined(CONFIG_BLKZONED)
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t sector = aiocb->aio_offset;
> +    int64_t nr_sectors = aiocb->aio_nbytes;
> +    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;
> +    struct blk_zone_range range;
> +    int ret;
> +
> +    /* Execute the operation */
> +    range.sector = sector;
> +    range.nr_sectors = nr_sectors;
> +    do {
> +        ret = ioctl(fd, ioctl_op, &range);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    return ret;

  if (ret < 0) {
      return -errno;
  }
  return 0;

> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static int handle_aiocb_copy_range(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>      }
>  }
>  
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + *
> + * @param bs: passing zone block device file descriptor
> + * @param zones: an array of zone descriptors to hold zone
> + * information on reply
> + * @param offset: offset can be any byte within the zone size.
> + * @param len: (not sure yet.
> + * @return 0 on success, -1 on failure
> + */
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                           unsigned int *nr_zones,
> +                                           BlockZoneDescriptor *zones) {
> +#if defined(CONFIG_BLKZONED)
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs         = bs,
> +        .aio_fildes = s->fd,
> +        .aio_type   = QEMU_AIO_ZONE_REPORT,
> +        /* zoned block devices use 512-byte sectors */
> +        .aio_offset = offset / 512,
> +        .zone_report    = {
> +                .nr_zones       = nr_zones,
> +                .zones          = zones,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +        int64_t offset, int64_t len) {
> +#if defined(CONFIG_BLKZONED)
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +    int64_t zone_sector, zone_sector_mask;
> +    const char *ioctl_name;
> +    unsigned long ioctl_op;
> +    int ret;
> +
> +    struct stat st;
> +    if (fstat(s->fd, &st) < 0) {
> +        ret = -errno;
> +        return ret;
> +    }
> +    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
> +    if (zone_sector < 0) {
> +        error_report("invalid zone sector size %" PRId64 "", zone_sector);
> +        return -EINVAL;
> +    }
> +
> +    zone_sector_mask = zone_sector - 1;
> +    if (offset & zone_sector_mask) {
> +        error_report("sector offset %" PRId64 " is not aligned to zone size "
> +                     "%" PRId64 "", offset, zone_sector);
> +        return -EINVAL;
> +    }
> +
> +    if (len & zone_sector_mask) {
> +        error_report("number of sectors %" PRId64 " is not aligned to zone size"
> +                      " %" PRId64 "", len, zone_sector);
> +        return -EINVAL;
> +    }
> +
> +    switch (op) {
> +    case BLK_ZO_OPEN:
> +        ioctl_name = "BLKOPENZONE";
> +        ioctl_op = BLKOPENZONE;
> +        break;
> +    case BLK_ZO_CLOSE:
> +        ioctl_name = "BLKCLOSEZONE";
> +        ioctl_op = BLKCLOSEZONE;
> +        break;
> +    case BLK_ZO_FINISH:
> +        ioctl_name = "BLKFINISHZONE";
> +        ioctl_op = BLKFINISHZONE;
> +        break;
> +    case BLK_ZO_RESET:
> +        ioctl_name = "BLKRESETZONE";
> +        ioctl_op = BLKRESETZONE;
> +        break;
> +    default:
> +        error_report("Invalid zone operation 0x%x", op);
> +        return -EINVAL;
> +    }
> +
> +    acb = (RawPosixAIOData) {
> +        .bs             = bs,
> +        .aio_fildes     = s->fd,
> +        .aio_type       = QEMU_AIO_ZONE_MGMT,
> +        .aio_offset     = offset,
> +        .aio_nbytes     = len,
> +        .zone_mgmt  = {
> +                .ioctl_op = ioctl_op,
> +        },
> +    };
> +
> +    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> +    if (ret != 0) {
> +        error_report("ioctl %s failed %d", ioctl_name, errno);
> +        return -errno;
> +    }
> +
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static coroutine_fn int
>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                  bool blkdev)
> @@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
>      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
> +                                Error **errp)
> +{
> +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
> +}
> +#endif

Sorry, I asked you to add this function but I've changed my mind and I
think it should not be present. .bdrv_parse_filename() helps legacy
drivers convert arguments into QDict *options. But this is a new driver
that no one expects to work with string filenames. Therefore
.bdrv_parse_filename can be dropped.

> +
>  static bool hdev_is_sg(BlockDriverState *bs)
>  {
>  
> @@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
>  #endif
>  };
>  
> +#if defined(CONFIG_BLKZONED)
> +static BlockDriver bdrv_zoned_host_device = {
> +        .format_name = "zoned_host_device",
> +        .protocol_name = "zoned_host_device",
> +        .instance_size = sizeof(BDRVRawState),
> +        .bdrv_needs_filename = true,
> +        .bdrv_probe_device  = hdev_probe_device,
> +        .bdrv_parse_filename = zoned_host_device_parse_filename,
> +        .bdrv_file_open     = hdev_open,
> +        .bdrv_close         = raw_close,
> +        .bdrv_reopen_prepare = raw_reopen_prepare,
> +        .bdrv_reopen_commit  = raw_reopen_commit,
> +        .bdrv_reopen_abort   = raw_reopen_abort,
> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> +        .create_opts         = &bdrv_create_opts_simple,
> +        .mutable_opts        = mutable_opts,
> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> +
> +        .bdrv_co_preadv         = raw_co_preadv,
> +        .bdrv_co_pwritev        = raw_co_pwritev,
> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> +        .bdrv_refresh_limits = raw_refresh_limits,
> +        .bdrv_io_plug = raw_aio_plug,
> +        .bdrv_io_unplug = raw_aio_unplug,
> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> +
> +        .bdrv_co_truncate       = raw_co_truncate,
> +        .bdrv_getlength = raw_getlength,
> +        .bdrv_get_info = raw_get_info,
> +        .bdrv_get_allocated_file_size
> +                            = raw_get_allocated_file_size,
> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> +        .bdrv_check_perm = raw_check_perm,
> +        .bdrv_set_perm   = raw_set_perm,
> +        .bdrv_abort_perm_update = raw_abort_perm_update,
> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +        .bdrv_probe_geometry = hdev_probe_geometry,
> +        .bdrv_co_ioctl = hdev_co_ioctl,
> +
> +        /* zone management operations */
> +        .bdrv_co_zone_report = raw_co_zone_report,
> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +};
> +#endif
> +
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static void cdrom_parse_filename(const char *filename, QDict *options,
>                                   Error **errp)
> @@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
>      bdrv_register(&bdrv_file);
>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>      bdrv_register(&bdrv_host_device);
> +#if defined(CONFIG_BLKZONED)
> +    bdrv_register(&bdrv_zoned_host_device);
> +#endif
>  #ifdef __linux__
>      bdrv_register(&bdrv_host_cdrom);
>  #endif
> diff --git a/block/io.c b/block/io.c
> index 0a8cbefe86..de9ec1d740 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3198,6 +3198,47 @@ out:
>      return co.ret;
>  }
>  
> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                        unsigned int *nr_zones,
> +                        BlockZoneDescriptor *zones)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || !drv->bdrv_co_zone_report) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
> +int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +        int64_t offset, int64_t len)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || !drv->bdrv_co_zone_mgmt) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
>      IO_CODE();
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 36bd0e480e..5102fa6858 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -23,7 +23,6 @@
>   */
>  #ifndef BLOCK_COMMON_H
>  #define BLOCK_COMMON_H
> -
>  #include "block/aio.h"
>  #include "block/aio-wait.h"
>  #include "qemu/iov.h"

Unrelated whitespace change. Please drop this.

> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index fd25ffa9be..55ad261e16 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>  /* Ensure contents are flushed to disk.  */
>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>  
> +/* Report zone information of zone block device. */
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                     unsigned int *nr_zones,
> +                                     BlockZoneDescriptor *zones);
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> +                                   int64_t offset, int64_t len);
> +
>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> @@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  int generated_co_wrapper
>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  
> +int generated_co_wrapper
> +blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
> +                BlockZoneDescriptor *zones);
> +int generated_co_wrapper
> +blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
> +
>  /**
>   * bdrv_parent_drained_begin_single:
>   *
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 7f7863cc9e..de44c7b6f4 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
>      struct BdrvTrackedRequest *waiting_for;
>  } BdrvTrackedRequest;
>  
> -
>  struct BlockDriver {
>      /*
>       * These fields are initialized when this object is created,

Unrelated whitespace change. Please drop this.

> @@ -691,6 +690,12 @@ struct BlockDriver {
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
>  
> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> +            int64_t offset, unsigned int *nr_zones,
> +            BlockZoneDescriptor *zones);
> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
> +            int64_t offset, int64_t len);
> +
>      /* removable device specific */
>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> @@ -828,6 +833,21 @@ typedef struct BlockLimits {
>  
>      /* device zone model */
>      BlockZoneModel zoned;
> +
> +    /* zone size expressed in 512-byte sectors */
> +    uint32_t zone_sectors;
> +
> +    /* total number of zones */
> +    unsigned int nr_zones;
> +
> +    /* maximum size in bytes of a zone append write operation */
> +    int64_t zone_append_max_bytes;
> +
> +    /* maximum number of open zones */
> +    int64_t max_open_zones;
> +
> +    /* maximum number of active zones */
> +    int64_t max_active_zones;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index 21fc10c4c9..3d26929cdd 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -29,6 +29,8 @@
>  #define QEMU_AIO_WRITE_ZEROES 0x0020
>  #define QEMU_AIO_COPY_RANGE   0x0040
>  #define QEMU_AIO_TRUNCATE     0x0080
> +#define QEMU_AIO_ZONE_REPORT  0x0100
> +#define QEMU_AIO_ZONE_MGMT    0x0200
>  #define QEMU_AIO_TYPE_MASK \
>          (QEMU_AIO_READ | \
>           QEMU_AIO_WRITE | \
> @@ -37,7 +39,9 @@
>           QEMU_AIO_DISCARD | \
>           QEMU_AIO_WRITE_ZEROES | \
>           QEMU_AIO_COPY_RANGE | \
> -         QEMU_AIO_TRUNCATE)
> +         QEMU_AIO_TRUNCATE  | \
> +         QEMU_AIO_ZONE_REPORT | \
> +         QEMU_AIO_ZONE_MGMT)
>  
>  /* AIO flags */
>  #define QEMU_AIO_MISALIGNED   0x1000
> diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
> index 50f5aa2e07..6e7df1d93b 100644
> --- a/include/sysemu/block-backend-io.h
> +++ b/include/sysemu/block-backend-io.h
> @@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>  int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>                                        int64_t bytes, BdrvRequestFlags flags);
>  
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    unsigned int *nr_zones,
> +                                    BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +                                  int64_t offset, int64_t len);
> +
>  int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
>                                        int64_t bytes);
>  int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
> diff --git a/meson.build b/meson.build
> index 294e9a8f32..c3219b0e87 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
>  # has_header
>  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
>  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
> +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
>  config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
>  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
>  config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2173e7734a..c6bbb7a037 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2942,6 +2942,7 @@
>  # @compress: Since 5.0
>  # @copy-before-write: Since 6.2
>  # @snapshot-access: Since 7.0
> +# @zoned_host_device: Since 7.2
>  #
>  # Since: 2.9
>  ##
> @@ -2955,7 +2956,8 @@
>              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> +            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -4329,7 +4331,9 @@
>        'vhdx':       'BlockdevOptionsGenericFormat',
>        'vmdk':       'BlockdevOptionsGenericCOWFormat',
>        'vpc':        'BlockdevOptionsGenericFormat',
> -      'vvfat':      'BlockdevOptionsVVFAT'
> +      'vvfat':      'BlockdevOptionsVVFAT',
> +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
> +                             'if': 'CONFIG_BLKZONED' }
>    } }
>  
>  ##
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 952dc940f1..687c3a624c 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
>      .oneline    = "flush all in-core file state to disk",
>  };
>  
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset;
> +    unsigned int nr_zones;
> +
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    nr_zones = cvtnum(argv[optind]);
> +
> +    g_autofree BlockZoneDescriptor *zones = NULL;
> +    zones = g_new(BlockZoneDescriptor, nr_zones);
> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> +    if (ret < 0) {
> +        printf("zone report failed: %s\n", strerror(-ret));
> +    } else {
> +        for (int i = 0; i < nr_zones; ++i) {
> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> +                   "zcond:%u, [type: %u]\n",
> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> +                   zones[i].cond, zones[i].type);
> +        }
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> +        .name = "zone_report",
> +        .altname = "zrp",
> +        .cfunc = zone_report_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset number",
> +        .oneline = "report zone information",
> +};
> +
> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
> +    if (ret < 0) {
> +        printf("zone open failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_open_cmd = {
> +        .name = "zone_open",
> +        .altname = "zo",
> +        .cfunc = zone_open_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "explicit open a range of zones in zone block device",
> +};
> +
> +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
> +    if (ret < 0) {
> +        printf("zone close failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_close_cmd = {
> +        .name = "zone_close",
> +        .altname = "zc",
> +        .cfunc = zone_close_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "close a range of zones in zone block device",
> +};
> +
> +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
> +    if (ret < 0) {
> +        printf("zone finish failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_finish_cmd = {
> +        .name = "zone_finish",
> +        .altname = "zf",
> +        .cfunc = zone_finish_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "finish a range of zones in zone block device",
> +};
> +
> +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
> +    if (ret < 0) {
> +        printf("zone reset failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_reset_cmd = {
> +        .name = "zone_reset",
> +        .altname = "zrs",
> +        .cfunc = zone_reset_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "reset a zone write pointer in zone block device",
> +};
> +
>  static int truncate_f(BlockBackend *blk, int argc, char **argv);
>  static const cmdinfo_t truncate_cmd = {
>      .name       = "truncate",
> @@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
>      qemuio_add_command(&aio_write_cmd);
>      qemuio_add_command(&aio_flush_cmd);
>      qemuio_add_command(&flush_cmd);
> +    qemuio_add_command(&zone_report_cmd);
> +    qemuio_add_command(&zone_open_cmd);
> +    qemuio_add_command(&zone_close_cmd);
> +    qemuio_add_command(&zone_finish_cmd);
> +    qemuio_add_command(&zone_reset_cmd);
>      qemuio_add_command(&truncate_cmd);
>      qemuio_add_command(&length_cmd);
>      qemuio_add_command(&info_cmd);
> -- 
> 2.37.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 6/8] config: add check to block layer
  2022-08-16  6:25 ` [PATCH v7 6/8] config: add check to block layer Sam Li
@ 2022-08-23  0:54   ` Stefan Hajnoczi
  2022-08-23  4:25     ` Sam Li
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-08-23  0:54 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, hare, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, dmitry.fomichev, qemu-block,
	damien.lemoal

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

On Tue, Aug 16, 2022 at 02:25:20PM +0800, Sam Li wrote:
> Putting zoned/non-zoned BlockDrivers on top of each other is not
> allowed.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c                          | 14 ++++++++++++++
>  block/raw-format.c               |  1 +
>  include/block/block_int-common.h |  5 +++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/block.c b/block.c
> index bc85f46eed..affe6c597e 100644
> --- a/block.c
> +++ b/block.c
> @@ -7947,6 +7947,20 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>          return;
>      }
>  
> +    /*
> +     * Non-zoned block drivers do not follow zoned storage constraints
> +     * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
> +     * drivers in a graph.
> +     */
> +    if (!parent_bs->drv->supports_zoned_children &&
> +        child_bs->bl.zoned != BLK_Z_HM) {

Is this logical expression correct:

  If the parent does not support zoned children and the child is not
  zoned, fail with an error.

?

> +        error_setg(errp, "Cannot add a %s child to a %s parent",
> +                   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
> +                   parent_bs->drv->supports_zoned_children ?
> +                   "support zoned children" : "not support zoned children");
> +        return;
> +    }
> +
>      if (!QLIST_EMPTY(&child_bs->parents)) {
>          error_setg(errp, "The node %s already has a parent",
>                     child_bs->node_name);
> diff --git a/block/raw-format.c b/block/raw-format.c
> index 6b20bd22ef..9441536819 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
>  BlockDriver bdrv_raw = {
>      .format_name          = "raw",
>      .instance_size        = sizeof(BDRVRawState),
> +    .supports_zoned_children = true,
>      .bdrv_probe           = &raw_probe,
>      .bdrv_reopen_prepare  = &raw_reopen_prepare,
>      .bdrv_reopen_commit   = &raw_reopen_commit,
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index de44c7b6f4..4c44592b59 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -126,6 +126,11 @@ struct BlockDriver {
>       */
>      bool is_format;
>  
> +    /*
> +     * Set to true if the BlockDriver supports zoned children.
> +     */
> +    bool supports_zoned_children;
> +
>      /*
>       * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
>       * this field set to true, except ones that are defined only by their
> -- 
> 2.37.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-23  0:49   ` Stefan Hajnoczi
@ 2022-08-23  4:12     ` Sam Li
  2022-08-23 12:40       ` Stefan Hajnoczi
  2022-08-24 23:46       ` Damien Le Moal
  0 siblings, 2 replies; 28+ messages in thread
From: Sam Li @ 2022-08-23  4:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block,
	Damien Le Moal

Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:49写道:
>
> On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage controller
> > emulation can use the new block layer APIs including Report Zone and
> > four zone management operations (open, close, finish, reset).
> >
> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  block/block-backend.c             |  50 +++++
> >  block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
> >  block/io.c                        |  41 ++++
> >  include/block/block-common.h      |   1 -
> >  include/block/block-io.h          |  13 ++
> >  include/block/block_int-common.h  |  22 +-
> >  include/block/raw-aio.h           |   6 +-
> >  include/sysemu/block-backend-io.h |   6 +
> >  meson.build                       |   1 +
> >  qapi/block-core.json              |   8 +-
> >  qemu-io-cmds.c                    | 143 +++++++++++++
> >  11 files changed, 625 insertions(+), 7 deletions(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index d4a5df2ac2..fc639b0cd7 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> >      return ret;
> >  }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset is a byte offset from the start of the device. No alignment
> > + * required for offset.
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    unsigned int *nr_zones,
> > +                                    BlockZoneDescriptor *zones)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk); /* increase before waiting */
> > +    blk_wait_while_drained(blk);
> > +    if (!blk_is_available(blk)) {
> > +        blk_dec_in_flight(blk);
> > +        return -ENOMEDIUM;
> > +    }
> > +    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * offset is the starting zone specified as a sector offset.
> > + * len is the maximum number of sectors the command should operate on.
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    ret = blk_check_byte_request(blk, offset, len);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
>
> blk_check_byte_request() calls blk_is_available() and returns -ENOMEDIUM
> when it fails. You can therefore move this down and replace "if
> (!blk_is_available(blk)) {".
>
> > +    blk_inc_in_flight(blk);
> > +    blk_wait_while_drained(blk);
> > +    if (!blk_is_available(blk)) {
> > +        blk_dec_in_flight(blk);
> > +        return -ENOMEDIUM;
> > +    }
> > +    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> >  void blk_drain(BlockBackend *blk)
> >  {
> >      BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 727389488c..29f67082d9 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,9 @@
> >  #include <sys/param.h>
> >  #include <sys/syscall.h>
> >  #include <sys/vfs.h>
> > +#if defined(CONFIG_BLKZONED)
> > +#include <linux/blkzoned.h>
> > +#endif
> >  #include <linux/cdrom.h>
> >  #include <linux/fd.h>
> >  #include <linux/fs.h>
> > @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
> >              PreallocMode prealloc;
> >              Error **errp;
> >          } truncate;
> > +        struct {
> > +            unsigned int *nr_zones;
> > +            BlockZoneDescriptor *zones;
> > +        } zone_report;
> > +        struct {
> > +            unsigned long ioctl_op;
> > +        } zone_mgmt;
> >      };
> >  } RawPosixAIOData;
> >
> > @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >  #endif
> >
> >      if (bs->sg || S_ISBLK(st.st_mode)) {
> > -        int ret = hdev_get_max_hw_transfer(s->fd, &st);
> > +        ret = hdev_get_max_hw_transfer(s->fd, &st);
> >
> >          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> >              bs->bl.max_hw_transfer = ret;
> > @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >          }
> >      }
> >
> > -    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> > +    ret = get_sysfs_zoned_model(&st, &zoned);
> >      if (ret < 0) {
> >          zoned = BLK_Z_NONE;
> >      }
> >      bs->bl.zoned = zoned;
> > +    if (zoned != BLK_Z_NONE) {
> > +        ret = get_sysfs_long_val(&st, "chunk_sectors");
> > +        if (ret > 0) {
> > +            bs->bl.zone_sectors = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
> > +        if (ret > 0) {
> > +            bs->bl.zone_append_max_bytes = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "max_open_zones");
> > +        if (ret > 0) {
> > +            bs->bl.max_open_zones = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "max_active_zones");
> > +        if (ret > 0) {
> > +            bs->bl.max_active_zones = ret;
> > +        }
> > +    }
> >  }
> >
> >  static int check_for_dasd(int fd)
> > @@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> >  }
> >  #endif
> >
> > +/*
> > + * parse_zone - Fill a zone descriptor
> > + */
> > +#if defined(CONFIG_BLKZONED)
> > +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> > +                              struct blk_zone *blkz) {
>
> Declaring the second argument "const struct blk_zone *blkz" would make
> it clear that this function converts from blk_zone to
> BlockZoneDescriptor.
>
> > +    zone->start = blkz->start;
> > +    zone->length = blkz->len;
> > +    zone->cap = blkz->capacity;
> > +    zone->wp = blkz->wp;
> > +
> > +    switch (blkz->type) {
> > +    case BLK_ZONE_TYPE_SEQWRITE_REQ:
> > +        zone->type = BLK_ZT_SWR;
> > +        break;
> > +    case BLK_ZONE_TYPE_SEQWRITE_PREF:
> > +        zone->type = BLK_ZT_SWP;
> > +        break;
> > +    case BLK_ZONE_TYPE_CONVENTIONAL:
> > +        zone->type = BLK_ZT_CONV;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone type: 0x%x", blkz->type);
>
> Or g_assert_not_reached() to indicate that this should never happen. If
> it does happen the process will call abort(3) and it will terminate with
> a coredump file for debugging.
>
> > +    }
> > +
> > +    switch (blkz->cond) {
> > +    case BLK_ZONE_COND_NOT_WP:
> > +        zone->cond = BLK_ZS_NOT_WP;
> > +        break;
> > +    case BLK_ZONE_COND_EMPTY:
> > +        zone->cond = BLK_ZS_EMPTY;
> > +        break;
> > +    case BLK_ZONE_COND_IMP_OPEN:
> > +        zone->cond =BLK_ZS_IOPEN;
> > +        break;
> > +    case BLK_ZONE_COND_EXP_OPEN:
> > +        zone->cond = BLK_ZS_EOPEN;
> > +        break;
> > +    case BLK_ZONE_COND_CLOSED:
> > +        zone->cond = BLK_ZS_CLOSED;
> > +        break;
> > +    case BLK_ZONE_COND_READONLY:
> > +        zone->cond = BLK_ZS_RDONLY;
> > +        break;
> > +    case BLK_ZONE_COND_FULL:
> > +        zone->cond = BLK_ZS_FULL;
> > +        break;
> > +    case BLK_ZONE_COND_OFFLINE:
> > +        zone->cond = BLK_ZS_OFFLINE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone condition 0x%x", blkz->cond);
>
> Same here.
>
> > +    }
> > +}
> > +#endif
> > +
> > +static int handle_aiocb_zone_report(void *opaque) {
> > +#if defined(CONFIG_BLKZONED)
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> > +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > +    int64_t sector = aiocb->aio_offset;
> > +
> > +    struct blk_zone *blkz;
> > +    int64_t rep_size;
> > +    unsigned int nrz;
> > +    int ret, n = 0, i = 0;
> > +
> > +    nrz = *nr_zones;
> > +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> > +    g_autofree struct blk_zone_report *rep = NULL;
> > +    rep = g_malloc(rep_size);
> > +
> > +    blkz = (struct blk_zone *)(rep + 1);
> > +    while (n < nrz) {
> > +        memset(rep, 0, rep_size);
> > +        rep->sector = sector;
> > +        rep->nr_zones = nrz - n;
> > +
> > +        ret = ioctl(fd, BLKREPORTZONE, rep);
>
> Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?

No? We discussed this before. I guess even EINTR should be propagated
back to the guest. Maybe Damien can talk more about why.

>
> > +        if (ret != 0) {
> > +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> > +                         fd, sector, errno);
> > +            return -errno;
> > +        }
> > +
> > +        if (!rep->nr_zones) {
> > +            break;
> > +        }
> > +
> > +        for (i = 0; i < rep->nr_zones; i++, n++) {
> > +            parse_zone(&zones[n], &blkz[i]);
> > +            /* The next report should start after the last zone reported */
> > +            sector = blkz[i].start + blkz[i].len;
> > +        }
> > +    }
> > +
> > +    *nr_zones = n;
> > +    return 0;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> > +static int handle_aiocb_zone_mgmt(void *opaque) {
> > +#if defined(CONFIG_BLKZONED)
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t sector = aiocb->aio_offset;
> > +    int64_t nr_sectors = aiocb->aio_nbytes;
> > +    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;
> > +    struct blk_zone_range range;
> > +    int ret;
> > +
> > +    /* Execute the operation */
> > +    range.sector = sector;
> > +    range.nr_sectors = nr_sectors;
> > +    do {
> > +        ret = ioctl(fd, ioctl_op, &range);
> > +    } while (ret != 0 && errno == EINTR);
> > +
> > +    return ret;
>
>   if (ret < 0) {
>       return -errno;
>   }
>   return 0;
>
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static int handle_aiocb_copy_range(void *opaque)
> >  {
> >      RawPosixAIOData *aiocb = opaque;
> > @@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
> >      }
> >  }
> >
> > +/*
> > + * zone report - Get a zone block device's information in the form
> > + * of an array of zone descriptors.
> > + *
> > + * @param bs: passing zone block device file descriptor
> > + * @param zones: an array of zone descriptors to hold zone
> > + * information on reply
> > + * @param offset: offset can be any byte within the zone size.
> > + * @param len: (not sure yet.
> > + * @return 0 on success, -1 on failure
> > + */
> > +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                           unsigned int *nr_zones,
> > +                                           BlockZoneDescriptor *zones) {
> > +#if defined(CONFIG_BLKZONED)
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs         = bs,
> > +        .aio_fildes = s->fd,
> > +        .aio_type   = QEMU_AIO_ZONE_REPORT,
> > +        /* zoned block devices use 512-byte sectors */
> > +        .aio_offset = offset / 512,
> > +        .zone_report    = {
> > +                .nr_zones       = nr_zones,
> > +                .zones          = zones,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> > +/*
> > + * zone management operations - Execute an operation on a zone
> > + */
> > +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +        int64_t offset, int64_t len) {
> > +#if defined(CONFIG_BLKZONED)
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +    int64_t zone_sector, zone_sector_mask;
> > +    const char *ioctl_name;
> > +    unsigned long ioctl_op;
> > +    int ret;
> > +
> > +    struct stat st;
> > +    if (fstat(s->fd, &st) < 0) {
> > +        ret = -errno;
> > +        return ret;
> > +    }
> > +    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
> > +    if (zone_sector < 0) {
> > +        error_report("invalid zone sector size %" PRId64 "", zone_sector);
> > +        return -EINVAL;
> > +    }
> > +
> > +    zone_sector_mask = zone_sector - 1;
> > +    if (offset & zone_sector_mask) {
> > +        error_report("sector offset %" PRId64 " is not aligned to zone size "
> > +                     "%" PRId64 "", offset, zone_sector);
> > +        return -EINVAL;
> > +    }
> > +
> > +    if (len & zone_sector_mask) {
> > +        error_report("number of sectors %" PRId64 " is not aligned to zone size"
> > +                      " %" PRId64 "", len, zone_sector);
> > +        return -EINVAL;
> > +    }
> > +
> > +    switch (op) {
> > +    case BLK_ZO_OPEN:
> > +        ioctl_name = "BLKOPENZONE";
> > +        ioctl_op = BLKOPENZONE;
> > +        break;
> > +    case BLK_ZO_CLOSE:
> > +        ioctl_name = "BLKCLOSEZONE";
> > +        ioctl_op = BLKCLOSEZONE;
> > +        break;
> > +    case BLK_ZO_FINISH:
> > +        ioctl_name = "BLKFINISHZONE";
> > +        ioctl_op = BLKFINISHZONE;
> > +        break;
> > +    case BLK_ZO_RESET:
> > +        ioctl_name = "BLKRESETZONE";
> > +        ioctl_op = BLKRESETZONE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone operation 0x%x", op);
> > +        return -EINVAL;
> > +    }
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs             = bs,
> > +        .aio_fildes     = s->fd,
> > +        .aio_type       = QEMU_AIO_ZONE_MGMT,
> > +        .aio_offset     = offset,
> > +        .aio_nbytes     = len,
> > +        .zone_mgmt  = {
> > +                .ioctl_op = ioctl_op,
> > +        },
> > +    };
> > +
> > +    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> > +    if (ret != 0) {
> > +        error_report("ioctl %s failed %d", ioctl_name, errno);
> > +        return -errno;
> > +    }
> > +
> > +    return ret;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static coroutine_fn int
> >  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >                  bool blkdev)
> > @@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
> >      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
> >  }
> >
> > +#if defined(CONFIG_BLKZONED)
> > +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
> > +                                Error **errp)
> > +{
> > +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
> > +}
> > +#endif
>
> Sorry, I asked you to add this function but I've changed my mind and I
> think it should not be present. .bdrv_parse_filename() helps legacy
> drivers convert arguments into QDict *options. But this is a new driver
> that no one expects to work with string filenames. Therefore
> .bdrv_parse_filename can be dropped.
>
> > +
> >  static bool hdev_is_sg(BlockDriverState *bs)
> >  {
> >
> > @@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
> >  #endif
> >  };
> >
> > +#if defined(CONFIG_BLKZONED)
> > +static BlockDriver bdrv_zoned_host_device = {
> > +        .format_name = "zoned_host_device",
> > +        .protocol_name = "zoned_host_device",
> > +        .instance_size = sizeof(BDRVRawState),
> > +        .bdrv_needs_filename = true,
> > +        .bdrv_probe_device  = hdev_probe_device,
> > +        .bdrv_parse_filename = zoned_host_device_parse_filename,
> > +        .bdrv_file_open     = hdev_open,
> > +        .bdrv_close         = raw_close,
> > +        .bdrv_reopen_prepare = raw_reopen_prepare,
> > +        .bdrv_reopen_commit  = raw_reopen_commit,
> > +        .bdrv_reopen_abort   = raw_reopen_abort,
> > +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> > +        .create_opts         = &bdrv_create_opts_simple,
> > +        .mutable_opts        = mutable_opts,
> > +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> > +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> > +
> > +        .bdrv_co_preadv         = raw_co_preadv,
> > +        .bdrv_co_pwritev        = raw_co_pwritev,
> > +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> > +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > +        .bdrv_refresh_limits = raw_refresh_limits,
> > +        .bdrv_io_plug = raw_aio_plug,
> > +        .bdrv_io_unplug = raw_aio_unplug,
> > +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > +
> > +        .bdrv_co_truncate       = raw_co_truncate,
> > +        .bdrv_getlength = raw_getlength,
> > +        .bdrv_get_info = raw_get_info,
> > +        .bdrv_get_allocated_file_size
> > +                            = raw_get_allocated_file_size,
> > +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> > +        .bdrv_check_perm = raw_check_perm,
> > +        .bdrv_set_perm   = raw_set_perm,
> > +        .bdrv_abort_perm_update = raw_abort_perm_update,
> > +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> > +        .bdrv_probe_geometry = hdev_probe_geometry,
> > +        .bdrv_co_ioctl = hdev_co_ioctl,
> > +
> > +        /* zone management operations */
> > +        .bdrv_co_zone_report = raw_co_zone_report,
> > +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> > +};
> > +#endif
> > +
> >  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> >  static void cdrom_parse_filename(const char *filename, QDict *options,
> >                                   Error **errp)
> > @@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
> >      bdrv_register(&bdrv_file);
> >  #if defined(HAVE_HOST_BLOCK_DEVICE)
> >      bdrv_register(&bdrv_host_device);
> > +#if defined(CONFIG_BLKZONED)
> > +    bdrv_register(&bdrv_zoned_host_device);
> > +#endif
> >  #ifdef __linux__
> >      bdrv_register(&bdrv_host_cdrom);
> >  #endif
> > diff --git a/block/io.c b/block/io.c
> > index 0a8cbefe86..de9ec1d740 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -3198,6 +3198,47 @@ out:
> >      return co.ret;
> >  }
> >
> > +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                        unsigned int *nr_zones,
> > +                        BlockZoneDescriptor *zones)
> > +{
> > +    BlockDriver *drv = bs->drv;
> > +    CoroutineIOCompletion co = {
> > +            .coroutine = qemu_coroutine_self(),
> > +    };
> > +    IO_CODE();
> > +
> > +    bdrv_inc_in_flight(bs);
> > +    if (!drv || !drv->bdrv_co_zone_report) {
> > +        co.ret = -ENOTSUP;
> > +        goto out;
> > +    }
> > +    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> > +out:
> > +    bdrv_dec_in_flight(bs);
> > +    return co.ret;
> > +}
> > +
> > +int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    BlockDriver *drv = bs->drv;
> > +    CoroutineIOCompletion co = {
> > +            .coroutine = qemu_coroutine_self(),
> > +    };
> > +    IO_CODE();
> > +
> > +    bdrv_inc_in_flight(bs);
> > +    if (!drv || !drv->bdrv_co_zone_mgmt) {
> > +        co.ret = -ENOTSUP;
> > +        goto out;
> > +    }
> > +    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> > +out:
> > +    bdrv_dec_in_flight(bs);
> > +    return co.ret;
> > +}
> > +
> >  void *qemu_blockalign(BlockDriverState *bs, size_t size)
> >  {
> >      IO_CODE();
> > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > index 36bd0e480e..5102fa6858 100644
> > --- a/include/block/block-common.h
> > +++ b/include/block/block-common.h
> > @@ -23,7 +23,6 @@
> >   */
> >  #ifndef BLOCK_COMMON_H
> >  #define BLOCK_COMMON_H
> > -
> >  #include "block/aio.h"
> >  #include "block/aio-wait.h"
> >  #include "qemu/iov.h"
>
> Unrelated whitespace change. Please drop this.
>
> > diff --git a/include/block/block-io.h b/include/block/block-io.h
> > index fd25ffa9be..55ad261e16 100644
> > --- a/include/block/block-io.h
> > +++ b/include/block/block-io.h
> > @@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
> >  /* Ensure contents are flushed to disk.  */
> >  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
> >
> > +/* Report zone information of zone block device. */
> > +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                     unsigned int *nr_zones,
> > +                                     BlockZoneDescriptor *zones);
> > +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +                                   int64_t offset, int64_t len);
> > +
> >  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
> >  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> >  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> > @@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
> >  int generated_co_wrapper
> >  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
> >
> > +int generated_co_wrapper
> > +blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
> > +                BlockZoneDescriptor *zones);
> > +int generated_co_wrapper
> > +blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
> > +
> >  /**
> >   * bdrv_parent_drained_begin_single:
> >   *
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index 7f7863cc9e..de44c7b6f4 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
> >      struct BdrvTrackedRequest *waiting_for;
> >  } BdrvTrackedRequest;
> >
> > -
> >  struct BlockDriver {
> >      /*
> >       * These fields are initialized when this object is created,
>
> Unrelated whitespace change. Please drop this.
>
> > @@ -691,6 +690,12 @@ struct BlockDriver {
> >                                            QEMUIOVector *qiov,
> >                                            int64_t pos);
> >
> > +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> > +            int64_t offset, unsigned int *nr_zones,
> > +            BlockZoneDescriptor *zones);
> > +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
> > +            int64_t offset, int64_t len);
> > +
> >      /* removable device specific */
> >      bool (*bdrv_is_inserted)(BlockDriverState *bs);
> >      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> > @@ -828,6 +833,21 @@ typedef struct BlockLimits {
> >
> >      /* device zone model */
> >      BlockZoneModel zoned;
> > +
> > +    /* zone size expressed in 512-byte sectors */
> > +    uint32_t zone_sectors;
> > +
> > +    /* total number of zones */
> > +    unsigned int nr_zones;
> > +
> > +    /* maximum size in bytes of a zone append write operation */
> > +    int64_t zone_append_max_bytes;
> > +
> > +    /* maximum number of open zones */
> > +    int64_t max_open_zones;
> > +
> > +    /* maximum number of active zones */
> > +    int64_t max_active_zones;
> >  } BlockLimits;
> >
> >  typedef struct BdrvOpBlocker BdrvOpBlocker;
> > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> > index 21fc10c4c9..3d26929cdd 100644
> > --- a/include/block/raw-aio.h
> > +++ b/include/block/raw-aio.h
> > @@ -29,6 +29,8 @@
> >  #define QEMU_AIO_WRITE_ZEROES 0x0020
> >  #define QEMU_AIO_COPY_RANGE   0x0040
> >  #define QEMU_AIO_TRUNCATE     0x0080
> > +#define QEMU_AIO_ZONE_REPORT  0x0100
> > +#define QEMU_AIO_ZONE_MGMT    0x0200
> >  #define QEMU_AIO_TYPE_MASK \
> >          (QEMU_AIO_READ | \
> >           QEMU_AIO_WRITE | \
> > @@ -37,7 +39,9 @@
> >           QEMU_AIO_DISCARD | \
> >           QEMU_AIO_WRITE_ZEROES | \
> >           QEMU_AIO_COPY_RANGE | \
> > -         QEMU_AIO_TRUNCATE)
> > +         QEMU_AIO_TRUNCATE  | \
> > +         QEMU_AIO_ZONE_REPORT | \
> > +         QEMU_AIO_ZONE_MGMT)
> >
> >  /* AIO flags */
> >  #define QEMU_AIO_MISALIGNED   0x1000
> > diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
> > index 50f5aa2e07..6e7df1d93b 100644
> > --- a/include/sysemu/block-backend-io.h
> > +++ b/include/sysemu/block-backend-io.h
> > @@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> >  int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> >                                        int64_t bytes, BdrvRequestFlags flags);
> >
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    unsigned int *nr_zones,
> > +                                    BlockZoneDescriptor *zones);
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +                                  int64_t offset, int64_t len);
> > +
> >  int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
> >                                        int64_t bytes);
> >  int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
> > diff --git a/meson.build b/meson.build
> > index 294e9a8f32..c3219b0e87 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
> >  # has_header
> >  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
> >  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
> > +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
> >  config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
> >  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
> >  config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2173e7734a..c6bbb7a037 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2942,6 +2942,7 @@
> >  # @compress: Since 5.0
> >  # @copy-before-write: Since 6.2
> >  # @snapshot-access: Since 7.0
> > +# @zoned_host_device: Since 7.2
> >  #
> >  # Since: 2.9
> >  ##
> > @@ -2955,7 +2956,8 @@
> >              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
> >              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> >              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> > -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> > +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> > +            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
> >
> >  ##
> >  # @BlockdevOptionsFile:
> > @@ -4329,7 +4331,9 @@
> >        'vhdx':       'BlockdevOptionsGenericFormat',
> >        'vmdk':       'BlockdevOptionsGenericCOWFormat',
> >        'vpc':        'BlockdevOptionsGenericFormat',
> > -      'vvfat':      'BlockdevOptionsVVFAT'
> > +      'vvfat':      'BlockdevOptionsVVFAT',
> > +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
> > +                             'if': 'CONFIG_BLKZONED' }
> >    } }
> >
> >  ##
> > diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> > index 952dc940f1..687c3a624c 100644
> > --- a/qemu-io-cmds.c
> > +++ b/qemu-io-cmds.c
> > @@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
> >      .oneline    = "flush all in-core file state to disk",
> >  };
> >
> > +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset;
> > +    unsigned int nr_zones;
> > +
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    nr_zones = cvtnum(argv[optind]);
> > +
> > +    g_autofree BlockZoneDescriptor *zones = NULL;
> > +    zones = g_new(BlockZoneDescriptor, nr_zones);
> > +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> > +    if (ret < 0) {
> > +        printf("zone report failed: %s\n", strerror(-ret));
> > +    } else {
> > +        for (int i = 0; i < nr_zones; ++i) {
> > +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> > +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> > +                   "zcond:%u, [type: %u]\n",
> > +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> > +                   zones[i].cond, zones[i].type);
> > +        }
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_report_cmd = {
> > +        .name = "zone_report",
> > +        .altname = "zrp",
> > +        .cfunc = zone_report_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset number",
> > +        .oneline = "report zone information",
> > +};
> > +
> > +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone open failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_open_cmd = {
> > +        .name = "zone_open",
> > +        .altname = "zo",
> > +        .cfunc = zone_open_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "explicit open a range of zones in zone block device",
> > +};
> > +
> > +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone close failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_close_cmd = {
> > +        .name = "zone_close",
> > +        .altname = "zc",
> > +        .cfunc = zone_close_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "close a range of zones in zone block device",
> > +};
> > +
> > +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone finish failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_finish_cmd = {
> > +        .name = "zone_finish",
> > +        .altname = "zf",
> > +        .cfunc = zone_finish_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "finish a range of zones in zone block device",
> > +};
> > +
> > +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone reset failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_reset_cmd = {
> > +        .name = "zone_reset",
> > +        .altname = "zrs",
> > +        .cfunc = zone_reset_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "reset a zone write pointer in zone block device",
> > +};
> > +
> >  static int truncate_f(BlockBackend *blk, int argc, char **argv);
> >  static const cmdinfo_t truncate_cmd = {
> >      .name       = "truncate",
> > @@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
> >      qemuio_add_command(&aio_write_cmd);
> >      qemuio_add_command(&aio_flush_cmd);
> >      qemuio_add_command(&flush_cmd);
> > +    qemuio_add_command(&zone_report_cmd);
> > +    qemuio_add_command(&zone_open_cmd);
> > +    qemuio_add_command(&zone_close_cmd);
> > +    qemuio_add_command(&zone_finish_cmd);
> > +    qemuio_add_command(&zone_reset_cmd);
> >      qemuio_add_command(&truncate_cmd);
> >      qemuio_add_command(&length_cmd);
> >      qemuio_add_command(&info_cmd);
> > --
> > 2.37.1
> >


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 6/8] config: add check to block layer
  2022-08-23  0:54   ` Stefan Hajnoczi
@ 2022-08-23  4:25     ` Sam Li
  2022-08-23 12:36       ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Sam Li @ 2022-08-23  4:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block,
	Damien Le Moal

Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:54写道:
>
> On Tue, Aug 16, 2022 at 02:25:20PM +0800, Sam Li wrote:
> > Putting zoned/non-zoned BlockDrivers on top of each other is not
> > allowed.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block.c                          | 14 ++++++++++++++
> >  block/raw-format.c               |  1 +
> >  include/block/block_int-common.h |  5 +++++
> >  3 files changed, 20 insertions(+)
> >
> > diff --git a/block.c b/block.c
> > index bc85f46eed..affe6c597e 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -7947,6 +7947,20 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> >          return;
> >      }
> >
> > +    /*
> > +     * Non-zoned block drivers do not follow zoned storage constraints
> > +     * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
> > +     * drivers in a graph.
> > +     */
> > +    if (!parent_bs->drv->supports_zoned_children &&
> > +        child_bs->bl.zoned != BLK_Z_HM) {
>

Should be:
+if (!parent_bs->drv->supports_zoned_children &&
+        child_bs->bl.zoned == BLK_Z_HM)

> Is this logical expression correct:
>
>   If the parent does not support zoned children and the child is not
>   zoned, fail with an error.
>
> ?

No. It should be:

If the parent does not support zoned children and the child is zoned,
fail with an error.  It should handle the case where a filter node is
inserted above a raw block driver with a zoned_host_device child.

There are some QEMU command-line constraints for the zoned devices. I
was wondering where to add such support so that it can print an error
message for users:
1. cache.direct= setting
2. mix zoned/non-zoned drivers

>
> > +        error_setg(errp, "Cannot add a %s child to a %s parent",
> > +                   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
> > +                   parent_bs->drv->supports_zoned_children ?
> > +                   "support zoned children" : "not support zoned children");
> > +        return;
> > +    }
> > +
> >      if (!QLIST_EMPTY(&child_bs->parents)) {
> >          error_setg(errp, "The node %s already has a parent",
> >                     child_bs->node_name);
> > diff --git a/block/raw-format.c b/block/raw-format.c
> > index 6b20bd22ef..9441536819 100644
> > --- a/block/raw-format.c
> > +++ b/block/raw-format.c
> > @@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
> >  BlockDriver bdrv_raw = {
> >      .format_name          = "raw",
> >      .instance_size        = sizeof(BDRVRawState),
> > +    .supports_zoned_children = true,
> >      .bdrv_probe           = &raw_probe,
> >      .bdrv_reopen_prepare  = &raw_reopen_prepare,
> >      .bdrv_reopen_commit   = &raw_reopen_commit,
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index de44c7b6f4..4c44592b59 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -126,6 +126,11 @@ struct BlockDriver {
> >       */
> >      bool is_format;
> >
> > +    /*
> > +     * Set to true if the BlockDriver supports zoned children.
> > +     */
> > +    bool supports_zoned_children;
> > +
> >      /*
> >       * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
> >       * this field set to true, except ones that are defined only by their
> > --
> > 2.37.1
> >


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model
  2022-08-22 23:05   ` Stefan Hajnoczi
@ 2022-08-23  4:31     ` Sam Li
  0 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-23  4:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block,
	Damien Le Moal

Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 07:05写道:
>
> On Tue, Aug 16, 2022 at 02:25:16PM +0800, Sam Li wrote:
> > +static int hdev_get_max_segments(int fd, struct stat *st) {
> > +    int ret;
> > +    if (S_ISCHR(st->st_mode)) {
> > +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
>
> The ioctl must be within #ifdef CONFIG_LINUX since SG_GET_SG_TABLESIZE
> will be undefined on other operating systems and a compiler error will
> be encountered. Maybe keep the #ifdef around the entire body of this
> hdev_get_max_segments().
>
> > +            return ret;
> > +        }
> > +        return -ENOTSUP;
> >      }
> > -    g_free(sysfspath);
> > -    return ret;
> > -#else
> > -    return -ENOTSUP;
> > -#endif
> > +    return get_sysfs_long_val(st, "max_segments");
>
> Where is get_sysfs_long_val() defined? Maybe in a later patch? The code
> must compile after each patch. You can test this with "git rebase -i
> origin/master" and then adding "x make" lines after each commit in the
> interactive rebase file. When rebase runs it will execute make after
> each commit and will stop if make fails.

Explained in the next patch. I will make sure the patches compile in future.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 6/8] config: add check to block layer
  2022-08-23  4:25     ` Sam Li
@ 2022-08-23 12:36       ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-08-23 12:36 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block,
	Damien Le Moal

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

On Tue, Aug 23, 2022 at 12:25:23PM +0800, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:54写道:
> >
> > On Tue, Aug 16, 2022 at 02:25:20PM +0800, Sam Li wrote:
> > > Putting zoned/non-zoned BlockDrivers on top of each other is not
> > > allowed.
> > >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block.c                          | 14 ++++++++++++++
> > >  block/raw-format.c               |  1 +
> > >  include/block/block_int-common.h |  5 +++++
> > >  3 files changed, 20 insertions(+)
> > >
> > > diff --git a/block.c b/block.c
> > > index bc85f46eed..affe6c597e 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -7947,6 +7947,20 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> > >          return;
> > >      }
> > >
> > > +    /*
> > > +     * Non-zoned block drivers do not follow zoned storage constraints
> > > +     * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
> > > +     * drivers in a graph.
> > > +     */
> > > +    if (!parent_bs->drv->supports_zoned_children &&
> > > +        child_bs->bl.zoned != BLK_Z_HM) {
> >
> 
> Should be:
> +if (!parent_bs->drv->supports_zoned_children &&
> +        child_bs->bl.zoned == BLK_Z_HM)
> 
> > Is this logical expression correct:
> >
> >   If the parent does not support zoned children and the child is not
> >   zoned, fail with an error.
> >
> > ?
> 
> No. It should be:
> 
> If the parent does not support zoned children and the child is zoned,
> fail with an error.  It should handle the case where a filter node is
> inserted above a raw block driver with a zoned_host_device child.
> 
> There are some QEMU command-line constraints for the zoned devices. I
> was wondering where to add such support so that it can print an error
> message for users:
> 1. cache.direct= setting

The O_DIRECT requirement is specific to file-posix and Linux's zoned
block device implementation, so it belongs in file-posix.c's
zoned_host_device .bdrv_file_open() function.

> 2. mix zoned/non-zoned drivers

This is generic and I think bdrv_add_child() is the right place for
parent-child compatibility checks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-23  4:12     ` Sam Li
@ 2022-08-23 12:40       ` Stefan Hajnoczi
  2022-08-24 23:46       ` Damien Le Moal
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-08-23 12:40 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block,
	Damien Le Moal

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

On Tue, Aug 23, 2022 at 12:12:44PM +0800, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:49写道:
> > On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
> > > +static int handle_aiocb_zone_report(void *opaque) {
> > > +#if defined(CONFIG_BLKZONED)
> > > +    RawPosixAIOData *aiocb = opaque;
> > > +    int fd = aiocb->aio_fildes;
> > > +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> > > +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > > +    int64_t sector = aiocb->aio_offset;
> > > +
> > > +    struct blk_zone *blkz;
> > > +    int64_t rep_size;
> > > +    unsigned int nrz;
> > > +    int ret, n = 0, i = 0;
> > > +
> > > +    nrz = *nr_zones;
> > > +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> > > +    g_autofree struct blk_zone_report *rep = NULL;
> > > +    rep = g_malloc(rep_size);
> > > +
> > > +    blkz = (struct blk_zone *)(rep + 1);
> > > +    while (n < nrz) {
> > > +        memset(rep, 0, rep_size);
> > > +        rep->sector = sector;
> > > +        rep->nr_zones = nrz - n;
> > > +
> > > +        ret = ioctl(fd, BLKREPORTZONE, rep);
> >
> > Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?
> 
> No? We discussed this before. I guess even EINTR should be propagated
> back to the guest. Maybe Damien can talk more about why.

No, EINTR is an internal error that must be handled by QEMU. It means
the QEMU process' syscall was interrupted by a signal and the syscall
must be retried. The guest shouldn't see EINTR (and there is no
virtio-blk error code defined for it).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-23  4:12     ` Sam Li
  2022-08-23 12:40       ` Stefan Hajnoczi
@ 2022-08-24 23:46       ` Damien Le Moal
  2022-08-24 23:53         ` Damien Le Moal
  1 sibling, 1 reply; 28+ messages in thread
From: Damien Le Moal @ 2022-08-24 23:46 UTC (permalink / raw)
  To: Sam Li, Stefan Hajnoczi
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block

On 2022/08/22 21:12, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:49写道:
>>
>> On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
>>> By adding zone management operations in BlockDriver, storage controller
>>> emulation can use the new block layer APIs including Report Zone and
>>> four zone management operations (open, close, finish, reset).
>>>
>>> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
>>> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>>>
>>> For example, to test zone_report, use following command:
>>> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
>>> -c "zrp offset nr_zones"
>>>
>>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>  block/block-backend.c             |  50 +++++
>>>  block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
>>>  block/io.c                        |  41 ++++
>>>  include/block/block-common.h      |   1 -
>>>  include/block/block-io.h          |  13 ++
>>>  include/block/block_int-common.h  |  22 +-
>>>  include/block/raw-aio.h           |   6 +-
>>>  include/sysemu/block-backend-io.h |   6 +
>>>  meson.build                       |   1 +
>>>  qapi/block-core.json              |   8 +-
>>>  qemu-io-cmds.c                    | 143 +++++++++++++
>>>  11 files changed, 625 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/block-backend.c b/block/block-backend.c
>>> index d4a5df2ac2..fc639b0cd7 100644
>>> --- a/block/block-backend.c
>>> +++ b/block/block-backend.c
>>> @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>>>      return ret;
>>>  }
>>>
>>> +/*
>>> + * Send a zone_report command.
>>> + * offset is a byte offset from the start of the device. No alignment
>>> + * required for offset.
>>> + * nr_zones represents IN maximum and OUT actual.
>>> + */
>>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
>>> +                                    unsigned int *nr_zones,
>>> +                                    BlockZoneDescriptor *zones)
>>> +{
>>> +    int ret;
>>> +    IO_CODE();
>>> +
>>> +    blk_inc_in_flight(blk); /* increase before waiting */
>>> +    blk_wait_while_drained(blk);
>>> +    if (!blk_is_available(blk)) {
>>> +        blk_dec_in_flight(blk);
>>> +        return -ENOMEDIUM;
>>> +    }
>>> +    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
>>> +    blk_dec_in_flight(blk);
>>> +    return ret;
>>> +}
>>> +
>>> +/*
>>> + * Send a zone_management command.
>>> + * offset is the starting zone specified as a sector offset.
>>> + * len is the maximum number of sectors the command should operate on.
>>> + */
>>> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>>> +        int64_t offset, int64_t len)
>>> +{
>>> +    int ret;
>>> +    IO_CODE();
>>> +
>>> +    ret = blk_check_byte_request(blk, offset, len);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>
>> blk_check_byte_request() calls blk_is_available() and returns -ENOMEDIUM
>> when it fails. You can therefore move this down and replace "if
>> (!blk_is_available(blk)) {".
>>
>>> +    blk_inc_in_flight(blk);
>>> +    blk_wait_while_drained(blk);
>>> +    if (!blk_is_available(blk)) {
>>> +        blk_dec_in_flight(blk);
>>> +        return -ENOMEDIUM;
>>> +    }
>>> +    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
>>> +    blk_dec_in_flight(blk);
>>> +    return ret;
>>> +}
>>> +
>>>  void blk_drain(BlockBackend *blk)
>>>  {
>>>      BlockDriverState *bs = blk_bs(blk);
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index 727389488c..29f67082d9 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -67,6 +67,9 @@
>>>  #include <sys/param.h>
>>>  #include <sys/syscall.h>
>>>  #include <sys/vfs.h>
>>> +#if defined(CONFIG_BLKZONED)
>>> +#include <linux/blkzoned.h>
>>> +#endif
>>>  #include <linux/cdrom.h>
>>>  #include <linux/fd.h>
>>>  #include <linux/fs.h>
>>> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>>>              PreallocMode prealloc;
>>>              Error **errp;
>>>          } truncate;
>>> +        struct {
>>> +            unsigned int *nr_zones;
>>> +            BlockZoneDescriptor *zones;
>>> +        } zone_report;
>>> +        struct {
>>> +            unsigned long ioctl_op;
>>> +        } zone_mgmt;
>>>      };
>>>  } RawPosixAIOData;
>>>
>>> @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>>  #endif
>>>
>>>      if (bs->sg || S_ISBLK(st.st_mode)) {
>>> -        int ret = hdev_get_max_hw_transfer(s->fd, &st);
>>> +        ret = hdev_get_max_hw_transfer(s->fd, &st);
>>>
>>>          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>>>              bs->bl.max_hw_transfer = ret;
>>> @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>>          }
>>>      }
>>>
>>> -    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
>>> +    ret = get_sysfs_zoned_model(&st, &zoned);
>>>      if (ret < 0) {
>>>          zoned = BLK_Z_NONE;
>>>      }
>>>      bs->bl.zoned = zoned;
>>> +    if (zoned != BLK_Z_NONE) {
>>> +        ret = get_sysfs_long_val(&st, "chunk_sectors");
>>> +        if (ret > 0) {
>>> +            bs->bl.zone_sectors = ret;
>>> +        }
>>> +
>>> +        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
>>> +        if (ret > 0) {
>>> +            bs->bl.zone_append_max_bytes = ret;
>>> +        }
>>> +
>>> +        ret = get_sysfs_long_val(&st, "max_open_zones");
>>> +        if (ret > 0) {
>>> +            bs->bl.max_open_zones = ret;
>>> +        }
>>> +
>>> +        ret = get_sysfs_long_val(&st, "max_active_zones");
>>> +        if (ret > 0) {
>>> +            bs->bl.max_active_zones = ret;
>>> +        }
>>> +    }
>>>  }
>>>
>>>  static int check_for_dasd(int fd)
>>> @@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>>>  }
>>>  #endif
>>>
>>> +/*
>>> + * parse_zone - Fill a zone descriptor
>>> + */
>>> +#if defined(CONFIG_BLKZONED)
>>> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
>>> +                              struct blk_zone *blkz) {
>>
>> Declaring the second argument "const struct blk_zone *blkz" would make
>> it clear that this function converts from blk_zone to
>> BlockZoneDescriptor.
>>
>>> +    zone->start = blkz->start;
>>> +    zone->length = blkz->len;
>>> +    zone->cap = blkz->capacity;
>>> +    zone->wp = blkz->wp;
>>> +
>>> +    switch (blkz->type) {
>>> +    case BLK_ZONE_TYPE_SEQWRITE_REQ:
>>> +        zone->type = BLK_ZT_SWR;
>>> +        break;
>>> +    case BLK_ZONE_TYPE_SEQWRITE_PREF:
>>> +        zone->type = BLK_ZT_SWP;
>>> +        break;
>>> +    case BLK_ZONE_TYPE_CONVENTIONAL:
>>> +        zone->type = BLK_ZT_CONV;
>>> +        break;
>>> +    default:
>>> +        error_report("Invalid zone type: 0x%x", blkz->type);
>>
>> Or g_assert_not_reached() to indicate that this should never happen. If
>> it does happen the process will call abort(3) and it will terminate with
>> a coredump file for debugging.
>>
>>> +    }
>>> +
>>> +    switch (blkz->cond) {
>>> +    case BLK_ZONE_COND_NOT_WP:
>>> +        zone->cond = BLK_ZS_NOT_WP;
>>> +        break;
>>> +    case BLK_ZONE_COND_EMPTY:
>>> +        zone->cond = BLK_ZS_EMPTY;
>>> +        break;
>>> +    case BLK_ZONE_COND_IMP_OPEN:
>>> +        zone->cond =BLK_ZS_IOPEN;
>>> +        break;
>>> +    case BLK_ZONE_COND_EXP_OPEN:
>>> +        zone->cond = BLK_ZS_EOPEN;
>>> +        break;
>>> +    case BLK_ZONE_COND_CLOSED:
>>> +        zone->cond = BLK_ZS_CLOSED;
>>> +        break;
>>> +    case BLK_ZONE_COND_READONLY:
>>> +        zone->cond = BLK_ZS_RDONLY;
>>> +        break;
>>> +    case BLK_ZONE_COND_FULL:
>>> +        zone->cond = BLK_ZS_FULL;
>>> +        break;
>>> +    case BLK_ZONE_COND_OFFLINE:
>>> +        zone->cond = BLK_ZS_OFFLINE;
>>> +        break;
>>> +    default:
>>> +        error_report("Invalid zone condition 0x%x", blkz->cond);
>>
>> Same here.
>>
>>> +    }
>>> +}
>>> +#endif
>>> +
>>> +static int handle_aiocb_zone_report(void *opaque) {
>>> +#if defined(CONFIG_BLKZONED)
>>> +    RawPosixAIOData *aiocb = opaque;
>>> +    int fd = aiocb->aio_fildes;
>>> +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
>>> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
>>> +    int64_t sector = aiocb->aio_offset;
>>> +
>>> +    struct blk_zone *blkz;
>>> +    int64_t rep_size;
>>> +    unsigned int nrz;
>>> +    int ret, n = 0, i = 0;
>>> +
>>> +    nrz = *nr_zones;
>>> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
>>> +    g_autofree struct blk_zone_report *rep = NULL;
>>> +    rep = g_malloc(rep_size);
>>> +
>>> +    blkz = (struct blk_zone *)(rep + 1);
>>> +    while (n < nrz) {
>>> +        memset(rep, 0, rep_size);
>>> +        rep->sector = sector;
>>> +        rep->nr_zones = nrz - n;
>>> +
>>> +        ret = ioctl(fd, BLKREPORTZONE, rep);
>>
>> Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?
> 
> No? We discussed this before. I guess even EINTR should be propagated
> back to the guest. Maybe Damien can talk more about why.

In the kernel, completion of zone management IO requests are waited for using
wait_for_completion_io() which uses TASK_UNINTERRUPTIBLE. So a signal will not
abort anything. So I do not think that the do { } while() loop is necessary.

> 
>>
>>> +        if (ret != 0) {
>>> +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
>>> +                         fd, sector, errno);
>>> +            return -errno;
>>> +        }
>>> +
>>> +        if (!rep->nr_zones) {
>>> +            break;
>>> +        }
>>> +
>>> +        for (i = 0; i < rep->nr_zones; i++, n++) {
>>> +            parse_zone(&zones[n], &blkz[i]);
>>> +            /* The next report should start after the last zone reported */
>>> +            sector = blkz[i].start + blkz[i].len;
>>> +        }
>>> +    }
>>> +
>>> +    *nr_zones = n;
>>> +    return 0;
>>> +#else
>>> +    return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>> +static int handle_aiocb_zone_mgmt(void *opaque) {
>>> +#if defined(CONFIG_BLKZONED)
>>> +    RawPosixAIOData *aiocb = opaque;
>>> +    int fd = aiocb->aio_fildes;
>>> +    int64_t sector = aiocb->aio_offset;
>>> +    int64_t nr_sectors = aiocb->aio_nbytes;
>>> +    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;
>>> +    struct blk_zone_range range;
>>> +    int ret;
>>> +
>>> +    /* Execute the operation */
>>> +    range.sector = sector;
>>> +    range.nr_sectors = nr_sectors;
>>> +    do {
>>> +        ret = ioctl(fd, ioctl_op, &range);
>>> +    } while (ret != 0 && errno == EINTR);
>>> +
>>> +    return ret;
>>
>>   if (ret < 0) {
>>       return -errno;
>>   }
>>   return 0;
>>
>>> +#else
>>> +    return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>>  static int handle_aiocb_copy_range(void *opaque)
>>>  {
>>>      RawPosixAIOData *aiocb = opaque;
>>> @@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>>>      }
>>>  }
>>>
>>> +/*
>>> + * zone report - Get a zone block device's information in the form
>>> + * of an array of zone descriptors.
>>> + *
>>> + * @param bs: passing zone block device file descriptor
>>> + * @param zones: an array of zone descriptors to hold zone
>>> + * information on reply
>>> + * @param offset: offset can be any byte within the zone size.
>>> + * @param len: (not sure yet.
>>> + * @return 0 on success, -1 on failure
>>> + */
>>> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
>>> +                                           unsigned int *nr_zones,
>>> +                                           BlockZoneDescriptor *zones) {
>>> +#if defined(CONFIG_BLKZONED)
>>> +    BDRVRawState *s = bs->opaque;
>>> +    RawPosixAIOData acb;
>>> +
>>> +    acb = (RawPosixAIOData) {
>>> +        .bs         = bs,
>>> +        .aio_fildes = s->fd,
>>> +        .aio_type   = QEMU_AIO_ZONE_REPORT,
>>> +        /* zoned block devices use 512-byte sectors */
>>> +        .aio_offset = offset / 512,
>>> +        .zone_report    = {
>>> +                .nr_zones       = nr_zones,
>>> +                .zones          = zones,
>>> +        },
>>> +    };
>>> +
>>> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
>>> +#else
>>> +    return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>> +/*
>>> + * zone management operations - Execute an operation on a zone
>>> + */
>>> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>> +        int64_t offset, int64_t len) {
>>> +#if defined(CONFIG_BLKZONED)
>>> +    BDRVRawState *s = bs->opaque;
>>> +    RawPosixAIOData acb;
>>> +    int64_t zone_sector, zone_sector_mask;
>>> +    const char *ioctl_name;
>>> +    unsigned long ioctl_op;
>>> +    int ret;
>>> +
>>> +    struct stat st;
>>> +    if (fstat(s->fd, &st) < 0) {
>>> +        ret = -errno;
>>> +        return ret;
>>> +    }
>>> +    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
>>> +    if (zone_sector < 0) {
>>> +        error_report("invalid zone sector size %" PRId64 "", zone_sector);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    zone_sector_mask = zone_sector - 1;
>>> +    if (offset & zone_sector_mask) {
>>> +        error_report("sector offset %" PRId64 " is not aligned to zone size "
>>> +                     "%" PRId64 "", offset, zone_sector);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    if (len & zone_sector_mask) {
>>> +        error_report("number of sectors %" PRId64 " is not aligned to zone size"
>>> +                      " %" PRId64 "", len, zone_sector);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    switch (op) {
>>> +    case BLK_ZO_OPEN:
>>> +        ioctl_name = "BLKOPENZONE";
>>> +        ioctl_op = BLKOPENZONE;
>>> +        break;
>>> +    case BLK_ZO_CLOSE:
>>> +        ioctl_name = "BLKCLOSEZONE";
>>> +        ioctl_op = BLKCLOSEZONE;
>>> +        break;
>>> +    case BLK_ZO_FINISH:
>>> +        ioctl_name = "BLKFINISHZONE";
>>> +        ioctl_op = BLKFINISHZONE;
>>> +        break;
>>> +    case BLK_ZO_RESET:
>>> +        ioctl_name = "BLKRESETZONE";
>>> +        ioctl_op = BLKRESETZONE;
>>> +        break;
>>> +    default:
>>> +        error_report("Invalid zone operation 0x%x", op);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    acb = (RawPosixAIOData) {
>>> +        .bs             = bs,
>>> +        .aio_fildes     = s->fd,
>>> +        .aio_type       = QEMU_AIO_ZONE_MGMT,
>>> +        .aio_offset     = offset,
>>> +        .aio_nbytes     = len,
>>> +        .zone_mgmt  = {
>>> +                .ioctl_op = ioctl_op,
>>> +        },
>>> +    };
>>> +
>>> +    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
>>> +    if (ret != 0) {
>>> +        error_report("ioctl %s failed %d", ioctl_name, errno);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    return ret;
>>> +#else
>>> +    return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>>  static coroutine_fn int
>>>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>>>                  bool blkdev)
>>> @@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
>>>      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
>>>  }
>>>
>>> +#if defined(CONFIG_BLKZONED)
>>> +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
>>> +                                Error **errp)
>>> +{
>>> +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
>>> +}
>>> +#endif
>>
>> Sorry, I asked you to add this function but I've changed my mind and I
>> think it should not be present. .bdrv_parse_filename() helps legacy
>> drivers convert arguments into QDict *options. But this is a new driver
>> that no one expects to work with string filenames. Therefore
>> .bdrv_parse_filename can be dropped.
>>
>>> +
>>>  static bool hdev_is_sg(BlockDriverState *bs)
>>>  {
>>>
>>> @@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
>>>  #endif
>>>  };
>>>
>>> +#if defined(CONFIG_BLKZONED)
>>> +static BlockDriver bdrv_zoned_host_device = {
>>> +        .format_name = "zoned_host_device",
>>> +        .protocol_name = "zoned_host_device",
>>> +        .instance_size = sizeof(BDRVRawState),
>>> +        .bdrv_needs_filename = true,
>>> +        .bdrv_probe_device  = hdev_probe_device,
>>> +        .bdrv_parse_filename = zoned_host_device_parse_filename,
>>> +        .bdrv_file_open     = hdev_open,
>>> +        .bdrv_close         = raw_close,
>>> +        .bdrv_reopen_prepare = raw_reopen_prepare,
>>> +        .bdrv_reopen_commit  = raw_reopen_commit,
>>> +        .bdrv_reopen_abort   = raw_reopen_abort,
>>> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
>>> +        .create_opts         = &bdrv_create_opts_simple,
>>> +        .mutable_opts        = mutable_opts,
>>> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>>> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>>> +
>>> +        .bdrv_co_preadv         = raw_co_preadv,
>>> +        .bdrv_co_pwritev        = raw_co_pwritev,
>>> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>>> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
>>> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
>>> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>>> +        .bdrv_refresh_limits = raw_refresh_limits,
>>> +        .bdrv_io_plug = raw_aio_plug,
>>> +        .bdrv_io_unplug = raw_aio_unplug,
>>> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>>> +
>>> +        .bdrv_co_truncate       = raw_co_truncate,
>>> +        .bdrv_getlength = raw_getlength,
>>> +        .bdrv_get_info = raw_get_info,
>>> +        .bdrv_get_allocated_file_size
>>> +                            = raw_get_allocated_file_size,
>>> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
>>> +        .bdrv_check_perm = raw_check_perm,
>>> +        .bdrv_set_perm   = raw_set_perm,
>>> +        .bdrv_abort_perm_update = raw_abort_perm_update,
>>> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
>>> +        .bdrv_probe_geometry = hdev_probe_geometry,
>>> +        .bdrv_co_ioctl = hdev_co_ioctl,
>>> +
>>> +        /* zone management operations */
>>> +        .bdrv_co_zone_report = raw_co_zone_report,
>>> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>>> +};
>>> +#endif
>>> +
>>>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>>>  static void cdrom_parse_filename(const char *filename, QDict *options,
>>>                                   Error **errp)
>>> @@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
>>>      bdrv_register(&bdrv_file);
>>>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>>>      bdrv_register(&bdrv_host_device);
>>> +#if defined(CONFIG_BLKZONED)
>>> +    bdrv_register(&bdrv_zoned_host_device);
>>> +#endif
>>>  #ifdef __linux__
>>>      bdrv_register(&bdrv_host_cdrom);
>>>  #endif
>>> diff --git a/block/io.c b/block/io.c
>>> index 0a8cbefe86..de9ec1d740 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -3198,6 +3198,47 @@ out:
>>>      return co.ret;
>>>  }
>>>
>>> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
>>> +                        unsigned int *nr_zones,
>>> +                        BlockZoneDescriptor *zones)
>>> +{
>>> +    BlockDriver *drv = bs->drv;
>>> +    CoroutineIOCompletion co = {
>>> +            .coroutine = qemu_coroutine_self(),
>>> +    };
>>> +    IO_CODE();
>>> +
>>> +    bdrv_inc_in_flight(bs);
>>> +    if (!drv || !drv->bdrv_co_zone_report) {
>>> +        co.ret = -ENOTSUP;
>>> +        goto out;
>>> +    }
>>> +    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
>>> +out:
>>> +    bdrv_dec_in_flight(bs);
>>> +    return co.ret;
>>> +}
>>> +
>>> +int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>> +        int64_t offset, int64_t len)
>>> +{
>>> +    BlockDriver *drv = bs->drv;
>>> +    CoroutineIOCompletion co = {
>>> +            .coroutine = qemu_coroutine_self(),
>>> +    };
>>> +    IO_CODE();
>>> +
>>> +    bdrv_inc_in_flight(bs);
>>> +    if (!drv || !drv->bdrv_co_zone_mgmt) {
>>> +        co.ret = -ENOTSUP;
>>> +        goto out;
>>> +    }
>>> +    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
>>> +out:
>>> +    bdrv_dec_in_flight(bs);
>>> +    return co.ret;
>>> +}
>>> +
>>>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>>>  {
>>>      IO_CODE();
>>> diff --git a/include/block/block-common.h b/include/block/block-common.h
>>> index 36bd0e480e..5102fa6858 100644
>>> --- a/include/block/block-common.h
>>> +++ b/include/block/block-common.h
>>> @@ -23,7 +23,6 @@
>>>   */
>>>  #ifndef BLOCK_COMMON_H
>>>  #define BLOCK_COMMON_H
>>> -
>>>  #include "block/aio.h"
>>>  #include "block/aio-wait.h"
>>>  #include "qemu/iov.h"
>>
>> Unrelated whitespace change. Please drop this.
>>
>>> diff --git a/include/block/block-io.h b/include/block/block-io.h
>>> index fd25ffa9be..55ad261e16 100644
>>> --- a/include/block/block-io.h
>>> +++ b/include/block/block-io.h
>>> @@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>>>  /* Ensure contents are flushed to disk.  */
>>>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>>>
>>> +/* Report zone information of zone block device. */
>>> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
>>> +                                     unsigned int *nr_zones,
>>> +                                     BlockZoneDescriptor *zones);
>>> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>> +                                   int64_t offset, int64_t len);
>>> +
>>>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>>>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>>>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
>>> @@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>>>  int generated_co_wrapper
>>>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>>>
>>> +int generated_co_wrapper
>>> +blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
>>> +                BlockZoneDescriptor *zones);
>>> +int generated_co_wrapper
>>> +blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
>>> +
>>>  /**
>>>   * bdrv_parent_drained_begin_single:
>>>   *
>>> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
>>> index 7f7863cc9e..de44c7b6f4 100644
>>> --- a/include/block/block_int-common.h
>>> +++ b/include/block/block_int-common.h
>>> @@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
>>>      struct BdrvTrackedRequest *waiting_for;
>>>  } BdrvTrackedRequest;
>>>
>>> -
>>>  struct BlockDriver {
>>>      /*
>>>       * These fields are initialized when this object is created,
>>
>> Unrelated whitespace change. Please drop this.
>>
>>> @@ -691,6 +690,12 @@ struct BlockDriver {
>>>                                            QEMUIOVector *qiov,
>>>                                            int64_t pos);
>>>
>>> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
>>> +            int64_t offset, unsigned int *nr_zones,
>>> +            BlockZoneDescriptor *zones);
>>> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
>>> +            int64_t offset, int64_t len);
>>> +
>>>      /* removable device specific */
>>>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>>>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
>>> @@ -828,6 +833,21 @@ typedef struct BlockLimits {
>>>
>>>      /* device zone model */
>>>      BlockZoneModel zoned;
>>> +
>>> +    /* zone size expressed in 512-byte sectors */
>>> +    uint32_t zone_sectors;
>>> +
>>> +    /* total number of zones */
>>> +    unsigned int nr_zones;
>>> +
>>> +    /* maximum size in bytes of a zone append write operation */
>>> +    int64_t zone_append_max_bytes;
>>> +
>>> +    /* maximum number of open zones */
>>> +    int64_t max_open_zones;
>>> +
>>> +    /* maximum number of active zones */
>>> +    int64_t max_active_zones;
>>>  } BlockLimits;
>>>
>>>  typedef struct BdrvOpBlocker BdrvOpBlocker;
>>> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
>>> index 21fc10c4c9..3d26929cdd 100644
>>> --- a/include/block/raw-aio.h
>>> +++ b/include/block/raw-aio.h
>>> @@ -29,6 +29,8 @@
>>>  #define QEMU_AIO_WRITE_ZEROES 0x0020
>>>  #define QEMU_AIO_COPY_RANGE   0x0040
>>>  #define QEMU_AIO_TRUNCATE     0x0080
>>> +#define QEMU_AIO_ZONE_REPORT  0x0100
>>> +#define QEMU_AIO_ZONE_MGMT    0x0200
>>>  #define QEMU_AIO_TYPE_MASK \
>>>          (QEMU_AIO_READ | \
>>>           QEMU_AIO_WRITE | \
>>> @@ -37,7 +39,9 @@
>>>           QEMU_AIO_DISCARD | \
>>>           QEMU_AIO_WRITE_ZEROES | \
>>>           QEMU_AIO_COPY_RANGE | \
>>> -         QEMU_AIO_TRUNCATE)
>>> +         QEMU_AIO_TRUNCATE  | \
>>> +         QEMU_AIO_ZONE_REPORT | \
>>> +         QEMU_AIO_ZONE_MGMT)
>>>
>>>  /* AIO flags */
>>>  #define QEMU_AIO_MISALIGNED   0x1000
>>> diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
>>> index 50f5aa2e07..6e7df1d93b 100644
>>> --- a/include/sysemu/block-backend-io.h
>>> +++ b/include/sysemu/block-backend-io.h
>>> @@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>>>  int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>>>                                        int64_t bytes, BdrvRequestFlags flags);
>>>
>>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
>>> +                                    unsigned int *nr_zones,
>>> +                                    BlockZoneDescriptor *zones);
>>> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>>> +                                  int64_t offset, int64_t len);
>>> +
>>>  int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
>>>                                        int64_t bytes);
>>>  int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
>>> diff --git a/meson.build b/meson.build
>>> index 294e9a8f32..c3219b0e87 100644
>>> --- a/meson.build
>>> +++ b/meson.build
>>> @@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
>>>  # has_header
>>>  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
>>>  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
>>> +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
>>>  config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
>>>  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
>>>  config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 2173e7734a..c6bbb7a037 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -2942,6 +2942,7 @@
>>>  # @compress: Since 5.0
>>>  # @copy-before-write: Since 6.2
>>>  # @snapshot-access: Since 7.0
>>> +# @zoned_host_device: Since 7.2
>>>  #
>>>  # Since: 2.9
>>>  ##
>>> @@ -2955,7 +2956,8 @@
>>>              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>>>              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>>>              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
>>> -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
>>> +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
>>> +            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
>>>
>>>  ##
>>>  # @BlockdevOptionsFile:
>>> @@ -4329,7 +4331,9 @@
>>>        'vhdx':       'BlockdevOptionsGenericFormat',
>>>        'vmdk':       'BlockdevOptionsGenericCOWFormat',
>>>        'vpc':        'BlockdevOptionsGenericFormat',
>>> -      'vvfat':      'BlockdevOptionsVVFAT'
>>> +      'vvfat':      'BlockdevOptionsVVFAT',
>>> +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
>>> +                             'if': 'CONFIG_BLKZONED' }
>>>    } }
>>>
>>>  ##
>>> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
>>> index 952dc940f1..687c3a624c 100644
>>> --- a/qemu-io-cmds.c
>>> +++ b/qemu-io-cmds.c
>>> @@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
>>>      .oneline    = "flush all in-core file state to disk",
>>>  };
>>>
>>> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
>>> +{
>>> +    int ret;
>>> +    int64_t offset;
>>> +    unsigned int nr_zones;
>>> +
>>> +    ++optind;
>>> +    offset = cvtnum(argv[optind]);
>>> +    ++optind;
>>> +    nr_zones = cvtnum(argv[optind]);
>>> +
>>> +    g_autofree BlockZoneDescriptor *zones = NULL;
>>> +    zones = g_new(BlockZoneDescriptor, nr_zones);
>>> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
>>> +    if (ret < 0) {
>>> +        printf("zone report failed: %s\n", strerror(-ret));
>>> +    } else {
>>> +        for (int i = 0; i < nr_zones; ++i) {
>>> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
>>> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
>>> +                   "zcond:%u, [type: %u]\n",
>>> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
>>> +                   zones[i].cond, zones[i].type);
>>> +        }
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static const cmdinfo_t zone_report_cmd = {
>>> +        .name = "zone_report",
>>> +        .altname = "zrp",
>>> +        .cfunc = zone_report_f,
>>> +        .argmin = 2,
>>> +        .argmax = 2,
>>> +        .args = "offset number",
>>> +        .oneline = "report zone information",
>>> +};
>>> +
>>> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
>>> +{
>>> +    int ret;
>>> +    int64_t offset, len;
>>> +    ++optind;
>>> +    offset = cvtnum(argv[optind]);
>>> +    ++optind;
>>> +    len = cvtnum(argv[optind]);
>>> +    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
>>> +    if (ret < 0) {
>>> +        printf("zone open failed: %s\n", strerror(-ret));
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static const cmdinfo_t zone_open_cmd = {
>>> +        .name = "zone_open",
>>> +        .altname = "zo",
>>> +        .cfunc = zone_open_f,
>>> +        .argmin = 2,
>>> +        .argmax = 2,
>>> +        .args = "offset len",
>>> +        .oneline = "explicit open a range of zones in zone block device",
>>> +};
>>> +
>>> +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
>>> +{
>>> +    int ret;
>>> +    int64_t offset, len;
>>> +    ++optind;
>>> +    offset = cvtnum(argv[optind]);
>>> +    ++optind;
>>> +    len = cvtnum(argv[optind]);
>>> +    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
>>> +    if (ret < 0) {
>>> +        printf("zone close failed: %s\n", strerror(-ret));
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static const cmdinfo_t zone_close_cmd = {
>>> +        .name = "zone_close",
>>> +        .altname = "zc",
>>> +        .cfunc = zone_close_f,
>>> +        .argmin = 2,
>>> +        .argmax = 2,
>>> +        .args = "offset len",
>>> +        .oneline = "close a range of zones in zone block device",
>>> +};
>>> +
>>> +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
>>> +{
>>> +    int ret;
>>> +    int64_t offset, len;
>>> +    ++optind;
>>> +    offset = cvtnum(argv[optind]);
>>> +    ++optind;
>>> +    len = cvtnum(argv[optind]);
>>> +    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
>>> +    if (ret < 0) {
>>> +        printf("zone finish failed: %s\n", strerror(-ret));
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static const cmdinfo_t zone_finish_cmd = {
>>> +        .name = "zone_finish",
>>> +        .altname = "zf",
>>> +        .cfunc = zone_finish_f,
>>> +        .argmin = 2,
>>> +        .argmax = 2,
>>> +        .args = "offset len",
>>> +        .oneline = "finish a range of zones in zone block device",
>>> +};
>>> +
>>> +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
>>> +{
>>> +    int ret;
>>> +    int64_t offset, len;
>>> +    ++optind;
>>> +    offset = cvtnum(argv[optind]);
>>> +    ++optind;
>>> +    len = cvtnum(argv[optind]);
>>> +    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
>>> +    if (ret < 0) {
>>> +        printf("zone reset failed: %s\n", strerror(-ret));
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static const cmdinfo_t zone_reset_cmd = {
>>> +        .name = "zone_reset",
>>> +        .altname = "zrs",
>>> +        .cfunc = zone_reset_f,
>>> +        .argmin = 2,
>>> +        .argmax = 2,
>>> +        .args = "offset len",
>>> +        .oneline = "reset a zone write pointer in zone block device",
>>> +};
>>> +
>>>  static int truncate_f(BlockBackend *blk, int argc, char **argv);
>>>  static const cmdinfo_t truncate_cmd = {
>>>      .name       = "truncate",
>>> @@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
>>>      qemuio_add_command(&aio_write_cmd);
>>>      qemuio_add_command(&aio_flush_cmd);
>>>      qemuio_add_command(&flush_cmd);
>>> +    qemuio_add_command(&zone_report_cmd);
>>> +    qemuio_add_command(&zone_open_cmd);
>>> +    qemuio_add_command(&zone_close_cmd);
>>> +    qemuio_add_command(&zone_finish_cmd);
>>> +    qemuio_add_command(&zone_reset_cmd);
>>>      qemuio_add_command(&truncate_cmd);
>>>      qemuio_add_command(&length_cmd);
>>>      qemuio_add_command(&info_cmd);
>>> --
>>> 2.37.1
>>>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-24 23:46       ` Damien Le Moal
@ 2022-08-24 23:53         ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2022-08-24 23:53 UTC (permalink / raw)
  To: Sam Li, Stefan Hajnoczi
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Hanna Reitz, Dmitry Fomichev, qemu block

On 2022/08/24 16:46, Damien Le Moal wrote:
> On 2022/08/22 21:12, Sam Li wrote:
>> Stefan Hajnoczi <stefanha@redhat.com> 于2022年8月23日周二 08:49写道:
>>>
>>> On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
[...]>>>> +    blkz = (struct blk_zone *)(rep + 1);
>>>> +    while (n < nrz) {
>>>> +        memset(rep, 0, rep_size);
>>>> +        rep->sector = sector;
>>>> +        rep->nr_zones = nrz - n;
>>>> +
>>>> +        ret = ioctl(fd, BLKREPORTZONE, rep);
>>>
>>> Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?
>>
>> No? We discussed this before. I guess even EINTR should be propagated
>> back to the guest. Maybe Damien can talk more about why.
> 
> In the kernel, completion of zone management IO requests are waited for using
> wait_for_completion_io() which uses TASK_UNINTERRUPTIBLE. So a signal will not
> abort anything. So I do not think that the do { } while() loop is necessary.

Note: I do not think the loop to be necessary, but it will not hurt :)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  2022-08-16 17:50   ` Damien Le Moal
@ 2022-08-26 12:20     ` Sam Li
  0 siblings, 0 replies; 28+ messages in thread
From: Sam Li @ 2022-08-26 12:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: qemu-devel, Hannes Reinecke, Fam Zheng, Kevin Wolf, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi, Hanna Reitz, Dmitry Fomichev,
	qemu block

Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年8月17日周三 01:50写道:
>
> On 2022/08/15 23:25, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage controller
> > emulation can use the new block layer APIs including Report Zone and
> > four zone management operations (open, close, finish, reset).
> >
> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  block/block-backend.c             |  50 +++++
> >  block/file-posix.c                | 341 +++++++++++++++++++++++++++++-
> >  block/io.c                        |  41 ++++
> >  include/block/block-common.h      |   1 -
> >  include/block/block-io.h          |  13 ++
> >  include/block/block_int-common.h  |  22 +-
> >  include/block/raw-aio.h           |   6 +-
> >  include/sysemu/block-backend-io.h |   6 +
> >  meson.build                       |   1 +
> >  qapi/block-core.json              |   8 +-
> >  qemu-io-cmds.c                    | 143 +++++++++++++
> >  11 files changed, 625 insertions(+), 7 deletions(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index d4a5df2ac2..fc639b0cd7 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> >      return ret;
> >  }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset is a byte offset from the start of the device. No alignment
> > + * required for offset.
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    unsigned int *nr_zones,
> > +                                    BlockZoneDescriptor *zones)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk); /* increase before waiting */
> > +    blk_wait_while_drained(blk);
> > +    if (!blk_is_available(blk)) {
> > +        blk_dec_in_flight(blk);
> > +        return -ENOMEDIUM;
> > +    }
> > +    ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * offset is the starting zone specified as a sector offset.
> > + * len is the maximum number of sectors the command should operate on.
>
> You should mention that len should be zone size aligned. Also, for completness,
> add a short description of the op argument too ?
>
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    ret = blk_check_byte_request(blk, offset, len);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +    blk_inc_in_flight(blk);
> > +    blk_wait_while_drained(blk);
> > +    if (!blk_is_available(blk)) {
> > +        blk_dec_in_flight(blk);
> > +        return -ENOMEDIUM;
> > +    }
> > +    ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> >  void blk_drain(BlockBackend *blk)
> >  {
> >      BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 727389488c..29f67082d9 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,9 @@
> >  #include <sys/param.h>
> >  #include <sys/syscall.h>
> >  #include <sys/vfs.h>
> > +#if defined(CONFIG_BLKZONED)
> > +#include <linux/blkzoned.h>
> > +#endif
> >  #include <linux/cdrom.h>
> >  #include <linux/fd.h>
> >  #include <linux/fs.h>
> > @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
> >              PreallocMode prealloc;
> >              Error **errp;
> >          } truncate;
> > +        struct {
> > +            unsigned int *nr_zones;
> > +            BlockZoneDescriptor *zones;
> > +        } zone_report;
> > +        struct {
> > +            unsigned long ioctl_op;
>
> May be clarify this field usage by calling it zone_op ?
>
> > +        } zone_mgmt;
> >      };
> >  } RawPosixAIOData;
> >
> > @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >  #endif
> >
> >      if (bs->sg || S_ISBLK(st.st_mode)) {
> > -        int ret = hdev_get_max_hw_transfer(s->fd, &st);
> > +        ret = hdev_get_max_hw_transfer(s->fd, &st);
> >
> >          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> >              bs->bl.max_hw_transfer = ret;
> > @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >          }
> >      }
> >
> > -    ret = get_sysfs_zoned_model(s->fd, &st, &zoned);
> > +    ret = get_sysfs_zoned_model(&st, &zoned);
> >      if (ret < 0) {
> >          zoned = BLK_Z_NONE;
> >      }
> >      bs->bl.zoned = zoned;
> > +    if (zoned != BLK_Z_NONE) {
> > +        ret = get_sysfs_long_val(&st, "chunk_sectors");
> > +        if (ret > 0) {
> > +            bs->bl.zone_sectors = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
> > +        if (ret > 0) {
> > +            bs->bl.zone_append_max_bytes = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "max_open_zones");
> > +        if (ret > 0) {
>
> The value can be 0, so this should be "if (ret >= 0) {".
>
> > +            bs->bl.max_open_zones = ret;
> > +        }
> > +
> > +        ret = get_sysfs_long_val(&st, "max_active_zones");
> > +        if (ret > 0) {
>
> The value can be 0, so this should be "if (ret >= 0) {".
>
> > +            bs->bl.max_active_zones = ret;
> > +        }
> > +    }
> >  }
> >
> >  static int check_for_dasd(int fd)
> > @@ -1839,6 +1870,134 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> >  }
> >  #endif
> >
> > +/*
> > + * parse_zone - Fill a zone descriptor
> > + */
> > +#if defined(CONFIG_BLKZONED)
> > +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> > +                              struct blk_zone *blkz) {
> > +    zone->start = blkz->start;
> > +    zone->length = blkz->len;
> > +    zone->cap = blkz->capacity;
> > +    zone->wp = blkz->wp;
> > +
> > +    switch (blkz->type) {
> > +    case BLK_ZONE_TYPE_SEQWRITE_REQ:
> > +        zone->type = BLK_ZT_SWR;
> > +        break;
> > +    case BLK_ZONE_TYPE_SEQWRITE_PREF:
> > +        zone->type = BLK_ZT_SWP;
> > +        break;
> > +    case BLK_ZONE_TYPE_CONVENTIONAL:
> > +        zone->type = BLK_ZT_CONV;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone type: 0x%x", blkz->type);
> > +    }
> > +
> > +    switch (blkz->cond) {
> > +    case BLK_ZONE_COND_NOT_WP:
> > +        zone->cond = BLK_ZS_NOT_WP;
> > +        break;
> > +    case BLK_ZONE_COND_EMPTY:
> > +        zone->cond = BLK_ZS_EMPTY;
> > +        break;
> > +    case BLK_ZONE_COND_IMP_OPEN:
> > +        zone->cond =BLK_ZS_IOPEN;
> > +        break;
> > +    case BLK_ZONE_COND_EXP_OPEN:
> > +        zone->cond = BLK_ZS_EOPEN;
> > +        break;
> > +    case BLK_ZONE_COND_CLOSED:
> > +        zone->cond = BLK_ZS_CLOSED;
> > +        break;
> > +    case BLK_ZONE_COND_READONLY:
> > +        zone->cond = BLK_ZS_RDONLY;
> > +        break;
> > +    case BLK_ZONE_COND_FULL:
> > +        zone->cond = BLK_ZS_FULL;
> > +        break;
> > +    case BLK_ZONE_COND_OFFLINE:
> > +        zone->cond = BLK_ZS_OFFLINE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone condition 0x%x", blkz->cond);
> > +    }
> > +}
> > +#endif
> > +
> > +static int handle_aiocb_zone_report(void *opaque) {
> > +#if defined(CONFIG_BLKZONED)
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> > +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > +    int64_t sector = aiocb->aio_offset;
> > +
> > +    struct blk_zone *blkz;
> > +    int64_t rep_size;
> > +    unsigned int nrz;
> > +    int ret, n = 0, i = 0;
> > +
> > +    nrz = *nr_zones;
> > +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> > +    g_autofree struct blk_zone_report *rep = NULL;
> > +    rep = g_malloc(rep_size);
> > +
> > +    blkz = (struct blk_zone *)(rep + 1);
> > +    while (n < nrz) {
> > +        memset(rep, 0, rep_size);
> > +        rep->sector = sector;
> > +        rep->nr_zones = nrz - n;
> > +
> > +        ret = ioctl(fd, BLKREPORTZONE, rep);
> > +        if (ret != 0) {
> > +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> > +                         fd, sector, errno);
> > +            return -errno;
> > +        }
> > +
> > +        if (!rep->nr_zones) {
> > +            break;
> > +        }
> > +
> > +        for (i = 0; i < rep->nr_zones; i++, n++) {
> > +            parse_zone(&zones[n], &blkz[i]);
> > +            /* The next report should start after the last zone reported */
> > +            sector = blkz[i].start + blkz[i].len;
> > +        }
> > +    }
> > +
> > +    *nr_zones = n;
> > +    return 0;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> > +static int handle_aiocb_zone_mgmt(void *opaque) {
> > +#if defined(CONFIG_BLKZONED)
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t sector = aiocb->aio_offset;
> > +    int64_t nr_sectors = aiocb->aio_nbytes;
> > +    unsigned long ioctl_op = aiocb->zone_mgmt.ioctl_op;
>
> Nit: I do not think these variables are very useful. You could reference
> directly the aiocb fields in the code below.
>
> > +    struct blk_zone_range range;
> > +    int ret;
> > +
> > +    /* Execute the operation */
> > +    range.sector = sector;
> > +    range.nr_sectors = nr_sectors;
> > +    do {
> > +        ret = ioctl(fd, ioctl_op, &range);
> > +    } while (ret != 0 && errno == EINTR);
> > +
> > +    return ret;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static int handle_aiocb_copy_range(void *opaque)
> >  {
> >      RawPosixAIOData *aiocb = opaque;
> > @@ -3011,6 +3170,124 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
> >      }
> >  }
> >
> > +/*
> > + * zone report - Get a zone block device's information in the form
> > + * of an array of zone descriptors.
> > + *
> > + * @param bs: passing zone block device file descriptor
> > + * @param zones: an array of zone descriptors to hold zone
> > + * information on reply
> > + * @param offset: offset can be any byte within the zone size.
> > + * @param len: (not sure yet.
> > + * @return 0 on success, -1 on failure
> > + */
> > +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                           unsigned int *nr_zones,
> > +                                           BlockZoneDescriptor *zones) {
> > +#if defined(CONFIG_BLKZONED)
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs         = bs,
> > +        .aio_fildes = s->fd,
> > +        .aio_type   = QEMU_AIO_ZONE_REPORT,
> > +        /* zoned block devices use 512-byte sectors */
> > +        .aio_offset = offset / 512,
>
> This conversion from bytes to 512B sectors would be better placed in
> handle_aiocb_zone_report(). Doing so, all the API uses bytes, similarly to other
> operations and the conversion to 512B sectors only done for Linux specific ioctl
> code.
>
> > +        .zone_report    = {
> > +                .nr_zones       = nr_zones,
> > +                .zones          = zones,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> > +/*
> > + * zone management operations - Execute an operation on a zone
> > + */
> > +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +        int64_t offset, int64_t len) {
> > +#if defined(CONFIG_BLKZONED)
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +    int64_t zone_sector, zone_sector_mask;
> > +    const char *ioctl_name;
> > +    unsigned long ioctl_op;
> > +    int ret;
> > +
> > +    struct stat st;
> > +    if (fstat(s->fd, &st) < 0) {
> > +        ret = -errno;
> > +        return ret;
> > +    }
> > +    zone_sector = get_sysfs_long_val(&st, "chunk_sectors");
> > +    if (zone_sector < 0) {
> > +        error_report("invalid zone sector size %" PRId64 "", zone_sector);
> > +        return -EINVAL;
> > +    }
>
> You already got this value in bs->bl.zone_sectors in raw_refresh_limits(). So I
> you should not need to read it again from sysfs.
>
> > +
> > +    zone_sector_mask = zone_sector - 1;
> > +    if (offset & zone_sector_mask) {
> > +        error_report("sector offset %" PRId64 " is not aligned to zone size "
> > +                     "%" PRId64 "", offset, zone_sector);
> > +        return -EINVAL;
> > +    }
> > +
> > +    if (len & zone_sector_mask) {
> > +        error_report("number of sectors %" PRId64 " is not aligned to zone size"
> > +                      " %" PRId64 "", len, zone_sector);
> > +        return -EINVAL;
> > +    }
> > +
> > +    switch (op) {
> > +    case BLK_ZO_OPEN:
> > +        ioctl_name = "BLKOPENZONE";
> > +        ioctl_op = BLKOPENZONE;
> > +        break;
> > +    case BLK_ZO_CLOSE:
> > +        ioctl_name = "BLKCLOSEZONE";
> > +        ioctl_op = BLKCLOSEZONE;
> > +        break;
> > +    case BLK_ZO_FINISH:
> > +        ioctl_name = "BLKFINISHZONE";
> > +        ioctl_op = BLKFINISHZONE;
> > +        break;
> > +    case BLK_ZO_RESET:
> > +        ioctl_name = "BLKRESETZONE";
> > +        ioctl_op = BLKRESETZONE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone operation 0x%x", op);
> > +        return -EINVAL;
> > +    }
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs             = bs,
> > +        .aio_fildes     = s->fd,
> > +        .aio_type       = QEMU_AIO_ZONE_MGMT,
> > +        .aio_offset     = offset,
> > +        .aio_nbytes     = len,
>
> Are these 2 values in bytes or in 512B sectors ? Looking at
> handle_aiocb_zone_mgmt(), it looks like 512B sectors. So where is the conversion
> done from bytes ?

They are both in number of sectors so it doesn't need conversion.

>
> > +        .zone_mgmt  = {
> > +                .ioctl_op = ioctl_op,
> > +        },
> > +    };
> > +
> > +    ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> > +    if (ret != 0) {
> > +        error_report("ioctl %s failed %d", ioctl_name, errno);
> > +        return -errno;
> > +    }
> > +
> > +    return ret;
> > +#else
> > +    return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static coroutine_fn int
> >  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >                  bool blkdev)
> > @@ -3511,6 +3788,14 @@ static void hdev_parse_filename(const char *filename, QDict *options,
> >      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
> >  }
> >
> > +#if defined(CONFIG_BLKZONED)
> > +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
> > +                                Error **errp)
> > +{
> > +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
> > +}
> > +#endif
> > +
> >  static bool hdev_is_sg(BlockDriverState *bs)
> >  {
> >
> > @@ -3741,6 +4026,55 @@ static BlockDriver bdrv_host_device = {
> >  #endif
> >  };
> >
> > +#if defined(CONFIG_BLKZONED)
> > +static BlockDriver bdrv_zoned_host_device = {
> > +        .format_name = "zoned_host_device",
> > +        .protocol_name = "zoned_host_device",
> > +        .instance_size = sizeof(BDRVRawState),
> > +        .bdrv_needs_filename = true,
> > +        .bdrv_probe_device  = hdev_probe_device,
> > +        .bdrv_parse_filename = zoned_host_device_parse_filename,
> > +        .bdrv_file_open     = hdev_open,
> > +        .bdrv_close         = raw_close,
> > +        .bdrv_reopen_prepare = raw_reopen_prepare,
> > +        .bdrv_reopen_commit  = raw_reopen_commit,
> > +        .bdrv_reopen_abort   = raw_reopen_abort,
> > +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> > +        .create_opts         = &bdrv_create_opts_simple,
> > +        .mutable_opts        = mutable_opts,
> > +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> > +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> > +
> > +        .bdrv_co_preadv         = raw_co_preadv,
> > +        .bdrv_co_pwritev        = raw_co_pwritev,
> > +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> > +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > +        .bdrv_refresh_limits = raw_refresh_limits,
> > +        .bdrv_io_plug = raw_aio_plug,
> > +        .bdrv_io_unplug = raw_aio_unplug,
> > +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > +
> > +        .bdrv_co_truncate       = raw_co_truncate,
> > +        .bdrv_getlength = raw_getlength,
> > +        .bdrv_get_info = raw_get_info,
> > +        .bdrv_get_allocated_file_size
> > +                            = raw_get_allocated_file_size,
> > +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> > +        .bdrv_check_perm = raw_check_perm,
> > +        .bdrv_set_perm   = raw_set_perm,
> > +        .bdrv_abort_perm_update = raw_abort_perm_update,
> > +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> > +        .bdrv_probe_geometry = hdev_probe_geometry,
> > +        .bdrv_co_ioctl = hdev_co_ioctl,
> > +
> > +        /* zone management operations */
> > +        .bdrv_co_zone_report = raw_co_zone_report,
> > +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> > +};
> > +#endif
> > +
> >  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> >  static void cdrom_parse_filename(const char *filename, QDict *options,
> >                                   Error **errp)
> > @@ -4001,6 +4335,9 @@ static void bdrv_file_init(void)
> >      bdrv_register(&bdrv_file);
> >  #if defined(HAVE_HOST_BLOCK_DEVICE)
> >      bdrv_register(&bdrv_host_device);
> > +#if defined(CONFIG_BLKZONED)
> > +    bdrv_register(&bdrv_zoned_host_device);
> > +#endif
> >  #ifdef __linux__
> >      bdrv_register(&bdrv_host_cdrom);
> >  #endif
> > diff --git a/block/io.c b/block/io.c
> > index 0a8cbefe86..de9ec1d740 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -3198,6 +3198,47 @@ out:
> >      return co.ret;
> >  }
> >
> > +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                        unsigned int *nr_zones,
> > +                        BlockZoneDescriptor *zones)
> > +{
> > +    BlockDriver *drv = bs->drv;
> > +    CoroutineIOCompletion co = {
> > +            .coroutine = qemu_coroutine_self(),
> > +    };
> > +    IO_CODE();
> > +
> > +    bdrv_inc_in_flight(bs);
> > +    if (!drv || !drv->bdrv_co_zone_report) {
> > +        co.ret = -ENOTSUP;
> > +        goto out;
> > +    }
> > +    co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> > +out:
> > +    bdrv_dec_in_flight(bs);
> > +    return co.ret;
> > +}
> > +
> > +int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    BlockDriver *drv = bs->drv;
> > +    CoroutineIOCompletion co = {
> > +            .coroutine = qemu_coroutine_self(),
> > +    };
> > +    IO_CODE();
> > +
> > +    bdrv_inc_in_flight(bs);
> > +    if (!drv || !drv->bdrv_co_zone_mgmt) {
> > +        co.ret = -ENOTSUP;
> > +        goto out;
> > +    }
> > +    co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> > +out:
> > +    bdrv_dec_in_flight(bs);
> > +    return co.ret;
> > +}
> > +
> >  void *qemu_blockalign(BlockDriverState *bs, size_t size)
> >  {
> >      IO_CODE();
> > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > index 36bd0e480e..5102fa6858 100644
> > --- a/include/block/block-common.h
> > +++ b/include/block/block-common.h
> > @@ -23,7 +23,6 @@
> >   */
> >  #ifndef BLOCK_COMMON_H
> >  #define BLOCK_COMMON_H
> > -
> >  #include "block/aio.h"
> >  #include "block/aio-wait.h"
> >  #include "qemu/iov.h"
> > diff --git a/include/block/block-io.h b/include/block/block-io.h
> > index fd25ffa9be..55ad261e16 100644
> > --- a/include/block/block-io.h
> > +++ b/include/block/block-io.h
> > @@ -88,6 +88,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
> >  /* Ensure contents are flushed to disk.  */
> >  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
> >
> > +/* Report zone information of zone block device. */
> > +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                     unsigned int *nr_zones,
> > +                                     BlockZoneDescriptor *zones);
> > +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> > +                                   int64_t offset, int64_t len);
> > +
> >  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
> >  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> >  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> > @@ -297,6 +304,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
> >  int generated_co_wrapper
> >  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
> >
> > +int generated_co_wrapper
> > +blk_zone_report(BlockBackend *blk, int64_t offset, unsigned int *nr_zones,
> > +                BlockZoneDescriptor *zones);
> > +int generated_co_wrapper
> > +blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len);
> > +
> >  /**
> >   * bdrv_parent_drained_begin_single:
> >   *
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index 7f7863cc9e..de44c7b6f4 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -94,7 +94,6 @@ typedef struct BdrvTrackedRequest {
> >      struct BdrvTrackedRequest *waiting_for;
> >  } BdrvTrackedRequest;
> >
> > -
> >  struct BlockDriver {
> >      /*
> >       * These fields are initialized when this object is created,
> > @@ -691,6 +690,12 @@ struct BlockDriver {
> >                                            QEMUIOVector *qiov,
> >                                            int64_t pos);
> >
> > +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> > +            int64_t offset, unsigned int *nr_zones,
> > +            BlockZoneDescriptor *zones);
> > +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
> > +            int64_t offset, int64_t len);
> > +
> >      /* removable device specific */
> >      bool (*bdrv_is_inserted)(BlockDriverState *bs);
> >      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> > @@ -828,6 +833,21 @@ typedef struct BlockLimits {
> >
> >      /* device zone model */
> >      BlockZoneModel zoned;
> > +
> > +    /* zone size expressed in 512-byte sectors */
> > +    uint32_t zone_sectors;
> > +
> > +    /* total number of zones */
> > +    unsigned int nr_zones;
> > +
> > +    /* maximum size in bytes of a zone append write operation */
> > +    int64_t zone_append_max_bytes;
> > +
> > +    /* maximum number of open zones */
> > +    int64_t max_open_zones;
> > +
> > +    /* maximum number of active zones */
> > +    int64_t max_active_zones;
> >  } BlockLimits;
> >
> >  typedef struct BdrvOpBlocker BdrvOpBlocker;
> > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> > index 21fc10c4c9..3d26929cdd 100644
> > --- a/include/block/raw-aio.h
> > +++ b/include/block/raw-aio.h
> > @@ -29,6 +29,8 @@
> >  #define QEMU_AIO_WRITE_ZEROES 0x0020
> >  #define QEMU_AIO_COPY_RANGE   0x0040
> >  #define QEMU_AIO_TRUNCATE     0x0080
> > +#define QEMU_AIO_ZONE_REPORT  0x0100
> > +#define QEMU_AIO_ZONE_MGMT    0x0200
> >  #define QEMU_AIO_TYPE_MASK \
> >          (QEMU_AIO_READ | \
> >           QEMU_AIO_WRITE | \
> > @@ -37,7 +39,9 @@
> >           QEMU_AIO_DISCARD | \
> >           QEMU_AIO_WRITE_ZEROES | \
> >           QEMU_AIO_COPY_RANGE | \
> > -         QEMU_AIO_TRUNCATE)
> > +         QEMU_AIO_TRUNCATE  | \
> > +         QEMU_AIO_ZONE_REPORT | \
> > +         QEMU_AIO_ZONE_MGMT)
> >
> >  /* AIO flags */
> >  #define QEMU_AIO_MISALIGNED   0x1000
> > diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
> > index 50f5aa2e07..6e7df1d93b 100644
> > --- a/include/sysemu/block-backend-io.h
> > +++ b/include/sysemu/block-backend-io.h
> > @@ -156,6 +156,12 @@ int generated_co_wrapper blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> >  int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> >                                        int64_t bytes, BdrvRequestFlags flags);
> >
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    unsigned int *nr_zones,
> > +                                    BlockZoneDescriptor *zones);
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +                                  int64_t offset, int64_t len);
> > +
> >  int generated_co_wrapper blk_pdiscard(BlockBackend *blk, int64_t offset,
> >                                        int64_t bytes);
> >  int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
> > diff --git a/meson.build b/meson.build
> > index 294e9a8f32..c3219b0e87 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1883,6 +1883,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('live_block_migration').al
> >  # has_header
> >  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
> >  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
> > +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
> >  config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
> >  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
> >  config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2173e7734a..c6bbb7a037 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2942,6 +2942,7 @@
> >  # @compress: Since 5.0
> >  # @copy-before-write: Since 6.2
> >  # @snapshot-access: Since 7.0
> > +# @zoned_host_device: Since 7.2
> >  #
> >  # Since: 2.9
> >  ##
> > @@ -2955,7 +2956,8 @@
> >              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
> >              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> >              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> > -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> > +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> > +            { 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
> >
> >  ##
> >  # @BlockdevOptionsFile:
> > @@ -4329,7 +4331,9 @@
> >        'vhdx':       'BlockdevOptionsGenericFormat',
> >        'vmdk':       'BlockdevOptionsGenericCOWFormat',
> >        'vpc':        'BlockdevOptionsGenericFormat',
> > -      'vvfat':      'BlockdevOptionsVVFAT'
> > +      'vvfat':      'BlockdevOptionsVVFAT',
> > +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
> > +                             'if': 'CONFIG_BLKZONED' }
> >    } }
> >
> >  ##
> > diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> > index 952dc940f1..687c3a624c 100644
> > --- a/qemu-io-cmds.c
> > +++ b/qemu-io-cmds.c
> > @@ -1712,6 +1712,144 @@ static const cmdinfo_t flush_cmd = {
> >      .oneline    = "flush all in-core file state to disk",
> >  };
> >
> > +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset;
> > +    unsigned int nr_zones;
> > +
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    nr_zones = cvtnum(argv[optind]);
> > +
> > +    g_autofree BlockZoneDescriptor *zones = NULL;
> > +    zones = g_new(BlockZoneDescriptor, nr_zones);
> > +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> > +    if (ret < 0) {
> > +        printf("zone report failed: %s\n", strerror(-ret));
> > +    } else {
> > +        for (int i = 0; i < nr_zones; ++i) {
> > +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> > +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> > +                   "zcond:%u, [type: %u]\n",
> > +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> > +                   zones[i].cond, zones[i].type);
> > +        }
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_report_cmd = {
> > +        .name = "zone_report",
> > +        .altname = "zrp",
> > +        .cfunc = zone_report_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset number",
> > +        .oneline = "report zone information",
> > +};
> > +
> > +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone open failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_open_cmd = {
> > +        .name = "zone_open",
> > +        .altname = "zo",
> > +        .cfunc = zone_open_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "explicit open a range of zones in zone block device",
> > +};
> > +
> > +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone close failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_close_cmd = {
> > +        .name = "zone_close",
> > +        .altname = "zc",
> > +        .cfunc = zone_close_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "close a range of zones in zone block device",
> > +};
> > +
> > +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone finish failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_finish_cmd = {
> > +        .name = "zone_finish",
> > +        .altname = "zf",
> > +        .cfunc = zone_finish_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "finish a range of zones in zone block device",
> > +};
> > +
> > +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> > +{
> > +    int ret;
> > +    int64_t offset, len;
> > +    ++optind;
> > +    offset = cvtnum(argv[optind]);
> > +    ++optind;
> > +    len = cvtnum(argv[optind]);
> > +    ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
> > +    if (ret < 0) {
> > +        printf("zone reset failed: %s\n", strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const cmdinfo_t zone_reset_cmd = {
> > +        .name = "zone_reset",
> > +        .altname = "zrs",
> > +        .cfunc = zone_reset_f,
> > +        .argmin = 2,
> > +        .argmax = 2,
> > +        .args = "offset len",
> > +        .oneline = "reset a zone write pointer in zone block device",
> > +};
> > +
> >  static int truncate_f(BlockBackend *blk, int argc, char **argv);
> >  static const cmdinfo_t truncate_cmd = {
> >      .name       = "truncate",
> > @@ -2504,6 +2642,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
> >      qemuio_add_command(&aio_write_cmd);
> >      qemuio_add_command(&aio_flush_cmd);
> >      qemuio_add_command(&flush_cmd);
> > +    qemuio_add_command(&zone_report_cmd);
> > +    qemuio_add_command(&zone_open_cmd);
> > +    qemuio_add_command(&zone_close_cmd);
> > +    qemuio_add_command(&zone_finish_cmd);
> > +    qemuio_add_command(&zone_reset_cmd);
> >      qemuio_add_command(&truncate_cmd);
> >      qemuio_add_command(&length_cmd);
> >      qemuio_add_command(&info_cmd);
>
>
> --
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-08-26 12:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-16  6:25 [PATCH v7 0/8] Add support for zoned device Sam Li
2022-08-16  6:25 ` [PATCH v7 1/8] include: add zoned device structs Sam Li
2022-08-16 17:27   ` Damien Le Moal
2022-08-16  6:25 ` [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
2022-08-16 16:11   ` Sam Li
2022-08-16 17:32   ` Damien Le Moal
2022-08-22 23:05   ` Stefan Hajnoczi
2022-08-23  4:31     ` Sam Li
2022-08-16  6:25 ` [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute Sam Li
2022-08-16 16:13   ` Sam Li
2022-08-16 17:35   ` Damien Le Moal
2022-08-16 17:53     ` Sam Li
2022-08-16 17:55       ` Damien Le Moal
2022-08-16  6:25 ` [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
2022-08-16 17:50   ` Damien Le Moal
2022-08-26 12:20     ` Sam Li
2022-08-23  0:49   ` Stefan Hajnoczi
2022-08-23  4:12     ` Sam Li
2022-08-23 12:40       ` Stefan Hajnoczi
2022-08-24 23:46       ` Damien Le Moal
2022-08-24 23:53         ` Damien Le Moal
2022-08-16  6:25 ` [PATCH v7 5/8] raw-format: add zone operations to pass through requests Sam Li
2022-08-16  6:25 ` [PATCH v7 6/8] config: add check to block layer Sam Li
2022-08-23  0:54   ` Stefan Hajnoczi
2022-08-23  4:25     ` Sam Li
2022-08-23 12:36       ` Stefan Hajnoczi
2022-08-16  6:25 ` [PATCH v7 7/8] qemu-iotests: test new zone operations Sam Li
2022-08-16  6:25 ` [PATCH v7 8/8] docs/zoned-storage: add zoned device documentation Sam Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).