* [PATCH v16 1/8] include: add zoned device structs
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
@ 2023-03-10 10:23 ` Sam Li
2023-03-13 23:38 ` Dmitry Fomichev
2023-03-10 10:23 ` [PATCH v16 2/8] file-posix: introduce helper functions for sysfs attributes Sam Li
` (7 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:23 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/include/block/block-common.h b/include/block/block-common.h
index b5122ef8ab..1576fcf2ed 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -75,6 +75,49 @@ typedef struct BlockDriver BlockDriver;
typedef struct BdrvChild BdrvChild;
typedef struct BdrvChildClass BdrvChildClass;
+typedef enum BlockZoneOp {
+ BLK_ZO_OPEN,
+ BLK_ZO_CLOSE,
+ BLK_ZO_FINISH,
+ BLK_ZO_RESET,
+} BlockZoneOp;
+
+typedef enum BlockZoneModel {
+ BLK_Z_NONE = 0x0, /* Regular block device */
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
+} BlockZoneModel;
+
+typedef enum BlockZoneState {
+ BLK_ZS_NOT_WP = 0x0,
+ BLK_ZS_EMPTY = 0x1,
+ BLK_ZS_IOPEN = 0x2,
+ BLK_ZS_EOPEN = 0x3,
+ BLK_ZS_CLOSED = 0x4,
+ BLK_ZS_RDONLY = 0xD,
+ BLK_ZS_FULL = 0xE,
+ BLK_ZS_OFFLINE = 0xF,
+} BlockZoneState;
+
+typedef enum BlockZoneType {
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
+} BlockZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provides information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+ uint64_t start;
+ uint64_t length;
+ uint64_t cap;
+ uint64_t wp;
+ BlockZoneType type;
+ BlockZoneState state;
+} BlockZoneDescriptor;
+
typedef struct BlockDriverInfo {
/* in bytes, 0 if irrelevant */
int cluster_size;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v16 1/8] include: add zoned device structs
2023-03-10 10:23 ` [PATCH v16 1/8] include: add zoned device structs Sam Li
@ 2023-03-13 23:38 ` Dmitry Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Fomichev @ 2023-03-13 23:38 UTC (permalink / raw)
To: faithilikerun@gmail.com, qemu-devel@nongnu.org
Cc: hreitz@redhat.com, hare@suse.de, philmd@linaro.org,
stefanha@redhat.com, fam@euphon.net, qemu-block@nongnu.org,
marcandre.lureau@redhat.com, kwolf@redhat.com, thuth@redhat.com,
pbonzini@redhat.com, berrange@redhat.com,
damien.lemoal@opensource.wdc.com
On Fri, 2023-03-10 at 18:23 +0800, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
Looks good to me.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
> include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index b5122ef8ab..1576fcf2ed 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -75,6 +75,49 @@ typedef struct BlockDriver BlockDriver;
> typedef struct BdrvChild BdrvChild;
> typedef struct BdrvChildClass BdrvChildClass;
>
> +typedef enum BlockZoneOp {
> + BLK_ZO_OPEN,
> + BLK_ZO_CLOSE,
> + BLK_ZO_FINISH,
> + BLK_ZO_RESET,
> +} BlockZoneOp;
> +
> +typedef enum BlockZoneModel {
> + BLK_Z_NONE = 0x0, /* Regular block device */
> + BLK_Z_HM = 0x1, /* Host-managed zoned block device */
> + BLK_Z_HA = 0x2, /* Host-aware zoned block device */
> +} BlockZoneModel;
> +
> +typedef enum BlockZoneState {
> + BLK_ZS_NOT_WP = 0x0,
> + BLK_ZS_EMPTY = 0x1,
> + BLK_ZS_IOPEN = 0x2,
> + BLK_ZS_EOPEN = 0x3,
> + BLK_ZS_CLOSED = 0x4,
> + BLK_ZS_RDONLY = 0xD,
> + BLK_ZS_FULL = 0xE,
> + BLK_ZS_OFFLINE = 0xF,
> +} BlockZoneState;
> +
> +typedef enum BlockZoneType {
> + BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
> + BLK_ZT_SWR = 0x2, /* Sequential writes required */
> + BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> +} BlockZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provides information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> + uint64_t start;
> + uint64_t length;
> + uint64_t cap;
> + uint64_t wp;
> + BlockZoneType type;
> + BlockZoneState state;
> +} BlockZoneDescriptor;
> +
> typedef struct BlockDriverInfo {
> /* in bytes, 0 if irrelevant */
> int cluster_size;
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v16 2/8] file-posix: introduce helper functions for sysfs attributes
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
2023-03-10 10:23 ` [PATCH v16 1/8] include: add zoned device structs Sam Li
@ 2023-03-10 10:23 ` Sam Li
2023-03-10 10:23 ` [PATCH v16 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:23 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.
Use get_sysfs_long_val() to get the long value of zoned device
information.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
block/file-posix.c | 122 ++++++++++++++++++++++---------
include/block/block_int-common.h | 3 +
2 files changed, 91 insertions(+), 34 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 5760cf22d1..496edc644c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1202,64 +1202,112 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
#endif
}
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Get a sysfs attribute value as character string.
+ */
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
+ char **val) {
#ifdef CONFIG_LINUX
- char buf[32];
- const char *end;
- char *sysfspath = NULL;
+ g_autofree char *sysfspath = NULL;
int ret;
- int sysfd = -1;
- long max_segments;
+ size_t len;
- if (S_ISCHR(st->st_mode)) {
- if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
- return ret;
- }
+ if (!S_ISBLK(st->st_mode)) {
return -ENOTSUP;
}
- if (!S_ISBLK(st->st_mode)) {
- return -ENOTSUP;
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+ major(st->st_rdev), minor(st->st_rdev),
+ attribute);
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
+ if (ret == -1) {
+ return -ENOENT;
}
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
- major(st->st_rdev), minor(st->st_rdev));
- sysfd = open(sysfspath, O_RDONLY);
- if (sysfd == -1) {
- ret = -errno;
- goto out;
+ /* The file is ended with '\n' */
+ char *p;
+ p = *val;
+ if (*(p + len - 1) == '\n') {
+ *(p + len - 1) = '\0';
}
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
+ return ret;
+#else
+ return -ENOTSUP;
+#endif
+}
+
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
+{
+ g_autofree char *val = NULL;
+ int ret;
+
+ ret = get_sysfs_str_val(st, "zoned", &val);
if (ret < 0) {
- ret = -errno;
- goto out;
- } else if (ret == 0) {
- ret = -EIO;
- goto out;
+ return ret;
}
- buf[ret] = 0;
- /* The file is ended with '\n', pass 'end' to accept that. */
- ret = qemu_strtol(buf, &end, 10, &max_segments);
- if (ret == 0 && end && *end == '\n') {
- ret = max_segments;
+
+ if (strcmp(val, "host-managed") == 0) {
+ *zoned = BLK_Z_HM;
+ } else if (strcmp(val, "host-aware") == 0) {
+ *zoned = BLK_Z_HA;
+ } else if (strcmp(val, "none") == 0) {
+ *zoned = BLK_Z_NONE;
+ } else {
+ return -ENOTSUP;
+ }
+ return 0;
+}
+
+/*
+ * Get a sysfs attribute value as a long integer.
+ */
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
+{
+#ifdef CONFIG_LINUX
+ g_autofree char *str = NULL;
+ const char *end;
+ long val;
+ int ret;
+
+ ret = get_sysfs_str_val(st, attribute, &str);
+ if (ret < 0) {
+ return ret;
}
-out:
- if (sysfd != -1) {
- close(sysfd);
+ /* The file is ended with '\n', pass 'end' to accept that. */
+ ret = qemu_strtol(str, &end, 10, &val);
+ if (ret == 0 && end && *end == '\0') {
+ ret = val;
}
- g_free(sysfspath);
return ret;
#else
return -ENOTSUP;
#endif
}
+static int hdev_get_max_segments(int fd, struct stat *st)
+{
+#ifdef CONFIG_LINUX
+ int ret;
+
+ if (S_ISCHR(st->st_mode)) {
+ if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
+ return ret;
+ }
+ return -ENOTSUP;
+ }
+ return get_sysfs_long_val(st, "max_segments");
+#else
+ return -ENOTSUP;
+#endif
+}
+
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
{
BDRVRawState *s = bs->opaque;
struct stat st;
+ int ret;
+ BlockZoneModel zoned;
s->needs_alignment = raw_needs_alignment(bs);
raw_probe_alignment(bs, s->fd, errp);
@@ -1297,6 +1345,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
bs->bl.max_hw_iov = ret;
}
}
+
+ ret = get_sysfs_zoned_model(&st, &zoned);
+ if (ret < 0) {
+ zoned = BLK_Z_NONE;
+ }
+ bs->bl.zoned = zoned;
}
static int check_for_dasd(int fd)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index d419017328..6d0f470626 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -855,6 +855,9 @@ typedef struct BlockLimits {
/* maximum number of iovec elements */
int max_iov;
+
+ /* device zone model */
+ BlockZoneModel zoned;
} BlockLimits;
typedef struct BdrvOpBlocker BdrvOpBlocker;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v16 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
2023-03-10 10:23 ` [PATCH v16 1/8] include: add zoned device structs Sam Li
2023-03-10 10:23 ` [PATCH v16 2/8] file-posix: introduce helper functions for sysfs attributes Sam Li
@ 2023-03-10 10:23 ` Sam Li
2023-03-13 23:39 ` Dmitry Fomichev
2023-03-10 10:23 ` [PATCH v16 4/8] raw-format: add zone operations to pass through requests Sam Li
` (5 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:23 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Add zoned device option to host_device BlockDriver. It will be presented only
for zoned host block devices. By adding zone management operations to the
host_block_device BlockDriver, users can use the new block layer APIs
including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).
Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).
For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
---
block/block-backend.c | 133 +++++++++++++
block/file-posix.c | 309 +++++++++++++++++++++++++++++-
block/io.c | 41 ++++
include/block/block-io.h | 9 +
include/block/block_int-common.h | 21 ++
include/block/raw-aio.h | 6 +-
include/sysemu/block-backend-io.h | 18 ++
meson.build | 4 +
qemu-io-cmds.c | 149 ++++++++++++++
9 files changed, 687 insertions(+), 3 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 278b04ce69..f70b08e3f6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1806,6 +1806,139 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
return ret;
}
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
+{
+ BlkAioEmAIOCB *acb = opaque;
+ BlkRwCo *rwco = &acb->rwco;
+
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
+ (unsigned int*)acb->bytes,rwco->iobuf);
+ blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones,
+ BlockCompletionFunc *cb, void *opaque)
+{
+ BlkAioEmAIOCB *acb;
+ Coroutine *co;
+ IO_CODE();
+
+ blk_inc_in_flight(blk);
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
+ acb->rwco = (BlkRwCo) {
+ .blk = blk,
+ .offset = offset,
+ .iobuf = zones,
+ .ret = NOT_DONE,
+ };
+ acb->bytes = (int64_t)nr_zones,
+ acb->has_returned = false;
+
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
+ aio_co_enter(blk_get_aio_context(blk), co);
+
+ acb->has_returned = true;
+ if (acb->rwco.ret != NOT_DONE) {
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+ }
+
+ return &acb->common;
+}
+
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
+{
+ BlkAioEmAIOCB *acb = opaque;
+ BlkRwCo *rwco = &acb->rwco;
+
+ rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf,
+ rwco->offset, acb->bytes);
+ blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+ int64_t offset, int64_t len,
+ BlockCompletionFunc *cb, void *opaque) {
+ BlkAioEmAIOCB *acb;
+ Coroutine *co;
+ IO_CODE();
+
+ blk_inc_in_flight(blk);
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
+ acb->rwco = (BlkRwCo) {
+ .blk = blk,
+ .offset = offset,
+ .iobuf = (void *)op,
+ .ret = NOT_DONE,
+ };
+ acb->bytes = len;
+ acb->has_returned = false;
+
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
+ aio_co_enter(blk_get_aio_context(blk), co);
+
+ acb->has_returned = true;
+ if (acb->rwco.ret != NOT_DONE) {
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+ }
+
+ return &acb->common;
+}
+
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones)
+{
+ int ret;
+ IO_CODE();
+
+ blk_inc_in_flight(blk); /* increase before waiting */
+ blk_wait_while_drained(blk);
+ if (!blk_is_available(blk)) {
+ blk_dec_in_flight(blk);
+ return -ENOMEDIUM;
+ }
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
+ blk_dec_in_flight(blk);
+ return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * op is the zone operation;
+ * offset is the byte offset from the start of the zoned device;
+ * len is the maximum number of bytes the command should operate on. It
+ * should be aligned with the device zone size.
+ */
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+ int ret;
+ IO_CODE();
+
+ blk_inc_in_flight(blk);
+ blk_wait_while_drained(blk);
+
+ ret = blk_check_byte_request(blk, offset, len);
+ if (ret < 0) {
+ blk_dec_in_flight(blk);
+ return ret;
+ }
+
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
+ blk_dec_in_flight(blk);
+ return ret;
+}
+
void blk_drain(BlockBackend *blk)
{
BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 496edc644c..df9b9f1e30 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -68,6 +68,9 @@
#include <sys/param.h>
#include <sys/syscall.h>
#include <sys/vfs.h>
+#if defined(CONFIG_BLKZONED)
+#include <linux/blkzoned.h>
+#endif
#include <linux/cdrom.h>
#include <linux/fd.h>
#include <linux/fs.h>
@@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
PreallocMode prealloc;
Error **errp;
} truncate;
+ struct {
+ unsigned int *nr_zones;
+ BlockZoneDescriptor *zones;
+ } zone_report;
+ struct {
+ unsigned long op;
+ } zone_mgmt;
};
} RawPosixAIOData;
@@ -1351,6 +1361,50 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
zoned = BLK_Z_NONE;
}
bs->bl.zoned = zoned;
+ if (zoned != BLK_Z_NONE) {
+ /*
+ * The zoned device must at least have zone size and nr_zones fields.
+ */
+ ret = get_sysfs_long_val(&st, "chunk_sectors");
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
+ "sysfs attribute");
+ goto out;
+ } else if (!ret) {
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
+ goto out;
+ }
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
+
+ ret = get_sysfs_long_val(&st, "nr_zones");
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
+ "sysfs attribute");
+ goto out;
+ } else if (!ret) {
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
+ goto out;
+ }
+ bs->bl.nr_zones = ret;
+
+ ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
+ if (ret > 0) {
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
+ }
+
+ ret = get_sysfs_long_val(&st, "max_open_zones");
+ if (ret >= 0) {
+ bs->bl.max_open_zones = ret;
+ }
+
+ ret = get_sysfs_long_val(&st, "max_active_zones");
+ if (ret >= 0) {
+ bs->bl.max_active_zones = ret;
+ }
+ return;
+ }
+out:
+ bs->bl.zoned = BLK_Z_NONE;
}
static int check_for_dasd(int fd)
@@ -1374,9 +1428,12 @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
BDRVRawState *s = bs->opaque;
int ret;
- /* If DASD, get blocksizes */
+ /* If DASD or zoned devices, get blocksizes */
if (check_for_dasd(s->fd) < 0) {
- return -ENOTSUP;
+ /* zoned devices are not DASD */
+ if (bs->bl.zoned == BLK_Z_NONE) {
+ return -ENOTSUP;
+ }
}
ret = probe_logical_blocksize(s->fd, &bsz->log);
if (ret < 0) {
@@ -1844,6 +1901,146 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
}
#endif
+/*
+ * parse_zone - Fill a zone descriptor
+ */
+#if defined(CONFIG_BLKZONED)
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
+ const struct blk_zone *blkz) {
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
+
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
+#else
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
+#endif
+
+ switch (blkz->type) {
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
+ zone->type = BLK_ZT_SWR;
+ break;
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
+ zone->type = BLK_ZT_SWP;
+ break;
+ case BLK_ZONE_TYPE_CONVENTIONAL:
+ zone->type = BLK_ZT_CONV;
+ break;
+ default:
+ error_report("Unsupported zone type: 0x%x", blkz->type);
+ return -ENOTSUP;
+ }
+
+ switch (blkz->cond) {
+ case BLK_ZONE_COND_NOT_WP:
+ zone->state = BLK_ZS_NOT_WP;
+ break;
+ case BLK_ZONE_COND_EMPTY:
+ zone->state = BLK_ZS_EMPTY;
+ break;
+ case BLK_ZONE_COND_IMP_OPEN:
+ zone->state = BLK_ZS_IOPEN;
+ break;
+ case BLK_ZONE_COND_EXP_OPEN:
+ zone->state = BLK_ZS_EOPEN;
+ break;
+ case BLK_ZONE_COND_CLOSED:
+ zone->state = BLK_ZS_CLOSED;
+ break;
+ case BLK_ZONE_COND_READONLY:
+ zone->state = BLK_ZS_RDONLY;
+ break;
+ case BLK_ZONE_COND_FULL:
+ zone->state = BLK_ZS_FULL;
+ break;
+ case BLK_ZONE_COND_OFFLINE:
+ zone->state = BLK_ZS_OFFLINE;
+ break;
+ default:
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
+ return -ENOTSUP;
+ }
+ return 0;
+}
+#endif
+
+#if defined(CONFIG_BLKZONED)
+static int handle_aiocb_zone_report(void *opaque)
+{
+ RawPosixAIOData *aiocb = opaque;
+ int fd = aiocb->aio_fildes;
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
+ /* zoned block devices use 512-byte sectors */
+ uint64_t sector = aiocb->aio_offset / 512;
+
+ struct blk_zone *blkz;
+ size_t rep_size;
+ unsigned int nrz;
+ int ret, n = 0, i = 0;
+
+ nrz = *nr_zones;
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+ g_autofree struct blk_zone_report *rep = NULL;
+ rep = g_malloc(rep_size);
+
+ blkz = (struct blk_zone *)(rep + 1);
+ while (n < nrz) {
+ memset(rep, 0, rep_size);
+ rep->sector = sector;
+ rep->nr_zones = nrz - n;
+
+ do {
+ ret = ioctl(fd, BLKREPORTZONE, rep);
+ } while (ret != 0 && errno == EINTR);
+ if (ret != 0) {
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+ fd, sector, errno);
+ return -errno;
+ }
+
+ if (!rep->nr_zones) {
+ break;
+ }
+
+ for (i = 0; i < rep->nr_zones; i++, n++) {
+ ret = parse_zone(&zones[n], &blkz[i]);
+ if (ret != 0) {
+ return ret;
+ }
+
+ /* The next report should start after the last zone reported */
+ sector = blkz[i].start + blkz[i].len;
+ }
+ }
+
+ *nr_zones = n;
+ return 0;
+}
+#endif
+
+#if defined(CONFIG_BLKZONED)
+static int handle_aiocb_zone_mgmt(void *opaque)
+{
+ RawPosixAIOData *aiocb = opaque;
+ int fd = aiocb->aio_fildes;
+ uint64_t sector = aiocb->aio_offset / 512;
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
+ struct blk_zone_range range;
+ int ret;
+
+ /* Execute the operation */
+ range.sector = sector;
+ range.nr_sectors = nr_sectors;
+ do {
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
+ } while (ret != 0 && errno == EINTR);
+
+ return ret;
+}
+#endif
+
static int handle_aiocb_copy_range(void *opaque)
{
RawPosixAIOData *aiocb = opaque;
@@ -3034,6 +3231,107 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
}
}
+/*
+ * zone report - Get a zone block device's information in the form
+ * of an array of zone descriptors.
+ * zones is an array of zone descriptors to hold zone information on reply;
+ * offset can be any byte within the entire size of the device;
+ * nr_zones is the maxium number of sectors the command should operate on.
+ */
+#if defined(CONFIG_BLKZONED)
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones) {
+ BDRVRawState *s = bs->opaque;
+ RawPosixAIOData acb;
+
+ acb = (RawPosixAIOData) {
+ .bs = bs,
+ .aio_fildes = s->fd,
+ .aio_type = QEMU_AIO_ZONE_REPORT,
+ .aio_offset = offset,
+ .zone_report = {
+ .nr_zones = nr_zones,
+ .zones = zones,
+ },
+ };
+
+ return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
+}
+#endif
+
+/*
+ * zone management operations - Execute an operation on a zone
+ */
+#if defined(CONFIG_BLKZONED)
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len) {
+ BDRVRawState *s = bs->opaque;
+ RawPosixAIOData acb;
+ int64_t zone_size, zone_size_mask;
+ const char *op_name;
+ unsigned long zo;
+ int ret;
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
+
+ zone_size = bs->bl.zone_size;
+ zone_size_mask = zone_size - 1;
+ if (offset & zone_size_mask) {
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
+ "%" PRId64 "", offset / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ if (((offset + len) < capacity && len & zone_size_mask) ||
+ offset + len > capacity) {
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
+ " %" PRId64 "", len / 512, zone_size / 512);
+ return -EINVAL;
+ }
+
+ switch (op) {
+ case BLK_ZO_OPEN:
+ op_name = "BLKOPENZONE";
+ zo = BLKOPENZONE;
+ break;
+ case BLK_ZO_CLOSE:
+ op_name = "BLKCLOSEZONE";
+ zo = BLKCLOSEZONE;
+ break;
+ case BLK_ZO_FINISH:
+ op_name = "BLKFINISHZONE";
+ zo = BLKFINISHZONE;
+ break;
+ case BLK_ZO_RESET:
+ op_name = "BLKRESETZONE";
+ zo = BLKRESETZONE;
+ break;
+ default:
+ error_report("Unsupported zone op: 0x%x", op);
+ return -ENOTSUP;
+ }
+
+ acb = (RawPosixAIOData) {
+ .bs = bs,
+ .aio_fildes = s->fd,
+ .aio_type = QEMU_AIO_ZONE_MGMT,
+ .aio_offset = offset,
+ .aio_nbytes = len,
+ .zone_mgmt = {
+ .op = zo,
+ },
+ };
+
+ ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
+ if (ret != 0) {
+ ret = -errno;
+ error_report("ioctl %s failed %d", op_name, ret);
+ }
+
+ return ret;
+}
+#endif
+
static coroutine_fn int
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
bool blkdev)
@@ -3789,6 +4087,13 @@ static BlockDriver bdrv_host_device = {
#ifdef __linux__
.bdrv_co_ioctl = hdev_co_ioctl,
#endif
+
+ /* zoned device */
+#if defined(CONFIG_BLKZONED)
+ /* zone management operations */
+ .bdrv_co_zone_report = raw_co_zone_report,
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
+#endif
};
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
diff --git a/block/io.c b/block/io.c
index 8974d46941..5dbf1e50f2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3111,6 +3111,47 @@ out:
return co.ret;
}
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones)
+{
+ BlockDriver *drv = bs->drv;
+ CoroutineIOCompletion co = {
+ .coroutine = qemu_coroutine_self(),
+ };
+ IO_CODE();
+
+ bdrv_inc_in_flight(bs);
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
+ co.ret = -ENOTSUP;
+ goto out;
+ }
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
+out:
+ bdrv_dec_in_flight(bs);
+ return co.ret;
+}
+
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+ BlockDriver *drv = bs->drv;
+ CoroutineIOCompletion co = {
+ .coroutine = qemu_coroutine_self(),
+ };
+ IO_CODE();
+
+ bdrv_inc_in_flight(bs);
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
+ co.ret = -ENOTSUP;
+ goto out;
+ }
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
+out:
+ bdrv_dec_in_flight(bs);
+ return co.ret;
+}
+
void *qemu_blockalign(BlockDriverState *bs, size_t size)
{
IO_CODE();
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 5da99d4d60..19d1fad9cf 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -112,6 +112,15 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
int64_t bytes);
+/* Report zone information of zone block device. */
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
+ int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones);
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
+ BlockZoneOp op,
+ int64_t offset, int64_t len);
+
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
int64_t bytes, int64_t *pnum, int64_t *map,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 6d0f470626..a3efb385e0 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -714,6 +714,12 @@ struct BlockDriver {
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
+ int64_t offset, unsigned int *nr_zones,
+ BlockZoneDescriptor *zones);
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len);
+
/* removable device specific */
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
BlockDriverState *bs);
@@ -858,6 +864,21 @@ typedef struct BlockLimits {
/* device zone model */
BlockZoneModel zoned;
+
+ /* zone size expressed in bytes */
+ uint32_t zone_size;
+
+ /* total number of zones */
+ uint32_t nr_zones;
+
+ /* maximum sectors of a zone append write operation */
+ int64_t max_append_sectors;
+
+ /* maximum number of open zones */
+ int64_t max_open_zones;
+
+ /* maximum number of active zones */
+ int64_t max_active_zones;
} BlockLimits;
typedef struct BdrvOpBlocker BdrvOpBlocker;
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index f8cda9df91..eda6a7a253 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -28,6 +28,8 @@
#define QEMU_AIO_WRITE_ZEROES 0x0020
#define QEMU_AIO_COPY_RANGE 0x0040
#define QEMU_AIO_TRUNCATE 0x0080
+#define QEMU_AIO_ZONE_REPORT 0x0100
+#define QEMU_AIO_ZONE_MGMT 0x0200
#define QEMU_AIO_TYPE_MASK \
(QEMU_AIO_READ | \
QEMU_AIO_WRITE | \
@@ -36,7 +38,9 @@
QEMU_AIO_DISCARD | \
QEMU_AIO_WRITE_ZEROES | \
QEMU_AIO_COPY_RANGE | \
- QEMU_AIO_TRUNCATE)
+ QEMU_AIO_TRUNCATE | \
+ QEMU_AIO_ZONE_REPORT | \
+ QEMU_AIO_ZONE_MGMT)
/* AIO flags */
#define QEMU_AIO_MISALIGNED 0x1000
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index 40ab178719..f575ab5b6b 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -46,6 +46,13 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
BlockCompletionFunc *cb, void *opaque);
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones,
+ BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+ int64_t offset, int64_t len,
+ BlockCompletionFunc *cb, void *opaque);
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
BlockCompletionFunc *cb, void *opaque);
void blk_aio_cancel_async(BlockAIOCB *acb);
@@ -184,6 +191,17 @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
int64_t bytes, BdrvRequestFlags flags);
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones);
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones);
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+ int64_t offset, int64_t len);
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+ int64_t offset, int64_t len);
+
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
int64_t bytes);
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
diff --git a/meson.build b/meson.build
index 6bcab8bf0d..2985135802 100644
--- a/meson.build
+++ b/meson.build
@@ -1962,6 +1962,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed())
# has_header
config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
@@ -2048,6 +2049,9 @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
cc.has_member('struct stat', 'st_atim',
prefix: '#include <sys/stat.h>'))
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
+ cc.has_member('struct blk_zone', 'capacity',
+ prefix: '#include <linux/blkzoned.h>'))
# has_type
config_host_data.set('CONFIG_IOVEC',
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e7a02f5b99..f35ea627d7 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1730,6 +1730,150 @@ static const cmdinfo_t flush_cmd = {
.oneline = "flush all in-core file state to disk",
};
+static inline int64_t tosector(int64_t bytes)
+{
+ return bytes >> BDRV_SECTOR_BITS;
+}
+
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
+{
+ int ret;
+ int64_t offset;
+ unsigned int nr_zones;
+
+ ++optind;
+ offset = cvtnum(argv[optind]);
+ ++optind;
+ nr_zones = cvtnum(argv[optind]);
+
+ g_autofree BlockZoneDescriptor *zones = NULL;
+ zones = g_new(BlockZoneDescriptor, nr_zones);
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
+ if (ret < 0) {
+ printf("zone report failed: %s\n", strerror(-ret));
+ } else {
+ for (int i = 0; i < nr_zones; ++i) {
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
+ "zcond:%u, [type: %u]\n",
+ tosector(zones[i].start), tosector(zones[i].length),
+ tosector(zones[i].cap), tosector(zones[i].wp),
+ zones[i].state, zones[i].type);
+ }
+ }
+ return ret;
+}
+
+static const cmdinfo_t zone_report_cmd = {
+ .name = "zone_report",
+ .altname = "zrp",
+ .cfunc = zone_report_f,
+ .argmin = 2,
+ .argmax = 2,
+ .args = "offset number",
+ .oneline = "report zone information",
+};
+
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
+{
+ int ret;
+ int64_t offset, len;
+ ++optind;
+ offset = cvtnum(argv[optind]);
+ ++optind;
+ len = cvtnum(argv[optind]);
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
+ if (ret < 0) {
+ printf("zone open failed: %s\n", strerror(-ret));
+ }
+ return ret;
+}
+
+static const cmdinfo_t zone_open_cmd = {
+ .name = "zone_open",
+ .altname = "zo",
+ .cfunc = zone_open_f,
+ .argmin = 2,
+ .argmax = 2,
+ .args = "offset len",
+ .oneline = "explicit open a range of zones in zone block device",
+};
+
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
+{
+ int ret;
+ int64_t offset, len;
+ ++optind;
+ offset = cvtnum(argv[optind]);
+ ++optind;
+ len = cvtnum(argv[optind]);
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
+ if (ret < 0) {
+ printf("zone close failed: %s\n", strerror(-ret));
+ }
+ return ret;
+}
+
+static const cmdinfo_t zone_close_cmd = {
+ .name = "zone_close",
+ .altname = "zc",
+ .cfunc = zone_close_f,
+ .argmin = 2,
+ .argmax = 2,
+ .args = "offset len",
+ .oneline = "close a range of zones in zone block device",
+};
+
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
+{
+ int ret;
+ int64_t offset, len;
+ ++optind;
+ offset = cvtnum(argv[optind]);
+ ++optind;
+ len = cvtnum(argv[optind]);
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
+ if (ret < 0) {
+ printf("zone finish failed: %s\n", strerror(-ret));
+ }
+ return ret;
+}
+
+static const cmdinfo_t zone_finish_cmd = {
+ .name = "zone_finish",
+ .altname = "zf",
+ .cfunc = zone_finish_f,
+ .argmin = 2,
+ .argmax = 2,
+ .args = "offset len",
+ .oneline = "finish a range of zones in zone block device",
+};
+
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
+{
+ int ret;
+ int64_t offset, len;
+ ++optind;
+ offset = cvtnum(argv[optind]);
+ ++optind;
+ len = cvtnum(argv[optind]);
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
+ if (ret < 0) {
+ printf("zone reset failed: %s\n", strerror(-ret));
+ }
+ return ret;
+}
+
+static const cmdinfo_t zone_reset_cmd = {
+ .name = "zone_reset",
+ .altname = "zrs",
+ .cfunc = zone_reset_f,
+ .argmin = 2,
+ .argmax = 2,
+ .args = "offset len",
+ .oneline = "reset a zone write pointer in zone block device",
+};
+
static int truncate_f(BlockBackend *blk, int argc, char **argv);
static const cmdinfo_t truncate_cmd = {
.name = "truncate",
@@ -2523,6 +2667,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
qemuio_add_command(&aio_write_cmd);
qemuio_add_command(&aio_flush_cmd);
qemuio_add_command(&flush_cmd);
+ qemuio_add_command(&zone_report_cmd);
+ qemuio_add_command(&zone_open_cmd);
+ qemuio_add_command(&zone_close_cmd);
+ qemuio_add_command(&zone_finish_cmd);
+ qemuio_add_command(&zone_reset_cmd);
qemuio_add_command(&truncate_cmd);
qemuio_add_command(&length_cmd);
qemuio_add_command(&info_cmd);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v16 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
2023-03-10 10:23 ` [PATCH v16 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2023-03-13 23:39 ` Dmitry Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Fomichev @ 2023-03-13 23:39 UTC (permalink / raw)
To: faithilikerun@gmail.com, qemu-devel@nongnu.org
Cc: hreitz@redhat.com, hare@suse.de, philmd@linaro.org,
stefanha@redhat.com, fam@euphon.net, qemu-block@nongnu.org,
marcandre.lureau@redhat.com, kwolf@redhat.com, thuth@redhat.com,
pbonzini@redhat.com, berrange@redhat.com,
damien.lemoal@opensource.wdc.com
On Fri, 2023-03-10 at 18:23 +0800, Sam Li wrote:
> Add zoned device option to host_device BlockDriver. It will be presented only
> for zoned host block devices. By adding zone management operations to the
> host_block_device BlockDriver, users can use the new block layer APIs
> including Report Zone and four zone management operations
> (open, close, finish, reset, reset_all).
>
> Qemu-io uses the new APIs to perform zoned storage commands of the device:
> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
LGTM,
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> Acked-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/block-backend.c | 133 +++++++++++++
> block/file-posix.c | 309 +++++++++++++++++++++++++++++-
> block/io.c | 41 ++++
> include/block/block-io.h | 9 +
> include/block/block_int-common.h | 21 ++
> include/block/raw-aio.h | 6 +-
> include/sysemu/block-backend-io.h | 18 ++
> meson.build | 4 +
> qemu-io-cmds.c | 149 ++++++++++++++
> 9 files changed, 687 insertions(+), 3 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 278b04ce69..f70b08e3f6 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1806,6 +1806,139 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> return ret;
> }
>
> +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> +{
> + BlkAioEmAIOCB *acb = opaque;
> + BlkRwCo *rwco = &acb->rwco;
> +
> + rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> + (unsigned int*)acb->bytes,rwco->iobuf);
> + blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> + BlkAioEmAIOCB *acb;
> + Coroutine *co;
> + IO_CODE();
> +
> + blk_inc_in_flight(blk);
> + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> + acb->rwco = (BlkRwCo) {
> + .blk = blk,
> + .offset = offset,
> + .iobuf = zones,
> + .ret = NOT_DONE,
> + };
> + acb->bytes = (int64_t)nr_zones,
> + acb->has_returned = false;
> +
> + co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> + aio_co_enter(blk_get_aio_context(blk), co);
> +
> + acb->has_returned = true;
> + if (acb->rwco.ret != NOT_DONE) {
> + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> + }
> +
> + return &acb->common;
> +}
> +
> +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> +{
> + BlkAioEmAIOCB *acb = opaque;
> + BlkRwCo *rwco = &acb->rwco;
> +
> + rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf,
> + rwco->offset, acb->bytes);
> + blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> + int64_t offset, int64_t len,
> + BlockCompletionFunc *cb, void *opaque) {
> + BlkAioEmAIOCB *acb;
> + Coroutine *co;
> + IO_CODE();
> +
> + blk_inc_in_flight(blk);
> + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> + acb->rwco = (BlkRwCo) {
> + .blk = blk,
> + .offset = offset,
> + .iobuf = (void *)op,
> + .ret = NOT_DONE,
> + };
> + acb->bytes = len;
> + acb->has_returned = false;
> +
> + co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> + aio_co_enter(blk_get_aio_context(blk), co);
> +
> + acb->has_returned = true;
> + if (acb->rwco.ret != NOT_DONE) {
> + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> + }
> +
> + return &acb->common;
> +}
> +
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones)
> +{
> + int ret;
> + IO_CODE();
> +
> + blk_inc_in_flight(blk); /* increase before waiting */
> + blk_wait_while_drained(blk);
> + if (!blk_is_available(blk)) {
> + blk_dec_in_flight(blk);
> + return -ENOMEDIUM;
> + }
> + ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> + blk_dec_in_flight(blk);
> + return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * op is the zone operation;
> + * offset is the byte offset from the start of the zoned device;
> + * len is the maximum number of bytes the command should operate on. It
> + * should be aligned with the device zone size.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> + int64_t offset, int64_t len)
> +{
> + int ret;
> + IO_CODE();
> +
> + blk_inc_in_flight(blk);
> + blk_wait_while_drained(blk);
> +
> + ret = blk_check_byte_request(blk, offset, len);
> + if (ret < 0) {
> + blk_dec_in_flight(blk);
> + return ret;
> + }
> +
> + ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> + blk_dec_in_flight(blk);
> + return ret;
> +}
> +
> void blk_drain(BlockBackend *blk)
> {
> BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 496edc644c..df9b9f1e30 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -68,6 +68,9 @@
> #include <sys/param.h>
> #include <sys/syscall.h>
> #include <sys/vfs.h>
> +#if defined(CONFIG_BLKZONED)
> +#include <linux/blkzoned.h>
> +#endif
> #include <linux/cdrom.h>
> #include <linux/fd.h>
> #include <linux/fs.h>
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
> PreallocMode prealloc;
> Error **errp;
> } truncate;
> + struct {
> + unsigned int *nr_zones;
> + BlockZoneDescriptor *zones;
> + } zone_report;
> + struct {
> + unsigned long op;
> + } zone_mgmt;
> };
> } RawPosixAIOData;
>
> @@ -1351,6 +1361,50 @@ static void raw_refresh_limits(BlockDriverState *bs,
> Error **errp)
> zoned = BLK_Z_NONE;
> }
> bs->bl.zoned = zoned;
> + if (zoned != BLK_Z_NONE) {
> + /*
> + * The zoned device must at least have zone size and nr_zones fields.
> + */
> + ret = get_sysfs_long_val(&st, "chunk_sectors");
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
> + "sysfs attribute");
> + goto out;
> + } else if (!ret) {
> + error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
> + goto out;
> + }
> + bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
> +
> + ret = get_sysfs_long_val(&st, "nr_zones");
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Unable to read nr_zones "
> + "sysfs attribute");
> + goto out;
> + } else if (!ret) {
> + error_setg(errp, "Read 0 from nr_zones sysfs attribute");
> + goto out;
> + }
> + bs->bl.nr_zones = ret;
> +
> + ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
> + if (ret > 0) {
> + bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
> + }
> +
> + ret = get_sysfs_long_val(&st, "max_open_zones");
> + if (ret >= 0) {
> + bs->bl.max_open_zones = ret;
> + }
> +
> + ret = get_sysfs_long_val(&st, "max_active_zones");
> + if (ret >= 0) {
> + bs->bl.max_active_zones = ret;
> + }
> + return;
> + }
> +out:
> + bs->bl.zoned = BLK_Z_NONE;
> }
>
> static int check_for_dasd(int fd)
> @@ -1374,9 +1428,12 @@ static int hdev_probe_blocksizes(BlockDriverState *bs,
> BlockSizes *bsz)
> BDRVRawState *s = bs->opaque;
> int ret;
>
> - /* If DASD, get blocksizes */
> + /* If DASD or zoned devices, get blocksizes */
> if (check_for_dasd(s->fd) < 0) {
> - return -ENOTSUP;
> + /* zoned devices are not DASD */
> + if (bs->bl.zoned == BLK_Z_NONE) {
> + return -ENOTSUP;
> + }
> }
> ret = probe_logical_blocksize(s->fd, &bsz->log);
> if (ret < 0) {
> @@ -1844,6 +1901,146 @@ static off_t copy_file_range(int in_fd, off_t *in_off,
> int out_fd,
> }
> #endif
>
> +/*
> + * parse_zone - Fill a zone descriptor
> + */
> +#if defined(CONFIG_BLKZONED)
> +static inline int parse_zone(struct BlockZoneDescriptor *zone,
> + const struct blk_zone *blkz) {
> + zone->start = blkz->start << BDRV_SECTOR_BITS;
> + zone->length = blkz->len << BDRV_SECTOR_BITS;
> + zone->wp = blkz->wp << BDRV_SECTOR_BITS;
> +
> +#ifdef HAVE_BLK_ZONE_REP_CAPACITY
> + zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
> +#else
> + zone->cap = blkz->len << BDRV_SECTOR_BITS;
> +#endif
> +
> + switch (blkz->type) {
> + case BLK_ZONE_TYPE_SEQWRITE_REQ:
> + zone->type = BLK_ZT_SWR;
> + break;
> + case BLK_ZONE_TYPE_SEQWRITE_PREF:
> + zone->type = BLK_ZT_SWP;
> + break;
> + case BLK_ZONE_TYPE_CONVENTIONAL:
> + zone->type = BLK_ZT_CONV;
> + break;
> + default:
> + error_report("Unsupported zone type: 0x%x", blkz->type);
> + return -ENOTSUP;
> + }
> +
> + switch (blkz->cond) {
> + case BLK_ZONE_COND_NOT_WP:
> + zone->state = BLK_ZS_NOT_WP;
> + break;
> + case BLK_ZONE_COND_EMPTY:
> + zone->state = BLK_ZS_EMPTY;
> + break;
> + case BLK_ZONE_COND_IMP_OPEN:
> + zone->state = BLK_ZS_IOPEN;
> + break;
> + case BLK_ZONE_COND_EXP_OPEN:
> + zone->state = BLK_ZS_EOPEN;
> + break;
> + case BLK_ZONE_COND_CLOSED:
> + zone->state = BLK_ZS_CLOSED;
> + break;
> + case BLK_ZONE_COND_READONLY:
> + zone->state = BLK_ZS_RDONLY;
> + break;
> + case BLK_ZONE_COND_FULL:
> + zone->state = BLK_ZS_FULL;
> + break;
> + case BLK_ZONE_COND_OFFLINE:
> + zone->state = BLK_ZS_OFFLINE;
> + break;
> + default:
> + error_report("Unsupported zone state: 0x%x", blkz->cond);
> + return -ENOTSUP;
> + }
> + return 0;
> +}
> +#endif
> +
> +#if defined(CONFIG_BLKZONED)
> +static int handle_aiocb_zone_report(void *opaque)
> +{
> + RawPosixAIOData *aiocb = opaque;
> + int fd = aiocb->aio_fildes;
> + unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> + BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> + /* zoned block devices use 512-byte sectors */
> + uint64_t sector = aiocb->aio_offset / 512;
> +
> + struct blk_zone *blkz;
> + size_t rep_size;
> + unsigned int nrz;
> + int ret, n = 0, i = 0;
> +
> + nrz = *nr_zones;
> + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> + g_autofree struct blk_zone_report *rep = NULL;
> + rep = g_malloc(rep_size);
> +
> + blkz = (struct blk_zone *)(rep + 1);
> + while (n < nrz) {
> + memset(rep, 0, rep_size);
> + rep->sector = sector;
> + rep->nr_zones = nrz - n;
> +
> + do {
> + ret = ioctl(fd, BLKREPORTZONE, rep);
> + } while (ret != 0 && errno == EINTR);
> + if (ret != 0) {
> + error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> + fd, sector, errno);
> + return -errno;
> + }
> +
> + if (!rep->nr_zones) {
> + break;
> + }
> +
> + for (i = 0; i < rep->nr_zones; i++, n++) {
> + ret = parse_zone(&zones[n], &blkz[i]);
> + if (ret != 0) {
> + return ret;
> + }
> +
> + /* The next report should start after the last zone reported */
> + sector = blkz[i].start + blkz[i].len;
> + }
> + }
> +
> + *nr_zones = n;
> + return 0;
> +}
> +#endif
> +
> +#if defined(CONFIG_BLKZONED)
> +static int handle_aiocb_zone_mgmt(void *opaque)
> +{
> + RawPosixAIOData *aiocb = opaque;
> + int fd = aiocb->aio_fildes;
> + uint64_t sector = aiocb->aio_offset / 512;
> + int64_t nr_sectors = aiocb->aio_nbytes / 512;
> + struct blk_zone_range range;
> + int ret;
> +
> + /* Execute the operation */
> + range.sector = sector;
> + range.nr_sectors = nr_sectors;
> + do {
> + ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
> + } while (ret != 0 && errno == EINTR);
> +
> + return ret;
> +}
> +#endif
> +
> static int handle_aiocb_copy_range(void *opaque)
> {
> RawPosixAIOData *aiocb = opaque;
> @@ -3034,6 +3231,107 @@ static void raw_account_discard(BDRVRawState *s,
> uint64_t nbytes, int ret)
> }
> }
>
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + * zones is an array of zone descriptors to hold zone information on reply;
> + * offset can be any byte within the entire size of the device;
> + * nr_zones is the maxium number of sectors the command should operate on.
> + */
> +#if defined(CONFIG_BLKZONED)
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t
> offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones) {
> + BDRVRawState *s = bs->opaque;
> + RawPosixAIOData acb;
> +
> + acb = (RawPosixAIOData) {
> + .bs = bs,
> + .aio_fildes = s->fd,
> + .aio_type = QEMU_AIO_ZONE_REPORT,
> + .aio_offset = offset,
> + .zone_report = {
> + .nr_zones = nr_zones,
> + .zones = zones,
> + },
> + };
> +
> + return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> +}
> +#endif
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +#if defined(CONFIG_BLKZONED)
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> + int64_t offset, int64_t len) {
> + BDRVRawState *s = bs->opaque;
> + RawPosixAIOData acb;
> + int64_t zone_size, zone_size_mask;
> + const char *op_name;
> + unsigned long zo;
> + int ret;
> + int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
> +
> + zone_size = bs->bl.zone_size;
> + zone_size_mask = zone_size - 1;
> + if (offset & zone_size_mask) {
> + error_report("sector offset %" PRId64 " is not aligned to zone size "
> + "%" PRId64 "", offset / 512, zone_size / 512);
> + return -EINVAL;
> + }
> +
> + if (((offset + len) < capacity && len & zone_size_mask) ||
> + offset + len > capacity) {
> + error_report("number of sectors %" PRId64 " is not aligned to zone
> size"
> + " %" PRId64 "", len / 512, zone_size / 512);
> + return -EINVAL;
> + }
> +
> + switch (op) {
> + case BLK_ZO_OPEN:
> + op_name = "BLKOPENZONE";
> + zo = BLKOPENZONE;
> + break;
> + case BLK_ZO_CLOSE:
> + op_name = "BLKCLOSEZONE";
> + zo = BLKCLOSEZONE;
> + break;
> + case BLK_ZO_FINISH:
> + op_name = "BLKFINISHZONE";
> + zo = BLKFINISHZONE;
> + break;
> + case BLK_ZO_RESET:
> + op_name = "BLKRESETZONE";
> + zo = BLKRESETZONE;
> + break;
> + default:
> + error_report("Unsupported zone op: 0x%x", op);
> + return -ENOTSUP;
> + }
> +
> + acb = (RawPosixAIOData) {
> + .bs = bs,
> + .aio_fildes = s->fd,
> + .aio_type = QEMU_AIO_ZONE_MGMT,
> + .aio_offset = offset,
> + .aio_nbytes = len,
> + .zone_mgmt = {
> + .op = zo,
> + },
> + };
> +
> + ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> + if (ret != 0) {
> + ret = -errno;
> + error_report("ioctl %s failed %d", op_name, ret);
> + }
> +
> + return ret;
> +}
> +#endif
> +
> static coroutine_fn int
> raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
> bool blkdev)
> @@ -3789,6 +4087,13 @@ static BlockDriver bdrv_host_device = {
> #ifdef __linux__
> .bdrv_co_ioctl = hdev_co_ioctl,
> #endif
> +
> + /* zoned device */
> +#if defined(CONFIG_BLKZONED)
> + /* zone management operations */
> + .bdrv_co_zone_report = raw_co_zone_report,
> + .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +#endif
> };
>
> #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> diff --git a/block/io.c b/block/io.c
> index 8974d46941..5dbf1e50f2 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3111,6 +3111,47 @@ out:
> return co.ret;
> }
>
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones)
> +{
> + BlockDriver *drv = bs->drv;
> + CoroutineIOCompletion co = {
> + .coroutine = qemu_coroutine_self(),
> + };
> + IO_CODE();
> +
> + bdrv_inc_in_flight(bs);
> + if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
> + co.ret = -ENOTSUP;
> + goto out;
> + }
> + co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +out:
> + bdrv_dec_in_flight(bs);
> + return co.ret;
> +}
> +
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> + int64_t offset, int64_t len)
> +{
> + BlockDriver *drv = bs->drv;
> + CoroutineIOCompletion co = {
> + .coroutine = qemu_coroutine_self(),
> + };
> + IO_CODE();
> +
> + bdrv_inc_in_flight(bs);
> + if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
> + co.ret = -ENOTSUP;
> + goto out;
> + }
> + co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +out:
> + bdrv_dec_in_flight(bs);
> + return co.ret;
> +}
> +
> void *qemu_blockalign(BlockDriverState *bs, size_t size)
> {
> IO_CODE();
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 5da99d4d60..19d1fad9cf 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -112,6 +112,15 @@ int coroutine_fn GRAPH_RDLOCK
> bdrv_co_flush(BlockDriverState *bs);
> int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t
> offset,
> int64_t bytes);
>
> +/* Report zone information of zone block device. */
> +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
> + int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones);
> +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
> + BlockZoneOp op,
> + int64_t offset, int64_t len);
> +
> bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> int64_t bytes, int64_t *pnum, int64_t *map,
> diff --git a/include/block/block_int-common.h b/include/block/block_int-
> common.h
> index 6d0f470626..a3efb385e0 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -714,6 +714,12 @@ struct BlockDriver {
> int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
> BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>
> + int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> + int64_t offset, unsigned int *nr_zones,
> + BlockZoneDescriptor *zones);
> + int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp
> op,
> + int64_t offset, int64_t len);
> +
> /* removable device specific */
> bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
> BlockDriverState *bs);
> @@ -858,6 +864,21 @@ typedef struct BlockLimits {
>
> /* device zone model */
> BlockZoneModel zoned;
> +
> + /* zone size expressed in bytes */
> + uint32_t zone_size;
> +
> + /* total number of zones */
> + uint32_t nr_zones;
> +
> + /* maximum sectors of a zone append write operation */
> + int64_t max_append_sectors;
> +
> + /* maximum number of open zones */
> + int64_t max_open_zones;
> +
> + /* maximum number of active zones */
> + int64_t max_active_zones;
> } BlockLimits;
>
> typedef struct BdrvOpBlocker BdrvOpBlocker;
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index f8cda9df91..eda6a7a253 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -28,6 +28,8 @@
> #define QEMU_AIO_WRITE_ZEROES 0x0020
> #define QEMU_AIO_COPY_RANGE 0x0040
> #define QEMU_AIO_TRUNCATE 0x0080
> +#define QEMU_AIO_ZONE_REPORT 0x0100
> +#define QEMU_AIO_ZONE_MGMT 0x0200
> #define QEMU_AIO_TYPE_MASK \
> (QEMU_AIO_READ | \
> QEMU_AIO_WRITE | \
> @@ -36,7 +38,9 @@
> QEMU_AIO_DISCARD | \
> QEMU_AIO_WRITE_ZEROES | \
> QEMU_AIO_COPY_RANGE | \
> - QEMU_AIO_TRUNCATE)
> + QEMU_AIO_TRUNCATE | \
> + QEMU_AIO_ZONE_REPORT | \
> + QEMU_AIO_ZONE_MGMT)
>
> /* AIO flags */
> #define QEMU_AIO_MISALIGNED 0x1000
> diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-
> io.h
> index 40ab178719..f575ab5b6b 100644
> --- a/include/sysemu/block-backend-io.h
> +++ b/include/sysemu/block-backend-io.h
> @@ -46,6 +46,13 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t
> offset,
> BlockCompletionFunc *cb, void *opaque);
> BlockAIOCB *blk_aio_flush(BlockBackend *blk,
> BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones,
> + BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> + int64_t offset, int64_t len,
> + BlockCompletionFunc *cb, void *opaque);
> BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
> BlockCompletionFunc *cb, void *opaque);
> void blk_aio_cancel_async(BlockAIOCB *acb);
> @@ -184,6 +191,17 @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk,
> int64_t offset,
> int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> int64_t bytes, BdrvRequestFlags flags);
>
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones);
> +int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
> + unsigned int *nr_zones,
> + BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> + int64_t offset, int64_t len);
> +int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> + int64_t offset, int64_t len);
> +
> int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
> int64_t bytes);
> int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
> diff --git a/meson.build b/meson.build
> index 6bcab8bf0d..2985135802 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1962,6 +1962,7 @@ config_host_data.set('CONFIG_REPLICATION',
> get_option('replication').allowed())
> # has_header
> config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
> config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
> +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
> config_host_data.set('CONFIG_VALGRIND_H',
> cc.has_header('valgrind/valgrind.h'))
> config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
> config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
> @@ -2048,6 +2049,9 @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
> config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
> cc.has_member('struct stat', 'st_atim',
> prefix: '#include <sys/stat.h>'))
> +config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
> + cc.has_member('struct blk_zone', 'capacity',
> + prefix: '#include <linux/blkzoned.h>'))
>
> # has_type
> config_host_data.set('CONFIG_IOVEC',
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index e7a02f5b99..f35ea627d7 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1730,6 +1730,150 @@ static const cmdinfo_t flush_cmd = {
> .oneline = "flush all in-core file state to disk",
> };
>
> +static inline int64_t tosector(int64_t bytes)
> +{
> + return bytes >> BDRV_SECTOR_BITS;
> +}
> +
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> + int ret;
> + int64_t offset;
> + unsigned int nr_zones;
> +
> + ++optind;
> + offset = cvtnum(argv[optind]);
> + ++optind;
> + nr_zones = cvtnum(argv[optind]);
> +
> + g_autofree BlockZoneDescriptor *zones = NULL;
> + zones = g_new(BlockZoneDescriptor, nr_zones);
> + ret = blk_zone_report(blk, offset, &nr_zones, zones);
> + if (ret < 0) {
> + printf("zone report failed: %s\n", strerror(-ret));
> + } else {
> + for (int i = 0; i < nr_zones; ++i) {
> + printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> + "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
> + "zcond:%u, [type: %u]\n",
> + tosector(zones[i].start), tosector(zones[i].length),
> + tosector(zones[i].cap), tosector(zones[i].wp),
> + zones[i].state, zones[i].type);
> + }
> + }
> + return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> + .name = "zone_report",
> + .altname = "zrp",
> + .cfunc = zone_report_f,
> + .argmin = 2,
> + .argmax = 2,
> + .args = "offset number",
> + .oneline = "report zone information",
> +};
> +
> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> +{
> + int ret;
> + int64_t offset, len;
> + ++optind;
> + offset = cvtnum(argv[optind]);
> + ++optind;
> + len = cvtnum(argv[optind]);
> + ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
> + if (ret < 0) {
> + printf("zone open failed: %s\n", strerror(-ret));
> + }
> + return ret;
> +}
> +
> +static const cmdinfo_t zone_open_cmd = {
> + .name = "zone_open",
> + .altname = "zo",
> + .cfunc = zone_open_f,
> + .argmin = 2,
> + .argmax = 2,
> + .args = "offset len",
> + .oneline = "explicit open a range of zones in zone block device",
> +};
> +
> +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> +{
> + int ret;
> + int64_t offset, len;
> + ++optind;
> + offset = cvtnum(argv[optind]);
> + ++optind;
> + len = cvtnum(argv[optind]);
> + ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
> + if (ret < 0) {
> + printf("zone close failed: %s\n", strerror(-ret));
> + }
> + return ret;
> +}
> +
> +static const cmdinfo_t zone_close_cmd = {
> + .name = "zone_close",
> + .altname = "zc",
> + .cfunc = zone_close_f,
> + .argmin = 2,
> + .argmax = 2,
> + .args = "offset len",
> + .oneline = "close a range of zones in zone block device",
> +};
> +
> +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> +{
> + int ret;
> + int64_t offset, len;
> + ++optind;
> + offset = cvtnum(argv[optind]);
> + ++optind;
> + len = cvtnum(argv[optind]);
> + ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
> + if (ret < 0) {
> + printf("zone finish failed: %s\n", strerror(-ret));
> + }
> + return ret;
> +}
> +
> +static const cmdinfo_t zone_finish_cmd = {
> + .name = "zone_finish",
> + .altname = "zf",
> + .cfunc = zone_finish_f,
> + .argmin = 2,
> + .argmax = 2,
> + .args = "offset len",
> + .oneline = "finish a range of zones in zone block device",
> +};
> +
> +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> +{
> + int ret;
> + int64_t offset, len;
> + ++optind;
> + offset = cvtnum(argv[optind]);
> + ++optind;
> + len = cvtnum(argv[optind]);
> + ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
> + if (ret < 0) {
> + printf("zone reset failed: %s\n", strerror(-ret));
> + }
> + return ret;
> +}
> +
> +static const cmdinfo_t zone_reset_cmd = {
> + .name = "zone_reset",
> + .altname = "zrs",
> + .cfunc = zone_reset_f,
> + .argmin = 2,
> + .argmax = 2,
> + .args = "offset len",
> + .oneline = "reset a zone write pointer in zone block device",
> +};
> +
> static int truncate_f(BlockBackend *blk, int argc, char **argv);
> static const cmdinfo_t truncate_cmd = {
> .name = "truncate",
> @@ -2523,6 +2667,11 @@ static void __attribute((constructor))
> init_qemuio_commands(void)
> qemuio_add_command(&aio_write_cmd);
> qemuio_add_command(&aio_flush_cmd);
> qemuio_add_command(&flush_cmd);
> + qemuio_add_command(&zone_report_cmd);
> + qemuio_add_command(&zone_open_cmd);
> + qemuio_add_command(&zone_close_cmd);
> + qemuio_add_command(&zone_finish_cmd);
> + qemuio_add_command(&zone_reset_cmd);
> qemuio_add_command(&truncate_cmd);
> qemuio_add_command(&length_cmd);
> qemuio_add_command(&info_cmd);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v16 4/8] raw-format: add zone operations to pass through requests
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (2 preceding siblings ...)
2023-03-10 10:23 ` [PATCH v16 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2023-03-10 10:23 ` Sam Li
2023-03-10 10:24 ` [PATCH v16 5/8] config: add check to block layer Sam Li
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:23 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
block/raw-format.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/block/raw-format.c b/block/raw-format.c
index 66783ed8e7..6e1b9394c8 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -317,6 +317,21 @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
return bdrv_co_pdiscard(bs->file, offset, bytes);
}
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+ unsigned int *nr_zones,
+ BlockZoneDescriptor *zones)
+{
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
static int64_t coroutine_fn GRAPH_RDLOCK
raw_co_getlength(BlockDriverState *bs)
{
@@ -617,6 +632,8 @@ BlockDriver bdrv_raw = {
.bdrv_co_pwritev = &raw_co_pwritev,
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
.bdrv_co_pdiscard = &raw_co_pdiscard,
+ .bdrv_co_zone_report = &raw_co_zone_report,
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
.bdrv_co_block_status = &raw_co_block_status,
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v16 5/8] config: add check to block layer
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (3 preceding siblings ...)
2023-03-10 10:23 ` [PATCH v16 4/8] raw-format: add zone operations to pass through requests Sam Li
@ 2023-03-10 10:24 ` Sam Li
2023-03-10 10:24 ` [PATCH v16 6/8] qemu-iotests: test new zone operations Sam Li
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:24 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
block.c | 19 +++++++++++++++++++
block/file-posix.c | 12 ++++++++++++
block/raw-format.c | 1 +
include/block/block_int-common.h | 5 +++++
4 files changed, 37 insertions(+)
diff --git a/block.c b/block.c
index 0dd604d0f6..4ebf7bbc90 100644
--- a/block.c
+++ b/block.c
@@ -7953,6 +7953,25 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
return;
}
+ /*
+ * Non-zoned block drivers do not follow zoned storage constraints
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
+ * drivers in a graph.
+ */
+ if (!parent_bs->drv->supports_zoned_children &&
+ child_bs->bl.zoned == BLK_Z_HM) {
+ /*
+ * The host-aware model allows zoned storage constraints and random
+ * write. Allow mixing host-aware and non-zoned drivers. Using
+ * host-aware device as a regular device.
+ */
+ error_setg(errp, "Cannot add a %s child to a %s parent",
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
+ parent_bs->drv->supports_zoned_children ?
+ "support zoned children" : "not support zoned children");
+ return;
+ }
+
if (!QLIST_EMPTY(&child_bs->parents)) {
error_setg(errp, "The node %s already has a parent",
child_bs->node_name);
diff --git a/block/file-posix.c b/block/file-posix.c
index df9b9f1e30..2eceb250f1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
goto fail;
}
}
+#ifdef CONFIG_BLKZONED
+ /*
+ * The kernel page cache does not reliably work for writes to SWR zones
+ * of zoned block device because it can not guarantee the order of writes.
+ */
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
+ (!(s->open_flags & O_DIRECT))) {
+ error_setg(errp, "The driver supports zoned devices, and it requires "
+ "cache.direct=on, which was not specified.");
+ return -EINVAL; /* No host kernel page cache */
+ }
+#endif
if (S_ISBLK(st.st_mode)) {
#ifdef __linux__
diff --git a/block/raw-format.c b/block/raw-format.c
index 6e1b9394c8..72e23e7b55 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -621,6 +621,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
BlockDriver bdrv_raw = {
.format_name = "raw",
.instance_size = sizeof(BDRVRawState),
+ .supports_zoned_children = true,
.bdrv_probe = &raw_probe,
.bdrv_reopen_prepare = &raw_reopen_prepare,
.bdrv_reopen_commit = &raw_reopen_commit,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index a3efb385e0..1bd2aef4d5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -137,6 +137,11 @@ struct BlockDriver {
*/
bool is_format;
+ /*
+ * Set to true if the BlockDriver supports zoned children.
+ */
+ bool supports_zoned_children;
+
/*
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
* this field set to true, except ones that are defined only by their
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v16 6/8] qemu-iotests: test new zone operations
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (4 preceding siblings ...)
2023-03-10 10:24 ` [PATCH v16 5/8] config: add check to block layer Sam Li
@ 2023-03-10 10:24 ` Sam Li
2023-03-10 10:24 ` [PATCH v16 7/8] block: add some trace events for new block layer APIs Sam Li
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:24 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
We have added new block layer APIs of zoned block devices. Test it as
follows: Run each zone operation on a newly created null_blk device
and see whether the logs show the correct zone information. By:
$ ./tests/qemu-iotests/tests/zoned.sh
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
tests/qemu-iotests/tests/zoned.sh | 86 ++++++++++++++++++++++++++++++
2 files changed, 139 insertions(+)
create mode 100644 tests/qemu-iotests/tests/zoned.out
create mode 100755 tests/qemu-iotests/tests/zoned.sh
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
new file mode 100644
index 0000000000..0c8f96deb9
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -0,0 +1,53 @@
+QA output created by zoned.sh
+Testing a null_blk device:
+Simple cases: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
+
+
+(2) opening the first zone
+report after:
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+report after:
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
+
+opening the last zone
+report after:
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
+
+
+(3) closing the first zone
+report after:
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
+
+closing the last zone
+report after:
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
+
+
+(4) finishing the second zone
+After finishing a zone:
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
+
+
+(5) resetting the second zone
+After resetting a zone:
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
+*** done
diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
new file mode 100755
index 0000000000..9d7c15dde6
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_test_img
+ sudo rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+QEMU_IO="build/qemu-io"
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device:"
+echo "case 1: if the operations work"
+sudo modprobe null_blk nr_devices=1 zoned=1
+
+echo "(1) report the first zone:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+sudo $QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
+echo
+echo
+echo "(2) opening the first zone"
+sudo $QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+sudo $QEMU_IO $IMG -c "zo 268435456 268435456" #
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo "opening the last zone"
+sudo $QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(3) closing the first zone"
+sudo $QEMU_IO $IMG -c "zc 0 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+sudo $QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e70000000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+sudo $QEMU_IO $IMG -c "zf 268435456 268435456"
+echo "After finishing a zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(5) resetting the second zone"
+sudo $QEMU_IO $IMG -c "zrs 268435456 268435456"
+echo "After resetting a zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v16 7/8] block: add some trace events for new block layer APIs
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (5 preceding siblings ...)
2023-03-10 10:24 ` [PATCH v16 6/8] qemu-iotests: test new zone operations Sam Li
@ 2023-03-10 10:24 ` Sam Li
2023-03-13 23:40 ` Dmitry Fomichev
2023-03-10 10:24 ` [PATCH v16 8/8] docs/zoned-storage: add zoned device documentation Sam Li
2023-03-16 17:57 ` [PATCH v16 0/8] Add support for zoned device Stefan Hajnoczi
8 siblings, 1 reply; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:24 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/file-posix.c | 3 +++
block/trace-events | 2 ++
2 files changed, 5 insertions(+)
diff --git a/block/file-posix.c b/block/file-posix.c
index 2eceb250f1..563acc76ae 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3256,6 +3256,7 @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
BlockZoneDescriptor *zones) {
BDRVRawState *s = bs->opaque;
RawPosixAIOData acb;
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
acb = (RawPosixAIOData) {
.bs = bs,
@@ -3334,6 +3335,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
},
};
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
+ len >> BDRV_SECTOR_BITS);
ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
if (ret != 0) {
ret = -errno;
diff --git a/block/trace-events b/block/trace-events
index 48dbf10c66..3f4e1d088a 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -209,6 +209,8 @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
file_setup_cdrom(const char *partition) "Using %s as optical disc"
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
file_flush_fdatasync_failed(int err) "errno %d"
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
# ssh.c
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v16 7/8] block: add some trace events for new block layer APIs
2023-03-10 10:24 ` [PATCH v16 7/8] block: add some trace events for new block layer APIs Sam Li
@ 2023-03-13 23:40 ` Dmitry Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Fomichev @ 2023-03-13 23:40 UTC (permalink / raw)
To: faithilikerun@gmail.com, qemu-devel@nongnu.org
Cc: hreitz@redhat.com, hare@suse.de, philmd@linaro.org,
stefanha@redhat.com, fam@euphon.net, qemu-block@nongnu.org,
marcandre.lureau@redhat.com, kwolf@redhat.com, thuth@redhat.com,
pbonzini@redhat.com, berrange@redhat.com,
damien.lemoal@opensource.wdc.com
On Fri, 2023-03-10 at 18:24 +0800, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
With one small nit below,
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
> block/file-posix.c | 3 +++
> block/trace-events | 2 ++
> 2 files changed, 5 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2eceb250f1..563acc76ae 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3256,6 +3256,7 @@ static int coroutine_fn
> raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> BlockZoneDescriptor *zones) {
> BDRVRawState *s = bs->opaque;
> RawPosixAIOData acb;
> + trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
The code in this function could be made a bit simpler -
BDRVRawState *s = bs->opaque;
RawPosixAIOData acb = (RawPosixAIOData) {
.bs = bs,
.aio_fildes = s->fd,
.aio_type = QEMU_AIO_ZONE_REPORT,
.aio_offset = offset,
.zone_report = {
.nr_zones = nr_zones,
.zones = zones,
},
};
trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
>
> acb = (RawPosixAIOData) {
> .bs = bs,
> @@ -3334,6 +3335,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState
> *bs, BlockZoneOp op,
> },
> };
>
> + trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
> + len >> BDRV_SECTOR_BITS);
> ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> if (ret != 0) {
> ret = -errno;
> diff --git a/block/trace-events b/block/trace-events
> index 48dbf10c66..3f4e1d088a 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -209,6 +209,8 @@ file_FindEjectableOpticalMedia(const char *media) "Matching
> using %s"
> file_setup_cdrom(const char *partition) "Using %s as optical disc"
> file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
> file_flush_fdatasync_failed(int err) "errno %d"
> +zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report
> %d zones starting at sector offset 0x%" PRIx64 ""
> +zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs
> %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 "
> sectors"
>
> # ssh.c
> sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int
> sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v16 8/8] docs/zoned-storage: add zoned device documentation
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (6 preceding siblings ...)
2023-03-10 10:24 ` [PATCH v16 7/8] block: add some trace events for new block layer APIs Sam Li
@ 2023-03-10 10:24 ` Sam Li
2023-03-16 17:57 ` [PATCH v16 0/8] Add support for zoned device Stefan Hajnoczi
8 siblings, 0 replies; 13+ messages in thread
From: Sam Li @ 2023-03-10 10:24 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal, hare,
Stefan Hajnoczi, Marc-André Lureau, Fam Zheng,
Daniel P. Berrangé, dmitry.fomichev, Thomas Huth,
Hanna Reitz, Philippe Mathieu-Daudé, Sam Li
Add the documentation about the zoned device support to virtio-blk
emulation.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
docs/system/qemu-block-drivers.rst.inc | 6 ++++
2 files changed, 49 insertions(+)
create mode 100644 docs/devel/zoned-storage.rst
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
new file mode 100644
index 0000000000..6a36133e51
--- /dev/null
+++ b/docs/devel/zoned-storage.rst
@@ -0,0 +1,43 @@
+=============
+zoned-storage
+=============
+
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
+that are larger than the LBA size. They can only allow sequential writes, which
+can reduce write amplification in SSDs, and potentially lead to higher
+throughput and increased capacity. More details about ZBDs can be found at:
+
+https://zonedstorage.io/docs/introduction/zoned-storage
+
+1. Block layer APIs for zoned storage
+-------------------------------------
+QEMU block layer supports three zoned storage models:
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
+manage the zones of a device.
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
+zones, making it backward compatible with regular block devices.
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
+supported.
+
+The block device information resides inside BlockDriverState. QEMU uses
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
+block layer while processing I/O requests. A BlockBackend has a root pointer to
+a BlockDriverState graph(for example, raw format on top of file-posix). The
+zoned storage information can be propagated from the leaf BlockDriverState all
+the way up to the BlockBackend. If the zoned storage model in file-posix is
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
+
+The block layer APIs support commands needed for zoned storage devices,
+including report zones, four zone operations, and zone append.
+
+2. Emulating zoned storage controllers
+--------------------------------------
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
+APIs for zoned storage emulation or testing.
+
+For example, to test zone_report on a null_blk device using qemu-io is:
+$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
+-c "zrp offset nr_zones"
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..105cb9679c 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
you may corrupt your host data (use the ``-snapshot`` command
line option or modify the device permissions accordingly).
+Zoned block devices
+ Zoned block devices can be passed through to the guest if the emulated storage
+ controller supports zoned storage. Use ``--blockdev host_device,
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
+ ``/dev/nullb0`` as ``drive0``.
+
Windows
^^^^^^^
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v16 0/8] Add support for zoned device
2023-03-10 10:23 [PATCH v16 0/8] Add support for zoned device Sam Li
` (7 preceding siblings ...)
2023-03-10 10:24 ` [PATCH v16 8/8] docs/zoned-storage: add zoned device documentation Sam Li
@ 2023-03-16 17:57 ` Stefan Hajnoczi
8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-03-16 17:57 UTC (permalink / raw)
To: Sam Li
Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, damien.lemoal,
hare, Marc-André Lureau, Fam Zheng, Daniel P. Berrangé,
dmitry.fomichev, Thomas Huth, Hanna Reitz,
Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 6461 bytes --]
On Fri, Mar 10, 2023 at 06:23:55PM +0800, Sam Li wrote:
> Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
> that are larger than the LBA size. It can only allow sequential writes, which
> reduces write amplification in SSD, leading to higher throughput and increased
> capacity. More details about ZBDs can be found at:
>
> https://zonedstorage.io/docs/introduction/zoned-storage
>
> The zoned device support aims to let guests (virtual machines) access zoned
> storage devices on the host (hypervisor) through a virtio-blk device. This
> involves extending QEMU's block layer and virtio-blk emulation code. In its
> current status, the virtio-blk device is not aware of ZBDs but the guest sees
> host-managed drives as regular drive that will runs correctly under the most
> common write workloads.
>
> This patch series extend the block layer APIs with the minimum set of zoned
> commands that are necessary to support zoned devices. The commands are - Report
> Zones, four zone operations and Zone Append.
>
> There has been a debate on whethre introducing new zoned_host_device BlockDriver
> specifically for zoned devices. In the end, it's been decided to stick to
> existing host_device BlockDriver interface by only adding new zoned operations
> inside it. The benefit of that is to avoid further changes - one example is
> command line syntax - to the applications like Libvirt using QEMU zoned
> emulation.
>
> It can be tested on a null_blk device using qemu-io or qemu-iotests. For
> example, to test zone report using qemu-io:
> $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> v16:
> - update zoned_host device name to host_device [Stefan]
> - fix probing zoned device blocksizes [Stefan]
> - Use empty fields instead of changing struct size of BlkRwCo [Kevin, Stefan]
>
> v15:
> - drop zoned_host_device BlockDriver
> - add zoned device option to host_device driver instead of introducing a new
> zoned_host_device BlockDriver [Stefan]
>
> v14:
> - address Stefan's comments of probing block sizes
>
> v13:
> - add some tracing points for new zone APIs [Dmitry]
> - change error handling in zone_mgmt [Damien, Stefan]
>
> v12:
> - address review comments
> * drop BLK_ZO_RESET_ALL bit [Damien]
> * fix error messages, style, and typos[Damien, Hannes]
>
> v11:
> - address review comments
> * fix possible BLKZONED config compiling warnings [Stefan]
> * fix capacity field compiling warnings on older kernel [Stefan,Damien]
>
> v10:
> - address review comments
> * deal with the last small zone case in zone_mgmt operations [Damien]
> * handle the capacity field outdated in old kernel(before 5.9) [Damien]
> * use byte unit in block layer to be consistent with QEMU [Eric]
> * fix coding style related problems [Stefan]
>
> v9:
> - address review comments
> * specify units of zone commands requests [Stefan]
> * fix some error handling in file-posix [Stefan]
> * introduce zoned_host_devcie in the commit message [Markus]
>
> v8:
> - address review comments
> * solve patch conflicts and merge sysfs helper funcations into one patch
> * add cache.direct=on check in config
>
> v7:
> - address review comments
> * modify sysfs attribute helper funcations
> * move the input validation and error checking into raw_co_zone_* function
> * fix checks in config
>
> v6:
> - drop virtio-blk emulation changes
> - address Stefan's review comments
> * fix CONFIG_BLKZONED configs in related functions
> * replace reading fd by g_file_get_contents() in get_sysfs_str_val()
> * rewrite documentation for zoned storage
>
> v5:
> - add zoned storage emulation to virtio-blk device
> - add documentation for zoned storage
> - address review comments
> * fix qemu-iotests
> * fix check to block layer
> * modify interfaces of sysfs helper functions
> * rename zoned device structs according to QEMU styles
> * reorder patches
>
> v4:
> - add virtio-blk headers for zoned device
> - add configurations for zoned host device
> - add zone operations for raw-format
> - address review comments
> * fix memory leak bug in zone_report
> * add checks to block layers
> * fix qemu-iotests format
> * fix sysfs helper functions
>
> v3:
> - add helper functions to get sysfs attributes
> - address review comments
> * fix zone report bugs
> * fix the qemu-io code path
> * use thread pool to avoid blocking ioctl() calls
>
> v2:
> - add qemu-io sub-commands
> - address review comments
> * modify interfaces of APIs
>
> v1:
> - add block layer APIs resembling Linux ZoneBlockDevice ioctls
>
> Sam Li (8):
> include: add zoned device structs
> file-posix: introduce helper functions for sysfs attributes
> block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
> raw-format: add zone operations to pass through requests
> config: add check to block layer
> qemu-iotests: test new zone operations
> block: add some trace events for new block layer APIs
> docs/zoned-storage: add zoned device documentation
>
> block.c | 19 ++
> block/block-backend.c | 133 ++++++++
> block/file-posix.c | 446 +++++++++++++++++++++++--
> block/io.c | 41 +++
> block/raw-format.c | 18 +
> block/trace-events | 2 +
> docs/devel/zoned-storage.rst | 43 +++
> docs/system/qemu-block-drivers.rst.inc | 6 +
> include/block/block-common.h | 43 +++
> include/block/block-io.h | 9 +
> include/block/block_int-common.h | 29 ++
> include/block/raw-aio.h | 6 +-
> include/sysemu/block-backend-io.h | 18 +
> meson.build | 4 +
> qemu-io-cmds.c | 149 +++++++++
> tests/qemu-iotests/tests/zoned.out | 53 +++
> tests/qemu-iotests/tests/zoned.sh | 86 +++++
> 17 files changed, 1068 insertions(+), 37 deletions(-)
> create mode 100644 docs/devel/zoned-storage.rst
> create mode 100644 tests/qemu-iotests/tests/zoned.out
> create mode 100755 tests/qemu-iotests/tests/zoned.sh
>
> --
> 2.39.2
>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread