From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
Sam Li <faithilikerun@gmail.com>,
qemu-devel@nongnu.org, dmitry.fomichev@wdc.com,
Raphael Norwitz <raphael.norwitz@nutanix.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Kevin Wolf <kwolf@redhat.com>,
damien.lemoal@opensource.wdc.com, hare@suse.de,
Markus Armbruster <armbru@redhat.com>,
qemu-block@nongnu.org, Eric Blake <eblake@redhat.com>,
Hanna Reitz <hreitz@redhat.com>
Subject: Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices
Date: Tue, 31 Jan 2023 09:10:18 -0500 [thread overview]
Message-ID: <Y9khSupdxuYFTqhb@fedora> (raw)
In-Reply-To: <Y9gMuBUOiPBStx+b@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6740 bytes --]
On Mon, Jan 30, 2023 at 06:30:16PM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 30, 2023 at 10:17:48AM -0500, Stefan Hajnoczi wrote:
> > On Mon, 30 Jan 2023 at 07:33, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Sun, Jan 29, 2023 at 06:39:49PM +0800, Sam Li wrote:
> > > > This patch extends virtio-blk emulation to handle zoned device commands
> > > > by calling the new block layer APIs to perform zoned device I/O on
> > > > behalf of the guest. It supports Report Zone, four zone oparations (open,
> > > > close, finish, reset), and Append Zone.
> > > >
> > > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > > > support zoned block devices. Regular block devices(conventional zones)
> > > > will not be set.
> > > >
> > > > The guest os can use blktests, fio to test those commands on zoned devices.
> > > > Furthermore, using zonefs to test zone append write is also supported.
> > > >
> > > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > > ---
> > > > hw/block/virtio-blk-common.c | 2 +
> > > > hw/block/virtio-blk.c | 394 +++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 396 insertions(+)
> > > >
> > >
> > > > @@ -949,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> > > > blkcfg.write_zeroes_may_unmap = 1;
> > > > virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
> > > > }
> > > > + if (bs->bl.zoned != BLK_Z_NONE) {
> > > > + switch (bs->bl.zoned) {
> > > > + case BLK_Z_HM:
> > > > + blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
> > > > + break;
> > > > + case BLK_Z_HA:
> > > > + blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
> > > > + break;
> > > > + default:
> > > > + g_assert_not_reached();
> > > > + }
> > > > +
> > > > + virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
> > > > + bs->bl.zone_size / 512);
> > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
> > > > + bs->bl.max_active_zones);
> > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
> > > > + bs->bl.max_open_zones);
> > > > + virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
> > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
> > > > + bs->bl.max_append_sectors);
> > >
> > > So these are all ABI sensitive frontend device settings, but they are
> > > not exposed as tunables on the virtio-blk device, instead they are
> > > implicitly set from the backend.
> > >
> > > We have done this kind of thing before in QEMU, but several times it
> > > has bitten QEMU maintainers/users, as having a backend affect the
> > > frontend ABI is not to typical. It wouldn't be immediately obvious
> > > when starting QEMU on a target host that the live migration would
> > > be breaking ABI if the target host wasn't using a zoned device with
> > > exact same settings.
> > >
> > > This also limits mgmt flexibility across live migration, if the
> > > mgmt app wants/needs to change the storage backend. eg maybe they
> > > need to evacuate the host for an emergency, but don't have spare
> > > hosts with same kind of storage. It might be desirable to migrate
> > > and switch to a plain block device or raw/qcow2 file, rather than
> > > let the VM die.
> > >
> > > Can we make these virtio setting be explicitly controlled on the
> > > virtio-blk device. If not specified explicitly they could be
> > > auto-populated from the backend for ease of use, but if specified
> > > then simply validate the backend is a match. libvirt would then
> > > make sure these are always explicitly set on the frontend.
> >
> > I think this is a good idea, especially if we streamline the
> > file-posix.c driver by merging --blockdev zoned_host_device into
> > --blockdev host_device. It won't be obvious from the command-line
> > whether this is a zoned or non-zoned device. There should be a
> > --device virtio-blk-pci,drive=drive0,zoned=on option that fails when
> > drive0 isn't zoned. It should probably be on/off/auto where auto is
> > the default and doesn't check anything, on requires a zoned device,
> > and off requires a non-zoned device. That will prevent accidental
> > migration between zoned/non-zoned devices.
> >
> > I want to point out that virtio-blk doesn't have checks for the disk
> > size or other details, so what you're suggesting for zone_sectors, etc
> > is stricter than what QEMU does today. Since the virtio-blk parameters
> > you're proposing are optional, I think it doesn't hurt though.
>
> Yeah, it is slightly different than some of the parameters handling.
> I guess you could say that with disk capacity, matching size is a
> fairly obvious constraint/expectation to manage, and also long standing.
>
> With disk capacity, you can add the 'raw' driver on top of any block
> driver stack, to apply an arbitrary offset+size, to make the storage
> smaller than it otherwise is on disk. Conceptually than could have
> been done on the frontend device(s) too, but I guess it made more
> sense to do it in the block layer to give consistent enforcement
> of the limits across frontends. It is fuzzy whether such a use of
> the 'raw' driver is really considered backend config, as opposed to
> frontend config but to me it feels likle frontend config.
>
> You could possibly come up with the concept of a 'zoned' format that
> can be layered on top of a block driver stack to add zoned I/O constraints
> for sake of compatibility, where none otherwise exists in the physical
> storage. Possibly useful if multiple frontends all support zoned storage,
> to avoid duplicating the constraints across all ?
Maybe:
DEFINE_BLOCK_ZONED_PROPERTIES(VirtIOBlock, conf.conf),
and then:
bool blkconf_check_zoned_properties(BlockBackend *blk, BlockZonedConf *conf, Error **errp);
That macro and helper function can be shared by all emulated storage
controllers that implement zoned storage.
However, there's one problem: some storage interfaces extend the zoned
storage model (e.g. NVMe ZNS seems to have functionality that's not
available elsewhere). It would be necessary to check whether there is a
common subset of parameters with matching property names (because
terminology could be different) across emulated storage controllers.
But I think it's likely that this will work. I think the macro and
helper function approach is nice because it's internal to QEMU and users
don't need to set up a --blockdev enforce-zoned.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-01-31 15:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-29 10:39 [RFC v6 0/4] Add zoned storage emulation to virtio-blk driver Sam Li
2023-01-29 10:39 ` [RFC v6 1/4] include: update virtio_blk headers Sam Li
2023-01-29 10:39 ` [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices Sam Li
2023-01-30 12:32 ` Daniel P. Berrangé
2023-01-30 15:17 ` Stefan Hajnoczi
2023-01-30 18:30 ` Daniel P. Berrangé
2023-01-31 14:10 ` Stefan Hajnoczi [this message]
2023-01-29 10:39 ` [RFC v6 3/4] block: add accounting for zone append operation Sam Li
2023-01-29 10:39 ` [RFC v6 4/4] virtio-blk: add some trace events for zoned emulation Sam Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9khSupdxuYFTqhb@fedora \
--to=stefanha@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dmitry.fomichev@wdc.com \
--cc=eblake@redhat.com \
--cc=faithilikerun@gmail.com \
--cc=hare@suse.de \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=raphael.norwitz@nutanix.com \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.