qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Sam Li <faithilikerun@gmail.com>,
	qemu-devel@nongnu.org, dmitry.fomichev@wdc.com,
	Raphael Norwitz <raphael.norwitz@nutanix.com>,
	stefanha@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	damien.lemoal@opensource.wdc.com, hare@suse.de,
	Markus Armbruster <armbru@redhat.com>,
	qemu-block@nongnu.org, Eric Blake <eblake@redhat.com>,
	Hanna Reitz <hreitz@redhat.com>
Subject: Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices
Date: Mon, 30 Jan 2023 18:30:16 +0000	[thread overview]
Message-ID: <Y9gMuBUOiPBStx+b@redhat.com> (raw)
In-Reply-To: <CAJSP0QUOQge9V2jM+ibhNgt-c-sjWw5RjFeO2isfw6Gxo3gEwQ@mail.gmail.com>

On Mon, Jan 30, 2023 at 10:17:48AM -0500, Stefan Hajnoczi wrote:
> On Mon, 30 Jan 2023 at 07:33, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Sun, Jan 29, 2023 at 06:39:49PM +0800, Sam Li wrote:
> > > This patch extends virtio-blk emulation to handle zoned device commands
> > > by calling the new block layer APIs to perform zoned device I/O on
> > > behalf of the guest. It supports Report Zone, four zone oparations (open,
> > > close, finish, reset), and Append Zone.
> > >
> > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > > support zoned block devices. Regular block devices(conventional zones)
> > > will not be set.
> > >
> > > The guest os can use blktests, fio to test those commands on zoned devices.
> > > Furthermore, using zonefs to test zone append write is also supported.
> > >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > ---
> > >  hw/block/virtio-blk-common.c |   2 +
> > >  hw/block/virtio-blk.c        | 394 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 396 insertions(+)
> > >
> >
> > > @@ -949,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> > >          blkcfg.write_zeroes_may_unmap = 1;
> > >          virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
> > >      }
> > > +    if (bs->bl.zoned != BLK_Z_NONE) {
> > > +        switch (bs->bl.zoned) {
> > > +        case BLK_Z_HM:
> > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
> > > +            break;
> > > +        case BLK_Z_HA:
> > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
> > > +            break;
> > > +        default:
> > > +            g_assert_not_reached();
> > > +        }
> > > +
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
> > > +                     bs->bl.zone_size / 512);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
> > > +                     bs->bl.max_active_zones);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
> > > +                     bs->bl.max_open_zones);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
> > > +                     bs->bl.max_append_sectors);
> >
> > So these are all ABI sensitive frontend device settings, but they are
> > not exposed as tunables on the virtio-blk device, instead they are
> > implicitly set from the backend.
> >
> > We have done this kind of thing before in QEMU, but several times it
> > has bitten QEMU maintainers/users, as having a backend affect the
> > frontend ABI is not to typical. It wouldn't be immediately obvious
> > when starting QEMU on a target host that the live migration would
> > be breaking ABI if the target host wasn't using a zoned device with
> > exact same settings.
> >
> > This also limits mgmt flexibility across live migration, if the
> > mgmt app wants/needs to change the storage backend. eg maybe they
> > need to evacuate the host for an emergency, but don't have spare
> > hosts with same kind of storage. It might be desirable to migrate
> > and switch to a plain block device or raw/qcow2 file, rather than
> > let the VM die.
> >
> > Can we make these virtio setting be explicitly controlled on the
> > virtio-blk device.  If not specified explicitly they could be
> > auto-populated from the backend for ease of use, but if specified
> > then simply validate the backend is a match. libvirt would then
> > make sure these are always explicitly set on the frontend.
> 
> I think this is a good idea, especially if we streamline the
> file-posix.c driver by merging --blockdev zoned_host_device into
> --blockdev host_device. It won't be obvious from the command-line
> whether this is a zoned or non-zoned device. There should be a
> --device virtio-blk-pci,drive=drive0,zoned=on option that fails when
> drive0 isn't zoned. It should probably be on/off/auto where auto is
> the default and doesn't check anything, on requires a zoned device,
> and off requires a non-zoned device. That will prevent accidental
> migration between zoned/non-zoned devices.
> 
> I want to point out that virtio-blk doesn't have checks for the disk
> size or other details, so what you're suggesting for zone_sectors, etc
> is stricter than what QEMU does today. Since the virtio-blk parameters
> you're proposing are optional, I think it doesn't hurt though.

Yeah, it is slightly different than some of the parameters handling.
I guess you could say that with disk capacity, matching size is a
fairly obvious constraint/expectation to manage, and also long standing. 

With disk capacity, you can add the 'raw' driver on top of any block
driver stack, to apply an arbitrary offset+size, to make the storage
smaller than it otherwise is on disk. Conceptually than could have
been done on the frontend device(s) too, but I guess it made more
sense to do it in the block layer to give consistent enforcement
of the limits across frontends. It is fuzzy whether such a use of
the 'raw' driver is really considered backend config,  as opposed to
frontend config but to me it feels likle frontend config.

You could possibly come up with the concept of a 'zoned' format that
can be layered on top of a block driver stack to add zoned I/O constraints
for sake of compatibility, where none otherwise exists in the physical
storage. Possibly useful if multiple frontends all support zoned storage,
to avoid duplicating the constraints across all ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-01-30 18:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-29 10:39 [RFC v6 0/4] Add zoned storage emulation to virtio-blk driver Sam Li
2023-01-29 10:39 ` [RFC v6 1/4] include: update virtio_blk headers Sam Li
2023-01-29 10:39 ` [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices Sam Li
2023-01-30 12:32   ` Daniel P. Berrangé
2023-01-30 15:17     ` Stefan Hajnoczi
2023-01-30 18:30       ` Daniel P. Berrangé [this message]
2023-01-31 14:10         ` Stefan Hajnoczi
2023-01-29 10:39 ` [RFC v6 3/4] block: add accounting for zone append operation Sam Li
2023-01-29 10:39 ` [RFC v6 4/4] virtio-blk: add some trace events for zoned emulation Sam Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9gMuBUOiPBStx+b@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dmitry.fomichev@wdc.com \
    --cc=eblake@redhat.com \
    --cc=faithilikerun@gmail.com \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).