qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Stefan Hajnoczi" <stefanha@redhat.com>,
	virtio-fs-list <virtio-fs@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Dave Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>
Subject: Re: vhost-user (virtio-fs) migration: back end state
Date: Tue, 7 Feb 2023 10:08:23 +0100	[thread overview]
Message-ID: <f26dd5ed-fa02-faeb-fadb-0dbfbe7792d3@redhat.com> (raw)
In-Reply-To: <CAJSP0QWnq6av7j6x_n-C2mLSPMYBhMeEthr6ayPN-cmsEB3UnA@mail.gmail.com>

On 06.02.23 17:27, Stefan Hajnoczi wrote:
> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>> Hi Stefan,
>>
>> For true virtio-fs migration, we need to migrate the daemon’s (back
>> end’s) state somehow.  I’m addressing you because you had a talk on this
>> topic at KVM Forum 2021. :)
>>
>> As far as I understood your talk, the only standardized way to migrate a
>> vhost-user back end’s state is via dbus-vmstate.  I believe that
>> interface is unsuitable for our use case, because we will need to
>> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
>> been chosen arbitrarily, but the introducing commit’s message says that
>> it’s based on the idea that the data must be supplied basically
>> immediately anyway (due to both dbus and qemu migration requirements),
>> and I don’t think we can meet that requirement.
> Yes, dbus-vmstate is the available today. It's independent of
> vhost-user and VIRTIO.
>
>> Has there been progress on the topic of standardizing a vhost-user back
>> end state migration channel besides dbus-vmstate?  I’ve looked around
>> but didn’t find anything.  If there isn’t anything yet, is there still
>> interest in the topic?
> Not that I'm aware of. There are two parts to the topic of VIRTIO
> device state migration:
> 1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
> devices. It doesn't need to be implemented in all these places
> immediately, but the design should consider that each of these
> standards will need to participate in migration sooner or later. It
> makes sense to choose an interface that works for all or most of these
> interfaces instead of inventing something vhost-user-specific.
> 2. Defining standard device state formats so VIRTIO implementations
> can interoperate.
>
>> Of course, we could use a channel that completely bypasses qemu, but I
>> think we’d like to avoid that if possible.  First, this would require
>> adding functionality to virtiofsd to configure this channel.  Second,
>> not storing the state in the central VM state means that migrating to
>> file doesn’t work (well, we could migrate to a dedicated state file,
>> but...).  Third, setting up such a channel after virtiofsd has sandboxed
>> itself is hard.  I guess we should set up the migration channel before
>> sandboxing, which constrains runtime configuration (basically this would
>> only allow us to set up a listening server, I believe).  Well, and
>> finally, it isn’t a standard way, which won’t be great if we’re planning
>> to add a standard way anyway.
> Yes, live migration is hard enough. Duplicating it is probably not
> going to make things better. It would still be necessary to support
> saving to file as well as live migration.
>
> There are two high-level approaches to the migration interface:
> 1. The whitebox approach where the vhost-user back-end implements
> device-specific messages to get/set migration state (e.g.
> VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
> containing the state of the FUSE session or multiple fine-grained
> messages that extract pieces of state). The hypervisor is responsible
> for the actual device state serialization.
> 2. The blackbox approach where the vhost-user back-end implements the
> device state serialization itself and just produces a blob of data.

Implementing this through device-specific messages sounds quite nice to 
me, and I think this would work for the blackbox approach, too. The 
virtio-fs device in qemu (the front end stub) would provide that data as 
its VM state then, right?

I’m not sure at this point whether it is sensible to define a 
device-specific standard for the state (i.e. the whitebox approach).  I 
think that it may be too rigid if we decide to extend it in the future.  
As far as I understand, the benefit is that it would allow for 
interoperability between different virtio-fs back end implementations, 
which isn’t really a concern right now.  If we need this in the future, 
I’m sure we can extend the protocol further to alternatively use 
standardized state.  (Which can easily be turned back into a blob if 
compatibility requires it.)

I think we’ll probably want a mix of both, where it is standardized that 
the state consists of information about each FUSE inode and each open 
handle, but that information itself is treated as a blob.

> An example of the whitebox approach is the existing vhost migration
> interface - except that it doesn't really support device-specific
> state, only generic virtqueue state.
>
> An example of the blackbox approach is the VFIO v2 migration interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867
>
> Another aspect to consider is whether save/load is sufficient or if
> the full iterative migration model needs to be exposed by the
> interface. VFIO migration is an example of the full iterative model
> while dbus-vmstate is just save/load. Devices with large amounts of
> state need the full iterative model while simple devices just need
> save/load.

Yes, we will probably need an iterative model.  Splitting the state into 
information about each FUSE inode/handle (so that single inodes/handles 
can be updated if needed) should help accomplish this.

> Regarding virtiofs, I think the device state is not
> implementation-specific. Different implementations may have different
> device states (e.g. in-memory file system implementation versus POSIX
> file system-backed implementation), but the device state produced by
> https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
> another implementation.

Difficult to say.  What seems universal to us now may well not be, 
because we’re just seeing our own implementation.  I think we’ll just 
serialize it in a way that makes sense to us now, and hope it’ll make 
sense to others too should the need arise.

> My suggestion is blackbox migration with a full iterative interface.
> The reason I like the blackbox approach is that a device's device
> state is encapsulated in the device implementation and does not
> require coordinating changes across other codebases (e.g. vDPA and
> vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
> interface only needs to be defined and implemented once. After that,
> device implementations can evolve without constant changes at various
> layers.

Agreed.

> So basically, something like VFIO v2 migration but for vhost-user
> (with an eye towards vDPA and VIRTIO support in the future).
>
> Should we schedule a call with Jason, Michael, Juan, David, etc to
> discuss further? That way there's less chance of spending weeks
> working on something only to be asked to change the approach later.

Sure, sounds good!  I’ve taken a look into what state we’ll need to 
migrate already, but I’ll take a more detailed look now so that it’s 
clear what our requirements are.

Hanna



  parent reply	other threads:[~2023-02-07  9:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-06 12:35 vhost-user (virtio-fs) migration: back end state Hanna Czenczek
2023-02-06 16:27 ` Stefan Hajnoczi
2023-02-06 21:02   ` Juan Quintela
2023-02-06 21:16     ` Stefan Hajnoczi
2023-02-06 23:58       ` Juan Quintela
2023-02-07  9:35     ` Hanna Czenczek
2023-02-07 15:13       ` Juan Quintela
2023-02-07  9:08   ` Hanna Czenczek [this message]
2023-02-07 12:29     ` Stefan Hajnoczi
2023-02-08 14:32       ` Stefan Hajnoczi
2023-02-08 14:34         ` Michael S. Tsirkin
2023-02-08 15:58         ` Hanna Czenczek
2023-02-08 16:32           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f26dd5ed-fa02-faeb-fadb-0dbfbe7792d3@redhat.com \
    --to=hreitz@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).