vhost-user (virtio-fs) migration: back end state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* vhost-user (virtio-fs) migration: back end state
@ 2023-02-06 12:35 Hanna Czenczek
  2023-02-06 16:27 ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2023-02-06 12:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs-list, qemu-devel@nongnu.org

Hi Stefan,

For true virtio-fs migration, we need to migrate the daemon’s (back 
end’s) state somehow.  I’m addressing you because you had a talk on this 
topic at KVM Forum 2021. :)

As far as I understood your talk, the only standardized way to migrate a 
vhost-user back end’s state is via dbus-vmstate.  I believe that 
interface is unsuitable for our use case, because we will need to 
migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly 
been chosen arbitrarily, but the introducing commit’s message says that 
it’s based on the idea that the data must be supplied basically 
immediately anyway (due to both dbus and qemu migration requirements), 
and I don’t think we can meet that requirement.

Has there been progress on the topic of standardizing a vhost-user back 
end state migration channel besides dbus-vmstate?  I’ve looked around 
but didn’t find anything.  If there isn’t anything yet, is there still 
interest in the topic?

Of course, we could use a channel that completely bypasses qemu, but I 
think we’d like to avoid that if possible.  First, this would require 
adding functionality to virtiofsd to configure this channel.  Second, 
not storing the state in the central VM state means that migrating to 
file doesn’t work (well, we could migrate to a dedicated state file, 
but...).  Third, setting up such a channel after virtiofsd has sandboxed 
itself is hard.  I guess we should set up the migration channel before 
sandboxing, which constrains runtime configuration (basically this would 
only allow us to set up a listening server, I believe).  Well, and 
finally, it isn’t a standard way, which won’t be great if we’re planning 
to add a standard way anyway.

Thanks!

Hanna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 12:35 vhost-user (virtio-fs) migration: back end state Hanna Czenczek
@ 2023-02-06 16:27 ` Stefan Hajnoczi
  2023-02-06 21:02   ` Juan Quintela
  2023-02-07  9:08   ` Hanna Czenczek
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-02-06 16:27 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> Hi Stefan,
>
> For true virtio-fs migration, we need to migrate the daemon’s (back
> end’s) state somehow.  I’m addressing you because you had a talk on this
> topic at KVM Forum 2021. :)
>
> As far as I understood your talk, the only standardized way to migrate a
> vhost-user back end’s state is via dbus-vmstate.  I believe that
> interface is unsuitable for our use case, because we will need to
> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
> been chosen arbitrarily, but the introducing commit’s message says that
> it’s based on the idea that the data must be supplied basically
> immediately anyway (due to both dbus and qemu migration requirements),
> and I don’t think we can meet that requirement.

Yes, dbus-vmstate is the available today. It's independent of
vhost-user and VIRTIO.

> Has there been progress on the topic of standardizing a vhost-user back
> end state migration channel besides dbus-vmstate?  I’ve looked around
> but didn’t find anything.  If there isn’t anything yet, is there still
> interest in the topic?

Not that I'm aware of. There are two parts to the topic of VIRTIO
device state migration:
1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
devices. It doesn't need to be implemented in all these places
immediately, but the design should consider that each of these
standards will need to participate in migration sooner or later. It
makes sense to choose an interface that works for all or most of these
interfaces instead of inventing something vhost-user-specific.
2. Defining standard device state formats so VIRTIO implementations
can interoperate.

> Of course, we could use a channel that completely bypasses qemu, but I
> think we’d like to avoid that if possible.  First, this would require
> adding functionality to virtiofsd to configure this channel.  Second,
> not storing the state in the central VM state means that migrating to
> file doesn’t work (well, we could migrate to a dedicated state file,
> but...).  Third, setting up such a channel after virtiofsd has sandboxed
> itself is hard.  I guess we should set up the migration channel before
> sandboxing, which constrains runtime configuration (basically this would
> only allow us to set up a listening server, I believe).  Well, and
> finally, it isn’t a standard way, which won’t be great if we’re planning
> to add a standard way anyway.

Yes, live migration is hard enough. Duplicating it is probably not
going to make things better. It would still be necessary to support
saving to file as well as live migration.

There are two high-level approaches to the migration interface:
1. The whitebox approach where the vhost-user back-end implements
device-specific messages to get/set migration state (e.g.
VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
containing the state of the FUSE session or multiple fine-grained
messages that extract pieces of state). The hypervisor is responsible
for the actual device state serialization.
2. The blackbox approach where the vhost-user back-end implements the
device state serialization itself and just produces a blob of data.

An example of the whitebox approach is the existing vhost migration
interface - except that it doesn't really support device-specific
state, only generic virtqueue state.

An example of the blackbox approach is the VFIO v2 migration interface:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867

Another aspect to consider is whether save/load is sufficient or if
the full iterative migration model needs to be exposed by the
interface. VFIO migration is an example of the full iterative model
while dbus-vmstate is just save/load. Devices with large amounts of
state need the full iterative model while simple devices just need
save/load.

Regarding virtiofs, I think the device state is not
implementation-specific. Different implementations may have different
device states (e.g. in-memory file system implementation versus POSIX
file system-backed implementation), but the device state produced by
https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
another implementation.

My suggestion is blackbox migration with a full iterative interface.
The reason I like the blackbox approach is that a device's device
state is encapsulated in the device implementation and does not
require coordinating changes across other codebases (e.g. vDPA and
vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
interface only needs to be defined and implemented once. After that,
device implementations can evolve without constant changes at various
layers.

So basically, something like VFIO v2 migration but for vhost-user
(with an eye towards vDPA and VIRTIO support in the future).

Should we schedule a call with Jason, Michael, Juan, David, etc to
discuss further? That way there's less chance of spending weeks
working on something only to be asked to change the approach later.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 16:27 ` Stefan Hajnoczi
@ 2023-02-06 21:02   ` Juan Quintela
  2023-02-06 21:16     ` Stefan Hajnoczi
  2023-02-07  9:35     ` Hanna Czenczek
  2023-02-07  9:08   ` Hanna Czenczek
  1 sibling, 2 replies; 13+ messages in thread
From: Juan Quintela @ 2023-02-06 21:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Hanna Czenczek, Stefan Hajnoczi, virtio-fs-list,
	qemu-devel@nongnu.org, Marc-André Lureau, Eugenio Pérez,
	Jason Wang, Michael S. Tsirkin, Dave Gilbert

Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>>
>> Hi Stefan,
>>
>> For true virtio-fs migration, we need to migrate the daemon’s (back
>> end’s) state somehow.  I’m addressing you because you had a talk on this
>> topic at KVM Forum 2021. :)
>>
>> As far as I understood your talk, the only standardized way to migrate a
>> vhost-user back end’s state is via dbus-vmstate.  I believe that
>> interface is unsuitable for our use case, because we will need to
>> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
>> been chosen arbitrarily, but the introducing commit’s message says that
>> it’s based on the idea that the data must be supplied basically
>> immediately anyway (due to both dbus and qemu migration requirements),
>> and I don’t think we can meet that requirement.
>
> Yes, dbus-vmstate is the available today. It's independent of
> vhost-user and VIRTIO.

Once that we are here:
- typical size of your starte (either vhost-user or whatever)
- what are the posibilities that you can enter the iterative stage
  negotiation (i.e. that you can create a dirty bitmap about your state)

>> Has there been progress on the topic of standardizing a vhost-user back
>> end state migration channel besides dbus-vmstate?  I’ve looked around
>> but didn’t find anything.  If there isn’t anything yet, is there still
>> interest in the topic?
>
> Not that I'm aware of. There are two parts to the topic of VIRTIO
> device state migration:
> 1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
> devices.

Related topic: I am having to do right now vfio devices migration.  That
is basically hardware with huge binary blobs.  But they are "learning"
to have a dirty bitmap.  Current GPU's are already on the 128GB range,
so it is really needed.

> It doesn't need to be implemented in all these places
> immediately, but the design should consider that each of these
> standards will need to participate in migration sooner or later. It
> makes sense to choose an interface that works for all or most of these
> interfaces instead of inventing something vhost-user-specific.

In vfio, we really need to use binary blobs.  Here I don't know what to
do here.  In one side, "understading" what is through the channel makes
things way easier.  On the other hand, learning vmstate or similar is
complicated.

The other thing that we *think* is going to be needed is something like
what we do with cpus. cpu models and flags.  Too many flags.

Why?  Because once that they are at it, they want to be able to migrate
from one card, lets say Mellanox^wNVidia Connection-CX5 to
Connection-CX6, with not necessarily the same levels of firmawere.
I.e. fun.

> 2. Defining standard device state formats so VIRTIO implementations
> can interoperate.

I have no clue here.

>> Of course, we could use a channel that completely bypasses qemu, but I
>> think we’d like to avoid that if possible.  First, this would require
>> adding functionality to virtiofsd to configure this channel.  Second,
>> not storing the state in the central VM state means that migrating to
>> file doesn’t work (well, we could migrate to a dedicated state file,
>> but...).

How much is migration to file used in practice?
I would like to have some information here.
It could be necessary probably to be able to encrypt it.  And that is a
(different) whole can of worms.

>> Third, setting up such a channel after virtiofsd has sandboxed
>> itself is hard.  I guess we should set up the migration channel before
>> sandboxing, which constrains runtime configuration (basically this would
>> only allow us to set up a listening server, I believe).  Well, and
>> finally, it isn’t a standard way, which won’t be great if we’re planning
>> to add a standard way anyway.
>
> Yes, live migration is hard enough. Duplicating it is probably not
> going to make things better. It would still be necessary to support
> saving to file as well as live migration.

The other problem of NOT using migration infrastructure is firewalls.
Live Migration only uses a single port.  It uses as many sockets as it
needs with multifd, but use the same port to make life easier for
libvirt/management app.

Adding a new port for each vhost-user devices is not going to fly with
admins.

> There are two high-level approaches to the migration interface:
> 1. The whitebox approach where the vhost-user back-end implements
> device-specific messages to get/set migration state (e.g.
> VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
> containing the state of the FUSE session or multiple fine-grained
> messages that extract pieces of state). The hypervisor is responsible
> for the actual device state serialization.
> 2. The blackbox approach where the vhost-user back-end implements the
> device state serialization itself and just produces a blob of data.

If your state is big enough, you are going to need a dirty bitmap or
something similar.  Independently if you use white or black box
approach.

100gigabit network ~ 10GB/s transfer.
1GB of state takes 1second downtime.

And 100gigabit is not common now.  If you are stuck at 10gigabit, then
you can only transfer 1GB in 1 second downtime.

And we are getting to the point when we have multiple vhost-user/vfio
devices, etc.

Another problem that we are working with right now is bitmaps.  Just
synchronizing them takes forever.  Take a 6TB guest:

6TB guest ~ 6TB/4KB ~ 1.600.000 pages, i.e. the size of the bitmap in bits
1.600.000 entries /8 bits/byte - 200.000 bytes - 200MB each bitmap.

If we end needing one for memory, and one for each vfio device, and
another for each vhost device, that makes synchronization,
... interesting to say the less.  We could start using GPU's to
synhronize bitmaps O:-)

> An example of the whitebox approach is the existing vhost migration
> interface - except that it doesn't really support device-specific
> state, only generic virtqueue state.
>
> An example of the blackbox approach is the VFIO v2 migration interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867
>
> Another aspect to consider is whether save/load is sufficient or if
> the full iterative migration model needs to be exposed by the
> interface. VFIO migration is an example of the full iterative model
> while dbus-vmstate is just save/load. Devices with large amounts of
> state need the full iterative model while simple devices just need
> save/load.

This is why I asked the size of vhost devices or whatever it is called
this week O:-)

> Regarding virtiofs, I think the device state is not
> implementation-specific. Different implementations may have different
> device states (e.g. in-memory file system implementation versus POSIX
> file system-backed implementation), but the device state produced by
> https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
> another implementation.
>
> My suggestion is blackbox migration with a full iterative interface.
> The reason I like the blackbox approach is that a device's device
> state is encapsulated in the device implementation and does not
> require coordinating changes across other codebases (e.g. vDPA and
> vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
> interface only needs to be defined and implemented once. After that,
> device implementations can evolve without constant changes at various
> layers.
>
> So basically, something like VFIO v2 migration but for vhost-user
> (with an eye towards vDPA and VIRTIO support in the future).
>
> Should we schedule a call with Jason, Michael, Juan, David, etc to
> discuss further? That way there's less chance of spending weeks
> working on something only to be asked to change the approach later.

We are discussing this with vfio.

Basically what we have asked vfio to support is:
- enter the interative stage to explain how much dirty memory do they
  have.  We need this to calculate downtimes.  See my last PULL request
  to see how I implemented it.
  I generalized save_state_pending() for save_live devices to
  state_pending_estimate() and state_pending_exact().
  Only device that use different implementations for that values right
  now is ram.  But I expect more to use it.
  The idea is that with estimate, you give an estimate of how much you
  think is pending, but without trying too hard.
  ram returns how much dirty bits are on the ram dirty bitmap now.
  with the _exact() one, you try very hard to give a "more" correct
  size.  It is called when according to the estimates, we have few dirty
  memory that we could enter last stage migration.

- My next project is creating a new multifd thread for each vfio device
  that requires it.  It does right now:
  * we give a channel for the device, nothing else will use it
  * a thread on the sending side/recovering side for the device
  * we notify when we have ended the iterative stage, so they can start
  * they can use the channel how they want, as it is on the ending
    stage, they can transfer full speed.

- They asked for a way to stop migration if we can not arrive to
  downtime needed.  If with current speed, the maximum amount of dirty
  memory that we can transmit is 512MB, and vfio tells us that they have
  more than 512MB only by itself, we now this will never converge, so we
  have to abort migration.  In the case of vfio devices, the device
  state depends of guest configuration, and it is not going ta change
  until guest change configuration.

The last two bits are on my ToDo list for the near future, but not done.

If we ended having lots of so big devices, we are going to have to think
about downtimes in the order of dozens of seconds, not subsecond.

So, if you are planning doing this in the near future, this is a good
time to discuss this.

Later, Juan.

> Stefan



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 21:02   ` Juan Quintela
@ 2023-02-06 21:16     ` Stefan Hajnoczi
  2023-02-06 23:58       ` Juan Quintela
  2023-02-07  9:35     ` Hanna Czenczek
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-02-06 21:16 UTC (permalink / raw)
  To: quintela
  Cc: Hanna Czenczek, Stefan Hajnoczi, virtio-fs-list,
	qemu-devel@nongnu.org, Marc-André Lureau, Eugenio Pérez,
	Jason Wang, Michael S. Tsirkin, Dave Gilbert

On Mon, 6 Feb 2023 at 16:02, Juan Quintela <quintela@redhat.com> wrote:
>
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
> >>
> >> Hi Stefan,
> >>
> >> For true virtio-fs migration, we need to migrate the daemon’s (back
> >> end’s) state somehow.  I’m addressing you because you had a talk on this
> >> topic at KVM Forum 2021. :)
> >>
> >> As far as I understood your talk, the only standardized way to migrate a
> >> vhost-user back end’s state is via dbus-vmstate.  I believe that
> >> interface is unsuitable for our use case, because we will need to
> >> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
> >> been chosen arbitrarily, but the introducing commit’s message says that
> >> it’s based on the idea that the data must be supplied basically
> >> immediately anyway (due to both dbus and qemu migration requirements),
> >> and I don’t think we can meet that requirement.
> >
> > Yes, dbus-vmstate is the available today. It's independent of
> > vhost-user and VIRTIO.
>
> Once that we are here:
> - typical size of your starte (either vhost-user or whatever)
> - what are the posibilities that you can enter the iterative stage
>   negotiation (i.e. that you can create a dirty bitmap about your state)
>
> >> Has there been progress on the topic of standardizing a vhost-user back
> >> end state migration channel besides dbus-vmstate?  I’ve looked around
> >> but didn’t find anything.  If there isn’t anything yet, is there still
> >> interest in the topic?
> >
> > Not that I'm aware of. There are two parts to the topic of VIRTIO
> > device state migration:
> > 1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
> > devices.
>
> Related topic: I am having to do right now vfio devices migration.  That
> is basically hardware with huge binary blobs.  But they are "learning"
> to have a dirty bitmap.  Current GPU's are already on the 128GB range,
> so it is really needed.
>
> > It doesn't need to be implemented in all these places
> > immediately, but the design should consider that each of these
> > standards will need to participate in migration sooner or later. It
> > makes sense to choose an interface that works for all or most of these
> > interfaces instead of inventing something vhost-user-specific.
>
> In vfio, we really need to use binary blobs.  Here I don't know what to
> do here.  In one side, "understading" what is through the channel makes
> things way easier.  On the other hand, learning vmstate or similar is
> complicated.
>
> The other thing that we *think* is going to be needed is something like
> what we do with cpus. cpu models and flags.  Too many flags.
>
> Why?  Because once that they are at it, they want to be able to migrate
> from one card, lets say Mellanox^wNVidia Connection-CX5 to
> Connection-CX6, with not necessarily the same levels of firmawere.
> I.e. fun.
>
> > 2. Defining standard device state formats so VIRTIO implementations
> > can interoperate.
>
> I have no clue here.
>
> >> Of course, we could use a channel that completely bypasses qemu, but I
> >> think we’d like to avoid that if possible.  First, this would require
> >> adding functionality to virtiofsd to configure this channel.  Second,
> >> not storing the state in the central VM state means that migrating to
> >> file doesn’t work (well, we could migrate to a dedicated state file,
> >> but...).
>
> How much is migration to file used in practice?
> I would like to have some information here.
> It could be necessary probably to be able to encrypt it.  And that is a
> (different) whole can of worms.
>
> >> Third, setting up such a channel after virtiofsd has sandboxed
> >> itself is hard.  I guess we should set up the migration channel before
> >> sandboxing, which constrains runtime configuration (basically this would
> >> only allow us to set up a listening server, I believe).  Well, and
> >> finally, it isn’t a standard way, which won’t be great if we’re planning
> >> to add a standard way anyway.
> >
> > Yes, live migration is hard enough. Duplicating it is probably not
> > going to make things better. It would still be necessary to support
> > saving to file as well as live migration.
>
> The other problem of NOT using migration infrastructure is firewalls.
> Live Migration only uses a single port.  It uses as many sockets as it
> needs with multifd, but use the same port to make life easier for
> libvirt/management app.
>
> Adding a new port for each vhost-user devices is not going to fly with
> admins.
>
> > There are two high-level approaches to the migration interface:
> > 1. The whitebox approach where the vhost-user back-end implements
> > device-specific messages to get/set migration state (e.g.
> > VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
> > containing the state of the FUSE session or multiple fine-grained
> > messages that extract pieces of state). The hypervisor is responsible
> > for the actual device state serialization.
> > 2. The blackbox approach where the vhost-user back-end implements the
> > device state serialization itself and just produces a blob of data.
>
> If your state is big enough, you are going to need a dirty bitmap or
> something similar.  Independently if you use white or black box
> approach.
>
> 100gigabit network ~ 10GB/s transfer.
> 1GB of state takes 1second downtime.
>
> And 100gigabit is not common now.  If you are stuck at 10gigabit, then
> you can only transfer 1GB in 1 second downtime.
>
> And we are getting to the point when we have multiple vhost-user/vfio
> devices, etc.
>
> Another problem that we are working with right now is bitmaps.  Just
> synchronizing them takes forever.  Take a 6TB guest:
>
> 6TB guest ~ 6TB/4KB ~ 1.600.000 pages, i.e. the size of the bitmap in bits
> 1.600.000 entries /8 bits/byte - 200.000 bytes - 200MB each bitmap.
>
> If we end needing one for memory, and one for each vfio device, and
> another for each vhost device, that makes synchronization,
> ... interesting to say the less.  We could start using GPU's to
> synhronize bitmaps O:-)
>
> > An example of the whitebox approach is the existing vhost migration
> > interface - except that it doesn't really support device-specific
> > state, only generic virtqueue state.
> >
> > An example of the blackbox approach is the VFIO v2 migration interface:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867
> >
> > Another aspect to consider is whether save/load is sufficient or if
> > the full iterative migration model needs to be exposed by the
> > interface. VFIO migration is an example of the full iterative model
> > while dbus-vmstate is just save/load. Devices with large amounts of
> > state need the full iterative model while simple devices just need
> > save/load.
>
> This is why I asked the size of vhost devices or whatever it is called
> this week O:-)
>
> > Regarding virtiofs, I think the device state is not
> > implementation-specific. Different implementations may have different
> > device states (e.g. in-memory file system implementation versus POSIX
> > file system-backed implementation), but the device state produced by
> > https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
> > another implementation.
> >
> > My suggestion is blackbox migration with a full iterative interface.
> > The reason I like the blackbox approach is that a device's device
> > state is encapsulated in the device implementation and does not
> > require coordinating changes across other codebases (e.g. vDPA and
> > vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
> > interface only needs to be defined and implemented once. After that,
> > device implementations can evolve without constant changes at various
> > layers.
> >
> > So basically, something like VFIO v2 migration but for vhost-user
> > (with an eye towards vDPA and VIRTIO support in the future).
> >
> > Should we schedule a call with Jason, Michael, Juan, David, etc to
> > discuss further? That way there's less chance of spending weeks
> > working on something only to be asked to change the approach later.
>
> We are discussing this with vfio.
>
> Basically what we have asked vfio to support is:
> - enter the interative stage to explain how much dirty memory do they
>   have.  We need this to calculate downtimes.  See my last PULL request
>   to see how I implemented it.
>   I generalized save_state_pending() for save_live devices to
>   state_pending_estimate() and state_pending_exact().
>   Only device that use different implementations for that values right
>   now is ram.  But I expect more to use it.
>   The idea is that with estimate, you give an estimate of how much you
>   think is pending, but without trying too hard.
>   ram returns how much dirty bits are on the ram dirty bitmap now.
>   with the _exact() one, you try very hard to give a "more" correct
>   size.  It is called when according to the estimates, we have few dirty
>   memory that we could enter last stage migration.
>
> - My next project is creating a new multifd thread for each vfio device
>   that requires it.  It does right now:
>   * we give a channel for the device, nothing else will use it
>   * a thread on the sending side/recovering side for the device
>   * we notify when we have ended the iterative stage, so they can start
>   * they can use the channel how they want, as it is on the ending
>     stage, they can transfer full speed.
>
> - They asked for a way to stop migration if we can not arrive to
>   downtime needed.  If with current speed, the maximum amount of dirty
>   memory that we can transmit is 512MB, and vfio tells us that they have
>   more than 512MB only by itself, we now this will never converge, so we
>   have to abort migration.  In the case of vfio devices, the device
>   state depends of guest configuration, and it is not going ta change
>   until guest change configuration.
>
> The last two bits are on my ToDo list for the near future, but not done.
>
> If we ended having lots of so big devices, we are going to have to think
> about downtimes in the order of dozens of seconds, not subsecond.
>
> So, if you are planning doing this in the near future, this is a good
> time to discuss this.

Can you explain the dirty bitmap requirement you mentioned further, given that:
1. vhost-user has dirty memory logging already, so DMA is covered.
2. An iterative interface allows the device to keep generating more
state, so does QEMU really need to know if parts of the previously
emitted binary blob have become dirty? It might allow QEMU to minimize
the size of the savevm file, but only if the overwritten data has the
same size.

Is a dirty bitmap for the device state necessary?

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 21:16     ` Stefan Hajnoczi
@ 2023-02-06 23:58       ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2023-02-06 23:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Hanna Czenczek, Stefan Hajnoczi, virtio-fs-list,
	qemu-devel@nongnu.org, Marc-André Lureau, Eugenio Pérez,
	Jason Wang, Michael S. Tsirkin, Dave Gilbert

Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, 6 Feb 2023 at 16:02, Juan Quintela <quintela@redhat.com> wrote:
>> The last two bits are on my ToDo list for the near future, but not done.
>>
>> If we ended having lots of so big devices, we are going to have to think
>> about downtimes in the order of dozens of seconds, not subsecond.
>>
>> So, if you are planning doing this in the near future, this is a good
>> time to discuss this.
>
> Can you explain the dirty bitmap requirement you mentioned further, given that:
> 1. vhost-user has dirty memory logging already, so DMA is covered.
> 2. An iterative interface allows the device to keep generating more
> state, so does QEMU really need to know if parts of the previously
> emitted binary blob have become dirty? It might allow QEMU to minimize
> the size of the savevm file, but only if the overwritten data has the
> same size.
>
> Is a dirty bitmap for the device state necessary?

Not for that.  My fault.  I was talking about a dirty bitmap because
that is how it is implemented in vfio.  Notice that qemu don't see that
bitmap.  But the device can enter the iterative stage.

The reason that I ask is if you can enter the iterative stage to send
stuff before the last stage.

If you don't have something similar to that, you need to send that in
one go, and that is going to take really a lot of time.

When we are talking about NVidia GPU's, they have two kinds of state:
- Frame buffer: they can (or at some point would have) a dirty bitmap
  and they can enter the iterative stage.
- They have internal state that is not visible to the user, that state
  is big (around 1GB) and they can't enter the iterative stage, because
  they can't know if the guest, the card, or whoever has changed that.

For vhost device:
- what is the "typical" amount of state.
- if it is more than a few megabytes, is there a way to send parts of it
  before the ending stage?  That is what I asked for a dirty bitmap for
  it, but it can be anything else.  Just that vhost-user keep track of
  what has sent to the other side and then have less state for the last
  stage?
  

Later, Juan.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 21:02   ` Juan Quintela
  2023-02-06 21:16     ` Stefan Hajnoczi
@ 2023-02-07  9:35     ` Hanna Czenczek
  2023-02-07 15:13       ` Juan Quintela
  1 sibling, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2023-02-07  9:35 UTC (permalink / raw)
  To: quintela, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert

On 06.02.23 22:02, Juan Quintela wrote:
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>>> Hi Stefan,
>>>
>>> For true virtio-fs migration, we need to migrate the daemon’s (back
>>> end’s) state somehow.  I’m addressing you because you had a talk on this
>>> topic at KVM Forum 2021. :)
>>>
>>> As far as I understood your talk, the only standardized way to migrate a
>>> vhost-user back end’s state is via dbus-vmstate.  I believe that
>>> interface is unsuitable for our use case, because we will need to
>>> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
>>> been chosen arbitrarily, but the introducing commit’s message says that
>>> it’s based on the idea that the data must be supplied basically
>>> immediately anyway (due to both dbus and qemu migration requirements),
>>> and I don’t think we can meet that requirement.
>> Yes, dbus-vmstate is the available today. It's independent of
>> vhost-user and VIRTIO.
> Once that we are here:
> - typical size of your starte (either vhost-user or whatever)

Difficult to say, completely depends on the use case.  When identifying 
files by path and organizing them in a tree structure, probably ~48 
bytes per indexed file, plus, say, 16 bytes per open file.

So for a small shared filesystem, the state can be very small, but we’ll 
also have to prepare for cases where it is in the range of several MB.

The main problem isn’t size but that (when identifying files by path) 
we’ll probably want to construct the paths when migrating, which won’t 
be done instantaneously.

> - what are the posibilities that you can enter the iterative stage
>    negotiation (i.e. that you can create a dirty bitmap about your state)

Very good.  We should know when parts of the state (information about a 
specific indexed or open file) changes.  (Exceptions apply, but they 
mostly come down to whether indexed files are identified by path or file 
handle, which is a choice the user will probably need to make.  Either 
one comes with caveats.)

[...]

>>> Of course, we could use a channel that completely bypasses qemu, but I
>>> think we’d like to avoid that if possible.  First, this would require
>>> adding functionality to virtiofsd to configure this channel.  Second,
>>> not storing the state in the central VM state means that migrating to
>>> file doesn’t work (well, we could migrate to a dedicated state file,
>>> but...).
> How much is migration to file used in practice?
> I would like to have some information here.
> It could be necessary probably to be able to encrypt it.  And that is a
> (different) whole can of worms.

I don’t think virtio-fs state needs to be encrypted any more than any 
other state.  It’ll basically just map FUSE inode IDs to a file in the 
shared directory, either via path or file handle; and then also track 
open(2) flags for opened files.  (At least that’s what’s currently on my 
radar.)  That information should actually be replicated in the guest, 
too (because it too will have mapped the filesystem paths to FUSE inode 
IDs), so isn’t more security relevant than guest memory.

Hanna



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-07  9:35     ` Hanna Czenczek
@ 2023-02-07 15:13       ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2023-02-07 15:13 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, Stefan Hajnoczi, virtio-fs-list,
	qemu-devel@nongnu.org, Marc-André Lureau, Eugenio Pérez,
	Jason Wang, Michael S. Tsirkin, Dave Gilbert

Hanna Czenczek <hreitz@redhat.com> wrote:
> On 06.02.23 22:02, Juan Quintela wrote:
>> Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>>>> Hi Stefan,
>>>>
>>>> For true virtio-fs migration, we need to migrate the daemon’s (back
>>>> end’s) state somehow.  I’m addressing you because you had a talk on this
>>>> topic at KVM Forum 2021. :)
>>>>
>>>> As far as I understood your talk, the only standardized way to migrate a
>>>> vhost-user back end’s state is via dbus-vmstate.  I believe that
>>>> interface is unsuitable for our use case, because we will need to
>>>> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
>>>> been chosen arbitrarily, but the introducing commit’s message says that
>>>> it’s based on the idea that the data must be supplied basically
>>>> immediately anyway (due to both dbus and qemu migration requirements),
>>>> and I don’t think we can meet that requirement.
>>> Yes, dbus-vmstate is the available today. It's independent of
>>> vhost-user and VIRTIO.
>> Once that we are here:
>> - typical size of your starte (either vhost-user or whatever)
>
> Difficult to say, completely depends on the use case.  When
> identifying files by path and organizing them in a tree structure,
> probably ~48 bytes per indexed file, plus, say, 16 bytes per open
> file.
>
> So for a small shared filesystem, the state can be very small, but
> we’ll also have to prepare for cases where it is in the range of
> several MB.

That is not two bad.  Anything below a few tens megabytes is easy to
manage.  Anything in the hundred of megabytes or more really need thought.

> The main problem isn’t size but that (when identifying files by path)
> we’ll probably want to construct the paths when migrating, which won’t
> be done instantaneously.
>
>> - what are the posibilities that you can enter the iterative stage
>>    negotiation (i.e. that you can create a dirty bitmap about your state)
>
> Very good.  We should know when parts of the state (information about
> a specific indexed or open file) changes.  (Exceptions apply, but they
> mostly come down to whether indexed files are identified by path or
> file handle, which is a choice the user will probably need to make. 
> Either one comes with caveats.)

That is good.

>> How much is migration to file used in practice?
>> I would like to have some information here.
>> It could be necessary probably to be able to encrypt it.  And that is a
>> (different) whole can of worms.
>
> I don’t think virtio-fs state needs to be encrypted any more than any
> other state.  It’ll basically just map FUSE inode IDs to a file in the
> shared directory, either via path or file handle; and then also track
> open(2) flags for opened files.  (At least that’s what’s currently on
> my radar.)  That information should actually be replicated in the
> guest, too (because it too will have mapped the filesystem paths to
> FUSE inode IDs), so isn’t more security relevant than guest memory.

Oh, that was not about virtio-fs at all.

Is because you talked about file migration.

Right now, we need to use exec migration to do this.  but it is clearly
suboptimal.  Basically we just do a normal migration, but that means
that we have a lot of duplicated pages on the destination.

But we can do better.  Just create a file that is as big as the memory,
and write everypage in its own place.  So loading is going to be really
fast.  (yes holes on RAM is a different issue, but we can ignore that
for now).
And the other thing is that we have to really encrypt it somehow, so I
guess that a block cipher should work, but encryption is not my field of
expertise at all.

In vhost-user-fs case, I fully agree with you that if you are
"exporting" part of the local filesystem, encryption don't buy you
anything.

Later, Juan.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-06 16:27 ` Stefan Hajnoczi
  2023-02-06 21:02   ` Juan Quintela
@ 2023-02-07  9:08   ` Hanna Czenczek
  2023-02-07 12:29     ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2023-02-07  9:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On 06.02.23 17:27, Stefan Hajnoczi wrote:
> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>> Hi Stefan,
>>
>> For true virtio-fs migration, we need to migrate the daemon’s (back
>> end’s) state somehow.  I’m addressing you because you had a talk on this
>> topic at KVM Forum 2021. :)
>>
>> As far as I understood your talk, the only standardized way to migrate a
>> vhost-user back end’s state is via dbus-vmstate.  I believe that
>> interface is unsuitable for our use case, because we will need to
>> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
>> been chosen arbitrarily, but the introducing commit’s message says that
>> it’s based on the idea that the data must be supplied basically
>> immediately anyway (due to both dbus and qemu migration requirements),
>> and I don’t think we can meet that requirement.
> Yes, dbus-vmstate is the available today. It's independent of
> vhost-user and VIRTIO.
>
>> Has there been progress on the topic of standardizing a vhost-user back
>> end state migration channel besides dbus-vmstate?  I’ve looked around
>> but didn’t find anything.  If there isn’t anything yet, is there still
>> interest in the topic?
> Not that I'm aware of. There are two parts to the topic of VIRTIO
> device state migration:
> 1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
> devices. It doesn't need to be implemented in all these places
> immediately, but the design should consider that each of these
> standards will need to participate in migration sooner or later. It
> makes sense to choose an interface that works for all or most of these
> interfaces instead of inventing something vhost-user-specific.
> 2. Defining standard device state formats so VIRTIO implementations
> can interoperate.
>
>> Of course, we could use a channel that completely bypasses qemu, but I
>> think we’d like to avoid that if possible.  First, this would require
>> adding functionality to virtiofsd to configure this channel.  Second,
>> not storing the state in the central VM state means that migrating to
>> file doesn’t work (well, we could migrate to a dedicated state file,
>> but...).  Third, setting up such a channel after virtiofsd has sandboxed
>> itself is hard.  I guess we should set up the migration channel before
>> sandboxing, which constrains runtime configuration (basically this would
>> only allow us to set up a listening server, I believe).  Well, and
>> finally, it isn’t a standard way, which won’t be great if we’re planning
>> to add a standard way anyway.
> Yes, live migration is hard enough. Duplicating it is probably not
> going to make things better. It would still be necessary to support
> saving to file as well as live migration.
>
> There are two high-level approaches to the migration interface:
> 1. The whitebox approach where the vhost-user back-end implements
> device-specific messages to get/set migration state (e.g.
> VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
> containing the state of the FUSE session or multiple fine-grained
> messages that extract pieces of state). The hypervisor is responsible
> for the actual device state serialization.
> 2. The blackbox approach where the vhost-user back-end implements the
> device state serialization itself and just produces a blob of data.

Implementing this through device-specific messages sounds quite nice to 
me, and I think this would work for the blackbox approach, too. The 
virtio-fs device in qemu (the front end stub) would provide that data as 
its VM state then, right?

I’m not sure at this point whether it is sensible to define a 
device-specific standard for the state (i.e. the whitebox approach).  I 
think that it may be too rigid if we decide to extend it in the future.  
As far as I understand, the benefit is that it would allow for 
interoperability between different virtio-fs back end implementations, 
which isn’t really a concern right now.  If we need this in the future, 
I’m sure we can extend the protocol further to alternatively use 
standardized state.  (Which can easily be turned back into a blob if 
compatibility requires it.)

I think we’ll probably want a mix of both, where it is standardized that 
the state consists of information about each FUSE inode and each open 
handle, but that information itself is treated as a blob.

> An example of the whitebox approach is the existing vhost migration
> interface - except that it doesn't really support device-specific
> state, only generic virtqueue state.
>
> An example of the blackbox approach is the VFIO v2 migration interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867
>
> Another aspect to consider is whether save/load is sufficient or if
> the full iterative migration model needs to be exposed by the
> interface. VFIO migration is an example of the full iterative model
> while dbus-vmstate is just save/load. Devices with large amounts of
> state need the full iterative model while simple devices just need
> save/load.

Yes, we will probably need an iterative model.  Splitting the state into 
information about each FUSE inode/handle (so that single inodes/handles 
can be updated if needed) should help accomplish this.

> Regarding virtiofs, I think the device state is not
> implementation-specific. Different implementations may have different
> device states (e.g. in-memory file system implementation versus POSIX
> file system-backed implementation), but the device state produced by
> https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
> another implementation.

Difficult to say.  What seems universal to us now may well not be, 
because we’re just seeing our own implementation.  I think we’ll just 
serialize it in a way that makes sense to us now, and hope it’ll make 
sense to others too should the need arise.

> My suggestion is blackbox migration with a full iterative interface.
> The reason I like the blackbox approach is that a device's device
> state is encapsulated in the device implementation and does not
> require coordinating changes across other codebases (e.g. vDPA and
> vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
> interface only needs to be defined and implemented once. After that,
> device implementations can evolve without constant changes at various
> layers.

Agreed.

> So basically, something like VFIO v2 migration but for vhost-user
> (with an eye towards vDPA and VIRTIO support in the future).
>
> Should we schedule a call with Jason, Michael, Juan, David, etc to
> discuss further? That way there's less chance of spending weeks
> working on something only to be asked to change the approach later.

Sure, sounds good!  I’ve taken a look into what state we’ll need to 
migrate already, but I’ll take a more detailed look now so that it’s 
clear what our requirements are.

Hanna



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-07  9:08   ` Hanna Czenczek
@ 2023-02-07 12:29     ` Stefan Hajnoczi
  2023-02-08 14:32       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-02-07 12:29 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On Tue, 7 Feb 2023 at 04:08, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 06.02.23 17:27, Stefan Hajnoczi wrote:
> > On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
> >> Hi Stefan,
> >>
> >> For true virtio-fs migration, we need to migrate the daemon’s (back
> >> end’s) state somehow.  I’m addressing you because you had a talk on this
> >> topic at KVM Forum 2021. :)
> >>
> >> As far as I understood your talk, the only standardized way to migrate a
> >> vhost-user back end’s state is via dbus-vmstate.  I believe that
> >> interface is unsuitable for our use case, because we will need to
> >> migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly
> >> been chosen arbitrarily, but the introducing commit’s message says that
> >> it’s based on the idea that the data must be supplied basically
> >> immediately anyway (due to both dbus and qemu migration requirements),
> >> and I don’t think we can meet that requirement.
> > Yes, dbus-vmstate is the available today. It's independent of
> > vhost-user and VIRTIO.
> >
> >> Has there been progress on the topic of standardizing a vhost-user back
> >> end state migration channel besides dbus-vmstate?  I’ve looked around
> >> but didn’t find anything.  If there isn’t anything yet, is there still
> >> interest in the topic?
> > Not that I'm aware of. There are two parts to the topic of VIRTIO
> > device state migration:
> > 1. Defining an interface for migrating VIRTIO/vDPA/vhost/vhost-user
> > devices. It doesn't need to be implemented in all these places
> > immediately, but the design should consider that each of these
> > standards will need to participate in migration sooner or later. It
> > makes sense to choose an interface that works for all or most of these
> > interfaces instead of inventing something vhost-user-specific.
> > 2. Defining standard device state formats so VIRTIO implementations
> > can interoperate.
> >
> >> Of course, we could use a channel that completely bypasses qemu, but I
> >> think we’d like to avoid that if possible.  First, this would require
> >> adding functionality to virtiofsd to configure this channel.  Second,
> >> not storing the state in the central VM state means that migrating to
> >> file doesn’t work (well, we could migrate to a dedicated state file,
> >> but...).  Third, setting up such a channel after virtiofsd has sandboxed
> >> itself is hard.  I guess we should set up the migration channel before
> >> sandboxing, which constrains runtime configuration (basically this would
> >> only allow us to set up a listening server, I believe).  Well, and
> >> finally, it isn’t a standard way, which won’t be great if we’re planning
> >> to add a standard way anyway.
> > Yes, live migration is hard enough. Duplicating it is probably not
> > going to make things better. It would still be necessary to support
> > saving to file as well as live migration.
> >
> > There are two high-level approaches to the migration interface:
> > 1. The whitebox approach where the vhost-user back-end implements
> > device-specific messages to get/set migration state (e.g.
> > VIRTIO_FS_GET_DEVICE_STATE with a struct virtio_fs_device_state
> > containing the state of the FUSE session or multiple fine-grained
> > messages that extract pieces of state). The hypervisor is responsible
> > for the actual device state serialization.
> > 2. The blackbox approach where the vhost-user back-end implements the
> > device state serialization itself and just produces a blob of data.
>
> Implementing this through device-specific messages sounds quite nice to
> me, and I think this would work for the blackbox approach, too. The
> virtio-fs device in qemu (the front end stub) would provide that data as
> its VM state then, right?

Yes. In the blackbox approach the QEMU vhost-user-fs device's vmstate
contains a blob field. The contents of the blob come from the
vhost-user-fs back-end and are not parsed/modified by QEMU.

> I’m not sure at this point whether it is sensible to define a
> device-specific standard for the state (i.e. the whitebox approach).  I
> think that it may be too rigid if we decide to extend it in the future.
> As far as I understand, the benefit is that it would allow for
> interoperability between different virtio-fs back end implementations,
> which isn’t really a concern right now.  If we need this in the future,
> I’m sure we can extend the protocol further to alternatively use
> standardized state.  (Which can easily be turned back into a blob if
> compatibility requires it.)
>
> I think we’ll probably want a mix of both, where it is standardized that
> the state consists of information about each FUSE inode and each open
> handle, but that information itself is treated as a blob.
>
> > An example of the whitebox approach is the existing vhost migration
> > interface - except that it doesn't really support device-specific
> > state, only generic virtqueue state.
> >
> > An example of the blackbox approach is the VFIO v2 migration interface:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n867
> >
> > Another aspect to consider is whether save/load is sufficient or if
> > the full iterative migration model needs to be exposed by the
> > interface. VFIO migration is an example of the full iterative model
> > while dbus-vmstate is just save/load. Devices with large amounts of
> > state need the full iterative model while simple devices just need
> > save/load.
>
> Yes, we will probably need an iterative model.  Splitting the state into
> information about each FUSE inode/handle (so that single inodes/handles
> can be updated if needed) should help accomplish this.
>
> > Regarding virtiofs, I think the device state is not
> > implementation-specific. Different implementations may have different
> > device states (e.g. in-memory file system implementation versus POSIX
> > file system-backed implementation), but the device state produced by
> > https://gitlab.com/virtio-fs/virtiofsd can probably also be loaded by
> > another implementation.
>
> Difficult to say.  What seems universal to us now may well not be,
> because we’re just seeing our own implementation.  I think we’ll just
> serialize it in a way that makes sense to us now, and hope it’ll make
> sense to others too should the need arise.

When writing this I thought about the old QEMU C virtiofsd and the
current Rust virtiofsd. I'm pretty sure they could be made to migrate
between each other. We don't need to implement that, but it shows that
the device state is not specific to just 1 implementation.

> > My suggestion is blackbox migration with a full iterative interface.
> > The reason I like the blackbox approach is that a device's device
> > state is encapsulated in the device implementation and does not
> > require coordinating changes across other codebases (e.g. vDPA and
> > vhost kernel interface, vhost-user protocol, QEMU, etc). A blackbox
> > interface only needs to be defined and implemented once. After that,
> > device implementations can evolve without constant changes at various
> > layers.
>
> Agreed.
>
> > So basically, something like VFIO v2 migration but for vhost-user
> > (with an eye towards vDPA and VIRTIO support in the future).
> >
> > Should we schedule a call with Jason, Michael, Juan, David, etc to
> > discuss further? That way there's less chance of spending weeks
> > working on something only to be asked to change the approach later.
>
> Sure, sounds good!  I’ve taken a look into what state we’ll need to
> migrate already, but I’ll take a more detailed look now so that it’s
> clear what our requirements are.

Another thing that will be important is the exact interface for
iterative migration. VFIO v1 migration had some limitations and
changed semantics in v2. Learning from that would be good.

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-07 12:29     ` Stefan Hajnoczi
@ 2023-02-08 14:32       ` Stefan Hajnoczi
  2023-02-08 14:34         ` Michael S. Tsirkin
  2023-02-08 15:58         ` Hanna Czenczek
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-02-08 14:32 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On Tue, 7 Feb 2023 at 07:29, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, 7 Feb 2023 at 04:08, Hanna Czenczek <hreitz@redhat.com> wrote:
> > On 06.02.23 17:27, Stefan Hajnoczi wrote:
> > > On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
> > > Should we schedule a call with Jason, Michael, Juan, David, etc to
> > > discuss further? That way there's less chance of spending weeks
> > > working on something only to be asked to change the approach later.
> >
> > Sure, sounds good!  I’ve taken a look into what state we’ll need to
> > migrate already, but I’ll take a more detailed look now so that it’s
> > clear what our requirements are.

Hi Hanna,
The next step is getting agreement on how the vhost-user device state
interface will work. Do you want to draft the new vhost-user protocol
messages and put together slides for discussion with Michael, Jason,
Juan, and David in the next KVM call? That might be the best way to
get agreement. Doing it via email is possible too but I guess it will
take longer.

If you don't want to design the vhost-user protocol changes yourself
then someone on this email thread can help with that.

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-08 14:32       ` Stefan Hajnoczi
@ 2023-02-08 14:34         ` Michael S. Tsirkin
  2023-02-08 15:58         ` Hanna Czenczek
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2023-02-08 14:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Hanna Czenczek, Stefan Hajnoczi, virtio-fs-list,
	qemu-devel@nongnu.org, Marc-André Lureau, Eugenio Pérez,
	Jason Wang, Dave Gilbert, Juan Quintela

On Wed, Feb 08, 2023 at 09:32:05AM -0500, Stefan Hajnoczi wrote:
> On Tue, 7 Feb 2023 at 07:29, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Tue, 7 Feb 2023 at 04:08, Hanna Czenczek <hreitz@redhat.com> wrote:
> > > On 06.02.23 17:27, Stefan Hajnoczi wrote:
> > > > On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
> > > > Should we schedule a call with Jason, Michael, Juan, David, etc to
> > > > discuss further? That way there's less chance of spending weeks
> > > > working on something only to be asked to change the approach later.
> > >
> > > Sure, sounds good!  I’ve taken a look into what state we’ll need to
> > > migrate already, but I’ll take a more detailed look now so that it’s
> > > clear what our requirements are.
> 
> Hi Hanna,
> The next step is getting agreement on how the vhost-user device state
> interface will work. Do you want to draft the new vhost-user protocol
> messages and put together slides for discussion with Michael, Jason,
> Juan, and David in the next KVM call? That might be the best way to
> get agreement. Doing it via email is possible too but I guess it will
> take longer.

Let's get a proposal on list first please. If there's a disagreement
we can do a call too, but give everyone time to review.

> If you don't want to design the vhost-user protocol changes yourself
> then someone on this email thread can help with that.
> 
> Stefan



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-08 14:32       ` Stefan Hajnoczi
  2023-02-08 14:34         ` Michael S. Tsirkin
@ 2023-02-08 15:58         ` Hanna Czenczek
  2023-02-08 16:32           ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2023-02-08 15:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On 08.02.23 15:32, Stefan Hajnoczi wrote:
> On Tue, 7 Feb 2023 at 07:29, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Tue, 7 Feb 2023 at 04:08, Hanna Czenczek <hreitz@redhat.com> wrote:
>>> On 06.02.23 17:27, Stefan Hajnoczi wrote:
>>>> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>>>> Should we schedule a call with Jason, Michael, Juan, David, etc to
>>>> discuss further? That way there's less chance of spending weeks
>>>> working on something only to be asked to change the approach later.
>>> Sure, sounds good!  I’ve taken a look into what state we’ll need to
>>> migrate already, but I’ll take a more detailed look now so that it’s
>>> clear what our requirements are.
> Hi Hanna,
> The next step is getting agreement on how the vhost-user device state
> interface will work. Do you want to draft the new vhost-user protocol
> messages and put together slides for discussion with Michael, Jason,
> Juan, and David in the next KVM call? That might be the best way to
> get agreement. Doing it via email is possible too but I guess it will
> take longer.
>
> If you don't want to design the vhost-user protocol changes yourself
> then someone on this email thread can help with that.

I’ll need to talk about the whole thing to Stefano and German first 
(we’re collaborating on virtio-fs migration, looking at different 
aspects of it).  Also, I think I’ll want to look into the code a bit 
first and fiddle around to get a working prototype so I get an idea of 
what might be feasible at all.  I wouldn’t want to propose something 
that actually can’t work when I try to make it work in practice. O:)

Hanna



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vhost-user (virtio-fs) migration: back end state
  2023-02-08 15:58         ` Hanna Czenczek
@ 2023-02-08 16:32           ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2023-02-08 16:32 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel@nongnu.org,
	Marc-André Lureau, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Dave Gilbert, Juan Quintela

On Wed, 8 Feb 2023 at 10:58, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 08.02.23 15:32, Stefan Hajnoczi wrote:
> > On Tue, 7 Feb 2023 at 07:29, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> On Tue, 7 Feb 2023 at 04:08, Hanna Czenczek <hreitz@redhat.com> wrote:
> >>> On 06.02.23 17:27, Stefan Hajnoczi wrote:
> >>>> On Mon, 6 Feb 2023 at 07:36, Hanna Czenczek <hreitz@redhat.com> wrote:
> >>>> Should we schedule a call with Jason, Michael, Juan, David, etc to
> >>>> discuss further? That way there's less chance of spending weeks
> >>>> working on something only to be asked to change the approach later.
> >>> Sure, sounds good!  I’ve taken a look into what state we’ll need to
> >>> migrate already, but I’ll take a more detailed look now so that it’s
> >>> clear what our requirements are.
> > Hi Hanna,
> > The next step is getting agreement on how the vhost-user device state
> > interface will work. Do you want to draft the new vhost-user protocol
> > messages and put together slides for discussion with Michael, Jason,
> > Juan, and David in the next KVM call? That might be the best way to
> > get agreement. Doing it via email is possible too but I guess it will
> > take longer.
> >
> > If you don't want to design the vhost-user protocol changes yourself
> > then someone on this email thread can help with that.
>
> I’ll need to talk about the whole thing to Stefano and German first
> (we’re collaborating on virtio-fs migration, looking at different
> aspects of it).  Also, I think I’ll want to look into the code a bit
> first and fiddle around to get a working prototype so I get an idea of
> what might be feasible at all.  I wouldn’t want to propose something
> that actually can’t work when I try to make it work in practice. O:)

Okay.

The specifics of the virtiofs device state don't need much discussion
here. Only the vhost-user protocol changes need agreement because they
are generic infrastructure that will affect other vhost-user device
types, vDPA, and probably core VIRTIO. So don't worry about defining
what each piece of virtiofs state needs to be at this point, I guess
you only need a rough idea in order to come up with the vhost-user
protocol changes.

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-02-08 16:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-06 12:35 vhost-user (virtio-fs) migration: back end state Hanna Czenczek
2023-02-06 16:27 ` Stefan Hajnoczi
2023-02-06 21:02   ` Juan Quintela
2023-02-06 21:16     ` Stefan Hajnoczi
2023-02-06 23:58       ` Juan Quintela
2023-02-07  9:35     ` Hanna Czenczek
2023-02-07 15:13       ` Juan Quintela
2023-02-07  9:08   ` Hanna Czenczek
2023-02-07 12:29     ` Stefan Hajnoczi
2023-02-08 14:32       ` Stefan Hajnoczi
2023-02-08 14:34         ` Michael S. Tsirkin
2023-02-08 15:58         ` Hanna Czenczek
2023-02-08 16:32           ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).