qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "John G Johnson" <john.g.johnson@oracle.com>,
	mtsirkin@redhat.com, "Daniel P. Berrangé" <berrange@redhat.com>,
	quintela@redhat.com, "Jason Wang" <jasowang@redhat.com>,
	"Felipe Franciosi" <felipe@nutanix.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	qemu-devel@nongnu.org,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Thanos Makatos" <thanos.makatos@nutanix.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: VFIO Migration
Date: Wed, 4 Nov 2020 17:32:02 +0000	[thread overview]
Message-ID: <20201104173202.GG3896@work-vm> (raw)
In-Reply-To: <20201104164744.GC425016@stefanha-x1.localdomain>

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 04, 2020 at 10:14:23AM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Tue, Nov 03, 2020 at 06:49:51PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > On Tue, Nov 03, 2020 at 12:17:09PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > > > Device Models
> > > > > > > -------------
> > > > > > > Devices have a *hardware interface* consisting of hardware registers,
> > > > > > > interrupts, and so on.
> > > > > > > 
> > > > > > > The hardware interface together with the device state representation is called
> > > > > > > a *device model*. Device models can be assigned URIs such as
> > > > > > > https://qemu.org/devices/e1000e to uniquely identify them.
> > > > > > 
> > > > > > I think this is a unique identifier, not actually a URI; the https://
> > > > > > isn't needed since no one expects to ever connect to this.
> > > > > 
> > > > > Yes, it could be any unique string. If the URI idea is not popular we
> > > > > can use any similar scheme.
> > > > 
> > > > I'm OK with it being a URI; just drop the https.
> > > 
> > > Okay.
> > > 
> > > > > > > However, secondary aspects related to the physical port may affect the device's
> > > > > > > hardware interface and need to be reflected in the device configuration. The
> > > > > > > link speed may depend on the physical port and be reported through the device's
> > > > > > > hardware interface. In that case a ``link-speed`` configuration parameter is
> > > > > > > required to prevent unexpected changes to the link speed after migration.
> > > > > > 
> > > > > > That's an interesting example; because depending on the device, it might
> > > > > > be:
> > > > > >     a) Completely virtualised so that the guest *shouldn't* know what
> > > > > > the physical link speed is, precisely to allow the physical network on
> > > > > > the destination to be different.
> > > > > > 
> > > > > >     b) Part of the migrated state
> > > > > > 
> > > > > >     c) Something that's allowed to be reloaded after migration
> > > > > > 
> > > > > >     d) Configurable
> > > > > > 
> > > > > > so I'm not sure whether it's a good example in this case or not.
> > > > > 
> > > > > Can you think of an example that has only one option?
> > > > > 
> > > > > I tried but couldn't. For example take a sound card. The guest is aware
> > > > > the device supports stereo playback (2 output channels), but which exact
> > > > > stereo host device is used doesn't matter, they are all suitable.
> > > > > 
> > > > > Now imagine migrating to a 7.1 surround-sound device. Similar options
> > > > > come into play:
> > > > > 
> > > > > a) Emulate stereo and mix it to 7.1 surround-sound on the physical
> > > > >    device. The guest still sees the stereo device.
> > > > > 
> > > > > b) Refuse migration.
> > > > > 
> > > > > c) Indicate that the output has switched and let the guest reconfigure
> > > > >    itself (e.g. a sound card with multiple outputs, where one of them is
> > > > >    stereo and another is 7.1 surround sound).
> > > > > 
> > > > > Which option is desirable depends on the use case.
> > > > 
> > > > Yes, but I think it might be worth calling out these differences;  there
> > > > are explicitly cases where you don't want external changes to be visible
> > > > and other cases where you do; both are valid, but both need thinking
> > > > about. (Another one, GPU whether you have a monitor plugged in!)
> > > 
> > > Okay.
> > > 
> > > > > > Maybe what's needed is a stronger instruction to abstract external
> > > > > > device state so that it's not part of the configuration in most cases.
> > > > > 
> > > > > Do you want to propose something?
> > > > 
> > > > I think something like 'Some part of a devices state may be irrelevant
> > > > to a migration; for example on some NICs it might be preferable to hide
> > > > the physical characteristics of the link from the guest.'
> > > 
> > > Got it.
> > > 
> > > > > > > For example, if address filtering support was added to a network card then
> > > > > > > device versions and the corresponding configurations may look like this:
> > > > > > > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > > > > > > * ``version=2`` - ``rx-filter-size=32``
> > > > > > 
> > > > > > Note configuration parameters might have been added during the life of
> > > > > > the device; e.g. if the original card had no support for rx-filters, it
> > > > > > might not have a rx-filter-size parameter.
> > > > > 
> > > > > version=1 does not explicitly set rx-filter-size=0. When a new parameter
> > > > > is introduced it must have a default value that disables its effect on
> > > > > the hardware interface and/or device state representation. This is
> > > > > described in a bit more detail in the next section, maybe it should be
> > > > > reordered.
> > > > 
> > > > We've generally found the definition of devices tends in practice to be
> > > > done newer->older; i.e. you define the current machine, and then define
> > > > the next older machine setting the defaults that used to be true; then
> > > > define the older version behind that....
> > > 
> > > That is not possible here because an older device implementation is
> > > unaware of new configuration parameters.
> > > 
> > > Looking at the example above, imagine a version=1 device is instantiated
> > > on a device implementation that supports both version=1 and version=2.
> > > Should the configuration parameter list for version=1 be empty or
> > > rx-filter-size=0?
> > > 
> > > It must to be empty, otherwise an older device implementation that only
> > > supports version=1 cannot instantiate the device. The older device
> > > implementation does not recognize the rx-filter-size configuration
> > > parameter (it was introduced in version=2) so we cannot set it to 0.
> > 
> > I think this question might come down to who expands the device version
> > definition.
> > If it's the device itself that expands that, then a version 2 device
> > knows about what it needs to do for version 1 compatibility.
> > But if you're saying someone outside the device needs to be able to
> > expand that list then I'm not sure how you'd keep that expansion in line
> > with the implementation of a device.
> 
> The current approach is that the version is expanded into configuration
> parameters when the device is instantiated. Those parameters are then
> used to check migration compatibility of the destination (versions don't
> play a role once the device has been created).
> 
> Michael replied in another sub-thread wondering if versions are really
> necessary since tools do the migration checks. Let's try dropping
> versions to simplify things. We can bring them back if needed later.

What does a user facing tool do?  If I say I want one of these NICs
and I'm on the latest QEMU machine type, who sets all these parameters?

Dave

> > > > > > > Device States
> > > > > > > -------------
> > > > > > > The details of the device state representation are not covered in this document
> > > > > > > but the general requirements are discussed here.
> > > > > > > 
> > > > > > > The device state consists of data accessible through the device's hardware
> > > > > > > interface and internal state that is needed to restore device operation.
> > > > > > > State in the hardware interface includes the values of hardware registers.
> > > > > > > An example of internal state is an index value needed to avoid processing
> > > > > > > queued requests more than once.
> > > > > > 
> > > > > > I try and emphasise that 'internal state' should be represented in a way
> > > > > > that reflects the problem rather than the particular implementation;
> > > > > > this gives it a better chance of migrating to future versions.
> > > > > 
> > > > > Sounds like a good idea.
> > > > > 
> > > > > > > Changes can be made to the device state representation as follows. Each change
> > > > > > > to device state must have a corresponding device configuration parameter that
> > > > > > > allows the change to toggled:
> > > > > > > 
> > > > > > > * When the parameter is disabled the hardware interface and device state
> > > > > > >   representation are unchanged. This allows old device states to be loaded.
> > > > > > > 
> > > > > > > * When the parameter is enabled the change comes into effect.
> > > > > > > 
> > > > > > > * The parameter's default value disables the change. Therefore old versions do
> > > > > > >   not have to explicitly specify the parameter.
> > > > > > > 
> > > > > > > The following example illustrates migration from an old device
> > > > > > > implementation to a new one. A version=1 network card is migrated to a
> > > > > > > new device implementation that is also capable of version=2 and adds the
> > > > > > > rx-filter-size=32 parameter. The new device is instantiated with
> > > > > > > version=1, which disables rx-filter-size and is capable of loading the
> > > > > > > version=1 device state. The migration completes successfully but note
> > > > > > > the device is still operating at version=1 level in the new device.
> > > > > > > 
> > > > > > > The following example illustrates migration from a new device
> > > > > > > implementation back to an older one. The new device implementation
> > > > > > > supports version=1 and version=2. The old device implementation supports
> > > > > > > version=1 only. Therefore the device can only be migrated when
> > > > > > > instantiated with version=1 or the equivalent full configuration
> > > > > > > parameters.
> > > > > > 
> > > > > > I'm sometimes asked for 'ways out' of buggy migration cases; e.g. what
> > > > > > happens if version=1 forgot to migrate the X register; or what happens
> > > > > > if verison=1 forgot to handle the special, rare case when X=5 and we
> > > > > > now need to migrate some extra state.
> > > > > 
> > > > > Can these cases be handled by adding additional configuration parameters?
> > > > > 
> > > > > If version=1 is lacks essential state then version=2 can add it. The
> > > > > user must configure the device to use version before they can save the
> > > > > full state.
> > > > > 
> > > > > If version=1 didn't handle the X=5 case then the same solution is
> > > > > needed. A new configuration parameter is introduced and the user needs
> > > > > to configure the device to be the new version before migrating.
> > > > > 
> > > > > Unfortunately this requires poweroff or hotplugging a new device
> > > > > instance. But some disruption is probably necessarily anyway so the
> > > > > migration code on the host side can be patched to use the updated device
> > > > > state representation.
> > > > 
> > > > There are some corner cases that people sometimes prefer; for example
> > > > lets say the X=5 case is actually really rare - but when it happens the
> > > > device is hopelessly broken, some device authors prefer to fix it and
> > > > send the extra data and let the migration fail if the destination
> > > > doesn't understand it (it would break anyway).
> > > 
> > > The device implementation needs to be updated to send the extra data. At
> > > that point a new device configuration parameter should be introduced and
> > > if the user wishes to run the new version of the device then the extra
> > > data will be sent.
> > > 
> > > If the destination doesn't support the new parameter then migration will
> > > be refused. That matches what you've described, so I think the approach
> > > in this document handles this case.
> > 
> > Well that's the ideal; but the case I'm describing is where you're
> > recovering from a screwup in which the migration is going to fail in a
> > rare (runtime defined) corner case, and only sending the extra data
> > in that rare case before you get a chance to define a new version.
> 
> You need to upgrade the migration code in order to produce that extra
> data. Why not define a configuration parameter alongside this code
> change?
> 
> > > > I've also been asked
> > > > by mst for a 'unexpected data' mechanism to send data that the
> > > > destination might not expect if it didn't know about it, for similar
> > > > cases.
> > > 
> > > Do you mean optional data that can be more or less safely dropped? A new
> > > device configuration parameter is not needed because the hardware
> > > interface and device state representation remain compatible. That
> > > feature can be defined in the device state representation spec and is
> > > not visible at the layer discussed in this document. But I think it's
> > > worth adding an explanation into this document explaining what to do.
> > 
> > I mean a way to send optional data that the destination can drop; but
> > that the destination doesn't know what it means and at the time the
> > destination was written, wasn't yet defined. It is part of the device
> > state;  it's similar to the X=5 case above - but in this case it allows
> > the migration not to fail even when you start sending the extra data.
> 
> The device state representation may have a way of sending optional data.
> Since it just gets dropped if the destination doesn't recognize it there
> is no need to introduce a configuration parameter and it doesn't play a
> part in migration compatibility checks.
> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-11-04 17:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 11:11 VFIO Migration Stefan Hajnoczi
2020-11-02 12:28 ` Cornelia Huck
2020-11-02 14:56   ` Stefan Hajnoczi
2020-11-04  8:07     ` Gerd Hoffmann
2020-11-04 16:40       ` Stefan Hajnoczi
2020-11-05  6:47         ` Gerd Hoffmann
2020-11-05 11:42           ` Stefan Hajnoczi
2020-11-02 19:38 ` Alex Williamson
2020-11-03 11:03   ` Stefan Hajnoczi
2020-11-03 17:13     ` Alex Williamson
2020-11-03 18:09       ` Stefan Hajnoczi
2020-11-05 23:37       ` Yan Zhao
2020-11-03  8:46 ` Jason Wang
2020-11-03 12:15   ` Stefan Hajnoczi
2020-11-04  3:32     ` Jason Wang
2020-11-04  7:16       ` Stefan Hajnoczi
2020-11-03 11:39 ` Daniel P. Berrangé
2020-11-03 15:05   ` Stefan Hajnoczi
2020-11-03 15:23     ` Daniel P. Berrangé
2020-11-03 18:16       ` Stefan Hajnoczi
2020-11-03 12:17 ` Dr. David Alan Gilbert
2020-11-03 15:27   ` Stefan Hajnoczi
2020-11-03 18:49     ` Dr. David Alan Gilbert
2020-11-04  7:36       ` Stefan Hajnoczi
2020-11-04 10:14         ` Dr. David Alan Gilbert
2020-11-04 16:47           ` Stefan Hajnoczi
2020-11-04 17:32             ` Dr. David Alan Gilbert [this message]
2020-11-05 11:40               ` Stefan Hajnoczi
2020-11-05 12:13                 ` Dr. David Alan Gilbert
2020-11-05 12:47                   ` Michael S. Tsirkin
2020-11-05 14:17                     ` Dr. David Alan Gilbert
2020-11-05 12:53                 ` Michael S. Tsirkin
2020-11-04 11:05       ` Christophe de Dinechin
2020-11-03 15:23 ` Christophe de Dinechin
2020-11-03 15:33   ` Daniel P. Berrangé
2020-11-03 17:31     ` Alex Williamson
2020-11-04 10:13       ` Stefan Hajnoczi
2020-11-04 11:10   ` Stefan Hajnoczi
2020-11-04  7:50 ` Michael S. Tsirkin
2020-11-04 16:37   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201104173202.GG3896@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jasowang@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=mtsirkin@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).