qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "John G Johnson" <john.g.johnson@oracle.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	mtsirkin@redhat.com, "Daniel P. Berrangé" <berrange@redhat.com>,
	quintela@redhat.com, "Jason Wang" <jasowang@redhat.com>,
	"Zeng, Xin" <xin.zeng@intel.com>,
	qemu-devel@nongnu.org, "Yan Zhao" <yan.y.zhao@intel.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Felipe Franciosi" <felipe@nutanix.com>,
	"Christophe de Dinechin" <dinechin@redhat.com>,
	"Thanos Makatos" <thanos.makatos@nutanix.com>
Subject: Re: [RFC v3] VFIO Migration
Date: Wed, 11 Nov 2020 15:41:59 +0000	[thread overview]
Message-ID: <20201111154159.GG3232@work-vm> (raw)
In-Reply-To: <20201111153438.GD1421166@stefanha-x1.localdomain>

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Orchestrating Migrations
> > > ------------------------
> > > In order to migrate a device a *migration parameter list* must first be built
> > > on the source. Each migration parameter is added to the list if it is in
> > > effect. For example, the migration parameter list for a device with
> > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > parameter was introduced with the off value disabling its effect.
> > 
> > What component builds that list (i.e. what component needs to know the
> > history that new-feature=off was the default - ah I think you answer
> > that below).
> 
> Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> 
> > > The following conditions must be met to establish migration compatibility:
> > > 
> > > 1. The source and destination device model strings match.
> > > 
> > > 2. Each migration parameter name from the migration parameter list is supported
> > >    by the destination. For example, the destination supports the num-queues
> > >    migration parameter.
> > > 
> > > 3. Each migration parameter value from the migration parameter list is
> > >    supported by the destination. For example, the destination supports
> > >    num-queues=4.
> > 
> > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > that a destination supports    num-queues=4 and  new-feature=on/off -
> > but only supports new-feature=on when num-queues>2 ?
> 
> Yes, it's possible but cannot be expressed in the migration info JSON.
> 
> We need to choose a level of expressiveness that will be useful enough
> without being complex. In the extreme the migration info would contain
> Turing complete validation expressions (e.g. JavaScript) so that any
> relationship can be expressed, but I doubt that complexity is needed.
> The other extreme is just booleans and (opaque) strings for maximum
> simplicity.
> 
> If the syntax is not expressive enough then it's impossible to check
> migration compatibility without actually creating a new device instance
> on the destination. Daniel Berrange raised the requirement of checking
> migration compatibility without creating the device since this helps
> with selecting a migration destination.

Right, but my worry isn't the JSON description, it's the set of 3
conditions above; they need to state that only some combinations need to
be valid.

> 
> > > The migration compatibility check can be performed without initiating a
> > > migration. Therefore, this process can be used to select the migration
> > > destination.
> > > 
> > > The following steps perform the migration:
> > > 
> > > 1. Configure the destination so it is prepared to load the device state,
> > >    including applying the migration parameter list. This may involve
> > >    instantiating a new device instance or resetting an existing device instance
> > >    to a configuration that is compatible with the source.
> > > 
> > >    The details of how to do this for VFIO/mdev drivers and vfio-user device
> > >    backend programs is described below.
> > > 
> > > 2. Save the device state on the source and load it on the destination.
> > 
> > Which is true for almost everything, unles sit turned out to have
> > significant amounts of RAM on board;  do we have a way to deal with that
> > for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> > this for now)
> 
> Step 2 includes iterative migration. I should have mentioned that in the
> document.

OK.

> > > "allowed_values"
> > >   The list all values that the device implementation accepts for this migration
> > >   parameter. Integer ranges can be described using "<min>-<max>" strings.
> > > 
> > >   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > > 
> > >   This member is optional. When absent, any value suitable for the type may be
> > >   given but the device implementation may refuse certain values.
> > 
> > JSON isn't a great choice for specifying ranges of integers
> 
> Agreed :)
> 
> > > The device is instantiated by launching the destination process with the
> > > migration parameter list from the source:
> > > 
> > > .. code:: bash
> > > 
> > >   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > > 
> > > This example shows how to instantiate the device with migration parameters
> > > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > > <value>`` option formats are accepted.
> > > 
> > > The ``--m-`` prefix is used to allow the device emulation program to implement
> > > device implementation-specific command-line options without conflicting with
> > > the migration parameter namespace.
> > 
> > That feels like an odd syntax to me.
> 
> Unfortunately we cannot use --<param>. I also considered using a JSON
> input file but that makes it harder to invoke the device emulation
> program manually for testing/development. I bet I'd have to look up the
> JSON syntax every time whereas it's easy to remember how to format a
> command-line parameter.
> 
> The other one I considered was using '--' or another marker to separate
> device implementation-specific command-line arguments from migration
> parameters. However, doing so places requirements on the device
> emulation program's command-line parsing library and I think people will
> be unhappy if their favorite Go, Rust, Python, etc library cannot handle
> the command-line options due to our weird syntax.
> 
> Any ideas for a better syntax?

I'd be happy with a --param name=value   repeatedly, but also know that
some option parsers don't like that.

> > > When preparing for migration on the source, each migration parameter from the
> > > migration info JSON is added to the migration parameter list if its value
> > > differs from "off_value". If a migration parameter in the list is not available
> > > on the destination, then migration is not possible. If a migration parameter
> > > value is not in the destination "allowed_values" migration_info.json then
> > > migration is not possible.
> > > 
> > > On the destination, a command-line is generated from the migration parameter
> > > list. For each destination migration parameter missing from the migration
> > > parameter list a command-line option is added with the destination "off_value".
> > > The device emulation program prints an error message to standard error and
> > > terminates with exit status 1 if the device could not be instantiated.
> > 
> > I still don't think this revision answers the question of how a VM
> > management program picks a sane set of parameter values for a new VM
> > it's creating, especially if it wants it to be migratable.  That's
> > something your version stuff in V1 seemed nice for.
> 
> Good point. If we're creating a VM and expect to migrate between two
> device implementations, how do we choose the migration parameters?
> 
> I can see a solution for that: grab the set of "init_values" from both
> device implementations and use the one that both accept. This is O(N^2)
> so it's not great when there are many device implementations involved.
> It's O(N) with version numbers because you can keep an intersection set
> of supported version numbers.

Which is actually more complex if there's only some combinations that
work.

> This point definitely needs to be included in the document. Is my answer
> acceptable or do you think versions are really needed?
> 
> It's also hard to answer "which of these two migration parameter lists
> is better/more modern?" without versions when non-bool migration
> parameters are involved.

Dave

> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-11-11 15:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
2020-11-10 11:12 ` Paolo Bonzini
2020-11-11 14:36   ` Stefan Hajnoczi
2020-11-11 15:48     ` Daniel P. Berrangé
2020-11-12 15:26       ` Cornelia Huck
2020-11-16 10:48       ` Stefan Hajnoczi
2020-11-16 11:15       ` Stefan Hajnoczi
2020-11-16 11:41         ` Daniel P. Berrangé
2020-11-16 12:03           ` Michael S. Tsirkin
2020-11-16 12:05             ` Daniel P. Berrangé
2020-11-16 12:34               ` Michael S. Tsirkin
2020-11-16 12:45                 ` Daniel P. Berrangé
2020-11-16 12:51                   ` Michael S. Tsirkin
2020-11-16 12:48         ` Gerd Hoffmann
2020-11-16 12:54           ` Michael S. Tsirkin
2020-11-16 12:06       ` Michael S. Tsirkin
2020-11-10 20:14 ` Alex Williamson
2020-11-11 11:48   ` Cornelia Huck
2020-11-11 15:14     ` Stefan Hajnoczi
2020-11-11 15:35       ` Cornelia Huck
2020-11-16 11:02         ` Stefan Hajnoczi
2020-11-16 13:52           ` Cornelia Huck
2020-11-16 17:30             ` Alex Williamson
2020-11-24 17:24               ` Dr. David Alan Gilbert
2020-11-11 15:10   ` Stefan Hajnoczi
2020-11-11 15:28     ` Cornelia Huck
2020-11-16 11:36       ` Stefan Hajnoczi
2020-11-11 11:19 ` Cornelia Huck
2020-11-11 15:35   ` Stefan Hajnoczi
2020-11-11 12:56 ` Dr. David Alan Gilbert
2020-11-11 15:34   ` Stefan Hajnoczi
2020-11-11 15:41     ` Dr. David Alan Gilbert [this message]
2020-11-16 14:38       ` Stefan Hajnoczi
2020-11-17  9:44         ` Michael S. Tsirkin
2020-12-01 13:17           ` Stefan Hajnoczi
2020-11-11 16:18 ` Thanos Makatos
2020-11-16 15:24   ` Stefan Hajnoczi
2020-11-24 17:29     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201111154159.GG3232@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jasowang@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kwankhede@nvidia.com \
    --cc=mtsirkin@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=xin.zeng@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).