From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "John G Johnson" <john.g.johnson@oracle.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
mtsirkin@redhat.com, "Daniel P. Berrangé" <berrange@redhat.com>,
quintela@redhat.com, "Jason Wang" <jasowang@redhat.com>,
"Zeng, Xin" <xin.zeng@intel.com>,
qemu-devel@nongnu.org, "Yan Zhao" <yan.y.zhao@intel.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Felipe Franciosi" <felipe@nutanix.com>,
"Christophe de Dinechin" <dinechin@redhat.com>,
"Thanos Makatos" <thanos.makatos@nutanix.com>
Subject: Re: [RFC v3] VFIO Migration
Date: Wed, 11 Nov 2020 15:41:59 +0000 [thread overview]
Message-ID: <20201111154159.GG3232@work-vm> (raw)
In-Reply-To: <20201111153438.GD1421166@stefanha-x1.localdomain>
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Orchestrating Migrations
> > > ------------------------
> > > In order to migrate a device a *migration parameter list* must first be built
> > > on the source. Each migration parameter is added to the list if it is in
> > > effect. For example, the migration parameter list for a device with
> > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > parameter was introduced with the off value disabling its effect.
> >
> > What component builds that list (i.e. what component needs to know the
> > history that new-feature=off was the default - ah I think you answer
> > that below).
>
> Yep. Thanks for noting this. I'll need to reorder things so it is clear.
>
> > > The following conditions must be met to establish migration compatibility:
> > >
> > > 1. The source and destination device model strings match.
> > >
> > > 2. Each migration parameter name from the migration parameter list is supported
> > > by the destination. For example, the destination supports the num-queues
> > > migration parameter.
> > >
> > > 3. Each migration parameter value from the migration parameter list is
> > > supported by the destination. For example, the destination supports
> > > num-queues=4.
> >
> > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > that a destination supports num-queues=4 and new-feature=on/off -
> > but only supports new-feature=on when num-queues>2 ?
>
> Yes, it's possible but cannot be expressed in the migration info JSON.
>
> We need to choose a level of expressiveness that will be useful enough
> without being complex. In the extreme the migration info would contain
> Turing complete validation expressions (e.g. JavaScript) so that any
> relationship can be expressed, but I doubt that complexity is needed.
> The other extreme is just booleans and (opaque) strings for maximum
> simplicity.
>
> If the syntax is not expressive enough then it's impossible to check
> migration compatibility without actually creating a new device instance
> on the destination. Daniel Berrange raised the requirement of checking
> migration compatibility without creating the device since this helps
> with selecting a migration destination.
Right, but my worry isn't the JSON description, it's the set of 3
conditions above; they need to state that only some combinations need to
be valid.
>
> > > The migration compatibility check can be performed without initiating a
> > > migration. Therefore, this process can be used to select the migration
> > > destination.
> > >
> > > The following steps perform the migration:
> > >
> > > 1. Configure the destination so it is prepared to load the device state,
> > > including applying the migration parameter list. This may involve
> > > instantiating a new device instance or resetting an existing device instance
> > > to a configuration that is compatible with the source.
> > >
> > > The details of how to do this for VFIO/mdev drivers and vfio-user device
> > > backend programs is described below.
> > >
> > > 2. Save the device state on the source and load it on the destination.
> >
> > Which is true for almost everything, unles sit turned out to have
> > significant amounts of RAM on board; do we have a way to deal with that
> > for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> > this for now)
>
> Step 2 includes iterative migration. I should have mentioned that in the
> document.
OK.
> > > "allowed_values"
> > > The list all values that the device implementation accepts for this migration
> > > parameter. Integer ranges can be described using "<min>-<max>" strings.
> > >
> > > Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > >
> > > This member is optional. When absent, any value suitable for the type may be
> > > given but the device implementation may refuse certain values.
> >
> > JSON isn't a great choice for specifying ranges of integers
>
> Agreed :)
>
> > > The device is instantiated by launching the destination process with the
> > > migration parameter list from the source:
> > >
> > > .. code:: bash
> > >
> > > $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > >
> > > This example shows how to instantiate the device with migration parameters
> > > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > > <value>`` option formats are accepted.
> > >
> > > The ``--m-`` prefix is used to allow the device emulation program to implement
> > > device implementation-specific command-line options without conflicting with
> > > the migration parameter namespace.
> >
> > That feels like an odd syntax to me.
>
> Unfortunately we cannot use --<param>. I also considered using a JSON
> input file but that makes it harder to invoke the device emulation
> program manually for testing/development. I bet I'd have to look up the
> JSON syntax every time whereas it's easy to remember how to format a
> command-line parameter.
>
> The other one I considered was using '--' or another marker to separate
> device implementation-specific command-line arguments from migration
> parameters. However, doing so places requirements on the device
> emulation program's command-line parsing library and I think people will
> be unhappy if their favorite Go, Rust, Python, etc library cannot handle
> the command-line options due to our weird syntax.
>
> Any ideas for a better syntax?
I'd be happy with a --param name=value repeatedly, but also know that
some option parsers don't like that.
> > > When preparing for migration on the source, each migration parameter from the
> > > migration info JSON is added to the migration parameter list if its value
> > > differs from "off_value". If a migration parameter in the list is not available
> > > on the destination, then migration is not possible. If a migration parameter
> > > value is not in the destination "allowed_values" migration_info.json then
> > > migration is not possible.
> > >
> > > On the destination, a command-line is generated from the migration parameter
> > > list. For each destination migration parameter missing from the migration
> > > parameter list a command-line option is added with the destination "off_value".
> > > The device emulation program prints an error message to standard error and
> > > terminates with exit status 1 if the device could not be instantiated.
> >
> > I still don't think this revision answers the question of how a VM
> > management program picks a sane set of parameter values for a new VM
> > it's creating, especially if it wants it to be migratable. That's
> > something your version stuff in V1 seemed nice for.
>
> Good point. If we're creating a VM and expect to migrate between two
> device implementations, how do we choose the migration parameters?
>
> I can see a solution for that: grab the set of "init_values" from both
> device implementations and use the one that both accept. This is O(N^2)
> so it's not great when there are many device implementations involved.
> It's O(N) with version numbers because you can keep an intersection set
> of supported version numbers.
Which is actually more complex if there's only some combinations that
work.
> This point definitely needs to be included in the document. Is my answer
> acceptable or do you think versions are really needed?
>
> It's also hard to answer "which of these two migration parameter lists
> is better/more modern?" without versions when non-bool migration
> parameters are involved.
Dave
> Stefan
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2020-11-11 15:43 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-10 9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
2020-11-10 11:12 ` Paolo Bonzini
2020-11-11 14:36 ` Stefan Hajnoczi
2020-11-11 15:48 ` Daniel P. Berrangé
2020-11-12 15:26 ` Cornelia Huck
2020-11-16 10:48 ` Stefan Hajnoczi
2020-11-16 11:15 ` Stefan Hajnoczi
2020-11-16 11:41 ` Daniel P. Berrangé
2020-11-16 12:03 ` Michael S. Tsirkin
2020-11-16 12:05 ` Daniel P. Berrangé
2020-11-16 12:34 ` Michael S. Tsirkin
2020-11-16 12:45 ` Daniel P. Berrangé
2020-11-16 12:51 ` Michael S. Tsirkin
2020-11-16 12:48 ` Gerd Hoffmann
2020-11-16 12:54 ` Michael S. Tsirkin
2020-11-16 12:06 ` Michael S. Tsirkin
2020-11-10 20:14 ` Alex Williamson
2020-11-11 11:48 ` Cornelia Huck
2020-11-11 15:14 ` Stefan Hajnoczi
2020-11-11 15:35 ` Cornelia Huck
2020-11-16 11:02 ` Stefan Hajnoczi
2020-11-16 13:52 ` Cornelia Huck
2020-11-16 17:30 ` Alex Williamson
2020-11-24 17:24 ` Dr. David Alan Gilbert
2020-11-11 15:10 ` Stefan Hajnoczi
2020-11-11 15:28 ` Cornelia Huck
2020-11-16 11:36 ` Stefan Hajnoczi
2020-11-11 11:19 ` Cornelia Huck
2020-11-11 15:35 ` Stefan Hajnoczi
2020-11-11 12:56 ` Dr. David Alan Gilbert
2020-11-11 15:34 ` Stefan Hajnoczi
2020-11-11 15:41 ` Dr. David Alan Gilbert [this message]
2020-11-16 14:38 ` Stefan Hajnoczi
2020-11-17 9:44 ` Michael S. Tsirkin
2020-12-01 13:17 ` Stefan Hajnoczi
2020-11-11 16:18 ` Thanos Makatos
2020-11-16 15:24 ` Stefan Hajnoczi
2020-11-24 17:29 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201111154159.GG3232@work-vm \
--to=dgilbert@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=berrange@redhat.com \
--cc=dinechin@redhat.com \
--cc=felipe@nutanix.com \
--cc=jasowang@redhat.com \
--cc=john.g.johnson@oracle.com \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kwankhede@nvidia.com \
--cc=mtsirkin@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=thanos.makatos@nutanix.com \
--cc=xin.zeng@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).