qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Sameeh Jubran <sameeh@daynix.com>,
	qemu-devel@nongnu.org, Jason Wang <jasowang@redhat.com>,
	Yan Vugenfirer <yan@daynix.com>,
	Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices
Date: Wed, 5 Dec 2018 12:26:02 -0500	[thread overview]
Message-ID: <20181205122332-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20181205171818.GA1136@redhat.com>

On Wed, Dec 05, 2018 at 05:18:18PM +0000, Daniel P. Berrangé wrote:
> On Thu, Oct 25, 2018 at 05:06:29PM +0300, Sameeh Jubran wrote:
> > From: Sameeh Jubran <sjubran@redhat.com>
> > 
> > Hi all,
> > 
> > Background:
> > 
> > There has been a few attempts to implement the standby feature for vfio
> > assigned devices which aims to enable the migration of such devices. This
> > is another attempt.
> > 
> > The series implements an infrastructure for hiding devices from the bus
> > upon boot. What it does is the following:
> > 
> > * In the first patch the infrastructure for hiding the device is added
> >   for the qbus and qdev APIs. A "hidden" boolean is added to the device
> >   state and it is set based on a callback to the standby device which
> >   registers itself for handling the assessment: "should the primary device
> >   be hidden?" by cross validating the ids of the devices.
> > 
> > * In the second patch the virtio-net uses the API to hide the vfio
> >   device and unhides it when the feature is acked.
> 
> IIUC, the general idea is that we want to provide a pair of associated NIC
> devices to the guest, one emulated, one physical PCI device. The guest would
> put them in a bonded pair. Before migration the PCI device is unplugged & a
> new PCI device plugged on target after migration. The guest traffic continues
> without interuption due to the emulate device.
> 
> This kind of conceptual approach can already be implemented today by management
> apps. The only hard problem that exists today is how the guest OS can figure
> out that a particular pair of devices it has are intended to be used together. 
> 
> With this series, IIUC, the virtio-net device is getting a given property which
> defines the qdev ID of the associated VFIO device. When the guest OS activates
> the virtio-net device and acknowledges the STANDBY feature bit, qdev then
> unhides the associated VFIO device.
> 
> AFAICT the guest has to infer that the device which suddenly appears is the one
> associated with the virtio-net device it just initialized, for purposes of
> setting up the NIC bonding. There doesn't appear to be any explicit assocation
> between the devices exposed to the guest.
> 
> This feels pretty fragile for a guest needing to match up devices when there
> are many pairs of devices exposed to a single guest.
> 
> Unless I'm mis-reading the patches, it looks like the VFIO device always has
> to be available at the time QEMU is started. There's no way to boot a guest
> and then later hotplug a VFIO device to accelerate the existing virtio-net NIC.

That should be supported.

> Or similarly after migration there might not be any VFIO device available
> initially when QEMU is started to accept the incoming migration. So it might
> need to run in degraded mode for an extended period of time until one becomes
> available for hotplugging.

That should work too.

> The use of qdev IDs makes this troublesome, as the
> qdev ID of the future VFIO device would need to be decided upfront before it
> even exists.

I agree this sounds problematic.

> 
> So overall I'm not really a fan of the dynamic hiding/unhiding of devices.

Dynamic hiding is an orthogonal issue though. It's needed for
error handling in case of migration failure: we do not
want to close the VFIO device but we do need to
hide it from guest. libvirt should not be involved in
this aspect though.

> I
> would much prefer to see some way to expose an explicit relationship between
> the devices to the guest.
> 
> > Disclaimers:
> > 
> > * I have only scratch tested this and from qemu side, it seems to be
> >   working.
> > * This is an RFC so it lacks some proper error handling in few cases
> >   and proper resource freeing. I wanted to get some feedback first
> >   before it is finalized.
> > 
> > Command line example:
> > 
> > /home/sameeh/Builds/failover/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_71 \
> > -netdev tap,vhost=on,id=hostnet1,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4 \
> > -device virtio-net,host_mtu=1500,netdev=hostnet1,id=cc1_72,vectors=10,mq=on,primary=cc1_71 \
> > -device e1000,netdev=hostnet0,id=cc1_71,standby=cc1_72 \
> > 
> > Migration support:
> > 
> > Pre migration or during setup phase of the migration we should send an
> > unplug request to the guest to unplug the primary device. I haven't had
> > the chance to implement that part yet but should do soon. Do you know
> > what's the best approach to do so? I wanted to have a callback to the
> > virtio-net device which tries to send an unplug request to the guest and
> > if succeeds then the migration continues. It needs to handle the case where
> > the migration fails and then it has to replug the primary device back.
> 
> Having QEMU do this internally gets into a world of pain when you have
> multiple devices in the guest.
> 
> Consider if we have 2 pairs of devices. We unplug one VFIO device, but
> unplugging the second VFIO device fails, thus we try to replug the first
> VFIO device but this now fails too. We don't even get as far as starting
> the migration before we have to return an error.
> 
> The mgmt app will just see that the migration failed, but it will not
> be sure which devices are now actually exposed to the guest OS correctly.
> 
> The similar problem hits if we started the migration data stream, but
> then had to abort and so need to tear try to replug in the source but
> failed for some reasons.
> 
> Doing the VFIO device plugging/unplugging explicitly from the mgmt app
> gives that mgmt app direct information about which devices have been
> successfully made available to the guest at all time, becuase the mgmt
> app can see the errors from each step of the process.  Trying to do
> this inside QEMU doesn't achieve anything the mgmt app can't already
> do, but it obscures what happens during failures.  The same applies at
> the libvirt level too, which is why mgmt apps today will do the VFIO
> unplug/replug either side of migration themselves.
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2018-12-05 17:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-25 14:06 [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices Sameeh Jubran
2018-10-25 14:06 ` [Qemu-devel] [RFC 1/2] qdev/qbus: Add hidden device support Sameeh Jubran
2018-10-25 14:06 ` [Qemu-devel] [RFC 2/2] virtio-net: Implement VIRTIO_NET_F_STANDBY feature Sameeh Jubran
2018-10-25 18:01 ` [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices Sameeh Jubran
2018-12-05 16:18   ` Michael Roth
2018-12-05 17:09     ` [Qemu-devel] [libvirt] " Peter Krempa
2018-12-05 17:22       ` Michael S. Tsirkin
2018-12-05 17:26         ` Daniel P. Berrangé
2018-12-05 17:43       ` Daniel P. Berrangé
2018-10-25 22:17 ` [Qemu-devel] " Michael S. Tsirkin
2018-12-05 17:18 ` Daniel P. Berrangé
2018-12-05 17:26   ` Michael S. Tsirkin [this message]
2018-12-05 20:24   ` Michael Roth
2018-12-05 20:44     ` Michael Roth
2018-12-05 20:58       ` Michael S. Tsirkin
2018-12-05 20:57     ` Michael S. Tsirkin
2018-12-06 10:01       ` Daniel P. Berrangé
2018-12-06 10:06     ` Daniel P. Berrangé
2018-12-07 16:36       ` Eduardo Habkost
2018-12-07 16:46         ` Daniel P. Berrangé
2018-12-07 18:26           ` Michael S. Tsirkin
2018-12-07 17:50       ` Roman Kagan
2018-12-07 18:20       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205122332-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=berrange@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sameeh@daynix.com \
    --cc=yan@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).