kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Lan, Tianyu" <tianyu.lan@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	qemu-devel@nongnu.org, "Tantilov,
	Emil S" <emil.s.tantilov@intel.com>,
	kvm@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	aik@ozlabs.ru, "Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	quintela@redhat.com, "Dong, Eddie" <eddie.dong@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Alexander Graf <agraf@suse.de>, Blue Swirl <blauwirbel@gmail.com>,
	cornelia.huck@de.ibm.com,
	Alex Williamson <alex.williamson@redhat.com>,
	kraxel@redhat.com, Anthony Liguori <anthony@codemonkey.ws>,
	amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>,
	lcapitulino@redhat.com, Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Mon, 14 Dec 2015 11:26:35 +0200	[thread overview]
Message-ID: <20151214112253-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <566D9320.8000209@intel.com>

On Sun, Dec 13, 2015 at 11:47:44PM +0800, Lan, Tianyu wrote:
> 
> 
> On 12/11/2015 1:16 AM, Alexander Duyck wrote:
> >On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <tianyu.lan@intel.com> wrote:
> >>
> >>
> >>On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote:
> >>>>
> >>>>Ideally, it is able to leave guest driver unmodified but it requires the
> >>>>>hypervisor or qemu to aware the device which means we may need a driver
> >>>>>in
> >>>>>hypervisor or qemu to handle the device on behalf of guest driver.
> >>>
> >>>Can you answer the question of when do you use your code -
> >>>     at the start of migration or
> >>>     just before the end?
> >>
> >>
> >>Just before stopping VCPU in this version and inject VF mailbox irq to
> >>notify the driver if the irq handler is installed.
> >>Qemu side also will check this via the faked PCI migration capability
> >>and driver will set the status during device open() or resume() callback.
> >
> >The VF mailbox interrupt is a very bad idea.  Really the device should
> >be in a reset state on the other side of a migration.  It doesn't make
> >sense to have the interrupt firing if the device is not configured.
> >This is one of the things that is preventing you from being able to
> >migrate the device while the interface is administratively down or the
> >VF driver is not loaded.
> 
> From my opinion, if VF driver is not loaded and hardware doesn't start
> to work, the device state doesn't need to be migrated.
> 
> We may add a flag for driver to check whether migration happened during it's
> down and reinitialize the hardware and clear the flag when system try to put
> it up.
> 
> We may add migration core in the Linux kernel and provide some helps
> functions to facilitate to add migration support for drivers.
> Migration core is in charge to sync status with Qemu.
> 
> Example.
> migration_register()
> Driver provides
> - Callbacks to be called before and after migration or for bad path
> - Its irq which it prefers to deal with migration event.
> 
> migration_event_check()
> Driver calls it in the irq handler. Migration core code will check
> migration status and call its callbacks when migration happens.
> 
> 
> >
> >My thought on all this is that it might make sense to move this
> >functionality into a PCI-to-PCI bridge device and make it a
> >requirement that all direct-assigned devices have to exist behind that
> >device in order to support migration.  That way you would be working
> >with a directly emulated device that would likely already be
> >supporting hot-plug anyway.  Then it would just be a matter of coming
> >up with a few Qemu specific extensions that you would need to add to
> >the device itself.  The same approach would likely be portable enough
> >that you could achieve it with PCIe as well via the same configuration
> >space being present on the upstream side of a PCIe port or maybe a
> >PCIe switch of some sort.
> >
> >It would then be possible to signal via your vendor-specific PCI
> >capability on that device that all devices behind this bridge require
> >DMA page dirtying, you could use the configuration in addition to the
> >interrupt already provided for hot-plug to signal things like when you
> >are starting migration, and possibly even just extend the shpc
> >functionality so that if this capability is present you have the
> >option to pause/resume instead of remove/probe the device in the case
> >of certain hot-plug events.  The fact is there may be some use for a
> >pause/resume type approach for PCIe hot-plug in the near future
> >anyway.  From the sounds of it Apple has required it for all
> >Thunderbolt device drivers so that they can halt the device in order
> >to shuffle resources around, perhaps we should look at something
> >similar for Linux.
> >
> >The other advantage behind grouping functions on one bridge is things
> >like reset domains.  The PCI error handling logic will want to be able
> >to reset any devices that experienced an error in the event of
> >something such as a surprise removal.  By grouping all of the devices
> >you could disable/reset/enable them as one logical group in the event
> >of something such as the "bad path" approach Michael has mentioned.
> >
> 
> These sounds we need to add a faked bridge for migration and adding a
> driver in the guest for it. It also needs to extend PCI bus/hotplug
> driver to do pause/resume other devices, right?
> 
> My concern is still that whether we can change PCI bus/hotplug like that
> without spec change.
> 
> IRQ should be general for any devices and we may extend it for
> migration. Device driver also can make decision to support migration
> or not.

A dedicated IRQ per device for something that is a system wide event
sounds like a waste.  I don't understand why a spec change is strictly
required, we only need to support this with the specific virtual bridge
used by QEMU, so I think that a vendor specific capability will do.
Once this works well in the field, a PCI spec ECN might make sense
to standardise the capability.

> 
> 
> >>>
> >>>>>>>It would be great if we could avoid changing the guest; but at least
> >>>>>>>your guest
> >>>>>>>driver changes don't actually seem to be that hardware specific;
> >>>>>>>could your
> >>>>>>>changes actually be moved to generic PCI level so they could be made
> >>>>>>>to work for lots of drivers?
> >>>>
> >>>>>
> >>>>>It is impossible to use one common solution for all devices unless the
> >>>>>PCIE
> >>>>>spec documents it clearly and i think one day it will be there. But
> >>>>>before
> >>>>>that, we need some workarounds on guest driver to make it work even it
> >>>>>looks
> >>>>>ugly.
> >>
> >>
> >>Yes, so far there is not hardware migration support and it's hard to modify
> >>bus level code. It also will block implementation on the Windows.
> >
> >Please don't assume things.  Unless you have hard data from Microsoft
> >that says they want it this way lets just try to figure out what works
> >best for us for now and then we can start worrying about third party
> >implementations after we have figured out a solution that actually
> >works.
> >
> >- Alex
> >

  parent reply	other threads:[~2015-12-14  9:26 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 13:35 [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 01/10] Qemu/VFIO: Create head file pci.h to share data struct Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:40     ` Lan, Tianyu
2015-12-03 15:26       ` Alex Williamson
2015-11-24 13:35 ` [RFC PATCH V2 03/10] Qemu/VFIO: Rework vfio_std_cap_max_size() function Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 04/10] Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space regs Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 05/10] Qemu/VFIO: Expose PCI config space read/write and msix functions Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:57     ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 07/10] Qemu: Add post_load_state() to run after restoring CPU state Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 08/10] Qemu: Add save_before_stop callback to run just before stopping VCPU during migration Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support Lan Tianyu
2015-11-24 21:03   ` Michael S. Tsirkin
2015-11-25 15:32     ` Lan, Tianyu
2015-11-25 15:44       ` Michael S. Tsirkin
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:56     ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 10/10] Qemu/VFIO: Misc change for enable migration with VFIO Lan Tianyu
2015-11-30  8:01 ` [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Michael S. Tsirkin
2015-12-01  6:26   ` Lan, Tianyu
2015-12-01 15:02     ` Michael S. Tsirkin
2015-12-02 14:08       ` Lan, Tianyu
2015-12-02 14:31         ` Michael S. Tsirkin
2015-12-03 14:53           ` Lan, Tianyu
2015-12-04  6:42           ` Lan, Tianyu
2015-12-04  8:05             ` Michael S. Tsirkin
2015-12-04 12:11               ` Lan, Tianyu
2015-12-03 18:32         ` Alexander Duyck
2015-12-07 16:50 ` live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC) Michael S. Tsirkin
2015-12-09 16:26   ` live migration vs device assignment (motivation) Lan, Tianyu
2015-12-09 17:14     ` Alexander Duyck
2015-12-10  3:15       ` Lan, Tianyu
2015-12-09 20:07     ` Michael S. Tsirkin
2015-12-10  3:04       ` Lan, Tianyu
2015-12-10  8:38         ` Michael S. Tsirkin
2015-12-10 14:23           ` Lan, Tianyu
2015-12-10 10:18     ` [Qemu-devel] " Dr. David Alan Gilbert
2015-12-10 11:28       ` Yang Zhang
2015-12-10 11:41         ` Dr. David Alan Gilbert
2015-12-10 13:07           ` Yang Zhang
2015-12-10 14:38           ` Lan, Tianyu
2015-12-10 16:11             ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10 19:17               ` Alexander Duyck
2015-12-11  7:32               ` Lan, Tianyu
2015-12-14  9:12                 ` Michael S. Tsirkin
2015-12-10 16:23             ` Dr. David Alan Gilbert
2015-12-10 17:16             ` Alexander Duyck
2015-12-13 15:47               ` Lan, Tianyu
2015-12-13 19:30                 ` Alexander Duyck
2015-12-25  7:03                   ` Lan Tianyu
2015-12-25 12:11                     ` [Qemu-devel] " Michael S. Tsirkin
2015-12-28 17:42                       ` Lan, Tianyu
2015-12-29 16:46                         ` Michael S. Tsirkin
2015-12-29 17:04                           ` Alexander Duyck
2015-12-29 17:15                             ` Michael S. Tsirkin
2015-12-29 18:04                               ` [Qemu-devel] " Alexander Duyck
2016-01-04  2:15                           ` Lan Tianyu
2015-12-25 22:31                     ` Alexander Duyck
2015-12-27  9:21                       ` Michael S. Tsirkin
2015-12-27 21:45                         ` [Qemu-devel] " Alexander Duyck
2015-12-28  8:51                           ` Michael S. Tsirkin
2015-12-28  3:20                       ` Dong, Eddie
2015-12-28  4:26                         ` Alexander Duyck
2015-12-28 11:50                         ` [Qemu-devel] " Michael S. Tsirkin
2015-12-14  9:26                 ` Michael S. Tsirkin [this message]
2015-12-28  8:52                   ` Pavel Fedin
2015-12-28 11:51                     ` Michael S. Tsirkin
2016-03-17  9:15 ` [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151214112253-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=amit.shah@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=ard.biesheuvel@linaro.org \
    --cc=blauwirbel@gmail.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dgilbert@redhat.com \
    --cc=donald.c.skidmore@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=gerlitz.or@gmail.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=mark.d.rustad@intel.com \
    --cc=nrupal.jani@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=tianyu.lan@intel.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).