From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Lan, Tianyu" <tianyu.lan@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Yang Zhang <yang.zhang.wz@gmail.com>,
qemu-devel@nongnu.org, "Tantilov,
Emil S" <emil.s.tantilov@intel.com>,
kvm@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
aik@ozlabs.ru, "Skidmore, Donald C" <donald.c.skidmore@intel.com>,
quintela@redhat.com, "Dong, Eddie" <eddie.dong@intel.com>,
"Jani, Nrupal" <nrupal.jani@intel.com>,
Alexander Graf <agraf@suse.de>, Blue Swirl <blauwirbel@gmail.com>,
cornelia.huck@de.ibm.com,
Alex Williamson <alex.williamson@redhat.com>,
kraxel@redhat.com, Anthony Liguori <anthony@codemonkey.ws>,
amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
"Rustad, Mark D" <mark.d.rustad@intel.com>,
lcapitulino@redhat.com, Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Mon, 14 Dec 2015 11:26:35 +0200 [thread overview]
Message-ID: <20151214112253-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <566D9320.8000209@intel.com>
On Sun, Dec 13, 2015 at 11:47:44PM +0800, Lan, Tianyu wrote:
>
>
> On 12/11/2015 1:16 AM, Alexander Duyck wrote:
> >On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <tianyu.lan@intel.com> wrote:
> >>
> >>
> >>On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote:
> >>>>
> >>>>Ideally, it is able to leave guest driver unmodified but it requires the
> >>>>>hypervisor or qemu to aware the device which means we may need a driver
> >>>>>in
> >>>>>hypervisor or qemu to handle the device on behalf of guest driver.
> >>>
> >>>Can you answer the question of when do you use your code -
> >>> at the start of migration or
> >>> just before the end?
> >>
> >>
> >>Just before stopping VCPU in this version and inject VF mailbox irq to
> >>notify the driver if the irq handler is installed.
> >>Qemu side also will check this via the faked PCI migration capability
> >>and driver will set the status during device open() or resume() callback.
> >
> >The VF mailbox interrupt is a very bad idea. Really the device should
> >be in a reset state on the other side of a migration. It doesn't make
> >sense to have the interrupt firing if the device is not configured.
> >This is one of the things that is preventing you from being able to
> >migrate the device while the interface is administratively down or the
> >VF driver is not loaded.
>
> From my opinion, if VF driver is not loaded and hardware doesn't start
> to work, the device state doesn't need to be migrated.
>
> We may add a flag for driver to check whether migration happened during it's
> down and reinitialize the hardware and clear the flag when system try to put
> it up.
>
> We may add migration core in the Linux kernel and provide some helps
> functions to facilitate to add migration support for drivers.
> Migration core is in charge to sync status with Qemu.
>
> Example.
> migration_register()
> Driver provides
> - Callbacks to be called before and after migration or for bad path
> - Its irq which it prefers to deal with migration event.
>
> migration_event_check()
> Driver calls it in the irq handler. Migration core code will check
> migration status and call its callbacks when migration happens.
>
>
> >
> >My thought on all this is that it might make sense to move this
> >functionality into a PCI-to-PCI bridge device and make it a
> >requirement that all direct-assigned devices have to exist behind that
> >device in order to support migration. That way you would be working
> >with a directly emulated device that would likely already be
> >supporting hot-plug anyway. Then it would just be a matter of coming
> >up with a few Qemu specific extensions that you would need to add to
> >the device itself. The same approach would likely be portable enough
> >that you could achieve it with PCIe as well via the same configuration
> >space being present on the upstream side of a PCIe port or maybe a
> >PCIe switch of some sort.
> >
> >It would then be possible to signal via your vendor-specific PCI
> >capability on that device that all devices behind this bridge require
> >DMA page dirtying, you could use the configuration in addition to the
> >interrupt already provided for hot-plug to signal things like when you
> >are starting migration, and possibly even just extend the shpc
> >functionality so that if this capability is present you have the
> >option to pause/resume instead of remove/probe the device in the case
> >of certain hot-plug events. The fact is there may be some use for a
> >pause/resume type approach for PCIe hot-plug in the near future
> >anyway. From the sounds of it Apple has required it for all
> >Thunderbolt device drivers so that they can halt the device in order
> >to shuffle resources around, perhaps we should look at something
> >similar for Linux.
> >
> >The other advantage behind grouping functions on one bridge is things
> >like reset domains. The PCI error handling logic will want to be able
> >to reset any devices that experienced an error in the event of
> >something such as a surprise removal. By grouping all of the devices
> >you could disable/reset/enable them as one logical group in the event
> >of something such as the "bad path" approach Michael has mentioned.
> >
>
> These sounds we need to add a faked bridge for migration and adding a
> driver in the guest for it. It also needs to extend PCI bus/hotplug
> driver to do pause/resume other devices, right?
>
> My concern is still that whether we can change PCI bus/hotplug like that
> without spec change.
>
> IRQ should be general for any devices and we may extend it for
> migration. Device driver also can make decision to support migration
> or not.
A dedicated IRQ per device for something that is a system wide event
sounds like a waste. I don't understand why a spec change is strictly
required, we only need to support this with the specific virtual bridge
used by QEMU, so I think that a vendor specific capability will do.
Once this works well in the field, a PCI spec ECN might make sense
to standardise the capability.
>
>
> >>>
> >>>>>>>It would be great if we could avoid changing the guest; but at least
> >>>>>>>your guest
> >>>>>>>driver changes don't actually seem to be that hardware specific;
> >>>>>>>could your
> >>>>>>>changes actually be moved to generic PCI level so they could be made
> >>>>>>>to work for lots of drivers?
> >>>>
> >>>>>
> >>>>>It is impossible to use one common solution for all devices unless the
> >>>>>PCIE
> >>>>>spec documents it clearly and i think one day it will be there. But
> >>>>>before
> >>>>>that, we need some workarounds on guest driver to make it work even it
> >>>>>looks
> >>>>>ugly.
> >>
> >>
> >>Yes, so far there is not hardware migration support and it's hard to modify
> >>bus level code. It also will block implementation on the Windows.
> >
> >Please don't assume things. Unless you have hard data from Microsoft
> >that says they want it this way lets just try to figure out what works
> >best for us for now and then we can start worrying about third party
> >implementations after we have figured out a solution that actually
> >works.
> >
> >- Alex
> >
next prev parent reply other threads:[~2015-12-14 9:26 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-24 13:35 [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 01/10] Qemu/VFIO: Create head file pci.h to share data struct Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition Lan Tianyu
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:40 ` Lan, Tianyu
2015-12-03 15:26 ` Alex Williamson
2015-11-24 13:35 ` [RFC PATCH V2 03/10] Qemu/VFIO: Rework vfio_std_cap_max_size() function Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 04/10] Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space regs Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 05/10] Qemu/VFIO: Expose PCI config space read/write and msix functions Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:57 ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 07/10] Qemu: Add post_load_state() to run after restoring CPU state Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 08/10] Qemu: Add save_before_stop callback to run just before stopping VCPU during migration Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support Lan Tianyu
2015-11-24 21:03 ` Michael S. Tsirkin
2015-11-25 15:32 ` Lan, Tianyu
2015-11-25 15:44 ` Michael S. Tsirkin
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:56 ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 10/10] Qemu/VFIO: Misc change for enable migration with VFIO Lan Tianyu
2015-11-30 8:01 ` [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Michael S. Tsirkin
2015-12-01 6:26 ` Lan, Tianyu
2015-12-01 15:02 ` Michael S. Tsirkin
2015-12-02 14:08 ` Lan, Tianyu
2015-12-02 14:31 ` Michael S. Tsirkin
2015-12-03 14:53 ` Lan, Tianyu
2015-12-04 6:42 ` Lan, Tianyu
2015-12-04 8:05 ` Michael S. Tsirkin
2015-12-04 12:11 ` Lan, Tianyu
2015-12-03 18:32 ` Alexander Duyck
2015-12-07 16:50 ` live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC) Michael S. Tsirkin
2015-12-09 16:26 ` live migration vs device assignment (motivation) Lan, Tianyu
2015-12-09 17:14 ` Alexander Duyck
2015-12-10 3:15 ` Lan, Tianyu
2015-12-09 20:07 ` Michael S. Tsirkin
2015-12-10 3:04 ` Lan, Tianyu
2015-12-10 8:38 ` Michael S. Tsirkin
2015-12-10 14:23 ` Lan, Tianyu
2015-12-10 10:18 ` [Qemu-devel] " Dr. David Alan Gilbert
2015-12-10 11:28 ` Yang Zhang
2015-12-10 11:41 ` Dr. David Alan Gilbert
2015-12-10 13:07 ` Yang Zhang
2015-12-10 14:38 ` Lan, Tianyu
2015-12-10 16:11 ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10 19:17 ` Alexander Duyck
2015-12-11 7:32 ` Lan, Tianyu
2015-12-14 9:12 ` Michael S. Tsirkin
2015-12-10 16:23 ` Dr. David Alan Gilbert
2015-12-10 17:16 ` Alexander Duyck
2015-12-13 15:47 ` Lan, Tianyu
2015-12-13 19:30 ` Alexander Duyck
2015-12-25 7:03 ` Lan Tianyu
2015-12-25 12:11 ` [Qemu-devel] " Michael S. Tsirkin
2015-12-28 17:42 ` Lan, Tianyu
2015-12-29 16:46 ` Michael S. Tsirkin
2015-12-29 17:04 ` Alexander Duyck
2015-12-29 17:15 ` Michael S. Tsirkin
2015-12-29 18:04 ` [Qemu-devel] " Alexander Duyck
2016-01-04 2:15 ` Lan Tianyu
2015-12-25 22:31 ` Alexander Duyck
2015-12-27 9:21 ` Michael S. Tsirkin
2015-12-27 21:45 ` [Qemu-devel] " Alexander Duyck
2015-12-28 8:51 ` Michael S. Tsirkin
2015-12-28 3:20 ` Dong, Eddie
2015-12-28 4:26 ` Alexander Duyck
2015-12-28 11:50 ` [Qemu-devel] " Michael S. Tsirkin
2015-12-14 9:26 ` Michael S. Tsirkin [this message]
2015-12-28 8:52 ` Pavel Fedin
2015-12-28 11:51 ` Michael S. Tsirkin
2016-03-17 9:15 ` [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151214112253-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=amit.shah@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=ard.biesheuvel@linaro.org \
--cc=blauwirbel@gmail.com \
--cc=cornelia.huck@de.ibm.com \
--cc=dgilbert@redhat.com \
--cc=donald.c.skidmore@intel.com \
--cc=eddie.dong@intel.com \
--cc=emil.s.tantilov@intel.com \
--cc=gerlitz.or@gmail.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=lcapitulino@redhat.com \
--cc=mark.d.rustad@intel.com \
--cc=nrupal.jani@intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=tianyu.lan@intel.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).