kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Lan, Tianyu" <tianyu.lan@intel.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, "Tantilov,
	Emil S" <emil.s.tantilov@intel.com>,
	kvm@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	aik@ozlabs.ru, "Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	quintela@redhat.com, "Dong, Eddie" <eddie.dong@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Alexander Graf <agraf@suse.de>, Blue Swirl <blauwirbel@gmail.com>,
	cornelia.huck@de.ibm.com,
	Alex Williamson <alex.williamson@redhat.com>,
	kraxel@redhat.com, Anthony Liguori <anthony@codemonkey.ws>,
	amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>,
	lcapitulino@redhat.com, Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Sun, 13 Dec 2015 23:47:44 +0800	[thread overview]
Message-ID: <566D9320.8000209@intel.com> (raw)
In-Reply-To: <CAKgT0UduOMvnVAUvRgnXkMPDwvOBh_5RimCgnb0zRr7aOyza4A@mail.gmail.com>



On 12/11/2015 1:16 AM, Alexander Duyck wrote:
> On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <tianyu.lan@intel.com> wrote:
>>
>>
>> On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote:
>>>>
>>>> Ideally, it is able to leave guest driver unmodified but it requires the
>>>>> hypervisor or qemu to aware the device which means we may need a driver
>>>>> in
>>>>> hypervisor or qemu to handle the device on behalf of guest driver.
>>>
>>> Can you answer the question of when do you use your code -
>>>      at the start of migration or
>>>      just before the end?
>>
>>
>> Just before stopping VCPU in this version and inject VF mailbox irq to
>> notify the driver if the irq handler is installed.
>> Qemu side also will check this via the faked PCI migration capability
>> and driver will set the status during device open() or resume() callback.
>
> The VF mailbox interrupt is a very bad idea.  Really the device should
> be in a reset state on the other side of a migration.  It doesn't make
> sense to have the interrupt firing if the device is not configured.
> This is one of the things that is preventing you from being able to
> migrate the device while the interface is administratively down or the
> VF driver is not loaded.

 From my opinion, if VF driver is not loaded and hardware doesn't start
to work, the device state doesn't need to be migrated.

We may add a flag for driver to check whether migration happened during 
it's down and reinitialize the hardware and clear the flag when system 
try to put it up.

We may add migration core in the Linux kernel and provide some helps 
functions to facilitate to add migration support for drivers.
Migration core is in charge to sync status with Qemu.

Example.
migration_register()
Driver provides
- Callbacks to be called before and after migration or for bad path
- Its irq which it prefers to deal with migration event.

migration_event_check()
Driver calls it in the irq handler. Migration core code will check
migration status and call its callbacks when migration happens.


>
> My thought on all this is that it might make sense to move this
> functionality into a PCI-to-PCI bridge device and make it a
> requirement that all direct-assigned devices have to exist behind that
> device in order to support migration.  That way you would be working
> with a directly emulated device that would likely already be
> supporting hot-plug anyway.  Then it would just be a matter of coming
> up with a few Qemu specific extensions that you would need to add to
> the device itself.  The same approach would likely be portable enough
> that you could achieve it with PCIe as well via the same configuration
> space being present on the upstream side of a PCIe port or maybe a
> PCIe switch of some sort.
>
> It would then be possible to signal via your vendor-specific PCI
> capability on that device that all devices behind this bridge require
> DMA page dirtying, you could use the configuration in addition to the
> interrupt already provided for hot-plug to signal things like when you
> are starting migration, and possibly even just extend the shpc
> functionality so that if this capability is present you have the
> option to pause/resume instead of remove/probe the device in the case
> of certain hot-plug events.  The fact is there may be some use for a
> pause/resume type approach for PCIe hot-plug in the near future
> anyway.  From the sounds of it Apple has required it for all
> Thunderbolt device drivers so that they can halt the device in order
> to shuffle resources around, perhaps we should look at something
> similar for Linux.
>
> The other advantage behind grouping functions on one bridge is things
> like reset domains.  The PCI error handling logic will want to be able
> to reset any devices that experienced an error in the event of
> something such as a surprise removal.  By grouping all of the devices
> you could disable/reset/enable them as one logical group in the event
> of something such as the "bad path" approach Michael has mentioned.
>

These sounds we need to add a faked bridge for migration and adding a
driver in the guest for it. It also needs to extend PCI bus/hotplug
driver to do pause/resume other devices, right?

My concern is still that whether we can change PCI bus/hotplug like that
without spec change.

IRQ should be general for any devices and we may extend it for
migration. Device driver also can make decision to support migration
or not.



>>>
>>>>>>> It would be great if we could avoid changing the guest; but at least
>>>>>>> your guest
>>>>>>> driver changes don't actually seem to be that hardware specific;
>>>>>>> could your
>>>>>>> changes actually be moved to generic PCI level so they could be made
>>>>>>> to work for lots of drivers?
>>>>
>>>>>
>>>>> It is impossible to use one common solution for all devices unless the
>>>>> PCIE
>>>>> spec documents it clearly and i think one day it will be there. But
>>>>> before
>>>>> that, we need some workarounds on guest driver to make it work even it
>>>>> looks
>>>>> ugly.
>>
>>
>> Yes, so far there is not hardware migration support and it's hard to modify
>> bus level code. It also will block implementation on the Windows.
>
> Please don't assume things.  Unless you have hard data from Microsoft
> that says they want it this way lets just try to figure out what works
> best for us for now and then we can start worrying about third party
> implementations after we have figured out a solution that actually
> works.
>
> - Alex
>

  reply	other threads:[~2015-12-13 15:48 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 13:35 [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 01/10] Qemu/VFIO: Create head file pci.h to share data struct Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:40     ` Lan, Tianyu
2015-12-03 15:26       ` Alex Williamson
2015-11-24 13:35 ` [RFC PATCH V2 03/10] Qemu/VFIO: Rework vfio_std_cap_max_size() function Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 04/10] Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space regs Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 05/10] Qemu/VFIO: Expose PCI config space read/write and msix functions Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:57     ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 07/10] Qemu: Add post_load_state() to run after restoring CPU state Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 08/10] Qemu: Add save_before_stop callback to run just before stopping VCPU during migration Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support Lan Tianyu
2015-11-24 21:03   ` Michael S. Tsirkin
2015-11-25 15:32     ` Lan, Tianyu
2015-11-25 15:44       ` Michael S. Tsirkin
2015-12-02 22:25   ` Alex Williamson
2015-12-03  8:56     ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 10/10] Qemu/VFIO: Misc change for enable migration with VFIO Lan Tianyu
2015-11-30  8:01 ` [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Michael S. Tsirkin
2015-12-01  6:26   ` Lan, Tianyu
2015-12-01 15:02     ` Michael S. Tsirkin
2015-12-02 14:08       ` Lan, Tianyu
2015-12-02 14:31         ` Michael S. Tsirkin
2015-12-03 14:53           ` Lan, Tianyu
2015-12-04  6:42           ` Lan, Tianyu
2015-12-04  8:05             ` Michael S. Tsirkin
2015-12-04 12:11               ` Lan, Tianyu
2015-12-03 18:32         ` Alexander Duyck
2015-12-07 16:50 ` live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC) Michael S. Tsirkin
2015-12-09 16:26   ` live migration vs device assignment (motivation) Lan, Tianyu
2015-12-09 17:14     ` Alexander Duyck
2015-12-10  3:15       ` Lan, Tianyu
2015-12-09 20:07     ` Michael S. Tsirkin
2015-12-10  3:04       ` Lan, Tianyu
2015-12-10  8:38         ` Michael S. Tsirkin
2015-12-10 14:23           ` Lan, Tianyu
2015-12-10 10:18     ` [Qemu-devel] " Dr. David Alan Gilbert
2015-12-10 11:28       ` Yang Zhang
2015-12-10 11:41         ` Dr. David Alan Gilbert
2015-12-10 13:07           ` Yang Zhang
2015-12-10 14:38           ` Lan, Tianyu
2015-12-10 16:11             ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10 19:17               ` Alexander Duyck
2015-12-11  7:32               ` Lan, Tianyu
2015-12-14  9:12                 ` Michael S. Tsirkin
2015-12-10 16:23             ` Dr. David Alan Gilbert
2015-12-10 17:16             ` Alexander Duyck
2015-12-13 15:47               ` Lan, Tianyu [this message]
2015-12-13 19:30                 ` Alexander Duyck
2015-12-25  7:03                   ` Lan Tianyu
2015-12-25 12:11                     ` [Qemu-devel] " Michael S. Tsirkin
2015-12-28 17:42                       ` Lan, Tianyu
2015-12-29 16:46                         ` Michael S. Tsirkin
2015-12-29 17:04                           ` Alexander Duyck
2015-12-29 17:15                             ` Michael S. Tsirkin
2015-12-29 18:04                               ` [Qemu-devel] " Alexander Duyck
2016-01-04  2:15                           ` Lan Tianyu
2015-12-25 22:31                     ` Alexander Duyck
2015-12-27  9:21                       ` Michael S. Tsirkin
2015-12-27 21:45                         ` [Qemu-devel] " Alexander Duyck
2015-12-28  8:51                           ` Michael S. Tsirkin
2015-12-28  3:20                       ` Dong, Eddie
2015-12-28  4:26                         ` Alexander Duyck
2015-12-28 11:50                         ` [Qemu-devel] " Michael S. Tsirkin
2015-12-14  9:26                 ` Michael S. Tsirkin
2015-12-28  8:52                   ` Pavel Fedin
2015-12-28 11:51                     ` Michael S. Tsirkin
2016-03-17  9:15 ` [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=566D9320.8000209@intel.com \
    --to=tianyu.lan@intel.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=amit.shah@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=ard.biesheuvel@linaro.org \
    --cc=blauwirbel@gmail.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dgilbert@redhat.com \
    --cc=donald.c.skidmore@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=gerlitz.or@gmail.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=mark.d.rustad@intel.com \
    --cc=mst@redhat.com \
    --cc=nrupal.jani@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).