From: Lan Tianyu <tianyu.lan@intel.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>,
"Tantilov, Emil S" <emil.s.tantilov@intel.com>,
kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
aik@ozlabs.ru, qemu-devel@nongnu.org, lcapitulino@redhat.com,
Blue Swirl <blauwirbel@gmail.com>,
kraxel@redhat.com, "Rustad, Mark D" <mark.d.rustad@intel.com>,
quintela@redhat.com, "Skidmore,
Donald C" <donald.c.skidmore@intel.com>,
Alexander Graf <agraf@suse.de>, Or Gerlitz <gerlitz.or@gmail.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
Anthony Liguori <anthony@codemonkey.ws>,
cornelia.huck@de.ibm.com,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
"Dong, Eddie" <eddie.dong@intel.com>,
"Jani, Nrupal" <nrupal.jani@intel.com>,
amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: live migration vs device assignment (motivation)
Date: Fri, 25 Dec 2015 15:03:47 +0800 [thread overview]
Message-ID: <567CEA53.5030601@intel.com> (raw)
In-Reply-To: <CAKgT0Uc9g5aqKUKudD4Rj+1KfbGZn6VLzZxGv7UrRK+dy3wEVA@mail.gmail.com>
Merry Christmas.
Sorry for later response due to personal affair.
On 2015年12月14日 03:30, Alexander Duyck wrote:
>> > These sounds we need to add a faked bridge for migration and adding a
>> > driver in the guest for it. It also needs to extend PCI bus/hotplug
>> > driver to do pause/resume other devices, right?
>> >
>> > My concern is still that whether we can change PCI bus/hotplug like that
>> > without spec change.
>> >
>> > IRQ should be general for any devices and we may extend it for
>> > migration. Device driver also can make decision to support migration
>> > or not.
> The device should have no say in the matter. Either we are going to
> migrate or we will not. This is why I have suggested my approach as
> it allows for the least amount of driver intrusion while providing the
> maximum number of ways to still perform migration even if the device
> doesn't support it.
Even if the device driver doesn't support migration, you still want to
migrate VM? That maybe risk and we should add the "bad path" for the
driver at least.
>
> The solution I have proposed is simple:
>
> 1. Extend swiotlb to allow for a page dirtying functionality.
>
> This part is pretty straight forward. I'll submit a few patches
> later today as RFC that can provided the minimal functionality needed
> for this.
Very appreciate to do that.
>
> 2. Provide a vendor specific configuration space option on the QEMU
> implementation of a PCI bridge to act as a bridge between direct
> assigned devices and the host bridge.
>
> My thought was to add some vendor specific block that includes a
> capabilities, status, and control register so you could go through and
> synchronize things like the DMA page dirtying feature. The bridge
> itself could manage the migration capable bit inside QEMU for all
> devices assigned to it. So if you added a VF to the bridge it would
> flag that you can support migration in QEMU, while the bridge would
> indicate you cannot until the DMA page dirtying control bit is set by
> the guest.
>
> We could also go through and optimize the DMA page dirtying after
> this is added so that we can narrow down the scope of use, and as a
> result improve the performance for other devices that don't need to
> support migration. It would then be a matter of adding an interrupt
> in the device to handle an event such as the DMA page dirtying status
> bit being set in the config space status register, while the bit is
> not set in the control register. If it doesn't get set then we would
> have to evict the devices before the warm-up phase of the migration,
> otherwise we can defer it until the end of the warm-up phase.
>
> 3. Extend existing shpc driver to support the optional "pause"
> functionality as called out in section 4.1.2 of the Revision 1.1 PCI
> hot-plug specification.
Since your solution has added a faked PCI bridge. Why not notify the
bridge directly during migration via irq and call device driver's
callback in the new bridge driver?
Otherwise, the new bridge driver also can check whether the device
driver provides migration callback or not and call them to improve the
passthough device's performance during migration.
>
> Note I call out "extend" here instead of saying to add this.
> Basically what we should do is provide a means of quiescing the device
> without unloading the driver. This is called out as something the OS
> vendor can optionally implement in the PCI hot-plug specification. On
> OSes that wouldn't support this it would just be treated as a standard
> hot-plug event. We could add a capability, status, and control bit
> in the vendor specific configuration block for this as well and if we
> set the status bit would indicate the host wants to pause instead of
> remove and the control bit would indicate the guest supports "pause"
> in the OS. We then could optionally disable guest migration while the
> VF is present and pause is not supported.
>
> To support this we would need to add a timer and if a new device
> is not inserted in some period of time (60 seconds for example), or if
> a different device is inserted,
> we need to unload the original driver
> from the device. In addition we would need to verify if drivers can
> call the remove function after having called suspend without resume.
> If not, we could look at adding a recovery function to remove the
> driver from the device in the case of a suspend with either a failed
> resume or no resume call. Once again it would probably be useful to
> have for those cases where power management suspend/resume runs into
> an issue like somebody causing a surprise removal while a device was
> suspended.
--
Best regards
Tianyu Lan
next prev parent reply other threads:[~2015-12-25 7:03 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-24 13:35 [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 01/10] Qemu/VFIO: Create head file pci.h to share data struct Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition Lan Tianyu
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:40 ` Lan, Tianyu
2015-12-03 15:26 ` Alex Williamson
2015-11-24 13:35 ` [RFC PATCH V2 03/10] Qemu/VFIO: Rework vfio_std_cap_max_size() function Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 04/10] Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space regs Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 05/10] Qemu/VFIO: Expose PCI config space read/write and msix functions Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:57 ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 07/10] Qemu: Add post_load_state() to run after restoring CPU state Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 08/10] Qemu: Add save_before_stop callback to run just before stopping VCPU during migration Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support Lan Tianyu
2015-11-24 21:03 ` Michael S. Tsirkin
2015-11-25 15:32 ` Lan, Tianyu
2015-11-25 15:44 ` Michael S. Tsirkin
2015-12-02 22:25 ` Alex Williamson
2015-12-03 8:56 ` Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 10/10] Qemu/VFIO: Misc change for enable migration with VFIO Lan Tianyu
2015-11-30 8:01 ` [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Michael S. Tsirkin
2015-12-01 6:26 ` Lan, Tianyu
2015-12-01 15:02 ` Michael S. Tsirkin
2015-12-02 14:08 ` Lan, Tianyu
2015-12-02 14:31 ` Michael S. Tsirkin
2015-12-03 14:53 ` Lan, Tianyu
2015-12-04 6:42 ` Lan, Tianyu
2015-12-04 8:05 ` Michael S. Tsirkin
2015-12-04 12:11 ` Lan, Tianyu
2015-12-03 18:32 ` Alexander Duyck
2015-12-07 16:50 ` live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC) Michael S. Tsirkin
2015-12-09 16:26 ` live migration vs device assignment (motivation) Lan, Tianyu
2015-12-09 17:14 ` Alexander Duyck
2015-12-10 3:15 ` Lan, Tianyu
2015-12-09 20:07 ` Michael S. Tsirkin
2015-12-10 3:04 ` Lan, Tianyu
2015-12-10 8:38 ` Michael S. Tsirkin
2015-12-10 14:23 ` Lan, Tianyu
2015-12-10 10:18 ` [Qemu-devel] " Dr. David Alan Gilbert
2015-12-10 11:28 ` Yang Zhang
2015-12-10 11:41 ` Dr. David Alan Gilbert
2015-12-10 13:07 ` Yang Zhang
2015-12-10 14:38 ` Lan, Tianyu
2015-12-10 16:11 ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10 19:17 ` Alexander Duyck
2015-12-11 7:32 ` Lan, Tianyu
2015-12-14 9:12 ` Michael S. Tsirkin
2015-12-10 16:23 ` Dr. David Alan Gilbert
2015-12-10 17:16 ` Alexander Duyck
2015-12-13 15:47 ` Lan, Tianyu
2015-12-13 19:30 ` Alexander Duyck
2015-12-25 7:03 ` Lan Tianyu [this message]
2015-12-25 12:11 ` Michael S. Tsirkin
2015-12-28 17:42 ` Lan, Tianyu
2015-12-29 16:46 ` Michael S. Tsirkin
2015-12-29 17:04 ` Alexander Duyck
2015-12-29 17:15 ` Michael S. Tsirkin
2015-12-29 18:04 ` [Qemu-devel] " Alexander Duyck
2016-01-04 2:15 ` Lan Tianyu
2015-12-25 22:31 ` Alexander Duyck
2015-12-27 9:21 ` Michael S. Tsirkin
2015-12-27 21:45 ` [Qemu-devel] " Alexander Duyck
2015-12-28 8:51 ` Michael S. Tsirkin
2015-12-28 3:20 ` Dong, Eddie
2015-12-28 4:26 ` Alexander Duyck
2015-12-28 11:50 ` [Qemu-devel] " Michael S. Tsirkin
2015-12-14 9:26 ` Michael S. Tsirkin
2015-12-28 8:52 ` Pavel Fedin
2015-12-28 11:51 ` Michael S. Tsirkin
2016-03-17 9:15 ` [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=567CEA53.5030601@intel.com \
--to=tianyu.lan@intel.com \
--cc=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=amit.shah@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=ard.biesheuvel@linaro.org \
--cc=blauwirbel@gmail.com \
--cc=cornelia.huck@de.ibm.com \
--cc=dgilbert@redhat.com \
--cc=donald.c.skidmore@intel.com \
--cc=eddie.dong@intel.com \
--cc=emil.s.tantilov@intel.com \
--cc=gerlitz.or@gmail.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=lcapitulino@redhat.com \
--cc=mark.d.rustad@intel.com \
--cc=mst@redhat.com \
--cc=nrupal.jani@intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).