From: Alexander Duyck <alexander.duyck@gmail.com>
To: Alex Williamson <alex.williamson@redhat.com>,
Or Gerlitz <gerlitz.or@gmail.com>
Cc: emil.s.tantilov@intel.com, kvm@vger.kernel.org,
"Michael S. Tsirkin <mst@redhat.com> (mst@redhat.com)"
<mst@redhat.com>,
linux-pci@vger.kernel.org, qemu-devel@nongnu.org,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
carolyn.wyborny@intel.com, "Skidmore,
Donald C" <donald.c.skidmore@intel.com>,
agraf@suse.de, matthew.vick@intel.com,
intel-wired-lan@lists.osuosl.org,
Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
yang.z.zhang@intel.com,
Mitch Williams <mitch.a.williams@intel.com>,
nrupal.jani@intel.com, bhelgaas@google.com,
Lan Tianyu <tianyu.lan@intel.com>,
Linux Netdev List <netdev@vger.kernel.org>,
Shannon Nelson <shannon.nelson@intel.com>,
eddie.dong@intel.com, Linux Kernel <linux-kernel@vger.kernel.org>,
john.ronciak@intel.com, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC
Date: Wed, 21 Oct 2015 16:26:47 -0700 [thread overview]
Message-ID: <56281F37.1050506@gmail.com> (raw)
In-Reply-To: <1445455227.4059.867.camel@redhat.com>
On 10/21/2015 12:20 PM, Alex Williamson wrote:
> On Wed, 2015-10-21 at 21:45 +0300, Or Gerlitz wrote:
>> On Wed, Oct 21, 2015 at 7:37 PM, Lan Tianyu <tianyu.lan@intel.com> wrote:
>>> This patchset is to propose a new solution to add live migration support
>>> for 82599 SRIOV network card.
>>
>>> In our solution, we prefer to put all device specific operation into VF and
>>> PF driver and make code in the Qemu more general.
>>
>> [...]
>>
>>> Service down time test
>>> So far, we tested migration between two laptops with 82599 nic which
>>> are connected to a gigabit switch. Ping VF in the 0.001s interval
>>> during migration on the host of source side. It service down
>>> time is about 180ms.
>>
>> So... what would you expect service down wise for the following
>> solution which is zero touch and I think should work for any VF
>> driver:
>>
>> on host A: unplug the VM and conduct live migration to host B ala the
>> no-SRIOV case.
>
> The trouble here is that the VF needs to be unplugged prior to the start
> of migration because we can't do effective dirty page tracking while the
> device is connected and doing DMA. So the downtime, assuming we're
> counting only VF connectivity, is dependent on memory size, rate of
> dirtying, and network bandwidth; seconds for small guests, minutes or
> more (maybe much, much more) for large guests.
The question of dirty page tracking though should be pretty simple. We
start the Tx packets out as dirty so we don't need to add anything
there. It seems like the Rx data and Tx/Rx descriptor rings are the issue.
> This is why the typical VF agnostic approach here is to using bonding
> and fail over to a emulated device during migration, so performance
> suffers, but downtime is something acceptable.
>
> If we want the ability to defer the VF unplug until just before the
> final stages of the migration, we need the VF to participate in dirty
> page tracking. Here it's done via an enlightened guest driver. Alex
> Graf presented a solution using a device specific enlightenment in QEMU.
> Otherwise we'd need hardware support from the IOMMU.
My only real complaint with this patch series is that it seems like
there was to much focus on instrumenting the driver instead of providing
the code necessary to enable a driver ecosystem that enables migration.
I don't know if what we need is a full hardware IOMMU. It seems like a
good way to take care of the need to flag dirty pages for DMA capable
devices would be to add functionality to the dma_map_ops calls
sync_{sg|single}for_cpu and unmap_{page|sg} so that they would take care
of mapping the pages as dirty for us when needed. We could probably
make do with just a few tweaks to existing API in order to make this work.
As far as the descriptor rings I would argue they are invalid as soon as
we migrate. The problem is there is no way to guarantee ordering as we
cannot pre-emptively mark an Rx data buffer as being a dirty page when
we haven't even looked at the Rx descriptor for the given buffer yet.
Tx has similar issues as we cannot guarantee the Tx will disable itself
after a complete frame. As such I would say the moment we migrate we
should just give up on the frames that are still in the descriptor
rings, drop them, and then start over with fresh rings.
- Alex
next prev parent reply other threads:[~2015-10-21 23:26 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 16:37 [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device Lan Tianyu
2015-10-21 18:07 ` Alexander Duyck
2015-10-24 14:46 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 02/12] IXGBE: Add new mail box event to restore VF status in the PF driver Lan Tianyu
2015-10-21 20:34 ` Alexander Duyck
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 03/12] IXGBE: Add sysfs interface for Qemu to migrate " Lan Tianyu
2015-10-21 20:45 ` Alexander Duyck
2015-10-25 7:21 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf" Lan Tianyu
2015-10-21 20:52 ` Alexander Duyck
2015-10-22 12:51 ` Michael S. Tsirkin
2015-10-24 15:43 ` Lan, Tianyu
2015-10-25 6:03 ` Alexander Duyck
2015-10-25 6:45 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 06/12] IXGBEVF: Add self emulation layer Lan Tianyu
2015-10-21 20:58 ` Alexander Duyck
2015-10-22 12:50 ` Michael S. Tsirkin
2015-10-22 15:50 ` Alexander Duyck
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 07/12] IXGBEVF: Add new mail box event for migration Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package Lan Tianyu
2015-10-21 21:14 ` Alexander Duyck
2015-10-24 16:12 ` Lan, Tianyu
2015-10-22 12:58 ` Michael S. Tsirkin
2015-10-24 16:08 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 09/12] IXGBEVF: Add live migration support for VF driver Lan Tianyu
2015-10-21 21:48 ` Alexander Duyck
2015-10-22 12:46 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation Lan Tianyu
2015-10-21 21:55 ` Alexander Duyck
2015-10-22 12:40 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 11/12] IXGBEVF: Migrate VF statistic data Lan Tianyu
2015-10-22 12:36 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 12/12] IXGBEVF: Track dma dirty pages Lan Tianyu
2015-10-22 12:30 ` Michael S. Tsirkin
2015-10-21 18:45 ` [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Or Gerlitz
2015-10-21 19:20 ` Alex Williamson
2015-10-21 23:26 ` Alexander Duyck [this message]
2015-10-22 12:32 ` Michael S. Tsirkin
2015-10-22 13:01 ` Alex Williamson
2015-10-22 13:06 ` Michael S. Tsirkin
2015-10-22 15:58 ` Or Gerlitz
2015-10-22 16:17 ` Alex Williamson
2015-10-22 12:55 ` Michael S. Tsirkin
2015-10-23 18:36 ` Alexander Duyck
2015-10-23 19:05 ` Alex Williamson
2015-10-23 20:01 ` Alexander Duyck
2015-10-26 5:36 ` Lan Tianyu
2015-10-26 15:03 ` Alexander Duyck
2015-10-29 6:12 ` Lan Tianyu
2015-10-29 6:58 ` Alexander Duyck
2015-10-29 8:33 ` Lan Tianyu
2015-10-29 16:17 ` Alexander Duyck
2015-10-30 2:41 ` Lan Tianyu
2015-10-30 18:04 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56281F37.1050506@gmail.com \
--to=alexander.duyck@gmail.com \
--cc=agraf@suse.de \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=carolyn.wyborny@intel.com \
--cc=donald.c.skidmore@intel.com \
--cc=eddie.dong@intel.com \
--cc=emil.s.tantilov@intel.com \
--cc=gerlitz.or@gmail.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.ronciak@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=matthew.vick@intel.com \
--cc=mitch.a.williams@intel.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nrupal.jani@intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shannon.nelson@intel.com \
--cc=tianyu.lan@intel.com \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).