From: Alex Williamson <alex.williamson@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: emil.s.tantilov@intel.com, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, qemu-devel@nongnu.org,
jesse.brandeburg@intel.com, carolyn.wyborny@intel.com,
donald.c.skidmore@intel.com, agraf@suse.de,
matthew.vick@intel.com, intel-wired-lan@lists.osuosl.org,
jeffrey.t.kirsher@intel.com, yang.z.zhang@intel.com,
mitch.a.williams@intel.com, nrupal.jani@intel.com,
bhelgaas@google.com, Lan Tianyu <tianyu.lan@intel.com>,
netdev@vger.kernel.org, shannon.nelson@intel.com,
eddie.dong@intel.com, linux-kernel@vger.kernel.org,
john.ronciak@intel.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC
Date: Fri, 23 Oct 2015 13:05:53 -0600 [thread overview]
Message-ID: <1445627153.5050.52.camel@redhat.com> (raw)
In-Reply-To: <562A7E33.4080800@gmail.com>
On Fri, 2015-10-23 at 11:36 -0700, Alexander Duyck wrote:
> On 10/21/2015 09:37 AM, Lan Tianyu wrote:
> > This patchset is to propose a new solution to add live migration support for 82599
> > SRIOV network card.
> >
> > Im our solution, we prefer to put all device specific operation into VF and
> > PF driver and make code in the Qemu more general.
> >
> >
> > VF status migration
> > =================================================================
> > VF status can be divided into 4 parts
> > 1) PCI configure regs
> > 2) MSIX configure
> > 3) VF status in the PF driver
> > 4) VF MMIO regs
> >
> > The first three status are all handled by Qemu.
> > The PCI configure space regs and MSIX configure are originally
> > stored in Qemu. To save and restore "VF status in the PF driver"
> > by Qemu during migration, adds new sysfs node "state_in_pf" under
> > VF sysfs directory.
> >
> > For VF MMIO regs, we introduce self emulation layer in the VF
> > driver to record MMIO reg values during reading or writing MMIO
> > and put these data in the guest memory. It will be migrated with
> > guest memory to new machine.
> >
> >
> > VF function restoration
> > ================================================================
> > Restoring VF function operation are done in the VF and PF driver.
> >
> > In order to let VF driver to know migration status, Qemu fakes VF
> > PCI configure regs to indicate migration status and add new sysfs
> > node "notify_vf" to trigger VF mailbox irq in order to notify VF
> > about migration status change.
> >
> > Transmit/Receive descriptor head regs are read-only and can't
> > be restored via writing back recording reg value directly and they
> > are set to 0 during VF reset. To reuse original tx/rx rings, shift
> > desc ring in order to move the desc pointed by original head reg to
> > first entry of the ring and then enable tx/rx rings. VF restarts to
> > receive and transmit from original head desc.
> >
> >
> > Tracking DMA accessed memory
> > =================================================================
> > Migration relies on tracking dirty page to migrate memory.
> > Hardware can't automatically mark a page as dirty after DMA
> > memory access. VF descriptor rings and data buffers are modified
> > by hardware when receive and transmit data. To track such dirty memory
> > manually, do dummy writes(read a byte and write it back) when receive
> > and transmit data.
>
> I was thinking about it and I am pretty sure the dummy write approach is
> problematic at best. Specifically the issue is that while you are
> performing a dummy write you risk pulling in descriptors for data that
> hasn't been dummy written to yet. So when you resume and restore your
> descriptors you will have once that may contain Rx descriptors
> indicating they contain data when after the migration they don't.
>
> I really think the best approach to take would be to look at
> implementing an emulated IOMMU so that you could track DMA mapped pages
> and avoid migrating the ones marked as DMA_FROM_DEVICE until they are
> unmapped. The advantage to this is that in the case of the ixgbevf
> driver it now reuses the same pages for Rx DMA. As a result it will be
> rewriting the same pages often and if you are marking those pages as
> dirty and transitioning them it is possible for a flow of small packets
> to really make a mess of things since you would be rewriting the same
> pages in a loop while the device is processing packets.
I'd be concerned that an emulated IOMMU on the DMA path would reduce
throughput to the point where we shouldn't even bother with assigning
the device in the first place and should be using virtio-net instead.
POWER systems have a guest visible IOMMU and it's been challenging for
them to get to 10Gbps, requiring real-mode tricks. virtio-net may add
some latency, but it's not that hard to get it to 10Gbps and it already
supports migration. An emulated IOMMU in the guest is really only good
for relatively static mappings, the latency for anything else is likely
too high. Maybe there are shadow page table tricks that could help, but
it's imposing overhead the whole time the guest is running, not only on
migration. Thanks,
Alex
next prev parent reply other threads:[~2015-10-23 19:06 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 16:37 [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device Lan Tianyu
2015-10-21 18:07 ` Alexander Duyck
2015-10-24 14:46 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 02/12] IXGBE: Add new mail box event to restore VF status in the PF driver Lan Tianyu
2015-10-21 20:34 ` Alexander Duyck
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 03/12] IXGBE: Add sysfs interface for Qemu to migrate " Lan Tianyu
2015-10-21 20:45 ` Alexander Duyck
2015-10-25 7:21 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf" Lan Tianyu
2015-10-21 20:52 ` Alexander Duyck
2015-10-22 12:51 ` Michael S. Tsirkin
2015-10-24 15:43 ` Lan, Tianyu
2015-10-25 6:03 ` Alexander Duyck
2015-10-25 6:45 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 06/12] IXGBEVF: Add self emulation layer Lan Tianyu
2015-10-21 20:58 ` Alexander Duyck
2015-10-22 12:50 ` Michael S. Tsirkin
2015-10-22 15:50 ` Alexander Duyck
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 07/12] IXGBEVF: Add new mail box event for migration Lan Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package Lan Tianyu
2015-10-21 21:14 ` Alexander Duyck
2015-10-24 16:12 ` Lan, Tianyu
2015-10-22 12:58 ` Michael S. Tsirkin
2015-10-24 16:08 ` Lan, Tianyu
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 09/12] IXGBEVF: Add live migration support for VF driver Lan Tianyu
2015-10-21 21:48 ` Alexander Duyck
2015-10-22 12:46 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation Lan Tianyu
2015-10-21 21:55 ` Alexander Duyck
2015-10-22 12:40 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 11/12] IXGBEVF: Migrate VF statistic data Lan Tianyu
2015-10-22 12:36 ` Michael S. Tsirkin
2015-10-21 16:37 ` [Qemu-devel] [RFC Patch 12/12] IXGBEVF: Track dma dirty pages Lan Tianyu
2015-10-22 12:30 ` Michael S. Tsirkin
2015-10-21 18:45 ` [Qemu-devel] [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Or Gerlitz
2015-10-21 19:20 ` Alex Williamson
2015-10-21 23:26 ` Alexander Duyck
2015-10-22 12:32 ` Michael S. Tsirkin
2015-10-22 13:01 ` Alex Williamson
2015-10-22 13:06 ` Michael S. Tsirkin
2015-10-22 15:58 ` Or Gerlitz
2015-10-22 16:17 ` Alex Williamson
2015-10-22 12:55 ` Michael S. Tsirkin
2015-10-23 18:36 ` Alexander Duyck
2015-10-23 19:05 ` Alex Williamson [this message]
2015-10-23 20:01 ` Alexander Duyck
2015-10-26 5:36 ` Lan Tianyu
2015-10-26 15:03 ` Alexander Duyck
2015-10-29 6:12 ` Lan Tianyu
2015-10-29 6:58 ` Alexander Duyck
2015-10-29 8:33 ` Lan Tianyu
2015-10-29 16:17 ` Alexander Duyck
2015-10-30 2:41 ` Lan Tianyu
2015-10-30 18:04 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445627153.5050.52.camel@redhat.com \
--to=alex.williamson@redhat.com \
--cc=agraf@suse.de \
--cc=alexander.duyck@gmail.com \
--cc=bhelgaas@google.com \
--cc=carolyn.wyborny@intel.com \
--cc=donald.c.skidmore@intel.com \
--cc=eddie.dong@intel.com \
--cc=emil.s.tantilov@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.ronciak@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=matthew.vick@intel.com \
--cc=mitch.a.williams@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nrupal.jani@intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shannon.nelson@intel.com \
--cc=tianyu.lan@intel.com \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).