From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: cohuck@redhat.com, cjia@nvidia.com, aik@ozlabs.ru,
Zhengxiao.zx@alibaba-inc.com, shuangtai.tst@alibaba-inc.com,
qemu-devel@nongnu.org, Kirti Wankhede <kwankhede@nvidia.com>,
eauger@redhat.com, yi.l.liu@intel.com, quintela@redhat.com,
ziye.yang@intel.com, armbru@redhat.com, mlevitsk@redhat.com,
pasic@linux.ibm.com, felipe@nutanix.com, zhi.a.wang@intel.com,
kevin.tian@intel.com, yan.y.zhao@intel.com, dgilbert@redhat.com,
changpeng.liu@intel.com, eskultet@redhat.com, Ken.Xue@amd.com,
jonathan.davies@nutanix.com, pbonzini@redhat.com
Subject: Re: [PATCH QEMU v25 13/17] vfio: create mapped iova list when vIOMMU is enabled
Date: Fri, 26 Jun 2020 10:43:41 -0400 [thread overview]
Message-ID: <20200626144341.GL64004@xz-x1> (raw)
In-Reply-To: <20200625114039.566b0914@x1.home>
On Thu, Jun 25, 2020 at 11:40:39AM -0600, Alex Williamson wrote:
> On Thu, 25 Jun 2020 20:04:08 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>
> > On 6/25/2020 12:25 AM, Alex Williamson wrote:
> > > On Sun, 21 Jun 2020 01:51:22 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >
> > >> Create mapped iova list when vIOMMU is enabled. For each mapped iova
> > >> save translated address. Add node to list on MAP and remove node from
> > >> list on UNMAP.
> > >> This list is used to track dirty pages during migration.
> > >
> > > This seems like a lot of overhead to support that the VM might migrate.
> > > Is there no way we can build this when we start migration, for example
> > > replaying the mappings at that time? Thanks,
> > >
> >
> > In my previous version I tried to go through whole range and find valid
> > iotlb, as below:
> >
> > + if (memory_region_is_iommu(section->mr)) {
> > + iotlb = address_space_get_iotlb_entry(container->space->as,
> > iova,
> > + true,
> > MEMTXATTRS_UNSPECIFIED);
> >
> > When mapping doesn't exist, qemu throws error as below:
> >
> > qemu-system-x86_64: vtd_iova_to_slpte: detected slpte permission error
> > (iova=0x0, level=0x3, slpte=0x0, write=1)
> > qemu-system-x86_64: vtd_iommu_translate: detected translation failure
> > (dev=00:03:00, iova=0x0)
> > qemu-system-x86_64: New fault is not recorded due to compression of faults
>
> My assumption would have been that we use the replay mechanism, which
> is known to work because we need to use it when we hot-add a device.
> We'd make use of iommu_notifier_init() to create a new handler for this
> purpose, then we'd walk our container->giommu_list and call
> memory_region_iommu_replay() for each.
>
> Peter, does this sound like the right approach to you?
(Sorry I may not have the complete picture of this series, please bear with
me...)
This seems to be a workable approach to me. However then we might have a
similar mapping entry cached the 3rd time... VFIO kernel has a copy initially,
then QEMU vIOMMU has another one (please grep iova_tree in intel_iommu.c).
My wild guess is that the mapping should still be in control in most cases, so
even if we cache it multiple times (for better layering) it would still be
fine. However since we're in QEMU right now, I'm also thinking whether we can
share the information with the vIOMMU somehow, because even if the page table
entry is wiped off at that time we may still have a chance to use the DMAMap
object that cached in vIOMMU when iommu notify() happens. Though that may
require some vIOMMU change too (e.g., vtd_page_walk_one may need to postpone
the iova_tree_remove to be after the hook_fn is called, also we may need to
pass the DMAMap object or at least the previous translated addr to the hook
somehow before removal), so maybe that can also be done on top.
>
> > Secondly, it iterates through whole range with IOMMU page size
> > granularity which is 4K, so it takes long time resulting in large
> > downtime. With this optimization, downtime with vIOMMU reduced
> > significantly.
>
> Right, but we amortize that overhead and the resulting bloat across the
> 99.9999% of the time that we're not migrating. I wonder if we could
> startup another thread to handle this when we enable dirty logging. We
> don't really need the result until we start processing the dirty
> bitmap, right? Also, if we're dealing with this many separate pages,
> shouldn't we be using a tree rather than a list to give us O(logN)
> rather than O(N)?
Yep I agree. At least the vIOMMU cache is using gtree.
Btw, IIUC we won't walk the whole range using 4K granularity always, not for
VT-d emulation. Because vtd_page_walk_level() is smart enough to skip higher
levels of invalid entries so it can jump with 2M/1G/... chunks if the whole
chunk is invalid.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2020-06-26 14:44 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-20 20:21 [PATCH QEMU v25 00/17] Add migration support for VFIO devices Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 01/17] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 02/17] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 03/17] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-06-22 20:28 ` Alex Williamson
2020-06-24 14:29 ` Kirti Wankhede
2020-06-24 19:49 ` Alex Williamson
2020-06-26 12:16 ` Dr. David Alan Gilbert
2020-06-26 22:44 ` Alex Williamson
2020-06-29 9:59 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 04/17] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-06-23 7:54 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 05/17] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 18:55 ` Kirti Wankhede
2020-06-26 14:51 ` Dr. David Alan Gilbert
2020-06-23 8:07 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 06/17] vfio: Add migration state change notifier Kirti Wankhede
2020-06-23 8:10 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 07/17] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 19:21 ` Kirti Wankhede
2020-06-23 19:50 ` Alex Williamson
2020-06-26 14:22 ` Dr. David Alan Gilbert
2020-06-26 14:31 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 08/17] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 20:34 ` Kirti Wankhede
2020-06-23 20:40 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 09/17] vfio: Add load " Kirti Wankhede
2020-06-24 18:54 ` Alex Williamson
2020-06-25 14:16 ` Kirti Wankhede
2020-06-25 14:57 ` Alex Williamson
2020-06-26 14:54 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 10/17] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 11/17] vfio: Get migration capability flags for container Kirti Wankhede
2020-06-24 8:43 ` Cornelia Huck
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:09 ` Kirti Wankhede
2020-06-25 14:56 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 12/17] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-06-23 10:32 ` Cornelia Huck
2020-06-23 11:01 ` Dr. David Alan Gilbert
2020-06-23 11:06 ` Cornelia Huck
2020-06-24 18:55 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 13/17] vfio: create mapped iova list when vIOMMU is enabled Kirti Wankhede
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:34 ` Kirti Wankhede
2020-06-25 17:40 ` Alex Williamson
2020-06-26 14:43 ` Peter Xu [this message]
2020-06-20 20:21 ` [PATCH QEMU v25 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:43 ` Kirti Wankhede
2020-06-25 17:57 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 15/17] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-06-23 8:25 ` Cornelia Huck
2020-06-24 18:56 ` Alex Williamson
2020-06-25 15:01 ` Kirti Wankhede
2020-06-25 19:18 ` Alex Williamson
2020-06-26 14:15 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 16/17] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-06-22 16:51 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 17/17] qapi: Add VFIO devices migration stats in Migration stats Kirti Wankhede
2020-06-23 7:21 ` Markus Armbruster
2020-06-23 21:16 ` Kirti Wankhede
2020-06-25 5:51 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200626144341.GL64004@xz-x1 \
--to=peterx@redhat.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).