qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"eskultet@redhat.com" <eskultet@redhat.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"Wang, Zhi A" <zhi.a.wang@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"Liu, Changpeng" <changpeng.liu@intel.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>
Subject: Re: [Qemu-devel] [PATCH v8 01/13] vfio: KABI for migration interface
Date: Fri, 30 Aug 2019 10:32:52 -0600	[thread overview]
Message-ID: <20190830103252.2b427144@x1.home> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19D553184@SHSMSX104.ccr.corp.intel.com>

On Fri, 30 Aug 2019 08:06:32 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Tian, Kevin
> > Sent: Friday, August 30, 2019 3:26 PM
> >   
> [...]
> > > How does QEMU handle the fact that IOVAs are potentially dynamic while
> > > performing the live portion of a migration?  For example, each time a
> > > guest driver calls dma_map_page() or dma_unmap_page(), a
> > > MemoryRegionSection pops in or out of the AddressSpace for the device
> > > (I'm assuming a vIOMMU where the device AddressSpace is not
> > > system_memory).  I don't see any QEMU code that intercepts that change
> > > in the AddressSpace such that the IOVA dirty pfns could be recorded and
> > > translated to GFNs.  The vendor driver can't track these beyond getting
> > > an unmap notification since it only knows the IOVA pfns, which can be
> > > re-used with different GFN backing.  Once the DMA mapping is torn down,
> > > it seems those dirty pfns are lost in the ether.  If this works in QEMU,
> > > please help me find the code that handles it.  
> > 
> > I'm curious about this part too. Interestingly, I didn't find any log_sync
> > callback registered by emulated devices in Qemu. Looks dirty pages
> > by emulated DMAs are recorded in some implicit way. But KVM always
> > reports dirty page in GFN instead of IOVA, regardless of the presence of
> > vIOMMU. If Qemu also tracks dirty pages in GFN for emulated DMAs
> >  (translation can be done when DMA happens), then we don't need
> > worry about transient mapping from IOVA to GFN. Along this way we
> > also want GFN-based dirty bitmap being reported through VFIO,
> > similar to what KVM does. For vendor drivers, it needs to translate
> > from IOVA to HVA to GFN when tracking DMA activities on VFIO
> > devices. IOVA->HVA is provided by VFIO. for HVA->GFN, it can be
> > provided by KVM but I'm not sure whether it's exposed now.
> >   
> 
> HVA->GFN can be done through hva_to_gfn_memslot in kvm_host.h.

I thought it was bad enough that we have vendor drivers that depend on
KVM, but designing a vfio interface that only supports a KVM interface
is more undesirable.  I also note without comment that gfn_to_memslot()
is a GPL symbol.  Thanks,

Alex

> Above flow works for software-tracked dirty mechanism, e.g. in
> KVMGT, where GFN-based 'dirty' is marked when a guest page is 
> mapped into device mmu. IOVA->HPA->GFN translation is done 
> at that time, thus immune from further IOVA->GFN changes.
> 
> When hardware IOMMU supports D-bit in 2nd level translation (e.g.
> VT-d rev3.0), there are two scenarios:
> 
> 1) nested translation: guest manages 1st-level translation (IOVA->GPA)
> and host manages 2nd-level translation (GPA->HPA). The 2nd-level
> is not affected by guest mapping operations. So it's OK for IOMMU
> driver to retrieve GFN-based dirty pages by directly scanning the 2nd-
> level structure, upon request from user space. 
> 
> 2) shadowed translation (IOVA->HPA) in 2nd level: in such case the dirty
> information is tied to IOVA. the IOMMU driver is expected to maintain
> an internal dirty bitmap. Upon any change of IOVA->GPA notification
> from VFIO, the IOMMU driver should flush dirty status of affected 2nd-level
> entries to the internal GFN-based bitmap. At this time, again IOVA->HVA
> ->GPA translation required for GFN-based recording. When userspace   
> queries dirty bitmap, the IOMMU driver needs to flush latest 2nd-level 
> dirty status to internal bitmap, which is then copied to user space.
> 
> Given the trickiness of 2), we aim to enable 1) on intel-iommu driver.
> 
> Thanks
> Kevin



  reply	other threads:[~2019-08-30 16:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 18:55 [Qemu-devel] [PATCH v8 00/13] Add migration support for VFIO device Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 01/13] vfio: KABI for migration interface Kirti Wankhede
2019-08-28 20:50   ` Alex Williamson
2019-08-30  7:25     ` Tian, Kevin
2019-08-30 16:15       ` Alex Williamson
2019-09-03  6:05         ` Tian, Kevin
2019-09-04  8:28           ` Yan Zhao
     [not found]     ` <AADFC41AFE54684AB9EE6CBC0274A5D19D553133@SHSMSX104.ccr.corp.intel.com>
2019-08-30  8:06       ` Tian, Kevin
2019-08-30 16:32         ` Alex Williamson [this message]
2019-09-03  6:57           ` Tian, Kevin
2019-09-12 14:41             ` Alex Williamson
2019-09-12 23:00               ` Tian, Kevin
2019-09-13 15:47                 ` Alex Williamson
2019-09-16  1:53                   ` Tian, Kevin
     [not found]               ` <AADFC41AFE54684AB9EE6CBC0274A5D19D572142@SHSMSX104.ccr.corp.intel.com>
2019-09-24  2:19                 ` Tian, Kevin
2019-09-24 18:03                   ` Alex Williamson
2019-09-24 23:04                     ` Tian, Kevin
2019-09-25 19:06                       ` Alex Williamson
2019-09-26  3:07                         ` Tian, Kevin
2019-09-26 21:33                           ` Alex Williamson
2019-10-24 11:41                             ` Tian, Kevin
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 02/13] vfio: Add function to unmap VFIO region Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 03/13] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 04/13] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 05/13] vfio: Add migration region initialization and finalize function Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 06/13] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 07/13] vfio: Add migration state change notifier Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 08/13] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 09/13] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 10/13] vfio: Add load " Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 11/13] vfio: Add function to get dirty page list Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 12/13] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2019-08-26 18:55 ` [Qemu-devel] [PATCH v8 13/13] vfio: Make vfio-pci device migration capable Kirti Wankhede
2019-08-26 19:43 ` [Qemu-devel] [PATCH v8 00/13] Add migration support for VFIO device no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190830103252.2b427144@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).