Re: [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Neo Jia <cjia@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
	"Song, Jike" <jike.song@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jike Song <albcamus@gmail.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
Subject: Re: [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Fri, 13 May 2016 01:31:07 -0700	[thread overview]
Message-ID: <20160513083107.GC6162@nvidia.com> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F8544C2@SHSMSX101.ccr.corp.intel.com>

On Fri, May 13, 2016 at 07:45:14AM +0000, Tian, Kevin wrote:
> > From: Neo Jia [mailto:cjia@nvidia.com]
> > Sent: Friday, May 13, 2016 3:42 PM
> > 
> > On Fri, May 13, 2016 at 03:30:27PM +0800, Jike Song wrote:
> > > On 05/13/2016 02:43 PM, Neo Jia wrote:
> > > > On Fri, May 13, 2016 at 02:22:37PM +0800, Jike Song wrote:
> > > >> On 05/13/2016 10:41 AM, Tian, Kevin wrote:
> > > >>>> From: Neo Jia [mailto:cjia@nvidia.com] Sent: Friday, May 13,
> > > >>>> 2016 3:49 AM
> > > >>>>
> > > >>>>>
> > > >>>>>> Perhaps one possibility would be to allow the vgpu driver
> > > >>>>>> to register map and unmap callbacks.  The unmap callback
> > > >>>>>> might provide the invalidation interface that we're so far
> > > >>>>>> missing.  The combination of map and unmap callbacks might
> > > >>>>>> simplify the Intel approach of pinning the entire VM memory
> > > >>>>>> space, ie. for each map callback do a translation (pin) and
> > > >>>>>> dma_map_page, for each unmap do a dma_unmap_page and
> > > >>>>>> release the translation.
> > > >>>>>
> > > >>>>> Yes adding map/unmap ops in pGPU drvier (I assume you are
> > > >>>>> refering to gpu_device_ops as implemented in Kirti's patch)
> > > >>>>> sounds a good idea, satisfying both: 1) keeping vGPU purely
> > > >>>>> virtual; 2) dealing with the Linux DMA API to achive hardware
> > > >>>>> IOMMU compatibility.
> > > >>>>>
> > > >>>>> PS, this has very little to do with pinning wholly or
> > > >>>>> partially. Intel KVMGT has once been had the whole guest
> > > >>>>> memory pinned, only because we used a spinlock, which can't
> > > >>>>> sleep at runtime.  We have removed that spinlock in our
> > > >>>>> another upstreaming effort, not here but for i915 driver, so
> > > >>>>> probably no biggie.
> > > >>>>>
> > > >>>>
> > > >>>> OK, then you guys don't need to pin everything. The next
> > > >>>> question will be if you can send the pinning request from your
> > > >>>> mediated driver backend to request memory pinning like we have
> > > >>>> demonstrated in the v3 patch, function vfio_pin_pages and
> > > >>>> vfio_unpin_pages?
> > > >>>>
> > > >>>
> > > >>> Jike can you confirm this statement? My feeling is that we don't
> > > >>> have such logic in our device model to figure out which pages
> > > >>> need to be pinned on demand. So currently pin-everything is same
> > > >>> requirement in both KVM and Xen side...
> > > >>
> > > >> [Correct me in case of any neglect:)]
> > > >>
> > > >> IMO the ultimate reason to pin a page, is for DMA. Accessing RAM
> > > >> from a GPU is certainly a DMA operation. The DMA facility of most
> > > >> platforms, IGD and NVIDIA GPU included, is not capable of
> > > >> faulting-handling-retrying.
> > > >>
> > > >> As for vGPU solutions like Nvidia and Intel provide, the memory
> > > >> address region used by Guest for GPU access, whenever Guest sets
> > > >> the mappings, it is intercepted by Host, so it's safe to only pin
> > > >> the page before it get used by Guest. This probably doesn't need
> > > >> device model to change :)
> > > >
> > > > Hi Jike
> > > >
> > > > Just out of curiosity, how does the host intercept this before it
> > > > goes on the bus?
> > > >
> > >
> > > Hi Neo,
> > >
> > > [prologize if I mis-expressed myself, bad English ..]
> > >
> > > I was talking about intercepting the setting-up of GPU page tables,
> > > not the DMA itself.  For currently Intel GPU, the page tables are
> > > MMIO registers or simply RAM pages, called GTT (Graphics Translation
> > > Table), the writing event to an GTT entry from Guest, is always
> > > intercepted by Host.
> > 
> > Hi Jike,
> > 
> > Thanks for the details, one more question if the page tables are guest RAM, how do you
> > intercept it from host? I can see it get intercepted when it is in MMIO range.
> > 
> 
> We use page tracking framework, which is newly added to KVM recently,
> to mark RAM pages as read-only so write accesses are intercepted to 
> device model.

Yes, I am aware of that patchset from Guangrong. So far the interface are all
requiring struct *kvm, copied from https://lkml.org/lkml/2015/11/30/644

- kvm_page_track_add_page(): add the page to the tracking pool after
  that later specified access on that page will be tracked

- kvm_page_track_remove_page(): remove the page from the tracking pool,
  the specified access on the page is not tracked after the last user is
  gone

void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
                enum kvm_page_track_mode mode);
void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
               enum kvm_page_track_mode mode);

Really curious how you are going to have access to the struct kvm *kvm, or you
are relying on the userfaultfd to track the write faults only as part of the
QEMU userfault thread?

Thanks,
Neo

> 
> Thanks
> Kevin

WARNING: multiple messages have this Message-ID (diff)

From: Neo Jia <cjia@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Song, Jike" <jike.song@intel.com>,
	Jike Song <albcamus@gmail.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Ruan, Shuai" <shuai.ruan@intel.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
Subject: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Fri, 13 May 2016 01:31:07 -0700	[thread overview]
Message-ID: <20160513083107.GC6162@nvidia.com> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F8544C2@SHSMSX101.ccr.corp.intel.com>

On Fri, May 13, 2016 at 07:45:14AM +0000, Tian, Kevin wrote:
> > From: Neo Jia [mailto:cjia@nvidia.com]
> > Sent: Friday, May 13, 2016 3:42 PM
> > 
> > On Fri, May 13, 2016 at 03:30:27PM +0800, Jike Song wrote:
> > > On 05/13/2016 02:43 PM, Neo Jia wrote:
> > > > On Fri, May 13, 2016 at 02:22:37PM +0800, Jike Song wrote:
> > > >> On 05/13/2016 10:41 AM, Tian, Kevin wrote:
> > > >>>> From: Neo Jia [mailto:cjia@nvidia.com] Sent: Friday, May 13,
> > > >>>> 2016 3:49 AM
> > > >>>>
> > > >>>>>
> > > >>>>>> Perhaps one possibility would be to allow the vgpu driver
> > > >>>>>> to register map and unmap callbacks.  The unmap callback
> > > >>>>>> might provide the invalidation interface that we're so far
> > > >>>>>> missing.  The combination of map and unmap callbacks might
> > > >>>>>> simplify the Intel approach of pinning the entire VM memory
> > > >>>>>> space, ie. for each map callback do a translation (pin) and
> > > >>>>>> dma_map_page, for each unmap do a dma_unmap_page and
> > > >>>>>> release the translation.
> > > >>>>>
> > > >>>>> Yes adding map/unmap ops in pGPU drvier (I assume you are
> > > >>>>> refering to gpu_device_ops as implemented in Kirti's patch)
> > > >>>>> sounds a good idea, satisfying both: 1) keeping vGPU purely
> > > >>>>> virtual; 2) dealing with the Linux DMA API to achive hardware
> > > >>>>> IOMMU compatibility.
> > > >>>>>
> > > >>>>> PS, this has very little to do with pinning wholly or
> > > >>>>> partially. Intel KVMGT has once been had the whole guest
> > > >>>>> memory pinned, only because we used a spinlock, which can't
> > > >>>>> sleep at runtime.  We have removed that spinlock in our
> > > >>>>> another upstreaming effort, not here but for i915 driver, so
> > > >>>>> probably no biggie.
> > > >>>>>
> > > >>>>
> > > >>>> OK, then you guys don't need to pin everything. The next
> > > >>>> question will be if you can send the pinning request from your
> > > >>>> mediated driver backend to request memory pinning like we have
> > > >>>> demonstrated in the v3 patch, function vfio_pin_pages and
> > > >>>> vfio_unpin_pages?
> > > >>>>
> > > >>>
> > > >>> Jike can you confirm this statement? My feeling is that we don't
> > > >>> have such logic in our device model to figure out which pages
> > > >>> need to be pinned on demand. So currently pin-everything is same
> > > >>> requirement in both KVM and Xen side...
> > > >>
> > > >> [Correct me in case of any neglect:)]
> > > >>
> > > >> IMO the ultimate reason to pin a page, is for DMA. Accessing RAM
> > > >> from a GPU is certainly a DMA operation. The DMA facility of most
> > > >> platforms, IGD and NVIDIA GPU included, is not capable of
> > > >> faulting-handling-retrying.
> > > >>
> > > >> As for vGPU solutions like Nvidia and Intel provide, the memory
> > > >> address region used by Guest for GPU access, whenever Guest sets
> > > >> the mappings, it is intercepted by Host, so it's safe to only pin
> > > >> the page before it get used by Guest. This probably doesn't need
> > > >> device model to change :)
> > > >
> > > > Hi Jike
> > > >
> > > > Just out of curiosity, how does the host intercept this before it
> > > > goes on the bus?
> > > >
> > >
> > > Hi Neo,
> > >
> > > [prologize if I mis-expressed myself, bad English ..]
> > >
> > > I was talking about intercepting the setting-up of GPU page tables,
> > > not the DMA itself.  For currently Intel GPU, the page tables are
> > > MMIO registers or simply RAM pages, called GTT (Graphics Translation
> > > Table), the writing event to an GTT entry from Guest, is always
> > > intercepted by Host.
> > 
> > Hi Jike,
> > 
> > Thanks for the details, one more question if the page tables are guest RAM, how do you
> > intercept it from host? I can see it get intercepted when it is in MMIO range.
> > 
> 
> We use page tracking framework, which is newly added to KVM recently,
> to mark RAM pages as read-only so write accesses are intercepted to 
> device model.

Yes, I am aware of that patchset from Guangrong. So far the interface are all
requiring struct *kvm, copied from https://lkml.org/lkml/2015/11/30/644

- kvm_page_track_add_page(): add the page to the tracking pool after
  that later specified access on that page will be tracked

- kvm_page_track_remove_page(): remove the page from the tracking pool,
  the specified access on the page is not tracked after the last user is
  gone

void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
                enum kvm_page_track_mode mode);
void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
               enum kvm_page_track_mode mode);

Really curious how you are going to have access to the struct kvm *kvm, or you
are relying on the userfaultfd to track the write faults only as part of the
QEMU userfault thread?

Thanks,
Neo

> 
> Thanks
> Kevin

next prev parent reply	other threads:[~2016-05-13  8:31 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 18:40 [RFC PATCH v3 0/3] Add vGPU support Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] " Kirti Wankhede
2016-05-02 18:40 ` [RFC PATCH v3 1/3] vGPU Core driver Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  2:45     ` Tian, Kevin
2016-05-04  2:45       ` [Qemu-devel] " Tian, Kevin
2016-05-04 16:57       ` Alex Williamson
2016-05-04 16:57         ` [Qemu-devel] " Alex Williamson
2016-05-05  8:58         ` Tian, Kevin
2016-05-05  8:58           ` [Qemu-devel] " Tian, Kevin
2016-05-04  2:58     ` Tian, Kevin
2016-05-04  2:58       ` [Qemu-devel] " Tian, Kevin
2016-05-12  8:22       ` Tian, Kevin
2016-05-12  8:22         ` [Qemu-devel] " Tian, Kevin
2016-05-04 13:31     ` Kirti Wankhede
2016-05-04 13:31       ` [Qemu-devel] " Kirti Wankhede
2016-05-05  9:06       ` Tian, Kevin
2016-05-05  9:06         ` [Qemu-devel] " Tian, Kevin
2016-05-05 10:44         ` Kirti Wankhede
2016-05-05 10:44           ` [Qemu-devel] " Kirti Wankhede
2016-05-05 12:07           ` Tian, Kevin
2016-05-05 12:07             ` [Qemu-devel] " Tian, Kevin
2016-05-05 12:57             ` Kirti Wankhede
2016-05-05 12:57               ` [Qemu-devel] " Kirti Wankhede
2016-05-11  6:37               ` Tian, Kevin
2016-05-11  6:37                 ` [Qemu-devel] " Tian, Kevin
2016-05-06 12:14         ` Jike Song
2016-05-06 12:14           ` [Qemu-devel] " Jike Song
2016-05-06 16:16           ` Kirti Wankhede
2016-05-06 16:16             ` [Qemu-devel] " Kirti Wankhede
2016-05-09 12:12             ` Jike Song
2016-05-09 12:12               ` [Qemu-devel] " Jike Song
2016-05-02 18:40 ` [RFC PATCH v3 2/3] VFIO driver for vGPU device Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  3:23     ` Tian, Kevin
2016-05-04  3:23       ` [Qemu-devel] " Tian, Kevin
2016-05-04 17:06       ` Alex Williamson
2016-05-04 17:06         ` [Qemu-devel] " Alex Williamson
2016-05-04 21:14         ` Neo Jia
2016-05-04 21:14           ` [Qemu-devel] " Neo Jia
2016-05-05  4:42           ` Kirti Wankhede
2016-05-05  4:42             ` [Qemu-devel] " Kirti Wankhede
2016-05-05  9:24         ` Tian, Kevin
2016-05-05  9:24           ` [Qemu-devel] " Tian, Kevin
2016-05-05 20:27           ` Neo Jia
2016-05-05 20:27             ` [Qemu-devel] " Neo Jia
2016-05-11  6:45         ` Tian, Kevin
2016-05-11  6:45           ` [Qemu-devel] " Tian, Kevin
2016-05-11 20:10           ` Alex Williamson
2016-05-11 20:10             ` [Qemu-devel] " Alex Williamson
2016-05-12  0:59             ` Tian, Kevin
2016-05-12  0:59               ` [Qemu-devel] " Tian, Kevin
2016-05-04 16:25     ` Kirti Wankhede
2016-05-04 16:25       ` Kirti Wankhede
2016-05-02 18:40 ` [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 10:40   ` Jike Song
2016-05-03 10:40     ` [Qemu-devel] " Jike Song
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  3:39     ` Tian, Kevin
2016-05-04  3:39       ` [Qemu-devel] " Tian, Kevin
2016-05-05  6:55     ` Jike Song
2016-05-05  6:55       ` [Qemu-devel] " Jike Song
2016-05-05  9:27       ` Tian, Kevin
2016-05-05  9:27         ` [Qemu-devel] " Tian, Kevin
2016-05-10  7:52         ` Jike Song
2016-05-10  7:52           ` [Qemu-devel] " Jike Song
2016-05-10 16:02           ` Neo Jia
2016-05-10 16:02             ` [Qemu-devel] " Neo Jia
2016-05-11  9:15             ` Jike Song
2016-05-11  9:15               ` [Qemu-devel] " Jike Song
2016-05-11 22:06               ` Alex Williamson
2016-05-11 22:06                 ` [Qemu-devel] " Alex Williamson
2016-05-12  4:11                 ` Jike Song
2016-05-12  4:11                   ` [Qemu-devel] " Jike Song
2016-05-12 19:49                   ` Neo Jia
2016-05-12 19:49                     ` [Qemu-devel] " Neo Jia
2016-05-13  2:41                     ` Tian, Kevin
2016-05-13  2:41                       ` [Qemu-devel] " Tian, Kevin
2016-05-13  6:22                       ` Jike Song
2016-05-13  6:22                         ` [Qemu-devel] " Jike Song
2016-05-13  6:43                         ` Neo Jia
2016-05-13  6:43                           ` [Qemu-devel] " Neo Jia
2016-05-13  7:30                           ` Jike Song
2016-05-13  7:30                             ` [Qemu-devel] " Jike Song
2016-05-13  7:42                             ` Neo Jia
2016-05-13  7:42                               ` [Qemu-devel] " Neo Jia
2016-05-13  7:45                               ` Tian, Kevin
2016-05-13  7:45                                 ` [Qemu-devel] " Tian, Kevin
2016-05-13  8:31                                 ` Neo Jia [this message]
2016-05-13  8:31                                   ` Neo Jia
2016-05-13  9:23                                   ` Jike Song
2016-05-13  9:23                                     ` [Qemu-devel] " Jike Song
2016-05-13 15:50                                     ` Neo Jia
2016-05-13 15:50                                       ` [Qemu-devel] " Neo Jia
2016-05-16  6:57                                       ` Jike Song
2016-05-16  6:57                                         ` [Qemu-devel] " Jike Song
2016-05-13  6:08                     ` Jike Song
2016-05-13  6:08                       ` [Qemu-devel] " Jike Song
2016-05-13  6:41                       ` Neo Jia
2016-05-13  6:41                         ` [Qemu-devel] " Neo Jia
2016-05-13  7:13                         ` Tian, Kevin
2016-05-13  7:13                           ` [Qemu-devel] " Tian, Kevin
2016-05-13  7:38                           ` Neo Jia
2016-05-13  7:38                             ` [Qemu-devel] " Neo Jia
2016-05-13  8:02                             ` Tian, Kevin
2016-05-13  8:02                               ` [Qemu-devel] " Tian, Kevin
2016-05-13  8:41                               ` Neo Jia
2016-05-13  8:41                                 ` [Qemu-devel] " Neo Jia
2016-05-12  8:00                 ` Tian, Kevin
2016-05-12  8:00                   ` [Qemu-devel] " Tian, Kevin
2016-05-12 19:05                   ` Alex Williamson
2016-05-12 19:05                     ` [Qemu-devel] " Alex Williamson
2016-05-12 20:12                     ` Neo Jia
2016-05-12 20:12                       ` [Qemu-devel] " Neo Jia
2016-05-13  9:46                       ` Jike Song
2016-05-13  9:46                         ` [Qemu-devel] " Jike Song
2016-05-13 15:48                         ` Neo Jia
2016-05-13 15:48                           ` [Qemu-devel] " Neo Jia
2016-05-16  2:27                           ` Jike Song
2016-05-16  2:27                             ` [Qemu-devel] " Jike Song
2016-05-13  3:55                     ` Tian, Kevin
2016-05-13  3:55                       ` [Qemu-devel] " Tian, Kevin
2016-05-13 16:16                       ` Alex Williamson
2016-05-13 16:16                         ` [Qemu-devel] " Alex Williamson
2016-05-13  7:10                     ` Dong Jia
2016-05-13  7:24                       ` Neo Jia
2016-05-13  8:39                         ` Dong Jia
2016-05-13  8:39                           ` [Qemu-devel] " Dong Jia
2016-05-13  9:05                           ` Neo Jia
2016-05-19  7:28                             ` Dong Jia
2016-05-20  3:21                               ` Tian, Kevin
2016-05-20  3:21                                 ` Tian, Kevin
2016-06-06  6:59                                 ` Dong Jia
2016-06-07  2:47                                   ` Tian, Kevin
2016-06-07  2:47                                     ` Tian, Kevin
2016-06-07  7:04                                     ` Dong Jia
2016-05-05  7:51     ` Kirti Wankhede
2016-05-05  7:51       ` [Qemu-devel] " Kirti Wankhede
2016-05-04  1:05 ` [RFC PATCH v3 0/3] Add vGPU support Tian, Kevin
2016-05-04  1:05   ` [Qemu-devel] " Tian, Kevin
2016-05-04  6:17   ` Neo Jia
2016-05-04  6:17     ` [Qemu-devel] " Neo Jia
2016-05-04 17:07     ` Alex Williamson
2016-05-04 17:07       ` [Qemu-devel] " Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160513083107.GC6162@nvidia.com \
    --to=cjia@nvidia.com \
    --cc=albcamus@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.