From: Jike Song <jike.song@intel.com>
To: Neo Jia <cjia@nvidia.com>
Cc: Jike Song <albcamus@gmail.com>,
Alex Williamson <alex.williamson@redhat.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Kirti Wankhede <kwankhede@nvidia.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"kraxel@redhat.com" <kraxel@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Ruan, Shuai" <shuai.ruan@intel.com>,
"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
Subject: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Fri, 13 May 2016 14:08:36 +0800 [thread overview]
Message-ID: <57356F64.1010406@intel.com> (raw)
In-Reply-To: <20160512194924.GA24334@nvidia.com>
On 05/13/2016 03:49 AM, Neo Jia wrote:
> On Thu, May 12, 2016 at 12:11:00PM +0800, Jike Song wrote:
>> On Thu, May 12, 2016 at 6:06 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>>> On Wed, 11 May 2016 17:15:15 +0800
>>> Jike Song <jike.song@intel.com> wrote:
>>>
>>>> On 05/11/2016 12:02 AM, Neo Jia wrote:
>>>>> On Tue, May 10, 2016 at 03:52:27PM +0800, Jike Song wrote:
>>>>>> On 05/05/2016 05:27 PM, Tian, Kevin wrote:
>>>>>>>> From: Song, Jike
>>>>>>>>
>>>>>>>> IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
>>>>>>>> hardware. It just, as you said in another mail, "rather than
>>>>>>>> programming them into an IOMMU for a device, it simply stores the
>>>>>>>> translations for use by later requests".
>>>>>>>>
>>>>>>>> That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
>>>>>>>> Otherwise, if IOMMU is present, the gfx driver eventually programs
>>>>>>>> the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
>>>>>>>> Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
>>>>>>>> translations without any knowledge about hardware IOMMU, how is the
>>>>>>>> device model supposed to do to get an IOVA for a given GPA (thereby HPA
>>>>>>>> by the IOMMU backend here)?
>>>>>>>>
>>>>>>>> If things go as guessed above, as vfio_pin_pages() indicates, it
>>>>>>>> pin & translate vaddr to PFN, then it will be very difficult for the
>>>>>>>> device model to figure out:
>>>>>>>>
>>>>>>>> 1, for a given GPA, how to avoid calling dma_map_page multiple times?
>>>>>>>> 2, for which page to call dma_unmap_page?
>>>>>>>>
>>>>>>>> --
>>>>>>>
>>>>>>> We have to support both w/ iommu and w/o iommu case, since
>>>>>>> that fact is out of GPU driver control. A simple way is to use
>>>>>>> dma_map_page which internally will cope with w/ and w/o iommu
>>>>>>> case gracefully, i.e. return HPA w/o iommu and IOVA w/ iommu.
>>>>>>> Then in this file we only need to cache GPA to whatever dmadr_t
>>>>>>> returned by dma_map_page.
>>>>>>>
>>>>>>
>>>>>> Hi Alex, Kirti and Neo, any thought on the IOMMU compatibility here?
>>>>>
>>>>> Hi Jike,
>>>>>
>>>>> With mediated passthru, you still can use hardware iommu, but more important
>>>>> that part is actually orthogonal to what we are discussing here as we will only
>>>>> cache the mapping between <gfn (iova if guest has iommu), (qemu) va>, once we
>>>>> have pinned pages later with the help of above info, you can map it into the
>>>>> proper iommu domain if the system has configured so.
>>>>>
>>>>
>>>> Hi Neo,
>>>>
>>>> Technically yes you can map a pfn into the proper IOMMU domain elsewhere,
>>>> but to find out whether a pfn was previously mapped or not, you have to
>>>> track it with another rbtree-alike data structure (the IOMMU driver simply
>>>> doesn't bother with tracking), that seems somehow duplicate with the vGPU
>>>> IOMMU backend we are discussing here.
>>>>
>>>> And it is also semantically correct for an IOMMU backend to handle both w/
>>>> and w/o an IOMMU hardware? :)
>>>
>>> A problem with the iommu doing the dma_map_page() though is for what
>>> device does it do this? In the mediated case the vfio infrastructure
>>> is dealing with a software representation of a device. For all we
>>> know that software model could transparently migrate from one physical
>>> GPU to another. There may not even be a physical device backing
>>> the mediated device. Those are details left to the vgpu driver itself.
>>>
>>
>> Great point :) Yes, I agree it's a bit intrusive to do the mapping for
>> a particular
>> pdev in an vGPU IOMMU BE.
>>
>>> Perhaps one possibility would be to allow the vgpu driver to register
>>> map and unmap callbacks. The unmap callback might provide the
>>> invalidation interface that we're so far missing. The combination of
>>> map and unmap callbacks might simplify the Intel approach of pinning the
>>> entire VM memory space, ie. for each map callback do a translation
>>> (pin) and dma_map_page, for each unmap do a dma_unmap_page and release
>>> the translation.
>>
>> Yes adding map/unmap ops in pGPU drvier (I assume you are refering to
>> gpu_device_ops as
>> implemented in Kirti's patch) sounds a good idea, satisfying both: 1)
>> keeping vGPU purely
>> virtual; 2) dealing with the Linux DMA API to achive hardware IOMMU
>> compatibility.
>>
>> PS, this has very little to do with pinning wholly or partially. Intel KVMGT has
>> once been had the whole guest memory pinned, only because we used a spinlock,
>> which can't sleep at runtime. We have removed that spinlock in our another
>> upstreaming effort, not here but for i915 driver, so probably no biggie.
>>
>
> OK, then you guys don't need to pin everything.
Yes :)
> The next question will be if you
> can send the pinning request from your mediated driver backend to request memory
> pinning like we have demonstrated in the v3 patch, function vfio_pin_pages and
> vfio_unpin_pages?
Kind of yes, not exactly.
IMO the mediated driver backend cares not only about pinning, but also the more
important translation. The vfio_pin_pages of v3 patch does the pinning and
translation simultaneously, whereas I do think the API is better named to
'translate' instead of 'pin', like v2 did.
We possibly have the same requirement from the mediate driver backend:
a) get a GFN, when guest try to tell hardware;
b) consult the vfio iommu with that GFN[1]: will you find me a proper dma_addr?
The vfio iommu backend search the tracking table with this GFN[1]:
c) if entry found, return the dma_addr;
d) if nothing found, call GUP to pin the page, and dma_map_page to get the dma_addr[2], return it;
The dma_addr will be told to real GPU hardware.
I can't simply say a 'Yes' here, since we may consult dma_addr for a GFN
multiple times, but only for the first time we need to pin the page.
IOW, pinning is kind of an internal action in the iommu backend.
//Sorry for the long, maybe boring explanation.. :)
[1] GFN or vaddr, no biggie
[2] As pointed out by Alex, dma_map_page can be called elsewhere like a callback.
--
Thanks,
Jike
next prev parent reply other threads:[~2016-05-13 6:09 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-02 18:40 [Qemu-devel] [RFC PATCH v3 0/3] Add vGPU support Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 1/3] vGPU Core driver Kirti Wankhede
2016-05-03 22:43 ` Alex Williamson
2016-05-04 2:45 ` Tian, Kevin
2016-05-04 16:57 ` Alex Williamson
2016-05-05 8:58 ` Tian, Kevin
2016-05-04 2:58 ` Tian, Kevin
2016-05-12 8:22 ` Tian, Kevin
2016-05-04 13:31 ` Kirti Wankhede
2016-05-05 9:06 ` Tian, Kevin
2016-05-05 10:44 ` Kirti Wankhede
2016-05-05 12:07 ` Tian, Kevin
2016-05-05 12:57 ` Kirti Wankhede
2016-05-11 6:37 ` Tian, Kevin
2016-05-06 12:14 ` Jike Song
2016-05-06 16:16 ` Kirti Wankhede
2016-05-09 12:12 ` Jike Song
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 2/3] VFIO driver for vGPU device Kirti Wankhede
2016-05-03 22:43 ` Alex Williamson
2016-05-04 3:23 ` Tian, Kevin
2016-05-04 17:06 ` Alex Williamson
2016-05-04 21:14 ` Neo Jia
2016-05-05 4:42 ` Kirti Wankhede
2016-05-05 9:24 ` Tian, Kevin
2016-05-05 20:27 ` Neo Jia
2016-05-11 6:45 ` Tian, Kevin
2016-05-11 20:10 ` Alex Williamson
2016-05-12 0:59 ` Tian, Kevin
2016-05-04 16:25 ` Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu Kirti Wankhede
2016-05-03 10:40 ` Jike Song
2016-05-03 22:43 ` Alex Williamson
2016-05-04 3:39 ` Tian, Kevin
2016-05-05 6:55 ` Jike Song
2016-05-05 9:27 ` Tian, Kevin
2016-05-10 7:52 ` Jike Song
2016-05-10 16:02 ` Neo Jia
2016-05-11 9:15 ` Jike Song
2016-05-11 22:06 ` Alex Williamson
2016-05-12 4:11 ` Jike Song
2016-05-12 19:49 ` Neo Jia
2016-05-13 2:41 ` Tian, Kevin
2016-05-13 6:22 ` Jike Song
2016-05-13 6:43 ` Neo Jia
2016-05-13 7:30 ` Jike Song
2016-05-13 7:42 ` Neo Jia
2016-05-13 7:45 ` Tian, Kevin
2016-05-13 8:31 ` Neo Jia
2016-05-13 9:23 ` Jike Song
2016-05-13 15:50 ` Neo Jia
2016-05-16 6:57 ` Jike Song
2016-05-13 6:08 ` Jike Song [this message]
2016-05-13 6:41 ` Neo Jia
2016-05-13 7:13 ` Tian, Kevin
2016-05-13 7:38 ` Neo Jia
2016-05-13 8:02 ` Tian, Kevin
2016-05-13 8:41 ` Neo Jia
2016-05-12 8:00 ` Tian, Kevin
2016-05-12 19:05 ` Alex Williamson
2016-05-12 20:12 ` Neo Jia
2016-05-13 9:46 ` Jike Song
2016-05-13 15:48 ` Neo Jia
2016-05-16 2:27 ` Jike Song
2016-05-13 3:55 ` Tian, Kevin
2016-05-13 16:16 ` Alex Williamson
2016-05-13 7:10 ` Dong Jia
2016-05-13 7:24 ` Neo Jia
2016-05-13 8:39 ` Dong Jia
2016-05-13 9:05 ` Neo Jia
2016-05-19 7:28 ` Dong Jia
2016-05-20 3:21 ` Tian, Kevin
2016-06-06 6:59 ` Dong Jia
2016-06-07 2:47 ` Tian, Kevin
2016-06-07 7:04 ` Dong Jia
2016-05-05 7:51 ` Kirti Wankhede
2016-05-04 1:05 ` [Qemu-devel] [RFC PATCH v3 0/3] Add vGPU support Tian, Kevin
2016-05-04 6:17 ` Neo Jia
2016-05-04 17:07 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57356F64.1010406@intel.com \
--to=jike.song@intel.com \
--cc=albcamus@gmail.com \
--cc=alex.williamson@redhat.com \
--cc=cjia@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shuai.ruan@intel.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).