Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, kevin.tian@intel.com,
	shuai.ruan@intel.com, zhiyuan.lv@intel.com
Subject: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Thu, 05 May 2016 14:55:46 +0800	[thread overview]
Message-ID: <572AEE72.90008@intel.com> (raw)
In-Reply-To: <20160503164306.6a699fe3@t450s.home>

On 05/04/2016 06:43 AM, Alex Williamson wrote:
> On Tue, 3 May 2016 00:10:41 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU.
>> + * @vaddr [in]: array of guest PFNs
>> + * @npage [in]: count of array elements
>> + * @prot [in] : protection flags
>> + * @pfn_base[out] : array of host PFNs
>> + */
>> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage,
>> +		   int prot, dma_addr_t *pfn_base)
>> +{
>> +	struct vfio_iommu *iommu = iommu_data;
>> +	struct vfio_domain *domain = NULL, *domain_vgpu = NULL;
>> +	int i = 0, ret = 0;
>> +	long retpage;
>> +	dma_addr_t remote_vaddr = 0;
>> +	dma_addr_t *pfn = pfn_base;
>> +	struct vfio_dma *dma;
>> +
>> +	if (!iommu || !pfn_base)
>> +		return -EINVAL;
>> +
>> +	if (list_empty(&iommu->domain_list)) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	get_first_domains(iommu, &domain, &domain_vgpu);
>> +
>> +	// Return error if vGPU domain doesn't exist
> 
> No c++ style comments please.
> 
>> +	if (!domain_vgpu) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	for (i = 0; i < npage; i++) {
>> +		struct vfio_vgpu_pfn *p;
>> +		struct vfio_vgpu_pfn *lpfn;
>> +		unsigned long tpfn;
>> +		dma_addr_t iova;
>> +
>> +		mutex_lock(&iommu->lock);
>> +
>> +		iova = vaddr[i] << PAGE_SHIFT;
>> +
>> +		dma = vfio_find_dma(iommu, iova, 0 /*  size */);
>> +		if (!dma) {
>> +			mutex_unlock(&iommu->lock);
>> +			ret = -EINVAL;
>> +			goto pin_done;
>> +		}
>> +
>> +		remote_vaddr = dma->vaddr + iova - dma->iova;
>> +
>> +		retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr,
>> +						  (long)1, prot, &tpfn);
>> +		mutex_unlock(&iommu->lock);
>> +		if (retpage <= 0) {
>> +			WARN_ON(!retpage);
>> +			ret = (int)retpage;
>> +			goto pin_done;
>> +		}
>> +
>> +		pfn[i] = tpfn;
>> +
>> +		mutex_lock(&domain_vgpu->lock);
>> +
>> +		// search if pfn exist
>> +		if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) {
>> +			atomic_inc(&p->ref_count);
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			continue;
>> +		}
> 
> The only reason I can come up with for why we'd want to integrate an
> api-only domain into the existing type1 code would be to avoid page
> accounting issues where we count locked pages once for a normal
> assigned device and again for a vgpu, but that's not what we're doing
> here.  We're not only locking the pages again regardless of them
> already being locked, we're counting every time we lock them through
> this new interface.  So there's really no point at all to making type1
> become this unsupportable.  In that case we should be pulling out the
> common code that we want to share from type1 and making a new type1
> compatible vfio iommu backend rather than conditionalizing everything
> here.
> 
>> +
>> +		// add to pfn_list
>> +		lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL);
>> +		if (!lpfn) {
>> +			ret = -ENOMEM;
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			goto pin_done;
>> +		}
>> +		lpfn->vmm_va = remote_vaddr;
>> +		lpfn->iova = iova;
>> +		lpfn->pfn = pfn[i];
>> +		lpfn->npage = 1;
>> +		lpfn->prot = prot;
>> +		atomic_inc(&lpfn->ref_count);
>> +		vfio_link_vgpu_pfn(domain_vgpu, lpfn);
>> +		mutex_unlock(&domain_vgpu->lock);
>> +	}
>> +
>> +	ret = i;
>> +
>> +pin_done:
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +

IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
hardware. It just, as you said in another mail, "rather than
programming them into an IOMMU for a device, it simply stores the
translations for use by later requests".

That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
Otherwise, if IOMMU is present, the gfx driver eventually programs
the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
translations without any knowledge about hardware IOMMU, how is the
device model supposed to do to get an IOVA for a given GPA (thereby HPA
by the IOMMU backend here)?

If things go as guessed above, as vfio_pin_pages() indicates, it
pin & translate vaddr to PFN, then it will be very difficult for the
device model to figure out:

	1, for a given GPA, how to avoid calling dma_map_page multiple times?
	2, for which page to call dma_unmap_page?

--
Thanks,
Jike

next prev parent reply	other threads:[~2016-05-05  6:57 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 18:40 [Qemu-devel] [RFC PATCH v3 0/3] Add vGPU support Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 1/3] vGPU Core driver Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-04  2:45     ` Tian, Kevin
2016-05-04 16:57       ` Alex Williamson
2016-05-05  8:58         ` Tian, Kevin
2016-05-04  2:58     ` Tian, Kevin
2016-05-12  8:22       ` Tian, Kevin
2016-05-04 13:31     ` Kirti Wankhede
2016-05-05  9:06       ` Tian, Kevin
2016-05-05 10:44         ` Kirti Wankhede
2016-05-05 12:07           ` Tian, Kevin
2016-05-05 12:57             ` Kirti Wankhede
2016-05-11  6:37               ` Tian, Kevin
2016-05-06 12:14         ` Jike Song
2016-05-06 16:16           ` Kirti Wankhede
2016-05-09 12:12             ` Jike Song
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 2/3] VFIO driver for vGPU device Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-04  3:23     ` Tian, Kevin
2016-05-04 17:06       ` Alex Williamson
2016-05-04 21:14         ` Neo Jia
2016-05-05  4:42           ` Kirti Wankhede
2016-05-05  9:24         ` Tian, Kevin
2016-05-05 20:27           ` Neo Jia
2016-05-11  6:45         ` Tian, Kevin
2016-05-11 20:10           ` Alex Williamson
2016-05-12  0:59             ` Tian, Kevin
2016-05-04 16:25     ` Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu Kirti Wankhede
2016-05-03 10:40   ` Jike Song
2016-05-03 22:43   ` Alex Williamson
2016-05-04  3:39     ` Tian, Kevin
2016-05-05  6:55     ` Jike Song [this message]
2016-05-05  9:27       ` Tian, Kevin
2016-05-10  7:52         ` Jike Song
2016-05-10 16:02           ` Neo Jia
2016-05-11  9:15             ` Jike Song
2016-05-11 22:06               ` Alex Williamson
2016-05-12  4:11                 ` Jike Song
2016-05-12 19:49                   ` Neo Jia
2016-05-13  2:41                     ` Tian, Kevin
2016-05-13  6:22                       ` Jike Song
2016-05-13  6:43                         ` Neo Jia
2016-05-13  7:30                           ` Jike Song
2016-05-13  7:42                             ` Neo Jia
2016-05-13  7:45                               ` Tian, Kevin
2016-05-13  8:31                                 ` Neo Jia
2016-05-13  9:23                                   ` Jike Song
2016-05-13 15:50                                     ` Neo Jia
2016-05-16  6:57                                       ` Jike Song
2016-05-13  6:08                     ` Jike Song
2016-05-13  6:41                       ` Neo Jia
2016-05-13  7:13                         ` Tian, Kevin
2016-05-13  7:38                           ` Neo Jia
2016-05-13  8:02                             ` Tian, Kevin
2016-05-13  8:41                               ` Neo Jia
2016-05-12  8:00                 ` Tian, Kevin
2016-05-12 19:05                   ` Alex Williamson
2016-05-12 20:12                     ` Neo Jia
2016-05-13  9:46                       ` Jike Song
2016-05-13 15:48                         ` Neo Jia
2016-05-16  2:27                           ` Jike Song
2016-05-13  3:55                     ` Tian, Kevin
2016-05-13 16:16                       ` Alex Williamson
2016-05-13  7:10                     ` Dong Jia
2016-05-13  7:24                       ` Neo Jia
2016-05-13  8:39                         ` Dong Jia
2016-05-13  9:05                           ` Neo Jia
2016-05-19  7:28                             ` Dong Jia
2016-05-20  3:21                               ` Tian, Kevin
2016-06-06  6:59                                 ` Dong Jia
2016-06-07  2:47                                   ` Tian, Kevin
2016-06-07  7:04                                     ` Dong Jia
2016-05-05  7:51     ` Kirti Wankhede
2016-05-04  1:05 ` [Qemu-devel] [RFC PATCH v3 0/3] Add vGPU support Tian, Kevin
2016-05-04  6:17   ` Neo Jia
2016-05-04 17:07     ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=572AEE72.90008@intel.com \
    --to=jike.song@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).