All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, kevin.tian@intel.com,
	shuai.ruan@intel.com, zhiyuan.lv@intel.com
Subject: Re: [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Thu, 05 May 2016 14:55:46 +0800	[thread overview]
Message-ID: <572AEE72.90008@intel.com> (raw)
In-Reply-To: <20160503164306.6a699fe3@t450s.home>

On 05/04/2016 06:43 AM, Alex Williamson wrote:
> On Tue, 3 May 2016 00:10:41 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU.
>> + * @vaddr [in]: array of guest PFNs
>> + * @npage [in]: count of array elements
>> + * @prot [in] : protection flags
>> + * @pfn_base[out] : array of host PFNs
>> + */
>> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage,
>> +		   int prot, dma_addr_t *pfn_base)
>> +{
>> +	struct vfio_iommu *iommu = iommu_data;
>> +	struct vfio_domain *domain = NULL, *domain_vgpu = NULL;
>> +	int i = 0, ret = 0;
>> +	long retpage;
>> +	dma_addr_t remote_vaddr = 0;
>> +	dma_addr_t *pfn = pfn_base;
>> +	struct vfio_dma *dma;
>> +
>> +	if (!iommu || !pfn_base)
>> +		return -EINVAL;
>> +
>> +	if (list_empty(&iommu->domain_list)) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	get_first_domains(iommu, &domain, &domain_vgpu);
>> +
>> +	// Return error if vGPU domain doesn't exist
> 
> No c++ style comments please.
> 
>> +	if (!domain_vgpu) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	for (i = 0; i < npage; i++) {
>> +		struct vfio_vgpu_pfn *p;
>> +		struct vfio_vgpu_pfn *lpfn;
>> +		unsigned long tpfn;
>> +		dma_addr_t iova;
>> +
>> +		mutex_lock(&iommu->lock);
>> +
>> +		iova = vaddr[i] << PAGE_SHIFT;
>> +
>> +		dma = vfio_find_dma(iommu, iova, 0 /*  size */);
>> +		if (!dma) {
>> +			mutex_unlock(&iommu->lock);
>> +			ret = -EINVAL;
>> +			goto pin_done;
>> +		}
>> +
>> +		remote_vaddr = dma->vaddr + iova - dma->iova;
>> +
>> +		retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr,
>> +						  (long)1, prot, &tpfn);
>> +		mutex_unlock(&iommu->lock);
>> +		if (retpage <= 0) {
>> +			WARN_ON(!retpage);
>> +			ret = (int)retpage;
>> +			goto pin_done;
>> +		}
>> +
>> +		pfn[i] = tpfn;
>> +
>> +		mutex_lock(&domain_vgpu->lock);
>> +
>> +		// search if pfn exist
>> +		if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) {
>> +			atomic_inc(&p->ref_count);
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			continue;
>> +		}
> 
> The only reason I can come up with for why we'd want to integrate an
> api-only domain into the existing type1 code would be to avoid page
> accounting issues where we count locked pages once for a normal
> assigned device and again for a vgpu, but that's not what we're doing
> here.  We're not only locking the pages again regardless of them
> already being locked, we're counting every time we lock them through
> this new interface.  So there's really no point at all to making type1
> become this unsupportable.  In that case we should be pulling out the
> common code that we want to share from type1 and making a new type1
> compatible vfio iommu backend rather than conditionalizing everything
> here.
> 
>> +
>> +		// add to pfn_list
>> +		lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL);
>> +		if (!lpfn) {
>> +			ret = -ENOMEM;
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			goto pin_done;
>> +		}
>> +		lpfn->vmm_va = remote_vaddr;
>> +		lpfn->iova = iova;
>> +		lpfn->pfn = pfn[i];
>> +		lpfn->npage = 1;
>> +		lpfn->prot = prot;
>> +		atomic_inc(&lpfn->ref_count);
>> +		vfio_link_vgpu_pfn(domain_vgpu, lpfn);
>> +		mutex_unlock(&domain_vgpu->lock);
>> +	}
>> +
>> +	ret = i;
>> +
>> +pin_done:
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +

IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
hardware. It just, as you said in another mail, "rather than
programming them into an IOMMU for a device, it simply stores the
translations for use by later requests".

That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
Otherwise, if IOMMU is present, the gfx driver eventually programs
the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
translations without any knowledge about hardware IOMMU, how is the
device model supposed to do to get an IOVA for a given GPA (thereby HPA
by the IOMMU backend here)?

If things go as guessed above, as vfio_pin_pages() indicates, it
pin & translate vaddr to PFN, then it will be very difficult for the
device model to figure out:

	1, for a given GPA, how to avoid calling dma_map_page multiple times?
	2, for which page to call dma_unmap_page?

--
Thanks,
Jike


WARNING: multiple messages have this Message-ID (diff)
From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, kevin.tian@intel.com,
	shuai.ruan@intel.com, zhiyuan.lv@intel.com
Subject: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date: Thu, 05 May 2016 14:55:46 +0800	[thread overview]
Message-ID: <572AEE72.90008@intel.com> (raw)
In-Reply-To: <20160503164306.6a699fe3@t450s.home>

On 05/04/2016 06:43 AM, Alex Williamson wrote:
> On Tue, 3 May 2016 00:10:41 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU.
>> + * @vaddr [in]: array of guest PFNs
>> + * @npage [in]: count of array elements
>> + * @prot [in] : protection flags
>> + * @pfn_base[out] : array of host PFNs
>> + */
>> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage,
>> +		   int prot, dma_addr_t *pfn_base)
>> +{
>> +	struct vfio_iommu *iommu = iommu_data;
>> +	struct vfio_domain *domain = NULL, *domain_vgpu = NULL;
>> +	int i = 0, ret = 0;
>> +	long retpage;
>> +	dma_addr_t remote_vaddr = 0;
>> +	dma_addr_t *pfn = pfn_base;
>> +	struct vfio_dma *dma;
>> +
>> +	if (!iommu || !pfn_base)
>> +		return -EINVAL;
>> +
>> +	if (list_empty(&iommu->domain_list)) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	get_first_domains(iommu, &domain, &domain_vgpu);
>> +
>> +	// Return error if vGPU domain doesn't exist
> 
> No c++ style comments please.
> 
>> +	if (!domain_vgpu) {
>> +		ret = -EINVAL;
>> +		goto pin_done;
>> +	}
>> +
>> +	for (i = 0; i < npage; i++) {
>> +		struct vfio_vgpu_pfn *p;
>> +		struct vfio_vgpu_pfn *lpfn;
>> +		unsigned long tpfn;
>> +		dma_addr_t iova;
>> +
>> +		mutex_lock(&iommu->lock);
>> +
>> +		iova = vaddr[i] << PAGE_SHIFT;
>> +
>> +		dma = vfio_find_dma(iommu, iova, 0 /*  size */);
>> +		if (!dma) {
>> +			mutex_unlock(&iommu->lock);
>> +			ret = -EINVAL;
>> +			goto pin_done;
>> +		}
>> +
>> +		remote_vaddr = dma->vaddr + iova - dma->iova;
>> +
>> +		retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr,
>> +						  (long)1, prot, &tpfn);
>> +		mutex_unlock(&iommu->lock);
>> +		if (retpage <= 0) {
>> +			WARN_ON(!retpage);
>> +			ret = (int)retpage;
>> +			goto pin_done;
>> +		}
>> +
>> +		pfn[i] = tpfn;
>> +
>> +		mutex_lock(&domain_vgpu->lock);
>> +
>> +		// search if pfn exist
>> +		if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) {
>> +			atomic_inc(&p->ref_count);
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			continue;
>> +		}
> 
> The only reason I can come up with for why we'd want to integrate an
> api-only domain into the existing type1 code would be to avoid page
> accounting issues where we count locked pages once for a normal
> assigned device and again for a vgpu, but that's not what we're doing
> here.  We're not only locking the pages again regardless of them
> already being locked, we're counting every time we lock them through
> this new interface.  So there's really no point at all to making type1
> become this unsupportable.  In that case we should be pulling out the
> common code that we want to share from type1 and making a new type1
> compatible vfio iommu backend rather than conditionalizing everything
> here.
> 
>> +
>> +		// add to pfn_list
>> +		lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL);
>> +		if (!lpfn) {
>> +			ret = -ENOMEM;
>> +			mutex_unlock(&domain_vgpu->lock);
>> +			goto pin_done;
>> +		}
>> +		lpfn->vmm_va = remote_vaddr;
>> +		lpfn->iova = iova;
>> +		lpfn->pfn = pfn[i];
>> +		lpfn->npage = 1;
>> +		lpfn->prot = prot;
>> +		atomic_inc(&lpfn->ref_count);
>> +		vfio_link_vgpu_pfn(domain_vgpu, lpfn);
>> +		mutex_unlock(&domain_vgpu->lock);
>> +	}
>> +
>> +	ret = i;
>> +
>> +pin_done:
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +

IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
hardware. It just, as you said in another mail, "rather than
programming them into an IOMMU for a device, it simply stores the
translations for use by later requests".

That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
Otherwise, if IOMMU is present, the gfx driver eventually programs
the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
translations without any knowledge about hardware IOMMU, how is the
device model supposed to do to get an IOVA for a given GPA (thereby HPA
by the IOMMU backend here)?

If things go as guessed above, as vfio_pin_pages() indicates, it
pin & translate vaddr to PFN, then it will be very difficult for the
device model to figure out:

	1, for a given GPA, how to avoid calling dma_map_page multiple times?
	2, for which page to call dma_unmap_page?

--
Thanks,
Jike

  parent reply	other threads:[~2016-05-05  6:56 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 18:40 [RFC PATCH v3 0/3] Add vGPU support Kirti Wankhede
2016-05-02 18:40 ` [Qemu-devel] " Kirti Wankhede
2016-05-02 18:40 ` [RFC PATCH v3 1/3] vGPU Core driver Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  2:45     ` Tian, Kevin
2016-05-04  2:45       ` [Qemu-devel] " Tian, Kevin
2016-05-04 16:57       ` Alex Williamson
2016-05-04 16:57         ` [Qemu-devel] " Alex Williamson
2016-05-05  8:58         ` Tian, Kevin
2016-05-05  8:58           ` [Qemu-devel] " Tian, Kevin
2016-05-04  2:58     ` Tian, Kevin
2016-05-04  2:58       ` [Qemu-devel] " Tian, Kevin
2016-05-12  8:22       ` Tian, Kevin
2016-05-12  8:22         ` [Qemu-devel] " Tian, Kevin
2016-05-04 13:31     ` Kirti Wankhede
2016-05-04 13:31       ` [Qemu-devel] " Kirti Wankhede
2016-05-05  9:06       ` Tian, Kevin
2016-05-05  9:06         ` [Qemu-devel] " Tian, Kevin
2016-05-05 10:44         ` Kirti Wankhede
2016-05-05 10:44           ` [Qemu-devel] " Kirti Wankhede
2016-05-05 12:07           ` Tian, Kevin
2016-05-05 12:07             ` [Qemu-devel] " Tian, Kevin
2016-05-05 12:57             ` Kirti Wankhede
2016-05-05 12:57               ` [Qemu-devel] " Kirti Wankhede
2016-05-11  6:37               ` Tian, Kevin
2016-05-11  6:37                 ` [Qemu-devel] " Tian, Kevin
2016-05-06 12:14         ` Jike Song
2016-05-06 12:14           ` [Qemu-devel] " Jike Song
2016-05-06 16:16           ` Kirti Wankhede
2016-05-06 16:16             ` [Qemu-devel] " Kirti Wankhede
2016-05-09 12:12             ` Jike Song
2016-05-09 12:12               ` [Qemu-devel] " Jike Song
2016-05-02 18:40 ` [RFC PATCH v3 2/3] VFIO driver for vGPU device Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  3:23     ` Tian, Kevin
2016-05-04  3:23       ` [Qemu-devel] " Tian, Kevin
2016-05-04 17:06       ` Alex Williamson
2016-05-04 17:06         ` [Qemu-devel] " Alex Williamson
2016-05-04 21:14         ` Neo Jia
2016-05-04 21:14           ` [Qemu-devel] " Neo Jia
2016-05-05  4:42           ` Kirti Wankhede
2016-05-05  4:42             ` [Qemu-devel] " Kirti Wankhede
2016-05-05  9:24         ` Tian, Kevin
2016-05-05  9:24           ` [Qemu-devel] " Tian, Kevin
2016-05-05 20:27           ` Neo Jia
2016-05-05 20:27             ` [Qemu-devel] " Neo Jia
2016-05-11  6:45         ` Tian, Kevin
2016-05-11  6:45           ` [Qemu-devel] " Tian, Kevin
2016-05-11 20:10           ` Alex Williamson
2016-05-11 20:10             ` [Qemu-devel] " Alex Williamson
2016-05-12  0:59             ` Tian, Kevin
2016-05-12  0:59               ` [Qemu-devel] " Tian, Kevin
2016-05-04 16:25     ` Kirti Wankhede
2016-05-04 16:25       ` Kirti Wankhede
2016-05-02 18:40 ` [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu Kirti Wankhede
2016-05-02 18:40   ` [Qemu-devel] " Kirti Wankhede
2016-05-03 10:40   ` Jike Song
2016-05-03 10:40     ` [Qemu-devel] " Jike Song
2016-05-03 22:43   ` Alex Williamson
2016-05-03 22:43     ` [Qemu-devel] " Alex Williamson
2016-05-04  3:39     ` Tian, Kevin
2016-05-04  3:39       ` [Qemu-devel] " Tian, Kevin
2016-05-05  6:55     ` Jike Song [this message]
2016-05-05  6:55       ` Jike Song
2016-05-05  9:27       ` Tian, Kevin
2016-05-05  9:27         ` [Qemu-devel] " Tian, Kevin
2016-05-10  7:52         ` Jike Song
2016-05-10  7:52           ` [Qemu-devel] " Jike Song
2016-05-10 16:02           ` Neo Jia
2016-05-10 16:02             ` [Qemu-devel] " Neo Jia
2016-05-11  9:15             ` Jike Song
2016-05-11  9:15               ` [Qemu-devel] " Jike Song
2016-05-11 22:06               ` Alex Williamson
2016-05-11 22:06                 ` [Qemu-devel] " Alex Williamson
2016-05-12  4:11                 ` Jike Song
2016-05-12  4:11                   ` [Qemu-devel] " Jike Song
2016-05-12 19:49                   ` Neo Jia
2016-05-12 19:49                     ` [Qemu-devel] " Neo Jia
2016-05-13  2:41                     ` Tian, Kevin
2016-05-13  2:41                       ` [Qemu-devel] " Tian, Kevin
2016-05-13  6:22                       ` Jike Song
2016-05-13  6:22                         ` [Qemu-devel] " Jike Song
2016-05-13  6:43                         ` Neo Jia
2016-05-13  6:43                           ` [Qemu-devel] " Neo Jia
2016-05-13  7:30                           ` Jike Song
2016-05-13  7:30                             ` [Qemu-devel] " Jike Song
2016-05-13  7:42                             ` Neo Jia
2016-05-13  7:42                               ` [Qemu-devel] " Neo Jia
2016-05-13  7:45                               ` Tian, Kevin
2016-05-13  7:45                                 ` [Qemu-devel] " Tian, Kevin
2016-05-13  8:31                                 ` Neo Jia
2016-05-13  8:31                                   ` [Qemu-devel] " Neo Jia
2016-05-13  9:23                                   ` Jike Song
2016-05-13  9:23                                     ` [Qemu-devel] " Jike Song
2016-05-13 15:50                                     ` Neo Jia
2016-05-13 15:50                                       ` [Qemu-devel] " Neo Jia
2016-05-16  6:57                                       ` Jike Song
2016-05-16  6:57                                         ` [Qemu-devel] " Jike Song
2016-05-13  6:08                     ` Jike Song
2016-05-13  6:08                       ` [Qemu-devel] " Jike Song
2016-05-13  6:41                       ` Neo Jia
2016-05-13  6:41                         ` [Qemu-devel] " Neo Jia
2016-05-13  7:13                         ` Tian, Kevin
2016-05-13  7:13                           ` [Qemu-devel] " Tian, Kevin
2016-05-13  7:38                           ` Neo Jia
2016-05-13  7:38                             ` [Qemu-devel] " Neo Jia
2016-05-13  8:02                             ` Tian, Kevin
2016-05-13  8:02                               ` [Qemu-devel] " Tian, Kevin
2016-05-13  8:41                               ` Neo Jia
2016-05-13  8:41                                 ` [Qemu-devel] " Neo Jia
2016-05-12  8:00                 ` Tian, Kevin
2016-05-12  8:00                   ` [Qemu-devel] " Tian, Kevin
2016-05-12 19:05                   ` Alex Williamson
2016-05-12 19:05                     ` [Qemu-devel] " Alex Williamson
2016-05-12 20:12                     ` Neo Jia
2016-05-12 20:12                       ` [Qemu-devel] " Neo Jia
2016-05-13  9:46                       ` Jike Song
2016-05-13  9:46                         ` [Qemu-devel] " Jike Song
2016-05-13 15:48                         ` Neo Jia
2016-05-13 15:48                           ` [Qemu-devel] " Neo Jia
2016-05-16  2:27                           ` Jike Song
2016-05-16  2:27                             ` [Qemu-devel] " Jike Song
2016-05-13  3:55                     ` Tian, Kevin
2016-05-13  3:55                       ` [Qemu-devel] " Tian, Kevin
2016-05-13 16:16                       ` Alex Williamson
2016-05-13 16:16                         ` [Qemu-devel] " Alex Williamson
2016-05-13  7:10                     ` Dong Jia
2016-05-13  7:24                       ` Neo Jia
2016-05-13  8:39                         ` Dong Jia
2016-05-13  8:39                           ` [Qemu-devel] " Dong Jia
2016-05-13  9:05                           ` Neo Jia
2016-05-19  7:28                             ` Dong Jia
2016-05-20  3:21                               ` Tian, Kevin
2016-05-20  3:21                                 ` Tian, Kevin
2016-06-06  6:59                                 ` Dong Jia
2016-06-07  2:47                                   ` Tian, Kevin
2016-06-07  2:47                                     ` Tian, Kevin
2016-06-07  7:04                                     ` Dong Jia
2016-05-05  7:51     ` Kirti Wankhede
2016-05-05  7:51       ` [Qemu-devel] " Kirti Wankhede
2016-05-04  1:05 ` [RFC PATCH v3 0/3] Add vGPU support Tian, Kevin
2016-05-04  1:05   ` [Qemu-devel] " Tian, Kevin
2016-05-04  6:17   ` Neo Jia
2016-05-04  6:17     ` [Qemu-devel] " Neo Jia
2016-05-04 17:07     ` Alex Williamson
2016-05-04 17:07       ` [Qemu-devel] " Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=572AEE72.90008@intel.com \
    --to=jike.song@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.