kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jike Song <jike.song@intel.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: alex.williamson@redhat.com, pbonzini@redhat.com,
	kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org,
	kvm@vger.kernel.org, kevin.tian@intel.com,
	bjsdjshi@linux.vnet.ibm.com, "Xiao,
	Guangrong" <guangrong.xiao@intel.com>
Subject: Re: [PATCH v7 3/4] vfio iommu: Add support for mediated devices
Date: Sat, 08 Oct 2016 15:09:15 +0800	[thread overview]
Message-ID: <57F89B9B.7000508@intel.com> (raw)
In-Reply-To: <690c3679-4084-a8e2-6ca7-2d615768e802@nvidia.com>

On 09/30/2016 07:44 PM, Kirti Wankhede wrote:
> On 9/30/2016 8:40 AM, Jike Song wrote:
>> On 09/30/2016 10:58 AM, Jike Song wrote:
>>> On 09/29/2016 11:06 PM, Kirti Wankhede wrote:
>>>>
>>>>
>>>> On 9/29/2016 7:47 AM, Jike Song wrote:
>>>>> +Guangrong
>>>>>
>>>>> On 08/25/2016 11:53 AM, Kirti Wankhede wrote:
>>>>
>>>> ...
>>>>
>>>>>> +static long vfio_iommu_type1_pin_pages(void *iommu_data,
>>>>>> +				       unsigned long *user_pfn,
>>>>>> +				       long npage, int prot,
>>>>>> +				       unsigned long *phys_pfn)
>>>>>> +{
>>>>>> +	struct vfio_iommu *iommu = iommu_data;
>>>>>> +	struct vfio_domain *domain;
>>>>>> +	int i, j, ret;
>>>>>> +	long retpage;
>>>>>> +	unsigned long remote_vaddr;
>>>>>> +	unsigned long *pfn = phys_pfn;
>>>>>> +	struct vfio_dma *dma;
>>>>>> +	bool do_accounting = false;
>>>>>> +
>>>>>> +	if (!iommu || !user_pfn || !phys_pfn)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	mutex_lock(&iommu->lock);
>>>>>> +
>>>>>> +	if (!iommu->local_domain) {
>>>>>> +		ret = -EINVAL;
>>>>>> +		goto pin_done;
>>>>>> +	}
>>>>>> +
>>>>>> +	domain = iommu->local_domain;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * If iommu capable domain exist in the container then all pages are
>>>>>> +	 * already pinned and accounted. Accouting should be done if there is no
>>>>>> +	 * iommu capable domain in the container.
>>>>>> +	 */
>>>>>> +	do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu);
>>>>>> +
>>>>>> +	for (i = 0; i < npage; i++) {
>>>>>> +		struct vfio_pfn *p;
>>>>>> +		dma_addr_t iova;
>>>>>> +
>>>>>> +		iova = user_pfn[i] << PAGE_SHIFT;
>>>>>> +
>>>>>> +		dma = vfio_find_dma(iommu, iova, 0);
>>>>>> +		if (!dma) {
>>>>>> +			ret = -EINVAL;
>>>>>> +			goto pin_unwind;
>>>>>> +		}
>>>>>> +
>>>>>> +		remote_vaddr = dma->vaddr + iova - dma->iova;
>>>>>> +
>>>>>> +		retpage = __vfio_pin_pages_local(domain, remote_vaddr, prot,
>>>>>> +						 &pfn[i], do_accounting);
>>>>>
>>>>> Hi Kirti,
>>>>>
>>>>> Here you call __vfio_pin_pages_local() > vaddr_get_pfn() > GUP regardless
>>>>> whether the vaddr already pinned or not. That probably means, if the caller 
>>>>> calls vfio_pin_pages() with a GPA for multiple times, you get memory leaks.
>>>>>
>>>>> GUP always increases the page refcnt.
>>>>>
>>>>> FWIW, I would like to have the pfn_list_lock implemented with key == iova,
>>>>> so you can always try to find the PFN for a given iova, and pin it only if
>>>>> not found.
>>>>>
>>>>
>>>> I didn't get how there would be a memory leak.
>>>>
>>>> Right, GUP increases refcnt, so if vfio_pin_pages() is called for
>>>> multiple types for same GPA, refcnt would be incremented. In
>>>> vfio_iommu_type1_pin_pages() pinned pages list is maintained with
>>>> ref_count. If pfn is already in list, ref_count is incremented and same
>>>> is used while unpining pages.
>>>>
>>>
>>> Let's have a close look at vfio_unpin_pfn:
>>>
>>> 	static int vfio_unpin_pfn(struct vfio_domain *domain,
>>> 				  struct vfio_pfn *vpfn, bool do_accounting)
>>> 	{
>>> 		__vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot,
>>> 					    do_accounting);
>>>
>>> 		if (atomic_dec_and_test(&vpfn->ref_count))
>>> 			vfio_remove_from_pfn_list(domain, vpfn);
>>>
>>> 		return 1;
>>> 	}
>>>
>>> Here you didn't call __vfio_unpin_pages_for_mdev -- thereby put_page -- for
>>> vpfn->ref_count times. If page->_refcount increased by GUP for (N) times, here
>>> you only set it back to (N-1).
>>>
> 
> User of vfio_pin_pages() should call vfio_unpin_pages() also,  so here
> we unpin it once. If vfio_pin_pages() is called twice for same page, we
> should get vfio_unpin_pages() twice for same page.
>

If this is the deliberate design, why do you need a 'ref_count'? You can
simply drop the 'ref_count' and blame the caller for pinning/unpinning
different times.

> If users of these APIs don't follow this, then
> vfio_release_domain() -> vfio_local_unpin_all() takes care of unpin,
> decrement ref_count and delete node on (ref_count == 0) for all
> remaining pfn.
>

Here you did pay attention to the "caller doesn't follow this" situation.
However, dealing with 'ref_count' in vfio-iommu is not enough: memory
leaked.

>>
>> What's more, since all pinned {iova, pfni} already saved, it's better to
>> consult it before calling GUP, which will get_page() unconditionally.
>
> pfn is required to unpin page, so we have pfn as key for rbtree.
> vfio_pin_pages() is called with user_pfn or iova, which can't be used to
> search in rbtree with iova in optimized way. Raw way would be to goto
> each node of rbtree and check iova which would hamper the performance in
> if this is called in performance critical path.
> So here optimized way is to first pin it, get pfn and check if already
> exist in rbtree. If it exist increment ref_count else add it to the rbtree.
> 

Of course pfn is required to unpin page, I 100% agree. But that doesn't 
change the argue: using iova instead for key, you can still store pfn
along with it.

By the way, calling GUP unconditionally hurts more than searching rbtree.


--
Thanks,
Jike


  reply	other threads:[~2016-10-08  7:11 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25  3:53 [PATCH v7 0/4] Add Mediated device support Kirti Wankhede
2016-08-25  3:53 ` [PATCH v7 1/4] vfio: Mediated device Core driver Kirti Wankhede
2016-09-08  8:09   ` Jike Song
2016-09-08  9:38     ` Neo Jia
2016-09-09  6:26       ` Jike Song
2016-09-09 17:48     ` Kirti Wankhede
2016-09-09 18:42       ` Alex Williamson
2016-09-09 19:55         ` Kirti Wankhede
2016-09-12  5:10           ` Jike Song
2016-09-12  7:49             ` Kirti Wankhede
2016-09-12 15:53               ` Alex Williamson
2016-09-19  7:08                 ` Jike Song
2016-09-19 17:29                 ` Kirti Wankhede
2016-09-19 18:11                   ` Alex Williamson
2016-09-19 20:09                     ` Kirti Wankhede
2016-09-19 20:59                       ` [Qemu-devel] " Alex Williamson
2016-09-20 12:48   ` Jike Song
2016-08-25  3:53 ` [PATCH v7 2/4] vfio: VFIO driver for mediated devices Kirti Wankhede
2016-08-25  9:22   ` Dong Jia
2016-08-26 14:13     ` Kirti Wankhede
2016-09-08  2:38       ` Jike Song
2016-09-19 18:22       ` [Qemu-devel] " Kirti Wankhede
2016-09-19 18:36         ` Alex Williamson
2016-09-19 19:13           ` Kirti Wankhede
2016-09-19 20:03             ` Alex Williamson
2016-09-20  2:50               ` Jike Song
2016-09-20 16:24                 ` Alex Williamson
2016-09-21  3:19                   ` Jike Song
2016-09-21  4:51                     ` Alex Williamson
2016-09-21  5:02                       ` Jike Song
2016-09-08  2:45     ` Jike Song
2016-09-13  2:35       ` Jike Song
2016-09-20  5:48         ` Dong Jia Shi
     [not found]         ` <20160920054851.GA2186@bjsdjshi@linux.vnet.ibm.com>
2016-09-20  6:37           ` Jike Song
2016-09-20 12:53   ` Jike Song
2016-08-25  3:53 ` [PATCH v7 3/4] vfio iommu: Add support " Kirti Wankhede
2016-08-25  7:29   ` Dong Jia
2016-08-26 13:50     ` Kirti Wankhede
2016-09-29  2:17   ` Jike Song
2016-09-29 15:06     ` Kirti Wankhede
2016-09-30  2:58       ` Jike Song
2016-09-30  3:10         ` Jike Song
2016-09-30 11:44           ` Kirti Wankhede
2016-10-08  7:09             ` Jike Song [this message]
2016-08-25  3:53 ` [PATCH v7 4/4] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-09-03 16:40   ` Kirti Wankhede
2016-08-30 16:16 ` [PATCH v7 0/4] Add Mediated device support Alex Williamson
2016-08-31  6:12   ` Tian, Kevin
2016-08-31  7:04     ` Jike Song
2016-08-31 15:48       ` Alex Williamson
2016-09-01  4:09         ` Tian, Kevin
2016-09-01  4:10         ` Tian, Kevin
2016-09-01 18:22         ` Kirti Wankhede
2016-09-01 20:01           ` Alex Williamson
2016-09-02  6:17             ` Kirti Wankhede
2016-09-01 16:47     ` [Qemu-devel] " Michal Privoznik
2016-09-01 16:59       ` Alex Williamson
2016-09-02  4:48         ` [Qemu-devel] " Michal Privoznik
2016-09-02  5:21           ` Kirti Wankhede
2016-09-02 10:05             ` Paolo Bonzini
2016-09-02 17:15               ` Kirti Wankhede
2016-09-02 17:25                 ` Paolo Bonzini
2016-09-02 18:33                   ` Kirti Wankhede
2016-09-02 20:29                     ` [libvirt] " John Ferlan
2016-09-03 16:31                       ` [libvirt] " Kirti Wankhede
2016-09-06 17:54                         ` [libvirt] [Qemu-devel] " Alex Williamson
2016-09-02 21:48                     ` Paolo Bonzini
2016-09-03 11:56                       ` [libvirt] " John Ferlan
2016-09-03 13:07                         ` Paolo Bonzini
2016-09-03 17:47                           ` [libvirt] " Kirti Wankhede
2016-09-03 16:34                       ` [Qemu-devel] " Kirti Wankhede
2016-09-06 17:40                         ` Alex Williamson
2016-09-06 19:35                           ` Kirti Wankhede
2016-09-06 21:28                             ` Alex Williamson
2016-09-07  8:22                               ` Tian, Kevin
2016-09-07 16:00                                 ` Alex Williamson
2016-09-07 16:15                               ` Kirti Wankhede
2016-09-07 16:44                                 ` Alex Williamson
2016-09-07 18:06                                   ` Kirti Wankhede
2016-09-07 22:13                                     ` Alex Williamson
2016-09-08 18:48                                       ` Kirti Wankhede
2016-09-08 20:51                                         ` Alex Williamson
2016-09-07 18:17                                   ` Neo Jia
2016-09-07 18:27                                     ` Daniel P. Berrange
2016-09-07 18:32                                       ` Neo Jia
2016-09-07  6:48                           ` Tian, Kevin
2016-09-02 20:19               ` [libvirt] " John Ferlan
2016-09-02 21:44                 ` Paolo Bonzini
2016-09-02 23:57                   ` Laine Stump
2016-09-03 16:49                     ` Kirti Wankhede
2016-09-05  7:52                     ` Paolo Bonzini
2016-09-03 11:57                   ` John Ferlan
2016-09-05  7:54                     ` Paolo Bonzini
2016-09-02 17:55         ` Laine Stump
2016-09-02 19:15           ` [libvirt] " Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57F89B9B.7000508@intel.com \
    --to=jike.song@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cjia@nvidia.com \
    --cc=guangrong.xiao@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).