From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38397)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1bpo4G-0004l8-SC
	for qemu-devel@nongnu.org; Thu, 29 Sep 2016 23:00:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1bpo4A-0000gE-VQ
	for qemu-devel@nongnu.org; Thu, 29 Sep 2016 23:00:55 -0400
Received: from mga04.intel.com ([192.55.52.120]:29658)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1bpo4A-0000g6-PL
	for qemu-devel@nongnu.org; Thu, 29 Sep 2016 23:00:50 -0400
Message-ID: <57EDD4CF.2080806@intel.com>
Date: Fri, 30 Sep 2016 10:58:23 +0800
From: Jike Song <jike.song@intel.com>
MIME-Version: 1.0
References: <1472097235-6332-1-git-send-email-kwankhede@nvidia.com>
	<1472097235-6332-4-git-send-email-kwankhede@nvidia.com>
	<57EC79B3.2060305@intel.com>
	<c205f9d9-581f-e213-3f2d-38dfd7f05bba@nvidia.com>
In-Reply-To: <c205f9d9-581f-e213-3f2d-38dfd7f05bba@nvidia.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v7 3/4] vfio iommu: Add support for
 mediated devices
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: alex.williamson@redhat.com, pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, kevin.tian@intel.com, bjsdjshi@linux.vnet.ibm.com, "Xiao, Guangrong" <guangrong.xiao@intel.com>

On 09/29/2016 11:06 PM, Kirti Wankhede wrote:
> 
> 
> On 9/29/2016 7:47 AM, Jike Song wrote:
>> +Guangrong
>>
>> On 08/25/2016 11:53 AM, Kirti Wankhede wrote:
> 
> ...
> 
>>> +static long vfio_iommu_type1_pin_pages(void *iommu_data,
>>> +				       unsigned long *user_pfn,
>>> +				       long npage, int prot,
>>> +				       unsigned long *phys_pfn)
>>> +{
>>> +	struct vfio_iommu *iommu = iommu_data;
>>> +	struct vfio_domain *domain;
>>> +	int i, j, ret;
>>> +	long retpage;
>>> +	unsigned long remote_vaddr;
>>> +	unsigned long *pfn = phys_pfn;
>>> +	struct vfio_dma *dma;
>>> +	bool do_accounting = false;
>>> +
>>> +	if (!iommu || !user_pfn || !phys_pfn)
>>> +		return -EINVAL;
>>> +
>>> +	mutex_lock(&iommu->lock);
>>> +
>>> +	if (!iommu->local_domain) {
>>> +		ret = -EINVAL;
>>> +		goto pin_done;
>>> +	}
>>> +
>>> +	domain = iommu->local_domain;
>>> +
>>> +	/*
>>> +	 * If iommu capable domain exist in the container then all pages are
>>> +	 * already pinned and accounted. Accouting should be done if there is no
>>> +	 * iommu capable domain in the container.
>>> +	 */
>>> +	do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu);
>>> +
>>> +	for (i = 0; i < npage; i++) {
>>> +		struct vfio_pfn *p;
>>> +		dma_addr_t iova;
>>> +
>>> +		iova = user_pfn[i] << PAGE_SHIFT;
>>> +
>>> +		dma = vfio_find_dma(iommu, iova, 0);
>>> +		if (!dma) {
>>> +			ret = -EINVAL;
>>> +			goto pin_unwind;
>>> +		}
>>> +
>>> +		remote_vaddr = dma->vaddr + iova - dma->iova;
>>> +
>>> +		retpage = __vfio_pin_pages_local(domain, remote_vaddr, prot,
>>> +						 &pfn[i], do_accounting);
>>
>> Hi Kirti,
>>
>> Here you call __vfio_pin_pages_local() > vaddr_get_pfn() > GUP regardless
>> whether the vaddr already pinned or not. That probably means, if the caller 
>> calls vfio_pin_pages() with a GPA for multiple times, you get memory leaks.
>>
>> GUP always increases the page refcnt.
>>
>> FWIW, I would like to have the pfn_list_lock implemented with key == iova,
>> so you can always try to find the PFN for a given iova, and pin it only if
>> not found.
>>
> 
> I didn't get how there would be a memory leak.
> 
> Right, GUP increases refcnt, so if vfio_pin_pages() is called for
> multiple types for same GPA, refcnt would be incremented. In
> vfio_iommu_type1_pin_pages() pinned pages list is maintained with
> ref_count. If pfn is already in list, ref_count is incremented and same
> is used while unpining pages.
> 

Let's have a close look at vfio_unpin_pfn:

	static int vfio_unpin_pfn(struct vfio_domain *domain,
				  struct vfio_pfn *vpfn, bool do_accounting)
	{
		__vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot,
					    do_accounting);

		if (atomic_dec_and_test(&vpfn->ref_count))
			vfio_remove_from_pfn_list(domain, vpfn);

		return 1;
	}

Here you didn't call __vfio_unpin_pages_for_mdev -- thereby put_page -- for
vpfn->ref_count times. If page->_refcount increased by GUP for (N) times, here
you only set it back to (N-1).

--
Thanks,
Jike