From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [RFC][PATCH] Fix superpage unmap on Intel IOMMU
Date: Fri, 03 Jun 2011 14:01:57 -0600
Message-ID: <1307131318.3128.10.camel@ul30vt>
References: <1307129491.23418.4.camel@i7.infradead.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org, "Kay, Allen M" <allen.m.kay@intel.com>,
	joerg.roedel@amd.com, iommu@lists.linux-foundation.org
To: David Woodhouse <dwmw2@infradead.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:51412 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754219Ab1FCUCh (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 3 Jun 2011 16:02:37 -0400
In-Reply-To: <1307129491.23418.4.camel@i7.infradead.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Fri, 2011-06-03 at 20:31 +0100, David Woodhouse wrote:
> Tell me it isn't so...

Looks accurate to me, in fact, with hugetlbfs it seems like it's doing
exactly what it should do.  The non-hugetlbfs case isn't efficient, but
it isn't wrong either.  Our only other option is to figure out what's
contiguous, map those chunks and try to keep track of all that.  It's
much easier to make you do it since you have to manage the iommu page
table anyway ;)  Besides, it's the only way we get page granularity on
unmapping.

And of course, isn't this just following the IOMMU API?  It'd be nice if
that API was documented somewhere.  Thanks,

Alex

> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 59f17ac..6c588ee 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -3866,6 +3866,20 @@ static int intel_iommu_unmap(struct iommu_domain *domain,
>  	struct dmar_domain *dmar_domain = domain->priv;
>  	size_t size = PAGE_SIZE << gfp_order;
>  
> +	/* The KVM code is *fucked* in the head. It maps the range
> +	   one page at a time, using 4KiB pages unless it actually
> +	   allocated hugepages using hugetlbfs. (So we get to flush
> +	   the CPU data cache and then the IOTLB for each page in
> +	   its loop). And on unmap, it unmaps 4KiB at a time (always
> +	   passing gfp_order==0), regardless of whether it mapped
> +	   using superpages or not. So on unmap, if we detect a
> +	   superpage in our page tables we are expected to unmap
> +	   *more* than we are asked to, and return a value indicating
> +	   how much we actually unmapped. WTF? */
> +	if (dma_pfn_level_pte (dmar_domain, iova >> VTD_PAGE_SHIFT,
> +			       1, &gfp_order))
> +		size = PAGE_SIZE << gfp_order;
> +	    
>  	dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
>  			    (iova + size - 1) >> VTD_PAGE_SHIFT);
>  
>