From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [RFC][PATCH] Fix superpage unmap on Intel IOMMU Date: Fri, 03 Jun 2011 14:01:57 -0600 Message-ID: <1307131318.3128.10.camel@ul30vt> References: <1307129491.23418.4.camel@i7.infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, "Kay, Allen M" , joerg.roedel@amd.com, iommu@lists.linux-foundation.org To: David Woodhouse Return-path: Received: from mx1.redhat.com ([209.132.183.28]:51412 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754219Ab1FCUCh (ORCPT ); Fri, 3 Jun 2011 16:02:37 -0400 In-Reply-To: <1307129491.23418.4.camel@i7.infradead.org> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, 2011-06-03 at 20:31 +0100, David Woodhouse wrote: > Tell me it isn't so... Looks accurate to me, in fact, with hugetlbfs it seems like it's doing exactly what it should do. The non-hugetlbfs case isn't efficient, but it isn't wrong either. Our only other option is to figure out what's contiguous, map those chunks and try to keep track of all that. It's much easier to make you do it since you have to manage the iommu page table anyway ;) Besides, it's the only way we get page granularity on unmapping. And of course, isn't this just following the IOMMU API? It'd be nice if that API was documented somewhere. Thanks, Alex > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > index 59f17ac..6c588ee 100644 > --- a/drivers/pci/intel-iommu.c > +++ b/drivers/pci/intel-iommu.c > @@ -3866,6 +3866,20 @@ static int intel_iommu_unmap(struct iommu_domain *domain, > struct dmar_domain *dmar_domain = domain->priv; > size_t size = PAGE_SIZE << gfp_order; > > + /* The KVM code is *fucked* in the head. It maps the range > + one page at a time, using 4KiB pages unless it actually > + allocated hugepages using hugetlbfs. (So we get to flush > + the CPU data cache and then the IOTLB for each page in > + its loop). And on unmap, it unmaps 4KiB at a time (always > + passing gfp_order==0), regardless of whether it mapped > + using superpages or not. So on unmap, if we detect a > + superpage in our page tables we are expected to unmap > + *more* than we are asked to, and return a value indicating > + how much we actually unmapped. WTF? */ > + if (dma_pfn_level_pte (dmar_domain, iova >> VTD_PAGE_SHIFT, > + 1, &gfp_order)) > + size = PAGE_SIZE << gfp_order; > + > dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT, > (iova + size - 1) >> VTD_PAGE_SHIFT); > >