public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Joerg Roedel <joro@8bytes.org>
Subject: [ 31/40] intel-iommu: Fix leaks in pagetable freeing
Date: Tue, 24 Sep 2013 17:12:03 -0700	[thread overview]
Message-ID: <20130925001045.327263910@linuxfoundation.org> (raw)
In-Reply-To: <20130925001041.939335518@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Williamson <alex.williamson@redhat.com>

commit 3269ee0bd6686baf86630300d528500ac5b516d7 upstream.

At best the current code only seems to free the leaf pagetables and
the root.  If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root).  This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/iommu/intel-iommu.c |   74 +++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 38 deletions(-)

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -886,56 +886,54 @@ static int dma_pte_clear_range(struct dm
 	return order;
 }
 
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+			       struct dma_pte *pte, unsigned long pfn,
+			       unsigned long start_pfn, unsigned long last_pfn)
+{
+	pfn = max(start_pfn, pfn);
+	pte = &pte[pfn_level_offset(pfn, level)];
+
+	do {
+		unsigned long level_pfn;
+		struct dma_pte *level_pte;
+
+		if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+			goto next;
+
+		level_pfn = pfn & level_mask(level - 1);
+		level_pte = phys_to_virt(dma_pte_addr(pte));
+
+		if (level > 2)
+			dma_pte_free_level(domain, level - 1, level_pte,
+					   level_pfn, start_pfn, last_pfn);
+
+		/* If range covers entire pagetable, free it */
+		if (!(start_pfn > level_pfn ||
+		      last_pfn < level_pfn + level_size(level))) {
+			dma_clear_pte(pte);
+			domain_flush_cache(domain, pte, sizeof(*pte));
+			free_pgtable_page(level_pte);
+		}
+next:
+		pfn += level_size(level);
+	} while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
 /* free page table pages. last level pte should already be cleared */
 static void dma_pte_free_pagetable(struct dmar_domain *domain,
 				   unsigned long start_pfn,
 				   unsigned long last_pfn)
 {
 	int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
-	struct dma_pte *first_pte, *pte;
-	int total = agaw_to_level(domain->agaw);
-	int level;
-	unsigned long tmp;
-	int large_page = 2;
 
 	BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
 	BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
 	BUG_ON(start_pfn > last_pfn);
 
 	/* We don't need lock here; nobody else touches the iova range */
-	level = 2;
-	while (level <= total) {
-		tmp = align_to_level(start_pfn, level);
-
-		/* If we can't even clear one PTE at this level, we're done */
-		if (tmp + level_size(level) - 1 > last_pfn)
-			return;
-
-		do {
-			large_page = level;
-			first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
-			if (large_page > level)
-				level = large_page + 1;
-			if (!pte) {
-				tmp = align_to_level(tmp + 1, level + 1);
-				continue;
-			}
-			do {
-				if (dma_pte_present(pte)) {
-					free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
-					dma_clear_pte(pte);
-				}
-				pte++;
-				tmp += level_size(level);
-			} while (!first_pte_in_page(pte) &&
-				 tmp + level_size(level) - 1 <= last_pfn);
-
-			domain_flush_cache(domain, first_pte,
-					   (void *)pte - (void *)first_pte);
-			
-		} while (tmp && tmp + level_size(level) - 1 <= last_pfn);
-		level++;
-	}
+	dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+			   domain->pgd, 0, start_pfn, last_pfn);
+
 	/* free pgd */
 	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
 		free_pgtable_page(domain->pgd);



  parent reply	other threads:[~2013-09-25  1:24 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-25  0:11 [ 00/40] 3.4.63-stable review Greg Kroah-Hartman
2013-09-25  0:11 ` [ 01/40] SCSI: sd: Fix potential out-of-bounds access Greg Kroah-Hartman
2013-09-25  0:11 ` [ 02/40] crypto: api - Fix race condition in larval lookup Greg Kroah-Hartman
2013-09-25  0:11 ` [ 03/40] powerpc: Handle unaligned ldbrx/stdbrx Greg Kroah-Hartman
2013-09-25  0:11 ` [ 04/40] xen-gnt: prevent adding duplicate gnt callbacks Greg Kroah-Hartman
2013-09-25  0:11 ` [ 05/40] ARM: PCI: versatile: Fix SMAP register offsets Greg Kroah-Hartman
2013-09-25  0:11 ` [ 06/40] xhci-plat: Dont enable legacy PCI interrupts Greg Kroah-Hartman
2013-09-25  0:11 ` [ 07/40] usb: xhci: Disable runtime PM suspend for quirky controllers Greg Kroah-Hartman
2013-09-25  0:11 ` [ 08/40] cifs: ensure that srv_mutex is held when dealing with ssocket pointer Greg Kroah-Hartman
2013-09-25  0:11 ` [ 09/40] staging: comedi: dt282x: dt282x_ai_insn_read() always fails Greg Kroah-Hartman
2013-09-25  0:11 ` [ 10/40] USB: mos7720: use GFP_ATOMIC under spinlock Greg Kroah-Hartman
2013-09-25  0:11 ` [ 11/40] USB: mos7720: fix big-endian control requests Greg Kroah-Hartman
2013-09-25  0:11 ` [ 12/40] usb: ehci-mxc: check for pdata before dereferencing Greg Kroah-Hartman
2013-09-25  0:11 ` [ 13/40] USB: cdc-wdm: fix race between interrupt handler and tasklet Greg Kroah-Hartman
2013-09-25  0:11 ` [ 14/40] usb: config->desc.bLength may not exceed amount of data returned by the device Greg Kroah-Hartman
2013-09-25  0:11 ` [ 15/40] rculist: list_first_or_null_rcu() should use list_entry_rcu() Greg Kroah-Hartman
2013-09-25  0:11 ` [ 16/40] ASoC: wm8960: Fix PLL register writes Greg Kroah-Hartman
2013-09-25  0:11 ` [ 17/40] ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist Greg Kroah-Hartman
2013-09-25  0:11 ` [ 18/40] brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error() Greg Kroah-Hartman
2013-09-25  0:11 ` [ 19/40] ath9k: always clear ps filter bit on new assoc Greg Kroah-Hartman
2013-09-25  0:11 ` [ 20/40] ath9k: fix rx descriptor related race condition Greg Kroah-Hartman
2013-09-25  0:11 ` [ 21/40] ath9k: avoid accessing MRC registers on single-chain devices Greg Kroah-Hartman
2013-09-25  0:11 ` [ 22/40] HID: pantherlord: validate output report details Greg Kroah-Hartman
2013-09-25  0:11 ` [ 23/40] HID: Fix Speedlink VAD Cezanne support for some devices Greg Kroah-Hartman
2013-09-25  0:11 ` [ 24/40] HID: validate HID report id size Greg Kroah-Hartman
2013-09-25  0:11 ` [ 25/40] HID: ntrig: validate feature report details Greg Kroah-Hartman
2013-09-25  0:11 ` [ 26/40] HID: battery: dont do DMA from stack Greg Kroah-Hartman
2013-09-25  0:11 ` [ 27/40] HID: check for NULL field when setting values Greg Kroah-Hartman
2013-09-25  0:12 ` [ 28/40] HID: usbhid: quirk for N-Trig DuoSense Touch Screen Greg Kroah-Hartman
2013-09-25  0:12 ` [ 29/40] media: v4l2: added missing mutex.h include to v4l2-ctrls.h Greg Kroah-Hartman
2013-09-25  0:12 ` [ 30/40] MIPS: ath79: Fix ar933x watchdog clock Greg Kroah-Hartman
2013-09-25  0:12 ` Greg Kroah-Hartman [this message]
2013-09-25  0:12 ` [ 32/40] ocfs2: fix the end cluster offset of FIEMAP Greg Kroah-Hartman
2013-09-25  0:12 ` [ 33/40] memcg: fix multiple large threshold notifications Greg Kroah-Hartman
2013-09-25  0:12 ` [ 34/40] mm/huge_memory.c: fix potential NULL pointer dereference Greg Kroah-Hartman
2013-09-25  0:12 ` [ 35/40] isofs: Refuse RW mount of the filesystem instead of making it RO Greg Kroah-Hartman
2013-09-25  0:12 ` [ 36/40] drm/edid: add quirk for Medion MD30217PG Greg Kroah-Hartman
2013-09-25  0:12 ` [ 37/40] mmc: tmio_mmc_dma: fix PIO fallback on SDHI Greg Kroah-Hartman
2013-09-25  0:12 ` [ 38/40] of: Fix missing memory initialization on FDT unflattening Greg Kroah-Hartman
2013-09-25  0:12 ` [ 39/40] fuse: postpone end_page_writeback() in fuse_writepage_locked() Greg Kroah-Hartman
2013-09-25  0:12 ` [ 40/40] fuse: invalidate inode attributes on xattr modification Greg Kroah-Hartman
2013-09-25  4:35 ` [ 00/40] 3.4.63-stable review Guenter Roeck
2013-09-26  1:09   ` Greg Kroah-Hartman
2013-09-26  2:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130925001045.327263910@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox