All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Joerg Roedel <joro@8bytes.org>
Subject: [ 31/40] intel-iommu: Fix leaks in pagetable freeing
Date: Tue, 24 Sep 2013 17:12:03 -0700	[thread overview]
Message-ID: <20130925001045.327263910@linuxfoundation.org> (raw)
In-Reply-To: <20130925001041.939335518@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Williamson <alex.williamson@redhat.com>

commit 3269ee0bd6686baf86630300d528500ac5b516d7 upstream.

At best the current code only seems to free the leaf pagetables and
the root.  If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root).  This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/iommu/intel-iommu.c |   74 +++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 38 deletions(-)

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -886,56 +886,54 @@ static int dma_pte_clear_range(struct dm
 	return order;
 }
 
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+			       struct dma_pte *pte, unsigned long pfn,
+			       unsigned long start_pfn, unsigned long last_pfn)
+{
+	pfn = max(start_pfn, pfn);
+	pte = &pte[pfn_level_offset(pfn, level)];
+
+	do {
+		unsigned long level_pfn;
+		struct dma_pte *level_pte;
+
+		if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+			goto next;
+
+		level_pfn = pfn & level_mask(level - 1);
+		level_pte = phys_to_virt(dma_pte_addr(pte));
+
+		if (level > 2)
+			dma_pte_free_level(domain, level - 1, level_pte,
+					   level_pfn, start_pfn, last_pfn);
+
+		/* If range covers entire pagetable, free it */
+		if (!(start_pfn > level_pfn ||
+		      last_pfn < level_pfn + level_size(level))) {
+			dma_clear_pte(pte);
+			domain_flush_cache(domain, pte, sizeof(*pte));
+			free_pgtable_page(level_pte);
+		}
+next:
+		pfn += level_size(level);
+	} while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
 /* free page table pages. last level pte should already be cleared */
 static void dma_pte_free_pagetable(struct dmar_domain *domain,
 				   unsigned long start_pfn,
 				   unsigned long last_pfn)
 {
 	int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
-	struct dma_pte *first_pte, *pte;
-	int total = agaw_to_level(domain->agaw);
-	int level;
-	unsigned long tmp;
-	int large_page = 2;
 
 	BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
 	BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
 	BUG_ON(start_pfn > last_pfn);
 
 	/* We don't need lock here; nobody else touches the iova range */
-	level = 2;
-	while (level <= total) {
-		tmp = align_to_level(start_pfn, level);
-
-		/* If we can't even clear one PTE at this level, we're done */
-		if (tmp + level_size(level) - 1 > last_pfn)
-			return;
-
-		do {
-			large_page = level;
-			first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
-			if (large_page > level)
-				level = large_page + 1;
-			if (!pte) {
-				tmp = align_to_level(tmp + 1, level + 1);
-				continue;
-			}
-			do {
-				if (dma_pte_present(pte)) {
-					free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
-					dma_clear_pte(pte);
-				}
-				pte++;
-				tmp += level_size(level);
-			} while (!first_pte_in_page(pte) &&
-				 tmp + level_size(level) - 1 <= last_pfn);
-
-			domain_flush_cache(domain, first_pte,
-					   (void *)pte - (void *)first_pte);
-			
-		} while (tmp && tmp + level_size(level) - 1 <= last_pfn);
-		level++;
-	}
+	dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+			   domain->pgd, 0, start_pfn, last_pfn);
+
 	/* free pgd */
 	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
 		free_pgtable_page(domain->pgd);



  parent reply	other threads:[~2013-09-25  1:24 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-25  0:11 [ 00/40] 3.4.63-stable review Greg Kroah-Hartman
2013-09-25  0:11 ` [ 01/40] SCSI: sd: Fix potential out-of-bounds access Greg Kroah-Hartman
2013-09-25  0:11 ` [ 02/40] crypto: api - Fix race condition in larval lookup Greg Kroah-Hartman
2013-09-25  0:11   ` Greg Kroah-Hartman
2013-09-25  0:11 ` [ 03/40] powerpc: Handle unaligned ldbrx/stdbrx Greg Kroah-Hartman
2013-09-25  0:11 ` [ 04/40] xen-gnt: prevent adding duplicate gnt callbacks Greg Kroah-Hartman
2013-09-25  0:11 ` [ 05/40] ARM: PCI: versatile: Fix SMAP register offsets Greg Kroah-Hartman
2013-09-25  0:11 ` [ 06/40] xhci-plat: Dont enable legacy PCI interrupts Greg Kroah-Hartman
2013-09-25  0:11 ` [ 07/40] usb: xhci: Disable runtime PM suspend for quirky controllers Greg Kroah-Hartman
2013-09-25  0:11 ` [ 08/40] cifs: ensure that srv_mutex is held when dealing with ssocket pointer Greg Kroah-Hartman
2013-09-25  0:11 ` [ 09/40] staging: comedi: dt282x: dt282x_ai_insn_read() always fails Greg Kroah-Hartman
2013-09-25  0:11 ` [ 10/40] USB: mos7720: use GFP_ATOMIC under spinlock Greg Kroah-Hartman
2013-09-25  0:11 ` [ 11/40] USB: mos7720: fix big-endian control requests Greg Kroah-Hartman
2013-09-25  0:11 ` [ 12/40] usb: ehci-mxc: check for pdata before dereferencing Greg Kroah-Hartman
2013-09-25  0:11 ` [ 13/40] USB: cdc-wdm: fix race between interrupt handler and tasklet Greg Kroah-Hartman
2013-09-25  0:11 ` [ 14/40] usb: config->desc.bLength may not exceed amount of data returned by the device Greg Kroah-Hartman
2013-09-25  0:11 ` [ 15/40] rculist: list_first_or_null_rcu() should use list_entry_rcu() Greg Kroah-Hartman
2013-09-25  0:11 ` [ 16/40] ASoC: wm8960: Fix PLL register writes Greg Kroah-Hartman
2013-09-25  0:11 ` [ 17/40] ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist Greg Kroah-Hartman
2013-09-25  0:11 ` [ 18/40] brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error() Greg Kroah-Hartman
2013-09-25  0:11 ` [ 19/40] ath9k: always clear ps filter bit on new assoc Greg Kroah-Hartman
2013-09-25  0:11 ` [ 20/40] ath9k: fix rx descriptor related race condition Greg Kroah-Hartman
2013-09-25  0:11 ` [ 21/40] ath9k: avoid accessing MRC registers on single-chain devices Greg Kroah-Hartman
2013-09-25  0:11 ` [ 22/40] HID: pantherlord: validate output report details Greg Kroah-Hartman
2013-09-25  0:11 ` [ 23/40] HID: Fix Speedlink VAD Cezanne support for some devices Greg Kroah-Hartman
2013-09-25  0:11 ` [ 24/40] HID: validate HID report id size Greg Kroah-Hartman
2013-09-25  0:11 ` [ 25/40] HID: ntrig: validate feature report details Greg Kroah-Hartman
2013-09-25  0:11 ` [ 26/40] HID: battery: dont do DMA from stack Greg Kroah-Hartman
2013-09-25  0:11 ` [ 27/40] HID: check for NULL field when setting values Greg Kroah-Hartman
2013-09-25  0:12 ` [ 28/40] HID: usbhid: quirk for N-Trig DuoSense Touch Screen Greg Kroah-Hartman
2013-09-25  0:12 ` [ 29/40] media: v4l2: added missing mutex.h include to v4l2-ctrls.h Greg Kroah-Hartman
2013-09-25  0:12 ` [ 30/40] MIPS: ath79: Fix ar933x watchdog clock Greg Kroah-Hartman
2013-09-25  0:12 ` Greg Kroah-Hartman [this message]
2013-09-25  0:12 ` [ 32/40] ocfs2: fix the end cluster offset of FIEMAP Greg Kroah-Hartman
2013-09-25  0:12 ` [ 33/40] memcg: fix multiple large threshold notifications Greg Kroah-Hartman
2013-09-25  0:12 ` [ 34/40] mm/huge_memory.c: fix potential NULL pointer dereference Greg Kroah-Hartman
2013-09-25  0:12 ` [ 35/40] isofs: Refuse RW mount of the filesystem instead of making it RO Greg Kroah-Hartman
2013-09-25  0:12 ` [ 36/40] drm/edid: add quirk for Medion MD30217PG Greg Kroah-Hartman
2013-09-25  0:12 ` [ 37/40] mmc: tmio_mmc_dma: fix PIO fallback on SDHI Greg Kroah-Hartman
2013-09-25  0:12 ` [ 38/40] of: Fix missing memory initialization on FDT unflattening Greg Kroah-Hartman
2013-09-25  0:12 ` [ 39/40] fuse: postpone end_page_writeback() in fuse_writepage_locked() Greg Kroah-Hartman
2013-09-25  0:12 ` [ 40/40] fuse: invalidate inode attributes on xattr modification Greg Kroah-Hartman
2013-09-25  4:35 ` [ 00/40] 3.4.63-stable review Guenter Roeck
2013-09-26  1:09   ` Greg Kroah-Hartman
2013-09-26  2:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130925001045.327263910@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.