From: Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org>
To: Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Subject: Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
Date: Wed, 2 Oct 2013 10:44:31 +0200 [thread overview]
Message-ID: <20131002084431.GA20568@pd.tnic> (raw)
In-Reply-To: <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
> At best the current code only seems to free the leaf pagetables and
> the root. If you're unlucky enough to have a large gap (like any
> QEMU guest with more than 3G of memory), only the first chunk of leaf
> pagetables are freed (plus the root). This is a massive memory leak.
> This patch re-writes the pagetable freeing function to use a
> recursive algorithm and manages to not only free all the pagetables,
> but does it without any apparent performance loss versus the current
> broken version.
>
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>
> Suggesting for stable, would like to see some soak time, but it's
> hard to imagine this being any worse than the current code.
Btw, I have a backport for the 3.0.x series which builds fine here, in
case you guys are interested :)
--
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Sat, 15 Jun 2013 10:27:19 -0600
Subject: [PATCH] intel-iommu: Fix leaks in pagetable freeing
upstream commit: 3269ee0bd6686baf86630300d528500ac5b516d7
At best the current code only seems to free the leaf pagetables and
the root. If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root). This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.
Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Borislav Petkov <bp-l3A5Bk7waGM@public.gmane.org>
---
drivers/pci/intel-iommu.c | 72 +++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 37 deletions(-)
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index ae762ecc658b..68baf178cede 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -853,56 +853,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
return order;
}
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+ struct dma_pte *pte, unsigned long pfn,
+ unsigned long start_pfn, unsigned long last_pfn)
+{
+ pfn = max(start_pfn, pfn);
+ pte = &pte[pfn_level_offset(pfn, level)];
+
+ do {
+ unsigned long level_pfn;
+ struct dma_pte *level_pte;
+
+ if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+ goto next;
+
+ level_pfn = pfn & level_mask(level - 1);
+ level_pte = phys_to_virt(dma_pte_addr(pte));
+
+ if (level > 2)
+ dma_pte_free_level(domain, level - 1, level_pte,
+ level_pfn, start_pfn, last_pfn);
+
+ /* If range covers entire pagetable, free it */
+ if (!(start_pfn > level_pfn ||
+ last_pfn < level_pfn + level_size(level))) {
+ dma_clear_pte(pte);
+ domain_flush_cache(domain, pte, sizeof(*pte));
+ free_pgtable_page(level_pte);
+ }
+next:
+ pfn += level_size(level);
+ } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
/* free page table pages. last level pte should already be cleared */
static void dma_pte_free_pagetable(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long last_pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
- struct dma_pte *first_pte, *pte;
- int total = agaw_to_level(domain->agaw);
- int level;
- unsigned long tmp;
- int large_page = 2;
BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
BUG_ON(start_pfn > last_pfn);
/* We don't need lock here; nobody else touches the iova range */
- level = 2;
- while (level <= total) {
- tmp = align_to_level(start_pfn, level);
-
- /* If we can't even clear one PTE at this level, we're done */
- if (tmp + level_size(level) - 1 > last_pfn)
- return;
-
- do {
- large_page = level;
- first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
- if (large_page > level)
- level = large_page + 1;
- if (!pte) {
- tmp = align_to_level(tmp + 1, level + 1);
- continue;
- }
- do {
- if (dma_pte_present(pte)) {
- free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
- dma_clear_pte(pte);
- }
- pte++;
- tmp += level_size(level);
- } while (!first_pte_in_page(pte) &&
- tmp + level_size(level) - 1 <= last_pfn);
+ dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+ domain->pgd, 0, start_pfn, last_pfn);
- domain_flush_cache(domain, first_pte,
- (void *)pte - (void *)first_pte);
-
- } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
- level++;
- }
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
free_pgtable_page(domain->pgd);
--
1.8.4
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: dwmw2@infradead.org, iommu@lists.linux-foundation.org,
ddutile@redhat.com, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
Date: Wed, 2 Oct 2013 10:44:31 +0200 [thread overview]
Message-ID: <20131002084431.GA20568@pd.tnic> (raw)
In-Reply-To: <20130615161614.2107.41044.stgit@bling.home>
On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
> At best the current code only seems to free the leaf pagetables and
> the root. If you're unlucky enough to have a large gap (like any
> QEMU guest with more than 3G of memory), only the first chunk of leaf
> pagetables are freed (plus the root). This is a massive memory leak.
> This patch re-writes the pagetable freeing function to use a
> recursive algorithm and manages to not only free all the pagetables,
> but does it without any apparent performance loss versus the current
> broken version.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>
> Suggesting for stable, would like to see some soak time, but it's
> hard to imagine this being any worse than the current code.
Btw, I have a backport for the 3.0.x series which builds fine here, in
case you guys are interested :)
--
From: Alex Williamson <alex.williamson@redhat.com>
Date: Sat, 15 Jun 2013 10:27:19 -0600
Subject: [PATCH] intel-iommu: Fix leaks in pagetable freeing
upstream commit: 3269ee0bd6686baf86630300d528500ac5b516d7
At best the current code only seems to free the leaf pagetables and
the root. If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root). This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
drivers/pci/intel-iommu.c | 72 +++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 37 deletions(-)
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index ae762ecc658b..68baf178cede 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -853,56 +853,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
return order;
}
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+ struct dma_pte *pte, unsigned long pfn,
+ unsigned long start_pfn, unsigned long last_pfn)
+{
+ pfn = max(start_pfn, pfn);
+ pte = &pte[pfn_level_offset(pfn, level)];
+
+ do {
+ unsigned long level_pfn;
+ struct dma_pte *level_pte;
+
+ if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+ goto next;
+
+ level_pfn = pfn & level_mask(level - 1);
+ level_pte = phys_to_virt(dma_pte_addr(pte));
+
+ if (level > 2)
+ dma_pte_free_level(domain, level - 1, level_pte,
+ level_pfn, start_pfn, last_pfn);
+
+ /* If range covers entire pagetable, free it */
+ if (!(start_pfn > level_pfn ||
+ last_pfn < level_pfn + level_size(level))) {
+ dma_clear_pte(pte);
+ domain_flush_cache(domain, pte, sizeof(*pte));
+ free_pgtable_page(level_pte);
+ }
+next:
+ pfn += level_size(level);
+ } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
/* free page table pages. last level pte should already be cleared */
static void dma_pte_free_pagetable(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long last_pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
- struct dma_pte *first_pte, *pte;
- int total = agaw_to_level(domain->agaw);
- int level;
- unsigned long tmp;
- int large_page = 2;
BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
BUG_ON(start_pfn > last_pfn);
/* We don't need lock here; nobody else touches the iova range */
- level = 2;
- while (level <= total) {
- tmp = align_to_level(start_pfn, level);
-
- /* If we can't even clear one PTE at this level, we're done */
- if (tmp + level_size(level) - 1 > last_pfn)
- return;
-
- do {
- large_page = level;
- first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
- if (large_page > level)
- level = large_page + 1;
- if (!pte) {
- tmp = align_to_level(tmp + 1, level + 1);
- continue;
- }
- do {
- if (dma_pte_present(pte)) {
- free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
- dma_clear_pte(pte);
- }
- pte++;
- tmp += level_size(level);
- } while (!first_pte_in_page(pte) &&
- tmp + level_size(level) - 1 <= last_pfn);
+ dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+ domain->pgd, 0, start_pfn, last_pfn);
- domain_flush_cache(domain, first_pte,
- (void *)pte - (void *)first_pte);
-
- } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
- level++;
- }
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
free_pgtable_page(domain->pgd);
--
1.8.4
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2013-10-02 8:44 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-15 16:27 [PATCH] intel-iommu: Fix leaks in pagetable freeing Alex Williamson
2013-06-15 16:27 ` Alex Williamson
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2013-07-24 15:25 ` Alex Williamson
2013-07-24 15:25 ` Alex Williamson
[not found] ` <1374679519.1675.1.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2013-08-06 16:08 ` Marcelo Tosatti
2013-08-06 16:08 ` Marcelo Tosatti
2013-08-14 20:23 ` Joerg Roedel
2013-08-14 20:23 ` Joerg Roedel
2013-10-02 8:44 ` Borislav Petkov [this message]
2013-10-02 8:44 ` Borislav Petkov
[not found] ` <20131002084431.GA20568-fF5Pk5pvG8Y@public.gmane.org>
2013-10-05 23:41 ` Greg KH
2013-10-05 23:41 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131002084431.GA20568@pd.tnic \
--to=bp-gina5biwoiwzqb+pc5nmwq@public.gmane.org \
--cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.