* [PATCH] intel-iommu: Fix leaks in pagetable freeing
@ 2013-06-15 16:27 Alex Williamson
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2013-06-15 16:27 UTC (permalink / raw)
To: dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA
At best the current code only seems to free the leaf pagetables and
the root. If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root). This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.
Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Suggesting for stable, would like to see some soak time, but it's
hard to imagine this being any worse than the current code.
This likely also affects device domains, but the current code does
ok at freeing individual leaf pagetables and driver domains would
only get a full pruning if the driver or device is removed.
Some test programs:
https://github.com/awilliam/tests/blob/master/kvm-huge-guest-test.c
https://github.com/awilliam/tests/blob/master/vfio-huge-guest-test.c
Both of these simulate a large guest on a small host system. They
mmap 4G of memory and map it across a large address space just like
QEMU would (aside from re-using the same mmap across multiple IOVAs).
On existing code the vfio version (w/o a KVM memory slot limit) will
leak over 1G of pagetables per run.
drivers/iommu/intel-iommu.c | 72 +++++++++++++++++++++----------------------
1 file changed, 35 insertions(+), 37 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index eec0d3e..15e9b57 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -890,56 +890,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
return order;
}
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+ struct dma_pte *pte, unsigned long pfn,
+ unsigned long start_pfn, unsigned long last_pfn)
+{
+ pfn = max(start_pfn, pfn);
+ pte = &pte[pfn_level_offset(pfn, level)];
+
+ do {
+ unsigned long level_pfn;
+ struct dma_pte *level_pte;
+
+ if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+ goto next;
+
+ level_pfn = pfn & level_mask(level - 1);
+ level_pte = phys_to_virt(dma_pte_addr(pte));
+
+ if (level > 2)
+ dma_pte_free_level(domain, level - 1, level_pte,
+ level_pfn, start_pfn, last_pfn);
+
+ /* If range covers entire pagetable, free it */
+ if (!(start_pfn > level_pfn ||
+ last_pfn < level_pfn + level_size(level))) {
+ dma_clear_pte(pte);
+ domain_flush_cache(domain, pte, sizeof(*pte));
+ free_pgtable_page(level_pte);
+ }
+next:
+ pfn += level_size(level);
+ } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
/* free page table pages. last level pte should already be cleared */
static void dma_pte_free_pagetable(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long last_pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
- struct dma_pte *first_pte, *pte;
- int total = agaw_to_level(domain->agaw);
- int level;
- unsigned long tmp;
- int large_page = 2;
BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
BUG_ON(start_pfn > last_pfn);
/* We don't need lock here; nobody else touches the iova range */
- level = 2;
- while (level <= total) {
- tmp = align_to_level(start_pfn, level);
-
- /* If we can't even clear one PTE at this level, we're done */
- if (tmp + level_size(level) - 1 > last_pfn)
- return;
-
- do {
- large_page = level;
- first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
- if (large_page > level)
- level = large_page + 1;
- if (!pte) {
- tmp = align_to_level(tmp + 1, level + 1);
- continue;
- }
- do {
- if (dma_pte_present(pte)) {
- free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
- dma_clear_pte(pte);
- }
- pte++;
- tmp += level_size(level);
- } while (!first_pte_in_page(pte) &&
- tmp + level_size(level) - 1 <= last_pfn);
+ dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+ domain->pgd, 0, start_pfn, last_pfn);
- domain_flush_cache(domain, first_pte,
- (void *)pte - (void *)first_pte);
-
- } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
- level++;
- }
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
free_pgtable_page(domain->pgd);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
@ 2013-07-24 15:25 ` Alex Williamson
[not found] ` <1374679519.1675.1.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2013-08-14 20:23 ` Joerg Roedel
2013-10-02 8:44 ` Borislav Petkov
2 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2013-07-24 15:25 UTC (permalink / raw)
To: dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA
This is a pretty massive memory leak, anyone @Intel care? Thanks,
Alex
On Sat, 2013-06-15 at 10:27 -0600, Alex Williamson wrote:
> At best the current code only seems to free the leaf pagetables and
> the root. If you're unlucky enough to have a large gap (like any
> QEMU guest with more than 3G of memory), only the first chunk of leaf
> pagetables are freed (plus the root). This is a massive memory leak.
> This patch re-writes the pagetable freeing function to use a
> recursive algorithm and manages to not only free all the pagetables,
> but does it without any apparent performance loss versus the current
> broken version.
>
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>
> Suggesting for stable, would like to see some soak time, but it's
> hard to imagine this being any worse than the current code.
>
> This likely also affects device domains, but the current code does
> ok at freeing individual leaf pagetables and driver domains would
> only get a full pruning if the driver or device is removed.
>
> Some test programs:
> https://github.com/awilliam/tests/blob/master/kvm-huge-guest-test.c
> https://github.com/awilliam/tests/blob/master/vfio-huge-guest-test.c
>
> Both of these simulate a large guest on a small host system. They
> mmap 4G of memory and map it across a large address space just like
> QEMU would (aside from re-using the same mmap across multiple IOVAs).
> On existing code the vfio version (w/o a KVM memory slot limit) will
> leak over 1G of pagetables per run.
>
> drivers/iommu/intel-iommu.c | 72 +++++++++++++++++++++----------------------
> 1 file changed, 35 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index eec0d3e..15e9b57 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -890,56 +890,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
> return order;
> }
>
> +static void dma_pte_free_level(struct dmar_domain *domain, int level,
> + struct dma_pte *pte, unsigned long pfn,
> + unsigned long start_pfn, unsigned long last_pfn)
> +{
> + pfn = max(start_pfn, pfn);
> + pte = &pte[pfn_level_offset(pfn, level)];
> +
> + do {
> + unsigned long level_pfn;
> + struct dma_pte *level_pte;
> +
> + if (!dma_pte_present(pte) || dma_pte_superpage(pte))
> + goto next;
> +
> + level_pfn = pfn & level_mask(level - 1);
> + level_pte = phys_to_virt(dma_pte_addr(pte));
> +
> + if (level > 2)
> + dma_pte_free_level(domain, level - 1, level_pte,
> + level_pfn, start_pfn, last_pfn);
> +
> + /* If range covers entire pagetable, free it */
> + if (!(start_pfn > level_pfn ||
> + last_pfn < level_pfn + level_size(level))) {
> + dma_clear_pte(pte);
> + domain_flush_cache(domain, pte, sizeof(*pte));
> + free_pgtable_page(level_pte);
> + }
> +next:
> + pfn += level_size(level);
> + } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
> +}
> +
> /* free page table pages. last level pte should already be cleared */
> static void dma_pte_free_pagetable(struct dmar_domain *domain,
> unsigned long start_pfn,
> unsigned long last_pfn)
> {
> int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
> - struct dma_pte *first_pte, *pte;
> - int total = agaw_to_level(domain->agaw);
> - int level;
> - unsigned long tmp;
> - int large_page = 2;
>
> BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
> BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
> BUG_ON(start_pfn > last_pfn);
>
> /* We don't need lock here; nobody else touches the iova range */
> - level = 2;
> - while (level <= total) {
> - tmp = align_to_level(start_pfn, level);
> -
> - /* If we can't even clear one PTE at this level, we're done */
> - if (tmp + level_size(level) - 1 > last_pfn)
> - return;
> -
> - do {
> - large_page = level;
> - first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
> - if (large_page > level)
> - level = large_page + 1;
> - if (!pte) {
> - tmp = align_to_level(tmp + 1, level + 1);
> - continue;
> - }
> - do {
> - if (dma_pte_present(pte)) {
> - free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
> - dma_clear_pte(pte);
> - }
> - pte++;
> - tmp += level_size(level);
> - } while (!first_pte_in_page(pte) &&
> - tmp + level_size(level) - 1 <= last_pfn);
> + dma_pte_free_level(domain, agaw_to_level(domain->agaw),
> + domain->pgd, 0, start_pfn, last_pfn);
>
> - domain_flush_cache(domain, first_pte,
> - (void *)pte - (void *)first_pte);
> -
> - } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
> - level++;
> - }
> /* free pgd */
> if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
> free_pgtable_page(domain->pgd);
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
[not found] ` <1374679519.1675.1.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2013-08-06 16:08 ` Marcelo Tosatti
0 siblings, 0 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2013-08-06 16:08 UTC (permalink / raw)
To: Alex Williamson
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
kvm-u79uwXL29TY76Z2rM5mHXA
On Wed, Jul 24, 2013 at 09:25:19AM -0600, Alex Williamson wrote:
>
> This is a pretty massive memory leak, anyone @Intel care? Thanks,
>
> Alex
>
> On Sat, 2013-06-15 at 10:27 -0600, Alex Williamson wrote:
> > At best the current code only seems to free the leaf pagetables and
> > the root. If you're unlucky enough to have a large gap (like any
> > QEMU guest with more than 3G of memory), only the first chunk of leaf
> > pagetables are freed (plus the root). This is a massive memory leak.
> > This patch re-writes the pagetable freeing function to use a
> > recursive algorithm and manages to not only free all the pagetables,
> > but does it without any apparent performance loss versus the current
> > broken version.
> >
> > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > ---
> >
> > Suggesting for stable, would like to see some soak time, but it's
> > hard to imagine this being any worse than the current code.
> >
> > This likely also affects device domains, but the current code does
> > ok at freeing individual leaf pagetables and driver domains would
> > only get a full pruning if the driver or device is removed.
> >
> > Some test programs:
> > https://github.com/awilliam/tests/blob/master/kvm-huge-guest-test.c
> > https://github.com/awilliam/tests/blob/master/vfio-huge-guest-test.c
> >
> > Both of these simulate a large guest on a small host system. They
> > mmap 4G of memory and map it across a large address space just like
> > QEMU would (aside from re-using the same mmap across multiple IOVAs).
> > On existing code the vfio version (w/o a KVM memory slot limit) will
> > leak over 1G of pagetables per run.
> >
> > drivers/iommu/intel-iommu.c | 72 +++++++++++++++++++++----------------------
> > 1 file changed, 35 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index eec0d3e..15e9b57 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -890,56 +890,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
> > return order;
> > }
> >
> > +static void dma_pte_free_level(struct dmar_domain *domain, int level,
> > + struct dma_pte *pte, unsigned long pfn,
> > + unsigned long start_pfn, unsigned long last_pfn)
> > +{
> > + pfn = max(start_pfn, pfn);
> > + pte = &pte[pfn_level_offset(pfn, level)];
> > +
> > + do {
> > + unsigned long level_pfn;
> > + struct dma_pte *level_pte;
> > +
> > + if (!dma_pte_present(pte) || dma_pte_superpage(pte))
> > + goto next;
> > +
> > + level_pfn = pfn & level_mask(level - 1);
> > + level_pte = phys_to_virt(dma_pte_addr(pte));
> > +
> > + if (level > 2)
> > + dma_pte_free_level(domain, level - 1, level_pte,
> > + level_pfn, start_pfn, last_pfn);
> > +
> > + /* If range covers entire pagetable, free it */
> > + if (!(start_pfn > level_pfn ||
> > + last_pfn < level_pfn + level_size(level))) {
> > + dma_clear_pte(pte);
> > + domain_flush_cache(domain, pte, sizeof(*pte));
> > + free_pgtable_page(level_pte);
> > + }
> > +next:
> > + pfn += level_size(level);
> > + } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
> > +}
> > +
> > /* free page table pages. last level pte should already be cleared */
> > static void dma_pte_free_pagetable(struct dmar_domain *domain,
> > unsigned long start_pfn,
> > unsigned long last_pfn)
> > {
> > int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
> > - struct dma_pte *first_pte, *pte;
> > - int total = agaw_to_level(domain->agaw);
> > - int level;
> > - unsigned long tmp;
> > - int large_page = 2;
> >
> > BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
> > BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
> > BUG_ON(start_pfn > last_pfn);
> >
> > /* We don't need lock here; nobody else touches the iova range */
> > - level = 2;
> > - while (level <= total) {
> > - tmp = align_to_level(start_pfn, level);
> > -
> > - /* If we can't even clear one PTE at this level, we're done */
> > - if (tmp + level_size(level) - 1 > last_pfn)
> > - return;
> > -
> > - do {
> > - large_page = level;
> > - first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
> > - if (large_page > level)
> > - level = large_page + 1;
> > - if (!pte) {
> > - tmp = align_to_level(tmp + 1, level + 1);
> > - continue;
> > - }
> > - do {
> > - if (dma_pte_present(pte)) {
> > - free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
> > - dma_clear_pte(pte);
> > - }
> > - pte++;
> > - tmp += level_size(level);
> > - } while (!first_pte_in_page(pte) &&
> > - tmp + level_size(level) - 1 <= last_pfn);
> > + dma_pte_free_level(domain, agaw_to_level(domain->agaw),
> > + domain->pgd, 0, start_pfn, last_pfn);
> >
> > - domain_flush_cache(domain, first_pte,
> > - (void *)pte - (void *)first_pte);
> > -
> > - } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
> > - level++;
> > - }
> > /* free pgd */
> > if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
> > free_pgtable_page(domain->pgd);
> >
Reviewed-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2013-07-24 15:25 ` Alex Williamson
@ 2013-08-14 20:23 ` Joerg Roedel
2013-10-02 8:44 ` Borislav Petkov
2 siblings, 0 replies; 6+ messages in thread
From: Joerg Roedel @ 2013-08-14 20:23 UTC (permalink / raw)
To: Alex Williamson
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
kvm-u79uwXL29TY76Z2rM5mHXA
On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
> At best the current code only seems to free the leaf pagetables and
> the root. If you're unlucky enough to have a large gap (like any
> QEMU guest with more than 3G of memory), only the first chunk of leaf
> pagetables are freed (plus the root). This is a massive memory leak.
> This patch re-writes the pagetable freeing function to use a
> recursive algorithm and manages to not only free all the pagetables,
> but does it without any apparent performance loss versus the current
> broken version.
>
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Applied to iommu/fixes, thanks Alex. Will send this for v3.11 after a
couple of days in next.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2013-07-24 15:25 ` Alex Williamson
2013-08-14 20:23 ` Joerg Roedel
@ 2013-10-02 8:44 ` Borislav Petkov
[not found] ` <20131002084431.GA20568-fF5Pk5pvG8Y@public.gmane.org>
2 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2013-10-02 8:44 UTC (permalink / raw)
To: Alex Williamson
Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
stable-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
> At best the current code only seems to free the leaf pagetables and
> the root. If you're unlucky enough to have a large gap (like any
> QEMU guest with more than 3G of memory), only the first chunk of leaf
> pagetables are freed (plus the root). This is a massive memory leak.
> This patch re-writes the pagetable freeing function to use a
> recursive algorithm and manages to not only free all the pagetables,
> but does it without any apparent performance loss versus the current
> broken version.
>
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>
> Suggesting for stable, would like to see some soak time, but it's
> hard to imagine this being any worse than the current code.
Btw, I have a backport for the 3.0.x series which builds fine here, in
case you guys are interested :)
--
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Sat, 15 Jun 2013 10:27:19 -0600
Subject: [PATCH] intel-iommu: Fix leaks in pagetable freeing
upstream commit: 3269ee0bd6686baf86630300d528500ac5b516d7
At best the current code only seems to free the leaf pagetables and
the root. If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root). This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.
Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Borislav Petkov <bp-l3A5Bk7waGM@public.gmane.org>
---
drivers/pci/intel-iommu.c | 72 +++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 37 deletions(-)
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index ae762ecc658b..68baf178cede 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -853,56 +853,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain,
return order;
}
+static void dma_pte_free_level(struct dmar_domain *domain, int level,
+ struct dma_pte *pte, unsigned long pfn,
+ unsigned long start_pfn, unsigned long last_pfn)
+{
+ pfn = max(start_pfn, pfn);
+ pte = &pte[pfn_level_offset(pfn, level)];
+
+ do {
+ unsigned long level_pfn;
+ struct dma_pte *level_pte;
+
+ if (!dma_pte_present(pte) || dma_pte_superpage(pte))
+ goto next;
+
+ level_pfn = pfn & level_mask(level - 1);
+ level_pte = phys_to_virt(dma_pte_addr(pte));
+
+ if (level > 2)
+ dma_pte_free_level(domain, level - 1, level_pte,
+ level_pfn, start_pfn, last_pfn);
+
+ /* If range covers entire pagetable, free it */
+ if (!(start_pfn > level_pfn ||
+ last_pfn < level_pfn + level_size(level))) {
+ dma_clear_pte(pte);
+ domain_flush_cache(domain, pte, sizeof(*pte));
+ free_pgtable_page(level_pte);
+ }
+next:
+ pfn += level_size(level);
+ } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
+}
+
/* free page table pages. last level pte should already be cleared */
static void dma_pte_free_pagetable(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long last_pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
- struct dma_pte *first_pte, *pte;
- int total = agaw_to_level(domain->agaw);
- int level;
- unsigned long tmp;
- int large_page = 2;
BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width);
BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width);
BUG_ON(start_pfn > last_pfn);
/* We don't need lock here; nobody else touches the iova range */
- level = 2;
- while (level <= total) {
- tmp = align_to_level(start_pfn, level);
-
- /* If we can't even clear one PTE at this level, we're done */
- if (tmp + level_size(level) - 1 > last_pfn)
- return;
-
- do {
- large_page = level;
- first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page);
- if (large_page > level)
- level = large_page + 1;
- if (!pte) {
- tmp = align_to_level(tmp + 1, level + 1);
- continue;
- }
- do {
- if (dma_pte_present(pte)) {
- free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
- dma_clear_pte(pte);
- }
- pte++;
- tmp += level_size(level);
- } while (!first_pte_in_page(pte) &&
- tmp + level_size(level) - 1 <= last_pfn);
+ dma_pte_free_level(domain, agaw_to_level(domain->agaw),
+ domain->pgd, 0, start_pfn, last_pfn);
- domain_flush_cache(domain, first_pte,
- (void *)pte - (void *)first_pte);
-
- } while (tmp && tmp + level_size(level) - 1 <= last_pfn);
- level++;
- }
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
free_pgtable_page(domain->pgd);
--
1.8.4
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
[not found] ` <20131002084431.GA20568-fF5Pk5pvG8Y@public.gmane.org>
@ 2013-10-05 23:41 ` Greg KH
0 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2013-10-05 23:41 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
stable-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
On Wed, Oct 02, 2013 at 10:44:31AM +0200, Borislav Petkov wrote:
> On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
> > At best the current code only seems to free the leaf pagetables and
> > the root. If you're unlucky enough to have a large gap (like any
> > QEMU guest with more than 3G of memory), only the first chunk of leaf
> > pagetables are freed (plus the root). This is a massive memory leak.
> > This patch re-writes the pagetable freeing function to use a
> > recursive algorithm and manages to not only free all the pagetables,
> > but does it without any apparent performance loss versus the current
> > broken version.
> >
> > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > ---
> >
> > Suggesting for stable, would like to see some soak time, but it's
> > hard to imagine this being any worse than the current code.
>
> Btw, I have a backport for the 3.0.x series which builds fine here, in
> case you guys are interested :)
Thanks, now applied.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-10-05 23:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-15 16:27 [PATCH] intel-iommu: Fix leaks in pagetable freeing Alex Williamson
[not found] ` <20130615161614.2107.41044.stgit-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2013-07-24 15:25 ` Alex Williamson
[not found] ` <1374679519.1675.1.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2013-08-06 16:08 ` Marcelo Tosatti
2013-08-14 20:23 ` Joerg Roedel
2013-10-02 8:44 ` Borislav Petkov
[not found] ` <20131002084431.GA20568-fF5Pk5pvG8Y@public.gmane.org>
2013-10-05 23:41 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).