kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: PPC: Book3S HV: Fix refcounting of hugepages
@ 2012-05-08 10:24 Paul Mackerras
  2012-05-08 13:10 ` Alexander Graf
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Mackerras @ 2012-05-08 10:24 UTC (permalink / raw)
  To: Alexander Graf, Avi Kivity; +Cc: kvm-ppc, kvm

From: David Gibson <david@gibson.dropbear.id.au>

The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
guest memory pages into host memory so that they can be safely accessed
from usermode.  It does this used get_user_pages_fast().  When the VPA is
unregistered, or the VCPUs are cleaned up, these pages are released using
put_page().

However, the get_user_pages() is invoked on the specific memory are of the
VPA which could lie within hugepages.  In case the pinned page is huge,
we explicitly find the head page of the compound page before calling
put_page() on it.

At least with the latest kernel, this is not correct.  put_page() already
handles finding the correct head page of a compound, and also deals with
various counts on the individual tail page which are important for
transparent huge pages.  We don't support transparent hugepages on Power,
but even so, bypassing this count maintenance can lead (when the VM ends)
to a hugepage being released back to the pool with a non-zero mapcount on
one of the tail pages.  This can then lead to a bad_page() when the page
is released from the hugepage pool.

This removes the explicit compound_head() call to correct this bug.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---

This needs to go upstream into 3.4-rc since it is a regression fix.
The regression is that after using a Book3S HV guest backed by 16MB
large pages, if the large page pool is reduced in size the system will
hit a BUG_ON.  That didn't happen with 3.3.

 arch/powerpc/kvm/book3s_64_mmu_hv.c |   22 +++++++++++++---------
 arch/powerpc/kvm/book3s_hv.c        |    2 --
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ddc485a..c3beaee 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -258,6 +258,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn,
 			    !(memslot->userspace_addr & (s - 1))) {
 				start &= ~(s - 1);
 				pgsize = s;
+				get_page(hpage);
+				put_page(page);
 				page = hpage;
 			}
 		}
@@ -281,11 +283,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn,
 	err = 0;
 
  out:
-	if (got) {
-		if (PageHuge(page))
-			page = compound_head(page);
+	if (got)
 		put_page(page);
-	}
 	return err;
 
  up_err:
@@ -678,8 +677,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		SetPageDirty(page);
 
  out_put:
-	if (page)
-		put_page(page);
+	if (page) {
+		/*
+		 * We drop pages[0] here, not page because page might
+		 * have been set to the head page of a compound, but
+		 * we have to drop the reference on the correct tail
+		 * page to match the get inside gup()
+		 */
+		put_page(pages[0]);
+	}
 	return ret;
 
  out_unlock:
@@ -979,6 +985,7 @@ void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 			pa = *physp;
 		}
 		page = pfn_to_page(pa >> PAGE_SHIFT);
+		get_page(page);
 	} else {
 		hva = gfn_to_hva_memslot(memslot, gfn);
 		npages = get_user_pages_fast(hva, 1, 1, pages);
@@ -991,8 +998,6 @@ void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 		page = compound_head(page);
 		psize <<= compound_order(page);
 	}
-	if (!kvm->arch.using_mmu_notifiers)
-		get_page(page);
 	offset = gpa & (psize - 1);
 	if (nb_ret)
 		*nb_ret = psize - offset;
@@ -1003,7 +1008,6 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va)
 {
 	struct page *page = virt_to_page(va);
 
-	page = compound_head(page);
 	put_page(page);
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d0bc79b..146c17c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1368,8 +1368,6 @@ static void unpin_slot(struct kvm *kvm, int slot_id)
 				continue;
 			pfn = physp[j] >> PAGE_SHIFT;
 			page = pfn_to_page(pfn);
-			if (PageHuge(page))
-				page = compound_head(page);
 			SetPageDirty(page);
 			put_page(page);
 		}
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: PPC: Book3S HV: Fix refcounting of hugepages
  2012-05-08 10:24 [PATCH] KVM: PPC: Book3S HV: Fix refcounting of hugepages Paul Mackerras
@ 2012-05-08 13:10 ` Alexander Graf
  2012-05-08 14:54   ` Avi Kivity
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Graf @ 2012-05-08 13:10 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Avi Kivity, kvm-ppc, kvm


On 08.05.2012, at 12:24, Paul Mackerras wrote:

> From: David Gibson <david@gibson.dropbear.id.au>
> 
> The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
> guest memory pages into host memory so that they can be safely accessed
> from usermode.  It does this used get_user_pages_fast().  When the VPA is
> unregistered, or the VCPUs are cleaned up, these pages are released using
> put_page().
> 
> However, the get_user_pages() is invoked on the specific memory are of the
> VPA which could lie within hugepages.  In case the pinned page is huge,
> we explicitly find the head page of the compound page before calling
> put_page() on it.
> 
> At least with the latest kernel, this is not correct.  put_page() already
> handles finding the correct head page of a compound, and also deals with
> various counts on the individual tail page which are important for
> transparent huge pages.  We don't support transparent hugepages on Power,
> but even so, bypassing this count maintenance can lead (when the VM ends)
> to a hugepage being released back to the pool with a non-zero mapcount on
> one of the tail pages.  This can then lead to a bad_page() when the page
> is released from the hugepage pool.
> 
> This removes the explicit compound_head() call to correct this bug.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Alexander Graf <agraf@suse.de>

Avi, could you please make sure this makes the next 3.4-rc or -stable?


Thanks!

Alex

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: PPC: Book3S HV: Fix refcounting of hugepages
  2012-05-08 13:10 ` Alexander Graf
@ 2012-05-08 14:54   ` Avi Kivity
  0 siblings, 0 replies; 3+ messages in thread
From: Avi Kivity @ 2012-05-08 14:54 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, kvm-ppc, kvm

On 05/08/2012 04:10 PM, Alexander Graf wrote:
> On 08.05.2012, at 12:24, Paul Mackerras wrote:
>
> > From: David Gibson <david@gibson.dropbear.id.au>
> > 
> > The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
> > guest memory pages into host memory so that they can be safely accessed
> > from usermode.  It does this used get_user_pages_fast().  When the VPA is
> > unregistered, or the VCPUs are cleaned up, these pages are released using
> > put_page().
> > 
> > However, the get_user_pages() is invoked on the specific memory are of the
> > VPA which could lie within hugepages.  In case the pinned page is huge,
> > we explicitly find the head page of the compound page before calling
> > put_page() on it.
> > 
> > At least with the latest kernel, this is not correct.  put_page() already
> > handles finding the correct head page of a compound, and also deals with
> > various counts on the individual tail page which are important for
> > transparent huge pages.  We don't support transparent hugepages on Power,
> > but even so, bypassing this count maintenance can lead (when the VM ends)
> > to a hugepage being released back to the pool with a non-zero mapcount on
> > one of the tail pages.  This can then lead to a bad_page() when the page
> > is released from the hugepage pool.
> > 
> > This removes the explicit compound_head() call to correct this bug.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
>
> Acked-by: Alexander Graf <agraf@suse.de>
>
> Avi, could you please make sure this makes the next 3.4-rc or -stable?
>
>

Sure, applied to master, will push tomorrow.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-08 14:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-08 10:24 [PATCH] KVM: PPC: Book3S HV: Fix refcounting of hugepages Paul Mackerras
2012-05-08 13:10 ` Alexander Graf
2012-05-08 14:54   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).