linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: kvm-ppc@vger.kernel.org
Cc: Nicholas Piggin <npiggin@gmail.com>, linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v2 5/5] KVM: PPC: Book3S HV: radix do not clear partition scoped page table when page fault races with other vCPUs.
Date: Mon, 16 Apr 2018 14:32:40 +1000	[thread overview]
Message-ID: <20180416043240.8796-6-npiggin@gmail.com> (raw)
In-Reply-To: <20180416043240.8796-1-npiggin@gmail.com>

When running a SMP radix guest, KVM can get into page fault / tlbie
storms -- hundreds of thousands to the same address from different
threads -- due to partition scoped page faults invalidating the
page table entry if it was found to be already set up by a racing
CPU.

What can happen is that guest threads can hit page faults for the
same addresses, this can happen when KSM or THP takes out a commonly
used page. gRA zero (the interrupt vectors and important kernel text)
was a common one. Multiple CPUs will page fault and contend on the
same lock, when one CPU sets up the page table and releases the lock,
the next will find the new entry and invalidate it before installing
its own, which causes other page faults which invalidate that entry,
etc.

The solution to this is to avoid invalidating the entry or flushing
TLBs in case of a race. The pte may still need bits updated, but
those are to add R/C or relax access restrictions so no flush is
required.

This solves the page fault / tlbie storms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 52 ++++++++++++++++----------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index dab6b622011c..2d3af22f90dd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -199,7 +199,6 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 	pud_t *pud, *new_pud = NULL;
 	pmd_t *pmd, *new_pmd = NULL;
 	pte_t *ptep, *new_ptep = NULL;
-	unsigned long old;
 	int ret;
 
 	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
@@ -243,6 +242,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 	pmd = pmd_offset(pud, gpa);
 	if (pmd_is_leaf(*pmd)) {
 		unsigned long lgpa = gpa & PMD_MASK;
+		pte_t old_pte = *pmdp_ptep(pmd);
 
 		/*
 		 * If we raced with another CPU which has just put
@@ -252,18 +252,22 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 			ret = -EAGAIN;
 			goto out_unlock;
 		}
-		/* Valid 2MB page here already, remove it */
-		old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd),
-					      ~0UL, 0, lgpa, PMD_SHIFT);
-		kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT);
-		if (old & _PAGE_DIRTY) {
-			unsigned long gfn = lgpa >> PAGE_SHIFT;
-			struct kvm_memory_slot *memslot;
-			memslot = gfn_to_memslot(kvm, gfn);
-			if (memslot && memslot->dirty_bitmap)
-				kvmppc_update_dirty_map(memslot,
-							gfn, PMD_SIZE);
+
+		/* PTE was previously valid, so update it */
+		if (pte_val(old_pte) == pte_val(pte)) {
+			ret = -EAGAIN;
+			goto out_unlock;
 		}
+
+		/* Make sure we're weren't trying to take bits away */
+		WARN_ON_ONCE(pte_pfn(old_pte) != pte_pfn(pte));
+		WARN_ON_ONCE((pte_val(old_pte) & ~pte_val(pte)) &
+			(_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE));
+
+		kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd),
+					0, pte_val(pte), lgpa, PMD_SHIFT);
+		ret = 0;
+		goto out_unlock;
 	} else if (level == 1 && !pmd_none(*pmd)) {
 		/*
 		 * There's a page table page here, but we wanted
@@ -274,6 +278,8 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 		goto out_unlock;
 	}
 	if (level == 0) {
+		pte_t old_pte;
+
 		if (pmd_none(*pmd)) {
 			if (!new_ptep)
 				goto out_unlock;
@@ -281,13 +287,21 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 			new_ptep = NULL;
 		}
 		ptep = pte_offset_kernel(pmd, gpa);
-		if (pte_present(*ptep)) {
-			/* PTE was previously valid, so invalidate it */
-			old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
-						      0, gpa, 0);
-			kvmppc_radix_tlbie_page(kvm, gpa, 0);
-			if (old & _PAGE_DIRTY)
-				mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
+		old_pte = *ptep;
+		if (pte_present(old_pte)) {
+			/* PTE was previously valid, so update it */
+			if (pte_val(old_pte) == pte_val(pte)) {
+				ret = -EAGAIN;
+				goto out_unlock;
+			}
+
+			/* Make sure we're weren't trying to take bits away */
+			WARN_ON_ONCE(pte_pfn(old_pte) != pte_pfn(pte));
+			WARN_ON_ONCE((pte_val(old_pte) & ~pte_val(pte)) &
+				(_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE));
+
+			kvmppc_radix_update_pte(kvm, ptep, 0,
+						pte_val(pte), gpa, 0);
 		}
 		kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
 	} else {
-- 
2.17.0

  parent reply	other threads:[~2018-04-16  4:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-16  4:32 [PATCH v2 0/5] KVM TLB flushing improvements (for radix) Nicholas Piggin
2018-04-16  4:32 ` [PATCH v2 1/5] KVM: PPC: Book3S HV: radix use correct tlbie sequence in kvmppc_radix_tlbie_page Nicholas Piggin
2018-04-16  4:32 ` [PATCH v2 2/5] powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM Nicholas Piggin
2018-04-16  4:32 ` [PATCH v2 3/5] KVM: PPC: Book3S HV: radix use the Linux TLB flush function in kvmppc_radix_tlbie_page Nicholas Piggin
2018-04-16  4:32 ` [PATCH v2 4/5] KVM: PPC: Book3S HV: radix handle process scoped LPID flush in C, with relocation on Nicholas Piggin
2018-04-16  4:32 ` Nicholas Piggin [this message]
2018-04-17  0:17   ` [PATCH v2 5/5] KVM: PPC: Book3S HV: radix do not clear partition scoped page table when page fault races with other vCPUs Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180416043240.8796-6-npiggin@gmail.com \
    --to=npiggin@gmail.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).