public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: "kvm@vger.kernel.org mailing list" <kvm@vger.kernel.org>
Cc: kvm-ppc@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Paul Mackerras <paulus@samba.org>
Subject: [PULL 20/51] KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe
Date: Thu, 31 Oct 2013 22:18:05 +0100	[thread overview]
Message-ID: <1383254316-11243-21-git-send-email-agraf@suse.de> (raw)
In-Reply-To: <1383254316-11243-1-git-send-email-agraf@suse.de>

From: Paul Mackerras <paulus@samba.org>

This adds a per-VM mutex to provide mutual exclusion between vcpus
for accesses to and updates of the guest hashed page table (HPT).
This also makes the code use single-byte writes to the HPT entry
when updating of the reference (R) and change (C) bits.  The reason
for doing this, rather than writing back the whole HPTE, is that on
non-PAPR virtual machines, the guest OS might be writing to the HPTE
concurrently, and writing back the whole HPTE might conflict with
that.  Also, real hardware does single-byte writes to update R and C.

The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading
the HPT and updating R and/or C, and in the PAPR HPT update hcalls
(H_ENTER, H_REMOVE, etc.).  Having the mutex means that we don't need
to use a hypervisor lock bit in the HPT update hcalls, and we don't
need to be careful about the order in which the bytes of the HPTE are
updated by those hcalls.

The other change here is to make emulated TLB invalidations (tlbie)
effective across all vcpus.  To do this we call kvmppc_mmu_pte_vflush
for all vcpus in kvmppc_ppc_book3s_64_tlbie().

For 32-bit, this makes the setting of the accessed and dirty bits use
single-byte writes, and makes tlbie invalidate shadow HPTEs for all
vcpus.

With this, PR KVM can successfully run SMP guests.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h |  3 +++
 arch/powerpc/kvm/book3s_32_mmu.c    | 36 ++++++++++++++++++++++--------------
 arch/powerpc/kvm/book3s_64_mmu.c    | 33 +++++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_pr.c        |  1 +
 arch/powerpc/kvm/book3s_pr_papr.c   | 33 +++++++++++++++++++++++----------
 5 files changed, 72 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3d8b8a8..0fe4872 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,9 @@ struct kvm_arch {
 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 	int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
+#ifdef CONFIG_KVM_BOOK3S_PR
+	struct mutex hpt_mutex;
+#endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index af04553..856af98 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -271,19 +271,22 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Update PTE C and A bits, so the guest's swapper knows we used the
 	   page */
 	if (found) {
-		u32 oldpte = pteg[i+1];
-
-		if (pte->may_read)
-			pteg[i+1] |= PTEG_FLAG_ACCESSED;
-		if (pte->may_write)
-			pteg[i+1] |= PTEG_FLAG_DIRTY;
-		else
-			dprintk_pte("KVM: Mapping read-only page!\n");
-
-		/* Write back into the PTEG */
-		if (pteg[i+1] != oldpte)
-			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
-
+		u32 pte_r = pteg[i+1];
+		char __user *addr = (char __user *) &pteg[i+1];
+
+		/*
+		 * Use single-byte writes to update the HPTE, to
+		 * conform to what real hardware does.
+		 */
+		if (pte->may_read && !(pte_r & PTEG_FLAG_ACCESSED)) {
+			pte_r |= PTEG_FLAG_ACCESSED;
+			put_user(pte_r >> 8, addr + 2);
+		}
+		if (pte->may_write && !(pte_r & PTEG_FLAG_DIRTY)) {
+			/* XXX should only set this for stores */
+			pte_r |= PTEG_FLAG_DIRTY;
+			put_user(pte_r, addr + 3);
+		}
 		return 0;
 	}
 
@@ -348,7 +351,12 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
 
 static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
 {
-	kvmppc_mmu_pte_flush(vcpu, ea, 0x0FFFF000);
+	int i;
+	struct kvm_vcpu *v;
+
+	/* flush this VA on all cpus */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		kvmppc_mmu_pte_flush(v, ea, 0x0FFFF000);
 }
 
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 9e6e112..ad9ecfd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -257,6 +257,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	pgsize = slbe->large ? MMU_PAGE_16M : MMU_PAGE_4K;
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
+
 do_second:
 	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
 	if (kvm_is_error_hva(ptegp))
@@ -332,30 +334,37 @@ do_second:
 
 	/* Update PTE R and C bits, so the guest's swapper knows we used the
 	 * page */
-	if (gpte->may_read) {
-		/* Set the accessed flag */
+	if (gpte->may_read && !(r & HPTE_R_R)) {
+		/*
+		 * Set the accessed flag.
+		 * We have to write this back with a single byte write
+		 * because another vcpu may be accessing this on
+		 * non-PAPR platforms such as mac99, and this is
+		 * what real hardware does.
+		 */
+		char __user *addr = (char __user *) &pteg[i+1];
 		r |= HPTE_R_R;
+		put_user(r >> 8, addr + 6);
 	}
-	if (data && gpte->may_write) {
+	if (data && gpte->may_write && !(r & HPTE_R_C)) {
 		/* Set the dirty flag -- XXX even if not writing */
+		/* Use a single byte write */
+		char __user *addr = (char __user *) &pteg[i+1];
 		r |= HPTE_R_C;
+		put_user(r, addr + 7);
 	}
 
-	/* Write back into the PTEG */
-	if (pteg[i+1] != r) {
-		pteg[i+1] = r;
-		copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
-	}
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 
 	if (!gpte->may_read)
 		return -EPERM;
 	return 0;
 
 no_page_found:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	return -ENOENT;
 
 no_seg_found:
-
 	dprintk("KVM MMU: Trigger segment fault\n");
 	return -EINVAL;
 }
@@ -520,6 +529,8 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 				       bool large)
 {
 	u64 mask = 0xFFFFFFFFFULL;
+	long i;
+	struct kvm_vcpu *v;
 
 	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
 
@@ -542,7 +553,9 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 		if (large)
 			mask = 0xFFFFFF000ULL;
 	}
-	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+	/* flush this VA on all vcpus */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		kvmppc_mmu_pte_vflush(v, va >> 12, mask);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index e9e8c74..4fa73c3 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1422,6 +1422,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
+	mutex_init(&kvm->arch.hpt_mutex);
 
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
 		spin_lock(&kvm_global_user_count_lock);
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 38f1899..5efa97b 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -48,6 +48,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
 	pte_index &= ~7UL;
 	pteg_addr = get_pteg_addr(vcpu, pte_index);
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
 	hpte = pteg;
 
@@ -74,6 +75,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
 	ret = H_SUCCESS;
 
  done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
@@ -86,26 +88,31 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu)
 	unsigned long avpn = kvmppc_get_gpr(vcpu, 6);
 	unsigned long v = 0, pteg, rb;
 	unsigned long pte[2];
+	long int ret;
 
 	pteg = get_pteg_addr(vcpu, pte_index);
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pte, (void __user *)pteg, sizeof(pte));
 
+	ret = H_NOT_FOUND;
 	if ((pte[0] & HPTE_V_VALID) == 0 ||
 	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn) ||
-	    ((flags & H_ANDCOND) && (pte[0] & avpn) != 0)) {
-		kvmppc_set_gpr(vcpu, 3, H_NOT_FOUND);
-		return EMULATE_DONE;
-	}
+	    ((flags & H_ANDCOND) && (pte[0] & avpn) != 0))
+		goto done;
 
 	copy_to_user((void __user *)pteg, &v, sizeof(v));
 
 	rb = compute_tlbie_rb(pte[0], pte[1], pte_index);
 	vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false);
 
-	kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+	ret = H_SUCCESS;
 	kvmppc_set_gpr(vcpu, 4, pte[0]);
 	kvmppc_set_gpr(vcpu, 5, pte[1]);
 
+ done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
+	kvmppc_set_gpr(vcpu, 3, ret);
+
 	return EMULATE_DONE;
 }
 
@@ -133,6 +140,7 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu)
 	int paramnr = 4;
 	int ret = H_SUCCESS;
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	for (i = 0; i < H_BULK_REMOVE_MAX_BATCH; i++) {
 		unsigned long tsh = kvmppc_get_gpr(vcpu, paramnr+(2*i));
 		unsigned long tsl = kvmppc_get_gpr(vcpu, paramnr+(2*i)+1);
@@ -181,6 +189,7 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu)
 		}
 		kvmppc_set_gpr(vcpu, paramnr+(2*i), tsh);
 	}
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
@@ -193,15 +202,16 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
 	unsigned long avpn = kvmppc_get_gpr(vcpu, 6);
 	unsigned long rb, pteg, r, v;
 	unsigned long pte[2];
+	long int ret;
 
 	pteg = get_pteg_addr(vcpu, pte_index);
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pte, (void __user *)pteg, sizeof(pte));
 
+	ret = H_NOT_FOUND;
 	if ((pte[0] & HPTE_V_VALID) == 0 ||
-	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn)) {
-		kvmppc_set_gpr(vcpu, 3, H_NOT_FOUND);
-		return EMULATE_DONE;
-	}
+	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn))
+		goto done;
 
 	v = pte[0];
 	r = pte[1];
@@ -216,8 +226,11 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
 	rb = compute_tlbie_rb(v, r, pte_index);
 	vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false);
 	copy_to_user((void __user *)pteg, pte, sizeof(pte));
+	ret = H_SUCCESS;
 
-	kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+ done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
+	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
 }
-- 
1.8.1.4

  parent reply	other threads:[~2013-10-31 21:18 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-31 21:17 [PULL 00/51] ppc patch queue 2013-10-31 Alexander Graf
2013-10-31 21:17 ` [PULL 01/51] KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg Alexander Graf
2013-10-31 21:17 ` [PULL 02/51] KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers Alexander Graf
2013-10-31 21:17 ` [PULL 03/51] KVM: PPC: Book3S HV: Implement timebase offset for guests Alexander Graf
2013-10-31 21:17 ` [PULL 04/51] KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE Alexander Graf
2013-10-31 21:17 ` [PULL 05/51] KVM: PPC: Book3S HV: Implement H_CONFER Alexander Graf
2013-10-31 21:17 ` [PULL 06/51] KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine Alexander Graf
2013-10-31 21:17 ` [PULL 07/51] KVM: PPC: Book3S HV: Pull out interrupt-reading code into " Alexander Graf
2013-10-31 21:17 ` [PULL 08/51] KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count Alexander Graf
2013-10-31 21:17 ` [PULL 09/51] KVM: PPC: BookE: Add GET/SET_ONE_REG interface for VRSAVE Alexander Graf
2013-10-31 21:17 ` [PULL 10/51] KVM: PPC: Book3S HV: Store LPCR value for each virtual core Alexander Graf
2013-10-31 21:17 ` [PULL 11/51] KVM: PPC: Book3S HV: Add support for guest Program Priority Register Alexander Graf
2013-10-31 21:17 ` [PULL 12/51] KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 Alexander Graf
2013-10-31 21:17 ` [PULL 13/51] KVM: PPC: Book3S HV: Don't crash host on unknown guest interrupt Alexander Graf
2013-10-31 21:17 ` [PULL 14/51] KVM: PPC: Book3S PR: Fix compilation without CONFIG_ALTIVEC Alexander Graf
2013-10-31 21:18 ` [PULL 15/51] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Alexander Graf
2013-10-31 21:18 ` [PULL 16/51] KVM: PPC: Book3S PR: Allow guest to use 64k pages Alexander Graf
2013-10-31 21:18 ` [PULL 17/51] KVM: PPC: Book3S PR: Use 64k host pages where possible Alexander Graf
2013-10-31 21:18 ` [PULL 18/51] KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs Alexander Graf
2013-10-31 21:18 ` [PULL 19/51] KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation Alexander Graf
2013-10-31 21:18 ` Alexander Graf [this message]
2013-10-31 21:18 ` [PULL 21/51] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache Alexander Graf
2013-10-31 21:18 ` [PULL 22/51] KVM: PPC: Book3S: Move skip-interrupt handlers to common code Alexander Graf
2013-10-31 21:18 ` [PULL 23/51] KVM: PPC: Book3S PR: Better handling of host-side read-only pages Alexander Graf
2013-10-31 21:18 ` [PULL 24/51] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Alexander Graf
2013-10-31 21:18 ` [PULL 25/51] KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written Alexander Graf
2013-10-31 21:18 ` [PULL 26/51] KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers Alexander Graf
2013-10-31 21:18 ` [PULL 27/51] kvm: powerpc: book3s hv: Fix vcore leak Alexander Graf
2013-10-31 21:18 ` [PULL 28/51] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode Alexander Graf
2013-10-31 21:18 ` [PULL 29/51] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN Alexander Graf
2013-10-31 21:18 ` [PULL 30/51] kvm: powerpc: allow guest control "E" attribute in mas2 Alexander Graf
2013-10-31 21:18 ` [PULL 31/51] kvm: powerpc: allow guest control "G" " Alexander Graf
2013-10-31 21:18 ` [PULL 32/51] kvm: powerpc: e500: mark page accessed when mapping a guest page Alexander Graf
2013-10-31 21:18 ` [PULL 33/51] powerpc: remove unnecessary line continuations Alexander Graf
2013-10-31 21:18 ` [PULL 34/51] powerpc: move debug registers in a structure Alexander Graf
2013-11-03 14:30   ` Gleb Natapov
2013-11-03 20:56     ` Benjamin Herrenschmidt
2013-11-04  0:03       ` Scott Wood
2013-11-04  6:43         ` Alexander Graf
2013-11-04  6:51           ` Benjamin Herrenschmidt
2013-10-31 21:18 ` [PULL 35/51] powerpc: export debug registers save function for KVM Alexander Graf
2013-10-31 21:18 ` [PULL 36/51] KVM: PPC: E500: exit to user space on "ehpriv 1" instruction Alexander Graf
2013-10-31 21:18 ` [PULL 37/51] KVM: PPC: E500: Using "struct debug_reg" Alexander Graf
2013-10-31 21:18 ` [PULL 38/51] KVM: PPC: E500: Add userspace debug stub support Alexander Graf
2013-10-31 21:18 ` [PULL 39/51] kvm: powerpc: book3s: remove kvmppc_handler_highmem label Alexander Graf
2013-10-31 21:18 ` [PULL 40/51] kvm: powerpc: book3s: move book3s_64_vio_hv.c into the main kernel binary Alexander Graf
2013-10-31 21:18 ` [PULL 41/51] kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE Alexander Graf
2013-10-31 21:18 ` [PULL 42/51] kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE Alexander Graf
2013-10-31 21:18 ` [PULL 43/51] kvm: powerpc: Add kvmppc_ops callback Alexander Graf
2013-10-31 21:18 ` [PULL 44/51] kvm: powerpc: book3s: Cleanup interrupt handling code Alexander Graf
2013-10-31 21:18 ` [PULL 45/51] kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops Alexander Graf
2013-10-31 21:18 ` [PULL 46/51] kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header Alexander Graf
2013-10-31 21:18 ` [PULL 47/51] kvm: powerpc: booke: Move booke related tracepoints to " Alexander Graf
2013-10-31 21:18 ` [PULL 48/51] kvm: powerpc: book3s: Support building HV and PR KVM as module Alexander Graf
2013-10-31 21:18 ` [PULL 49/51] kvm: Add struct kvm arg to memslot APIs Alexander Graf
2013-10-31 21:18 ` [PULL 50/51] kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine Alexander Graf
2013-10-31 21:18 ` [PULL 51/51] kvm: powerpc: book3s: drop is_hv_enabled Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1383254316-11243-21-git-send-email-agraf@suse.de \
    --to=agraf@suse.de \
    --cc=gleb@redhat.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox