[PATCH 00/23] Allow PR and HV KVM to coexist in one kernel

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel
@ 2013-08-06  4:12 Paul Mackerras
  2013-08-06  4:13 ` [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation Paul Mackerras
                   ` (22 more replies)
  0 siblings, 23 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:12 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This series aims at making it possible to have one kernel image with
both PR and HV KVM code included, so that guests can be run under HV
KVM using hypervisor mode if available, or under PR KVM if hypervisor
mode is not available or the guest is not a PAPR guest.

One of the difficulties in doing this is that the userspace may, and
if the userspace is QEMU, does call the KVM_PPC_GET_SMMU_INFO ioctl
before we have enough information to decide whether the guest should
use PR or HV KVM.  To overcome this, the series first enhances PR KVM
to have the same set of MMU features as current real hardware,
i.e. support for 64kB pages and 1TB segments.  Thus it's possible to
construct a result for KVM_PPC_GET_SMMU_INFO that it suitable for
either PR or HV KVM.

With this series, guests start out as PR guests.  At the time when the
KVM_CAP_PPC_PAPR capability is enabled, the guest gets converted into
a HV guest if possible.  This way, a non-PAPR guest will naturally run
using PR KVM.

The series also makes quite a lot of other improvements to PR KVM,
notably to run SMP guests and to work better with KSM.  For best
results my patch "powerpc: Implement __get_user_pages_fast()" is also
needed to allow the generic KVM code to detect correctly when pages
for which we have only requested read access are actually writable.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
@ 2013-08-06  4:13 ` Paul Mackerras
  2013-08-28 22:51   ` Alexander Graf
  2013-08-06  4:14 ` [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX Paul Mackerras
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:13 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and
H_XIRR_X in XICS emulation") added a call to get_tb() but didn't
include the header that defines it, and on some configs this means
book3s_xics.c fails to compile:

arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’:
arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration]

Cc: stable@vger.kernel.org [v3.10]
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 94c1dd4..a3a5cb8 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -19,6 +19,7 @@
 #include <asm/hvcall.h>
 #include <asm/xics.h>
 #include <asm/debug.h>
+#include <asm/time.h>
 
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
  2013-08-06  4:13 ` [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation Paul Mackerras
@ 2013-08-06  4:14 ` Paul Mackerras
  2013-08-08 15:49   ` Aneesh Kumar K.V
  2013-08-28 22:51   ` Alexander Graf
  2013-08-06  4:15 ` [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls Paul Mackerras
                   ` (20 subsequent siblings)
  22 siblings, 2 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:14 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Currently the code assumes that once we load up guest FP/VSX or VMX
state into the CPU, it stays valid in the CPU registers until we
explicitly flush it to the thread_struct.  However, on POWER7,
copy_page() and memcpy() can use VMX.  These functions do flush the
VMX state to the thread_struct before using VMX instructions, but if
this happens while we have guest state in the VMX registers, and we
then re-enter the guest, we don't reload the VMX state from the
thread_struct, leading to guest corruption.  This has been observed
to cause guest processes to segfault.

To fix this, we check before re-entering the guest that all of the
bits corresponding to facilities owned by the guest, as expressed
in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
Any bits that have been cleared correspond to facilities that have
been used by kernel code and thus flushed to the thread_struct, so
for them we reload the state from the thread_struct.

We also need to check current->thread.regs->msr before calling
giveup_fpu() or giveup_altivec(), since if the relevant bit is
clear, the state has already been flushed to the thread_struct and
to flush it again would corrupt it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_pr.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ddfaf56..adeab19 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -468,7 +468,8 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
 		 * both the traditional FP registers and the added VSX
 		 * registers into thread.fpr[].
 		 */
-		giveup_fpu(current);
+		if (current->thread.regs->msr & MSR_FP)
+			giveup_fpu(current);
 		for (i = 0; i < ARRAY_SIZE(vcpu->arch.fpr); i++)
 			vcpu_fpr[i] = thread_fpr[get_fpr_index(i)];
 
@@ -483,7 +484,8 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
 
 #ifdef CONFIG_ALTIVEC
 	if (msr & MSR_VEC) {
-		giveup_altivec(current);
+		if (current->thread.regs->msr & MSR_VEC)
+			giveup_altivec(current);
 		memcpy(vcpu->arch.vr, t->vr, sizeof(vcpu->arch.vr));
 		vcpu->arch.vscr = t->vscr;
 	}
@@ -575,8 +577,6 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 	printk(KERN_INFO "Loading up ext 0x%lx\n", msr);
 #endif
 
-	current->thread.regs->msr |= msr;
-
 	if (msr & MSR_FP) {
 		for (i = 0; i < ARRAY_SIZE(vcpu->arch.fpr); i++)
 			thread_fpr[get_fpr_index(i)] = vcpu_fpr[i];
@@ -598,12 +598,32 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 #endif
 	}
 
+	current->thread.regs->msr |= msr;
 	vcpu->arch.guest_owned_ext |= msr;
 	kvmppc_recalc_shadow_msr(vcpu);
 
 	return RESUME_GUEST;
 }
 
+/*
+ * Kernel code using FP or VMX could have flushed guest state to
+ * the thread_struct; if so, get it back now.
+ */
+static void kvmppc_handle_lost_ext(struct kvm_vcpu *vcpu)
+{
+	unsigned long lost_ext;
+
+	lost_ext = vcpu->arch.guest_owned_ext & ~current->thread.regs->msr;
+	if (!lost_ext)
+		return;
+
+	if (lost_ext & MSR_FP)
+		kvmppc_load_up_fpu();
+	if (lost_ext & MSR_VEC)
+		kvmppc_load_up_altivec();
+	current->thread.regs->msr |= lost_ext;
+}
+
 int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
                        unsigned int exit_nr)
 {
@@ -892,6 +912,7 @@ program_interrupt:
 		} else {
 			kvmppc_fix_ee_before_entry();
 		}
+		kvmppc_handle_lost_ext(vcpu);
 	}
 
 	trace_kvm_book3s_reenter(r, vcpu);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
  2013-08-06  4:13 ` [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation Paul Mackerras
  2013-08-06  4:14 ` [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX Paul Mackerras
@ 2013-08-06  4:15 ` Paul Mackerras
  2013-08-28 22:51   ` Alexander Graf
  2013-08-06  4:16 ` [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Paul Mackerras
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:15 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

It turns out that if we exit the guest due to a hcall instruction (sc 1),
and the loading of the instruction in the guest exit path fails for any
reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
instruction after the hcall instruction rather than the hcall itself.
This in turn means that the instruction doesn't get recognized as an
hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
as a sc instruction.  That usually results in the guest kernel getting
a return code of 38 (ENOSYS) from an hcall, which often triggers a
BUG_ON() or other failure.

This fixes the problem by adding a new variant of kvmppc_get_last_inst()
called kvmppc_get_last_sc(), which fetches the instruction if necessary
from pc - 4 rather than pc.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h | 38 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_pr.c          |  2 +-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 08891d0..fa19e2f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -334,6 +334,27 @@ static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+/*
+ * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
+ * Because the sc instruction sets SRR0 to point to the following
+ * instruction, we have to fetch from pc - 4.
+ */
+static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
+{
+	ulong pc = kvmppc_get_pc(vcpu) - 4;
+	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+	u32 r;
+
+	/* Load the instruction manually if it failed to do so in the
+	 * exit path */
+	if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
+
+	r = svcpu->last_inst;
+	svcpu_put(svcpu);
+	return r;
+}
+
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
@@ -446,6 +467,23 @@ static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
 	return vcpu->arch.last_inst;
 }
 
+/*
+ * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
+ * Because the sc instruction sets SRR0 to point to the following
+ * instruction, we have to fetch from pc - 4.
+ */
+static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
+{
+	ulong pc = kvmppc_get_pc(vcpu) - 4;
+
+	/* Load the instruction manually if it failed to do so in the
+	 * exit path */
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
+
+	return vcpu->arch.last_inst;
+}
+
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault_dar;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index adeab19..6cb29ef 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -792,7 +792,7 @@ program_interrupt:
 	}
 	case BOOK3S_INTERRUPT_SYSCALL:
 		if (vcpu->arch.papr_enabled &&
-		    (kvmppc_get_last_inst(vcpu) == 0x44000022) &&
+		    (kvmppc_get_last_sc(vcpu) == 0x44000022) &&
 		    !(vcpu->arch.shared->msr & MSR_PR)) {
 			/* SC 1 papr hypercalls */
 			ulong cmd = kvmppc_get_gpr(vcpu, 3);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (2 preceding siblings ...)
  2013-08-06  4:15 ` [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls Paul Mackerras
@ 2013-08-06  4:16 ` Paul Mackerras
  2013-08-11 11:06   ` Aneesh Kumar K.V
  2013-08-28 22:00   ` Alexander Graf
  2013-08-06  4:18 ` [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() Paul Mackerras
                   ` (18 subsequent siblings)
  22 siblings, 2 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:16 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Currently PR-style KVM keeps the volatile guest register values
(R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
places, a kmalloc'd struct and in the PACA, and it gets copied back
and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
can't rely on being able to access the kmalloc'd struct.

This changes the code to copy the volatile values into the shadow_vcpu
as one of the last things done before entering the guest.  Similarly
the values are copied back out of the shadow_vcpu to the kvm_vcpu
immediately after exiting the guest.  We arrange for interrupts to be
still disabled at this point so that we can't get preempted on 64-bit
and end up copying values from the wrong PACA.

This means that the accessor functions in kvm_book3s.h for these
registers are greatly simplified, and are same between PR and HV KVM.
In places where accesses to shadow_vcpu fields are now replaced by
accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.

With this, the time to read the PVR one million times in a loop went
from 582.1ms to 584.3ms (averages of 10 values), a difference which is
not statistically significant given the variability of the results
(the standard deviations were 9.5ms and 8.6ms respectively).  A version
of the patch that used loops to copy the GPR values increased that time
by around 5% to 611.2ms, so the loop has been unrolled.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h     | 220 +++++-------------------------
 arch/powerpc/include/asm/kvm_book3s_asm.h |   6 +-
 arch/powerpc/include/asm/kvm_host.h       |   1 +
 arch/powerpc/kernel/asm-offsets.c         |   4 +-
 arch/powerpc/kvm/book3s_emulate.c         |   8 +-
 arch/powerpc/kvm/book3s_interrupts.S      |  26 +++-
 arch/powerpc/kvm/book3s_pr.c              | 122 ++++++++++++-----
 arch/powerpc/kvm/book3s_rmhandlers.S      |   5 -
 arch/powerpc/kvm/trace.h                  |   7 +-
 9 files changed, 156 insertions(+), 243 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index fa19e2f..a8897c1 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -198,140 +198,76 @@ extern void kvm_return_point(void);
 #include <asm/kvm_book3s_64.h>
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_PR
-
-static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
-{
-	return to_book3s(vcpu)->hior;
-}
-
-static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
-			unsigned long pending_now, unsigned long old_pending)
-{
-	if (pending_now)
-		vcpu->arch.shared->int_pending = 1;
-	else if (old_pending)
-		vcpu->arch.shared->int_pending = 0;
-}
-
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
-	if ( num < 14 ) {
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-		svcpu->gpr[num] = val;
-		svcpu_put(svcpu);
-		to_book3s(vcpu)->shadow_vcpu->gpr[num] = val;
-	} else
-		vcpu->arch.gpr[num] = val;
+	vcpu->arch.gpr[num] = val;
 }
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
-	if ( num < 14 ) {
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-		ulong r = svcpu->gpr[num];
-		svcpu_put(svcpu);
-		return r;
-	} else
-		return vcpu->arch.gpr[num];
+	return vcpu->arch.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	svcpu->cr = val;
-	svcpu_put(svcpu);
-	to_book3s(vcpu)->shadow_vcpu->cr = val;
+	vcpu->arch.cr = val;
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	u32 r;
-	r = svcpu->cr;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.cr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	svcpu->xer = val;
-	to_book3s(vcpu)->shadow_vcpu->xer = val;
-	svcpu_put(svcpu);
+	vcpu->arch.xer = val;
 }
 
 static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	u32 r;
-	r = svcpu->xer;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.xer;
 }
 
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	svcpu->ctr = val;
-	svcpu_put(svcpu);
+	vcpu->arch.ctr = val;
 }
 
 static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	ulong r;
-	r = svcpu->ctr;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.ctr;
 }
 
 static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	svcpu->lr = val;
-	svcpu_put(svcpu);
+	vcpu->arch.lr = val;
 }
 
 static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	ulong r;
-	r = svcpu->lr;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.lr;
 }
 
 static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	svcpu->pc = val;
-	svcpu_put(svcpu);
+	vcpu->arch.pc = val;
 }
 
 static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	ulong r;
-	r = svcpu->pc;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.pc;
 }
 
 static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
 {
 	ulong pc = kvmppc_get_pc(vcpu);
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	u32 r;
 
 	/* Load the instruction manually if it failed to do so in the
 	 * exit path */
-	if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
 
-	r = svcpu->last_inst;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.last_inst;
 }
 
 /*
@@ -342,26 +278,34 @@ static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
 static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
 {
 	ulong pc = kvmppc_get_pc(vcpu) - 4;
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	u32 r;
 
 	/* Load the instruction manually if it failed to do so in the
 	 * exit path */
-	if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
 
-	r = svcpu->last_inst;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.last_inst;
 }
 
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
-	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-	ulong r;
-	r = svcpu->fault_dar;
-	svcpu_put(svcpu);
-	return r;
+	return vcpu->arch.fault_dar;
+}
+
+#ifdef CONFIG_KVM_BOOK3S_PR
+
+static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
+{
+	return to_book3s(vcpu)->hior;
+}
+
+static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
+			unsigned long pending_now, unsigned long old_pending)
+{
+	if (pending_now)
+		vcpu->arch.shared->int_pending = 1;
+	else if (old_pending)
+		vcpu->arch.shared->int_pending = 0;
 }
 
 static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
@@ -395,100 +339,6 @@ static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
 {
 }
 
-static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
-{
-	vcpu->arch.gpr[num] = val;
-}
-
-static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
-{
-	return vcpu->arch.gpr[num];
-}
-
-static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
-{
-	vcpu->arch.cr = val;
-}
-
-static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.cr;
-}
-
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
-{
-	vcpu->arch.xer = val;
-}
-
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.xer;
-}
-
-static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
-{
-	vcpu->arch.ctr = val;
-}
-
-static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.ctr;
-}
-
-static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
-{
-	vcpu->arch.lr = val;
-}
-
-static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.lr;
-}
-
-static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
-{
-	vcpu->arch.pc = val;
-}
-
-static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.pc;
-}
-
-static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
-{
-	ulong pc = kvmppc_get_pc(vcpu);
-
-	/* Load the instruction manually if it failed to do so in the
-	 * exit path */
-	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
-
-	return vcpu->arch.last_inst;
-}
-
-/*
- * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
- * Because the sc instruction sets SRR0 to point to the following
- * instruction, we have to fetch from pc - 4.
- */
-static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
-{
-	ulong pc = kvmppc_get_pc(vcpu) - 4;
-
-	/* Load the instruction manually if it failed to do so in the
-	 * exit path */
-	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
-
-	return vcpu->arch.last_inst;
-}
-
-static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.fault_dar;
-}
-
 static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 9039d3c..4141409 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -108,14 +108,14 @@ struct kvmppc_book3s_shadow_vcpu {
 	ulong gpr[14];
 	u32 cr;
 	u32 xer;
-
-	u32 fault_dsisr;
-	u32 last_inst;
 	ulong ctr;
 	ulong lr;
 	ulong pc;
+
 	ulong shadow_srr1;
 	ulong fault_dar;
+	u32 fault_dsisr;
+	u32 last_inst;
 
 #ifdef CONFIG_PPC_BOOK3S_32
 	u32     sr[16];			/* Guest SRs */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3328353..7b26395 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -463,6 +463,7 @@ struct kvm_vcpu_arch {
 	u32 ctrl;
 	ulong dabr;
 	ulong cfar;
+	ulong shadow_srr1;
 #endif
 	u32 vrsave; /* also USPRG0 */
 	u32 mmucr;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index a67c76e..14a8004 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -515,18 +515,18 @@ int main(void)
 	DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
 	DEFINE(VCPU_PTID, offsetof(struct kvm_vcpu, arch.ptid));
 	DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar));
+	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
 	DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
-	DEFINE(VCPU_SVCPU, offsetof(struct kvmppc_vcpu_book3s, shadow_vcpu) -
-			   offsetof(struct kvmppc_vcpu_book3s, vcpu));
 	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
 	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
 	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #ifdef CONFIG_KVM_BOOK3S_PR
+	DEFINE(PACA_SVCPU, offsetof(struct paca_struct, shadow_vcpu));
 # define SVCPU_FIELD(x, f)	DEFINE(x, offsetof(struct paca_struct, shadow_vcpu.f))
 #else
 # define SVCPU_FIELD(x, f)
diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
index 360ce68..34044b1 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -267,12 +267,9 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 			r = kvmppc_st(vcpu, &addr, 32, zeros, true);
 			if ((r == -ENOENT) || (r == -EPERM)) {
-				struct kvmppc_book3s_shadow_vcpu *svcpu;
-
-				svcpu = svcpu_get(vcpu);
 				*advance = 0;
 				vcpu->arch.shared->dar = vaddr;
-				svcpu->fault_dar = vaddr;
+				vcpu->arch.fault_dar = vaddr;
 
 				dsisr = DSISR_ISSTORE;
 				if (r == -ENOENT)
@@ -281,8 +278,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 					dsisr |= DSISR_PROTFAULT;
 
 				vcpu->arch.shared->dsisr = dsisr;
-				svcpu->fault_dsisr = dsisr;
-				svcpu_put(svcpu);
+				vcpu->arch.fault_dsisr = dsisr;
 
 				kvmppc_book3s_queue_irqprio(vcpu,
 					BOOK3S_INTERRUPT_DATA_STORAGE);
diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S
index 17cfae5..c81a185 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -26,8 +26,12 @@
 
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define FUNC(name) 		GLUE(.,name)
+#define GET_SHADOW_VCPU(reg)    addi	reg, r13, PACA_SVCPU
+
 #elif defined(CONFIG_PPC_BOOK3S_32)
 #define FUNC(name)		name
+#define GET_SHADOW_VCPU(reg)	lwz     reg, (THREAD + THREAD_KVM_SVCPU)(r2)
+
 #endif /* CONFIG_PPC_BOOK3S_XX */
 
 #define VCPU_LOAD_NVGPRS(vcpu) \
@@ -87,8 +91,13 @@ kvm_start_entry:
 	VCPU_LOAD_NVGPRS(r4)
 
 kvm_start_lightweight:
+	/* Copy registers into shadow vcpu so we can access them in real mode */
+	GET_SHADOW_VCPU(r3)
+	bl	FUNC(kvmppc_copy_to_svcpu)
+	nop
 
 #ifdef CONFIG_PPC_BOOK3S_64
+	/* Get the dcbz32 flag */
 	PPC_LL	r3, VCPU_HFLAGS(r4)
 	rldicl	r3, r3, 0, 63		/* r3 &= 1 */
 	stb	r3, HSTATE_RESTORE_HID5(r13)
@@ -125,8 +134,17 @@ kvmppc_handler_highmem:
 	 *
 	 */
 
-	/* R7 = vcpu */
-	PPC_LL	r7, GPR4(r1)
+	/* Transfer reg values from shadow vcpu back to vcpu struct */
+	/* On 64-bit, interrupts are still off at this point */
+	PPC_LL	r3, GPR4(r1)		/* vcpu pointer */
+	GET_SHADOW_VCPU(r4)
+	bl	FUNC(kvmppc_copy_from_svcpu)
+	nop
+
+	/* Re-enable interrupts */
+	mfmsr	r3
+	ori	r3, r3, MSR_EE
+	MTMSR_EERI(r3)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	/*
@@ -135,8 +153,12 @@ kvmppc_handler_highmem:
 	 */
 	ld	r3, PACA_SPRG3(r13)
 	mtspr	SPRN_SPRG3, r3
+
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
+	/* R7 = vcpu */
+	PPC_LL	r7, GPR4(r1)
+
 	PPC_STL	r14, VCPU_GPR(R14)(r7)
 	PPC_STL	r15, VCPU_GPR(R15)(r7)
 	PPC_STL	r16, VCPU_GPR(R16)(r7)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6cb29ef..28146c1 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -61,8 +61,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
 	memcpy(svcpu->slb, to_book3s(vcpu)->slb_shadow, sizeof(svcpu->slb));
-	memcpy(&get_paca()->shadow_vcpu, to_book3s(vcpu)->shadow_vcpu,
-	       sizeof(get_paca()->shadow_vcpu));
 	svcpu->slb_max = to_book3s(vcpu)->slb_shadow_max;
 	svcpu_put(svcpu);
 #endif
@@ -77,8 +75,6 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
 	memcpy(to_book3s(vcpu)->slb_shadow, svcpu->slb, sizeof(svcpu->slb));
-	memcpy(to_book3s(vcpu)->shadow_vcpu, &get_paca()->shadow_vcpu,
-	       sizeof(get_paca()->shadow_vcpu));
 	to_book3s(vcpu)->slb_shadow_max = svcpu->slb_max;
 	svcpu_put(svcpu);
 #endif
@@ -87,6 +83,60 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 }
 
+/* Copy data needed by real-mode code from vcpu to shadow vcpu */
+void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
+			  struct kvm_vcpu *vcpu)
+{
+	svcpu->gpr[0] = vcpu->arch.gpr[0];
+	svcpu->gpr[1] = vcpu->arch.gpr[1];
+	svcpu->gpr[2] = vcpu->arch.gpr[2];
+	svcpu->gpr[3] = vcpu->arch.gpr[3];
+	svcpu->gpr[4] = vcpu->arch.gpr[4];
+	svcpu->gpr[5] = vcpu->arch.gpr[5];
+	svcpu->gpr[6] = vcpu->arch.gpr[6];
+	svcpu->gpr[7] = vcpu->arch.gpr[7];
+	svcpu->gpr[8] = vcpu->arch.gpr[8];
+	svcpu->gpr[9] = vcpu->arch.gpr[9];
+	svcpu->gpr[10] = vcpu->arch.gpr[10];
+	svcpu->gpr[11] = vcpu->arch.gpr[11];
+	svcpu->gpr[12] = vcpu->arch.gpr[12];
+	svcpu->gpr[13] = vcpu->arch.gpr[13];
+	svcpu->cr  = vcpu->arch.cr;
+	svcpu->xer = vcpu->arch.xer;
+	svcpu->ctr = vcpu->arch.ctr;
+	svcpu->lr  = vcpu->arch.lr;
+	svcpu->pc  = vcpu->arch.pc;
+}
+
+/* Copy data touched by real-mode code from shadow vcpu back to vcpu */
+void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
+			    struct kvmppc_book3s_shadow_vcpu *svcpu)
+{
+	vcpu->arch.gpr[0] = svcpu->gpr[0];
+	vcpu->arch.gpr[1] = svcpu->gpr[1];
+	vcpu->arch.gpr[2] = svcpu->gpr[2];
+	vcpu->arch.gpr[3] = svcpu->gpr[3];
+	vcpu->arch.gpr[4] = svcpu->gpr[4];
+	vcpu->arch.gpr[5] = svcpu->gpr[5];
+	vcpu->arch.gpr[6] = svcpu->gpr[6];
+	vcpu->arch.gpr[7] = svcpu->gpr[7];
+	vcpu->arch.gpr[8] = svcpu->gpr[8];
+	vcpu->arch.gpr[9] = svcpu->gpr[9];
+	vcpu->arch.gpr[10] = svcpu->gpr[10];
+	vcpu->arch.gpr[11] = svcpu->gpr[11];
+	vcpu->arch.gpr[12] = svcpu->gpr[12];
+	vcpu->arch.gpr[13] = svcpu->gpr[13];
+	vcpu->arch.cr  = svcpu->cr;
+	vcpu->arch.xer = svcpu->xer;
+	vcpu->arch.ctr = svcpu->ctr;
+	vcpu->arch.lr  = svcpu->lr;
+	vcpu->arch.pc  = svcpu->pc;
+	vcpu->arch.shadow_srr1 = svcpu->shadow_srr1;
+	vcpu->arch.fault_dar   = svcpu->fault_dar;
+	vcpu->arch.fault_dsisr = svcpu->fault_dsisr;
+	vcpu->arch.last_inst   = svcpu->last_inst;
+}
+
 int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 {
 	int r = 1; /* Indicate we want to get back into the guest */
@@ -388,22 +438,18 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	if (page_found == -ENOENT) {
 		/* Page not found in guest PTE entries */
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
 		vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
-		vcpu->arch.shared->dsisr = svcpu->fault_dsisr;
+		vcpu->arch.shared->dsisr = vcpu->arch.fault_dsisr;
 		vcpu->arch.shared->msr |=
-			(svcpu->shadow_srr1 & 0x00000000f8000000ULL);
-		svcpu_put(svcpu);
+			vcpu->arch.shadow_srr1 & 0x00000000f8000000ULL;
 		kvmppc_book3s_queue_irqprio(vcpu, vec);
 	} else if (page_found == -EPERM) {
 		/* Storage protection */
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
 		vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
-		vcpu->arch.shared->dsisr = svcpu->fault_dsisr & ~DSISR_NOHPTE;
+		vcpu->arch.shared->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
 		vcpu->arch.shared->dsisr |= DSISR_PROTFAULT;
 		vcpu->arch.shared->msr |=
-			svcpu->shadow_srr1 & 0x00000000f8000000ULL;
-		svcpu_put(svcpu);
+			vcpu->arch.shadow_srr1 & 0x00000000f8000000ULL;
 		kvmppc_book3s_queue_irqprio(vcpu, vec);
 	} else if (page_found == -EINVAL) {
 		/* Page not found in guest SLB */
@@ -643,21 +689,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	switch (exit_nr) {
 	case BOOK3S_INTERRUPT_INST_STORAGE:
 	{
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-		ulong shadow_srr1 = svcpu->shadow_srr1;
+		ulong shadow_srr1 = vcpu->arch.shadow_srr1;
 		vcpu->stat.pf_instruc++;
 
 #ifdef CONFIG_PPC_BOOK3S_32
 		/* We set segments as unused segments when invalidating them. So
 		 * treat the respective fault as segment fault. */
-		if (svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT] == SR_INVALID) {
-			kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
-			r = RESUME_GUEST;
+		{
+			struct kvmppc_book3s_shadow_vcpu *svcpu;
+			u32 sr;
+
+			svcpu = svcpu_get(vcpu);
+			sr = svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT];
 			svcpu_put(svcpu);
-			break;
+			if (sr == SR_INVALID) {
+				kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
+				r = RESUME_GUEST;
+				break;
+			}
 		}
 #endif
-		svcpu_put(svcpu);
 
 		/* only care about PTEG not found errors, but leave NX alone */
 		if (shadow_srr1 & 0x40000000) {
@@ -682,21 +733,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	case BOOK3S_INTERRUPT_DATA_STORAGE:
 	{
 		ulong dar = kvmppc_get_fault_dar(vcpu);
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-		u32 fault_dsisr = svcpu->fault_dsisr;
+		u32 fault_dsisr = vcpu->arch.fault_dsisr;
 		vcpu->stat.pf_storage++;
 
 #ifdef CONFIG_PPC_BOOK3S_32
 		/* We set segments as unused segments when invalidating them. So
 		 * treat the respective fault as segment fault. */
-		if ((svcpu->sr[dar >> SID_SHIFT]) == SR_INVALID) {
-			kvmppc_mmu_map_segment(vcpu, dar);
-			r = RESUME_GUEST;
+		{
+			struct kvmppc_book3s_shadow_vcpu *svcpu;
+			u32 sr;
+
+			svcpu = svcpu_get(vcpu);
+			sr = svcpu->sr[dar >> SID_SHIFT];
 			svcpu_put(svcpu);
-			break;
+			if (sr == SR_INVALID) {
+				kvmppc_mmu_map_segment(vcpu, dar);
+				r = RESUME_GUEST;
+				break;
+			}
 		}
 #endif
-		svcpu_put(svcpu);
 
 		/* The only case we need to handle is missing shadow PTEs */
 		if (fault_dsisr & DSISR_NOHPTE) {
@@ -743,13 +799,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
 	{
 		enum emulation_result er;
-		struct kvmppc_book3s_shadow_vcpu *svcpu;
 		ulong flags;
 
 program_interrupt:
-		svcpu = svcpu_get(vcpu);
-		flags = svcpu->shadow_srr1 & 0x1f0000ull;
-		svcpu_put(svcpu);
+		flags = vcpu->arch.shadow_srr1 & 0x1f0000ull;
 
 		if (vcpu->arch.shared->msr & MSR_PR) {
 #ifdef EXIT_DEBUG
@@ -881,9 +934,7 @@ program_interrupt:
 		break;
 	default:
 	{
-		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
-		ulong shadow_srr1 = svcpu->shadow_srr1;
-		svcpu_put(svcpu);
+		ulong shadow_srr1 = vcpu->arch.shadow_srr1;
 		/* Ugh - bork here! What did we get? */
 		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n",
 			exit_nr, kvmppc_get_pc(vcpu), shadow_srr1);
@@ -1058,11 +1109,12 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (!vcpu_book3s)
 		goto out;
 
+#ifdef CONFIG_KVM_BOOK3S_32
 	vcpu_book3s->shadow_vcpu =
 		kzalloc(sizeof(*vcpu_book3s->shadow_vcpu), GFP_KERNEL);
 	if (!vcpu_book3s->shadow_vcpu)
 		goto free_vcpu;
-
+#endif
 	vcpu = &vcpu_book3s->vcpu;
 	err = kvm_vcpu_init(vcpu, kvm, id);
 	if (err)
@@ -1095,8 +1147,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 uninit_vcpu:
 	kvm_vcpu_uninit(vcpu);
 free_shadow_vcpu:
+#ifdef CONFIG_KVM_BOOK3S_32
 	kfree(vcpu_book3s->shadow_vcpu);
 free_vcpu:
+#endif
 	vfree(vcpu_book3s);
 out:
 	return ERR_PTR(err);
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S
index 8f7633e..b64d7f9 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -179,11 +179,6 @@ _GLOBAL(kvmppc_entry_trampoline)
 
 	li	r6, MSR_IR | MSR_DR
 	andc	r6, r5, r6	/* Clear DR and IR in MSR value */
-	/*
-	 * Set EE in HOST_MSR so that it's enabled when we get into our
-	 * C exit handler function
-	 */
-	ori	r5, r5, MSR_EE
 	mtsrr0	r7
 	mtsrr1	r6
 	RFI
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index e326489..a088e9a 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -101,17 +101,12 @@ TRACE_EVENT(kvm_exit,
 	),
 
 	TP_fast_assign(
-#ifdef CONFIG_KVM_BOOK3S_PR
-		struct kvmppc_book3s_shadow_vcpu *svcpu;
-#endif
 		__entry->exit_nr	= exit_nr;
 		__entry->pc		= kvmppc_get_pc(vcpu);
 		__entry->dar		= kvmppc_get_fault_dar(vcpu);
 		__entry->msr		= vcpu->arch.shared->msr;
 #ifdef CONFIG_KVM_BOOK3S_PR
-		svcpu = svcpu_get(vcpu);
-		__entry->srr1		= svcpu->shadow_srr1;
-		svcpu_put(svcpu);
+		__entry->srr1		= vcpu->arch.shadow_srr1;
 #endif
 		__entry->last_inst	= vcpu->arch.last_inst;
 	),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (3 preceding siblings ...)
  2013-08-06  4:16 ` [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Paul Mackerras
@ 2013-08-06  4:18 ` Paul Mackerras
  2013-08-28 22:51   ` Alexander Graf
  2013-08-06  4:18 ` [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages Paul Mackerras
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:18 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large
page bit in the hashed page table entries (HPTEs) it looks at, and
to simplify and streamline the code.  The checking of the first dword
of each HPTE is now done with a single mask and compare operation,
and all the code dealing with the matching HPTE, if we find one,
is consolidated in one place in the main line of the function flow.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu.c | 150 +++++++++++++++++++--------------------
 1 file changed, 72 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 739bfba..7e345e0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -182,10 +182,13 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	hva_t ptegp;
 	u64 pteg[16];
 	u64 avpn = 0;
+	u64 v, r;
+	u64 v_val, v_mask;
+	u64 eaddr_mask;
 	int i;
-	u8 key = 0;
+	u8 pp, key = 0;
 	bool found = false;
-	int second = 0;
+	bool second = false;
 	ulong mp_ea = vcpu->arch.magic_page_ea;
 
 	/* Magic page override */
@@ -208,8 +211,16 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		goto no_seg_found;
 
 	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+	v_val = avpn & HPTE_V_AVPN;
+
 	if (slbe->tb)
-		avpn |= SLB_VSID_B_1T;
+		v_val |= SLB_VSID_B_1T;
+	if (slbe->large)
+		v_val |= HPTE_V_LARGE;
+	v_val |= HPTE_V_VALID;
+
+	v_mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_LARGE | HPTE_V_VALID |
+		HPTE_V_SECONDARY;
 
 do_second:
 	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
@@ -227,91 +238,74 @@ do_second:
 		key = 4;
 
 	for (i=0; i<16; i+=2) {
-		u64 v = pteg[i];
-		u64 r = pteg[i+1];
-
-		/* Valid check */
-		if (!(v & HPTE_V_VALID))
-			continue;
-		/* Hash check */
-		if ((v & HPTE_V_SECONDARY) != second)
-			continue;
-
-		/* AVPN compare */
-		if (HPTE_V_COMPARE(avpn, v)) {
-			u8 pp = (r & HPTE_R_PP) | key;
-			int eaddr_mask = 0xFFF;
-
-			gpte->eaddr = eaddr;
-			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu,
-								    eaddr,
-								    data);
-			if (slbe->large)
-				eaddr_mask = 0xFFFFFF;
-			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
-			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
-			gpte->may_read = false;
-			gpte->may_write = false;
-
-			switch (pp) {
-			case 0:
-			case 1:
-			case 2:
-			case 6:
-				gpte->may_write = true;
-				/* fall through */
-			case 3:
-			case 5:
-			case 7:
-				gpte->may_read = true;
-				break;
-			}
-
-			dprintk("KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx "
-				"-> 0x%lx\n",
-				eaddr, avpn, gpte->vpage, gpte->raddr);
+		/* Check all relevant fields of 1st dword */
+		if ((pteg[i] & v_mask) == v_val) {
 			found = true;
 			break;
 		}
 	}
 
-	/* Update PTE R and C bits, so the guest's swapper knows we used the
-	 * page */
-	if (found) {
-		u32 oldr = pteg[i+1];
+	if (!found) {
+		if (second)
+			goto no_page_found;
+		v_val |= HPTE_V_SECONDARY;
+		second = true;
+		goto do_second;
+	}
 
-		if (gpte->may_read) {
-			/* Set the accessed flag */
-			pteg[i+1] |= HPTE_R_R;
-		}
-		if (gpte->may_write) {
-			/* Set the dirty flag */
-			pteg[i+1] |= HPTE_R_C;
-		} else {
-			dprintk("KVM: Mapping read-only page!\n");
-		}
+	v = pteg[i];
+	r = pteg[i+1];
+	pp = (r & HPTE_R_PP) | key;
+	eaddr_mask = 0xFFF;
+
+	gpte->eaddr = eaddr;
+	gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+	if (slbe->large)
+		eaddr_mask = 0xFFFFFF;
+	gpte->raddr = (r & HPTE_R_RPN & ~eaddr_mask) | (eaddr & eaddr_mask);
+	gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+	gpte->may_read = false;
+	gpte->may_write = false;
+
+	switch (pp) {
+	case 0:
+	case 1:
+	case 2:
+	case 6:
+		gpte->may_write = true;
+		/* fall through */
+	case 3:
+	case 5:
+	case 7:
+		gpte->may_read = true;
+		break;
+	}
 
-		/* Write back into the PTEG */
-		if (pteg[i+1] != oldr)
-			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+	dprintk("KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx "
+		"-> 0x%lx\n",
+		eaddr, avpn, gpte->vpage, gpte->raddr);
 
-		if (!gpte->may_read)
-			return -EPERM;
-		return 0;
-	} else {
-		dprintk("KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx "
-			"ptegp=0x%lx)\n",
-			eaddr, to_book3s(vcpu)->sdr1, ptegp);
-		for (i = 0; i < 16; i += 2)
-			dprintk("   %02d: 0x%llx - 0x%llx (0x%llx)\n",
-				i, pteg[i], pteg[i+1], avpn);
-
-		if (!second) {
-			second = HPTE_V_SECONDARY;
-			goto do_second;
-		}
+	/* Update PTE R and C bits, so the guest's swapper knows we used the
+	 * page */
+	if (gpte->may_read) {
+		/* Set the accessed flag */
+		r |= HPTE_R_R;
+	}
+	if (data && gpte->may_write) {
+		/* Set the dirty flag -- XXX even if not writing */
+		r |= HPTE_R_C;
+	}
+
+	/* Write back into the PTEG */
+	if (pteg[i+1] != r) {
+		pteg[i+1] = r;
+		copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
 	}
 
+	if (!gpte->may_read)
+		return -EPERM;
+	return 0;
+
 no_page_found:
 	return -ENOENT;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (4 preceding siblings ...)
  2013-08-06  4:18 ` [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() Paul Mackerras
@ 2013-08-06  4:18 ` Paul Mackerras
  2013-08-28 22:56   ` Alexander Graf
  2013-08-06  4:19 ` [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible Paul Mackerras
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:18 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This adds the code to interpret 64k HPTEs in the guest hashed page
table (HPT), 64k SLB entries, and to tell the guest about 64k pages
in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
by 4k pages.

This also adds another hash table to the four we have already in
book3s_mmu_hpte.c to allow us to find all the PTEs that we have
instantiated that match a given 64k guest page.

The tlbie instruction changed starting with POWER6 to use a bit in
the RB operand to indicate large page invalidations, and to use other
RB bits to indicate the base and actual page sizes and the segment
size.  64k pages came in slightly earlier, with POWER5++.  At present
we use one bit in vcpu->arch.hflags to indicate that the emulated
cpu supports 64k pages and also has the new tlbie definition.  If
we ever want to support emulation of POWER5++, we will need to use
another bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_asm.h    |  1 +
 arch/powerpc/include/asm/kvm_book3s.h |  6 +++
 arch/powerpc/include/asm/kvm_host.h   |  4 ++
 arch/powerpc/kvm/book3s_64_mmu.c      | 92 +++++++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_mmu_hpte.c    | 50 +++++++++++++++++++
 arch/powerpc/kvm/book3s_pr.c          | 30 +++++++++++-
 6 files changed, 173 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 851bac7..3d70b7e 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -123,6 +123,7 @@
 #define BOOK3S_HFLAG_SLB			0x2
 #define BOOK3S_HFLAG_PAIRED_SINGLE		0x4
 #define BOOK3S_HFLAG_NATIVE_PS			0x8
+#define BOOK3S_HFLAG_MULTI_PGSIZE		0x10
 
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index a8897c1..175f876 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -58,6 +58,9 @@ struct hpte_cache {
 	struct hlist_node list_pte_long;
 	struct hlist_node list_vpte;
 	struct hlist_node list_vpte_long;
+#ifdef CONFIG_PPC_BOOK3S_64
+	struct hlist_node list_vpte_64k;
+#endif
 	struct rcu_head rcu_head;
 	u64 host_vpn;
 	u64 pfn;
@@ -99,6 +102,9 @@ struct kvmppc_vcpu_book3s {
 	struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG];
 	struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
 	struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
+#ifdef CONFIG_PPC_BOOK3S_64
+	struct hlist_head hpte_hash_vpte_64k[HPTEG_HASH_NUM_VPTE_64K];
+#endif
 	int hpte_cache_count;
 	spinlock_t mmu_lock;
 };
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 7b26395..2d3c770 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -73,10 +73,12 @@ extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 #define HPTEG_HASH_BITS_PTE_LONG	12
 #define HPTEG_HASH_BITS_VPTE		13
 #define HPTEG_HASH_BITS_VPTE_LONG	5
+#define HPTEG_HASH_BITS_VPTE_64K	11
 #define HPTEG_HASH_NUM_PTE		(1 << HPTEG_HASH_BITS_PTE)
 #define HPTEG_HASH_NUM_PTE_LONG		(1 << HPTEG_HASH_BITS_PTE_LONG)
 #define HPTEG_HASH_NUM_VPTE		(1 << HPTEG_HASH_BITS_VPTE)
 #define HPTEG_HASH_NUM_VPTE_LONG	(1 << HPTEG_HASH_BITS_VPTE_LONG)
+#define HPTEG_HASH_NUM_VPTE_64K		(1 << HPTEG_HASH_BITS_VPTE_64K)
 
 /* Physical Address Mask - allowed range of real mode RAM access */
 #define KVM_PAM			0x0fffffffffffffffULL
@@ -328,6 +330,7 @@ struct kvmppc_pte {
 	bool may_read		: 1;
 	bool may_write		: 1;
 	bool may_execute	: 1;
+	u8 page_size;		/* MMU_PAGE_xxx */
 };
 
 struct kvmppc_mmu {
@@ -360,6 +363,7 @@ struct kvmppc_slb {
 	bool large	: 1;	/* PTEs are 16MB */
 	bool tb		: 1;	/* 1TB segment */
 	bool class	: 1;
+	u8 base_page_size;	/* MMU_PAGE_xxx */
 };
 
 # ifdef CONFIG_PPC_FSL_BOOK3E
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 7e345e0..d5fa26c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -107,9 +107,20 @@ static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
 	return kvmppc_slb_calc_vpn(slb, eaddr);
 }
 
+static int mmu_pagesize(int mmu_pg)
+{
+	switch (mmu_pg) {
+	case MMU_PAGE_64K:
+		return 16;
+	case MMU_PAGE_16M:
+		return 24;
+	}
+	return 12;
+}
+
 static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
 {
-	return slbe->large ? 24 : 12;
+	return mmu_pagesize(slbe->base_page_size);
 }
 
 static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
@@ -166,14 +177,34 @@ static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
 	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
 	avpn |= slbe->vsid << (kvmppc_slb_sid_shift(slbe) - p);
 
-	if (p < 24)
-		avpn >>= ((80 - p) - 56) - 8;
+	if (p < 16)
+		avpn >>= ((80 - p) - 56) - 8;	/* 16 - p */
 	else
-		avpn <<= 8;
+		avpn <<= p - 16;
 
 	return avpn;
 }
 
+/*
+ * Return page size encoded in the second word of a HPTE, or
+ * -1 for an invalid encoding for the base page size indicated by
+ * the SLB entry.  This doesn't handle mixed pagesize segments yet.
+ */
+static int decode_pagesize(struct kvmppc_slb *slbe, u64 r)
+{
+	switch (slbe->base_page_size) {
+	case MMU_PAGE_64K:
+		if ((r & 0xf000) == 0x1000)
+			return MMU_PAGE_64K;
+		break;
+	case MMU_PAGE_16M:
+		if ((r & 0xff000) == 0)
+			return MMU_PAGE_16M;
+		break;
+	}
+	return -1;
+}
+
 static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 				struct kvmppc_pte *gpte, bool data)
 {
@@ -189,6 +220,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	u8 pp, key = 0;
 	bool found = false;
 	bool second = false;
+	int pgsize;
 	ulong mp_ea = vcpu->arch.magic_page_ea;
 
 	/* Magic page override */
@@ -202,6 +234,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		gpte->may_execute = true;
 		gpte->may_read = true;
 		gpte->may_write = true;
+		gpte->page_size = MMU_PAGE_4K;
 
 		return 0;
 	}
@@ -222,6 +255,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	v_mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_LARGE | HPTE_V_VALID |
 		HPTE_V_SECONDARY;
 
+	pgsize = slbe->large ? MMU_PAGE_16M : MMU_PAGE_4K;
+
 do_second:
 	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
 	if (kvm_is_error_hva(ptegp))
@@ -240,6 +275,13 @@ do_second:
 	for (i=0; i<16; i+=2) {
 		/* Check all relevant fields of 1st dword */
 		if ((pteg[i] & v_mask) == v_val) {
+			/* If large page bit is set, check pgsize encoding */
+			if (slbe->large && 
+			    (vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE)) {
+				pgsize = decode_pagesize(slbe, pteg[i+1]);
+				if (pgsize < 0)
+					continue;
+			}
 			found = true;
 			break;
 		}
@@ -256,13 +298,13 @@ do_second:
 	v = pteg[i];
 	r = pteg[i+1];
 	pp = (r & HPTE_R_PP) | key;
-	eaddr_mask = 0xFFF;
 
 	gpte->eaddr = eaddr;
 	gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
-	if (slbe->large)
-		eaddr_mask = 0xFFFFFF;
+
+	eaddr_mask = (1ull << mmu_pagesize(pgsize)) - 1;
 	gpte->raddr = (r & HPTE_R_RPN & ~eaddr_mask) | (eaddr & eaddr_mask);
+	gpte->page_size = pgsize;
 	gpte->may_execute = ((r & HPTE_R_N) ? false : true);
 	gpte->may_read = false;
 	gpte->may_write = false;
@@ -345,6 +387,21 @@ static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
 	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
 	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
 
+	slbe->base_page_size = MMU_PAGE_4K;
+	if (slbe->large) {
+		if (vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE) {
+			switch (rs & SLB_VSID_LP) {
+			case SLB_VSID_LP_00:
+				slbe->base_page_size = MMU_PAGE_16M;
+				break;
+			case SLB_VSID_LP_01:
+				slbe->base_page_size = MMU_PAGE_64K;
+				break;
+			}
+		} else
+			slbe->base_page_size = MMU_PAGE_16M;
+	}
+
 	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
 	slbe->origv = rs;
 
@@ -463,8 +520,25 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 
 	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
 
-	if (large)
-		mask = 0xFFFFFF000ULL;
+	/*
+	 * The tlbie instruction changed behaviour starting with
+	 * POWER6.  POWER6 and later don't have the large page flag
+	 * in the instruction but in the RB value, along with bits
+	 * indicating page and segment sizes.
+	 */
+	if (vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE) {
+		/* POWER6 or later */
+		if (va & 1) {		/* L bit */
+			if ((va & 0xf000) == 0x1000)
+				mask = 0xFFFFFFFF0ULL;	/* 64k page */
+			else
+				mask = 0xFFFFFF000ULL;	/* 16M page */
+		}
+	} else {
+		/* older processors, e.g. PPC970 */
+		if (large)
+			mask = 0xFFFFFF000ULL;
+	}
 	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
 }
 
diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c
index da8b13c..d2d280b 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -56,6 +56,14 @@ static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage)
 		       HPTEG_HASH_BITS_VPTE_LONG);
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline u64 kvmppc_mmu_hash_vpte_64k(u64 vpage)
+{
+	return hash_64((vpage & 0xffffffff0ULL) >> 4,
+		       HPTEG_HASH_BITS_VPTE_64K);
+}
+#endif
+
 void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
 	u64 index;
@@ -83,6 +91,13 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 	hlist_add_head_rcu(&pte->list_vpte_long,
 			   &vcpu3s->hpte_hash_vpte_long[index]);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	/* Add to vPTE_64k list */
+	index = kvmppc_mmu_hash_vpte_64k(pte->pte.vpage);
+	hlist_add_head_rcu(&pte->list_vpte_64k,
+			   &vcpu3s->hpte_hash_vpte_64k[index]);
+#endif
+
 	spin_unlock(&vcpu3s->mmu_lock);
 }
 
@@ -113,6 +128,9 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 	hlist_del_init_rcu(&pte->list_pte_long);
 	hlist_del_init_rcu(&pte->list_vpte);
 	hlist_del_init_rcu(&pte->list_vpte_long);
+#ifdef CONFIG_PPC_BOOK3S_64
+	hlist_del_init_rcu(&pte->list_vpte_64k);
+#endif
 
 	spin_unlock(&vcpu3s->mmu_lock);
 
@@ -219,6 +237,29 @@ static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 guest_vp)
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+/* Flush with mask 0xffffffff0 */
+static void kvmppc_mmu_pte_vflush_64k(struct kvm_vcpu *vcpu, u64 guest_vp)
+{
+	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
+	struct hlist_head *list;
+	struct hpte_cache *pte;
+	u64 vp_mask = 0xffffffff0ULL;
+
+	list = &vcpu3s->hpte_hash_vpte_64k[
+		kvmppc_mmu_hash_vpte_64k(guest_vp)];
+
+	rcu_read_lock();
+
+	/* Check the list for matching entries and invalidate */
+	hlist_for_each_entry_rcu(pte, list, list_vpte_64k)
+		if ((pte->pte.vpage & vp_mask) == guest_vp)
+			invalidate_pte(vcpu, pte);
+
+	rcu_read_unlock();
+}
+#endif
+
 /* Flush with mask 0xffffff000 */
 static void kvmppc_mmu_pte_vflush_long(struct kvm_vcpu *vcpu, u64 guest_vp)
 {
@@ -249,6 +290,11 @@ void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
 	case 0xfffffffffULL:
 		kvmppc_mmu_pte_vflush_short(vcpu, guest_vp);
 		break;
+#ifdef CONFIG_PPC_BOOK3S_64
+	case 0xffffffff0ULL:
+		kvmppc_mmu_pte_vflush_64k(vcpu, guest_vp);
+		break;
+#endif
 	case 0xffffff000ULL:
 		kvmppc_mmu_pte_vflush_long(vcpu, guest_vp);
 		break;
@@ -320,6 +366,10 @@ int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
 				  ARRAY_SIZE(vcpu3s->hpte_hash_vpte));
 	kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_vpte_long,
 				  ARRAY_SIZE(vcpu3s->hpte_hash_vpte_long));
+#ifdef CONFIG_PPC_BOOK3S_64
+	kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_vpte_64k,
+				  ARRAY_SIZE(vcpu3s->hpte_hash_vpte_64k));
+#endif
 
 	spin_lock_init(&vcpu3s->mmu_lock);
 
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 28146c1..efd8785 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -306,6 +306,23 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
 	if (!strcmp(cur_cpu_spec->platform, "ppc-cell-be"))
 		to_book3s(vcpu)->msr_mask &= ~(MSR_FE0 | MSR_FE1);
 
+	/*
+	 * If they're asking for POWER6 or later, set the flag
+	 * indicating that we can do multiple large page sizes.
+	 * We also take this to mean that tlbie has the large page
+	 * bit in the RB operand instead of the instruction and
+	 * that the CPU can do 1TB segments.  If we ever wanted
+	 * to emulate POWER5++ we would need to separate these things.
+	 */
+	switch (PVR_VER(pvr)) {
+	case PVR_POWER6:
+	case PVR_POWER7:
+	case PVR_POWER7p:
+	case PVR_POWER8:
+		vcpu->arch.hflags |= BOOK3S_HFLAG_MULTI_PGSIZE;
+		break;
+	}
+
 #ifdef CONFIG_PPC_BOOK3S_32
 	/* 32 bit Book3S always has 32 byte dcbz */
 	vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
@@ -1127,8 +1144,13 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 		goto uninit_vcpu;
 
 #ifdef CONFIG_PPC_BOOK3S_64
-	/* default to book3s_64 (970fx) */
+	/*
+	 * Default to the same as the host if we're on a POWER7[+],
+	 * otherwise default to PPC970FX.
+	 */
 	vcpu->arch.pvr = 0x3C0301;
+	if (cpu_has_feature(CPU_FTR_ARCH_206))
+		vcpu->arch.pvr = mfspr(SPRN_PVR);
 #else
 	/* default to book3s_32 (750) */
 	vcpu->arch.pvr = 0x84202;
@@ -1331,6 +1353,12 @@ int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
 	info->sps[1].enc[0].page_shift = 24;
 	info->sps[1].enc[0].pte_enc = 0;
 
+	/* 64k large page size */
+	info->sps[2].page_shift = 16;
+	info->sps[2].slb_enc = SLB_VSID_L | SLB_VSID_LP_01;
+	info->sps[2].enc[0].page_shift = 16;
+	info->sps[2].enc[0].pte_enc = 1;
+
 	return 0;
 }
 #endif /* CONFIG_PPC64 */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (5 preceding siblings ...)
  2013-08-06  4:18 ` [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages Paul Mackerras
@ 2013-08-06  4:19 ` Paul Mackerras
  2013-08-28 23:24   ` Alexander Graf
  2013-08-06  4:20 ` [PATCH 08/23] KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs Paul Mackerras
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:19 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Currently, PR KVM uses 4k pages for the host-side mappings of guest
memory, regardless of the host page size.  When the host page size is
64kB, we might as well use 64k host page mappings for guest mappings
of 64kB and larger pages and for guest real-mode mappings.  However,
the magic page has to remain a 4k page.

To implement this, we first add another flag bit to the guest VSID
values we use, to indicate that this segment is one where host pages
should be mapped using 64k pages.  For segments with this bit set
we set the bits in the shadow SLB entry to indicate a 64k base page
size.  When faulting in host HPTEs for this segment, we make them
64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
for use when invalidating the HPTE.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |  6 ++++--
 arch/powerpc/kvm/book3s_32_mmu.c      |  1 +
 arch/powerpc/kvm/book3s_64_mmu.c      | 35 ++++++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_64_mmu_host.c | 27 +++++++++++++++++++++------
 arch/powerpc/kvm/book3s_pr.c          |  1 +
 5 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 175f876..322b539 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -66,6 +66,7 @@ struct hpte_cache {
 	u64 pfn;
 	ulong slot;
 	struct kvmppc_pte pte;
+	int pagesize;
 };
 
 struct kvmppc_vcpu_book3s {
@@ -113,8 +114,9 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST		1
 #define CONTEXT_GUEST_END	2
 
-#define VSID_REAL	0x0fffffffffc00000ULL
-#define VSID_BAT	0x0fffffffffb00000ULL
+#define VSID_REAL	0x07ffffffffc00000ULL
+#define VSID_BAT	0x07ffffffffb00000ULL
+#define VSID_64K	0x0800000000000000ULL
 #define VSID_1T		0x1000000000000000ULL
 #define VSID_REAL_DR	0x2000000000000000ULL
 #define VSID_REAL_IR	0x4000000000000000ULL
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index c8cefdd..af04553 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -308,6 +308,7 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	ulong mp_ea = vcpu->arch.magic_page_ea;
 
 	pte->eaddr = eaddr;
+	pte->page_size = MMU_PAGE_4K;
 
 	/* Magic page override */
 	if (unlikely(mp_ea) &&
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index d5fa26c..658ccd7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -542,6 +542,16 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
 }
 
+#ifdef CONFIG_PPC_64K_PAGES
+static int segment_contains_magic_page(struct kvm_vcpu *vcpu, ulong esid)
+{
+	ulong mp_ea = vcpu->arch.magic_page_ea;
+
+	return mp_ea && !(vcpu->arch.shared->msr & MSR_PR) &&
+		(mp_ea >> SID_SHIFT) == esid;
+}
+#endif
+
 static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 					     u64 *vsid)
 {
@@ -549,11 +559,13 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 	struct kvmppc_slb *slb;
 	u64 gvsid = esid;
 	ulong mp_ea = vcpu->arch.magic_page_ea;
+	int pagesize = MMU_PAGE_64K;
 
 	if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
 		slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
 		if (slb) {
 			gvsid = slb->vsid;
+			pagesize = slb->base_page_size;
 			if (slb->tb) {
 				gvsid <<= SID_SHIFT_1T - SID_SHIFT;
 				gvsid |= esid & ((1ul << (SID_SHIFT_1T - SID_SHIFT)) - 1);
@@ -564,28 +576,41 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 
 	switch (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
 	case 0:
-		*vsid = VSID_REAL | esid;
+		gvsid = VSID_REAL | esid;
 		break;
 	case MSR_IR:
-		*vsid = VSID_REAL_IR | gvsid;
+		gvsid |= VSID_REAL_IR;
 		break;
 	case MSR_DR:
-		*vsid = VSID_REAL_DR | gvsid;
+		gvsid |= VSID_REAL_DR;
 		break;
 	case MSR_DR|MSR_IR:
 		if (!slb)
 			goto no_slb;
 
-		*vsid = gvsid;
 		break;
 	default:
 		BUG();
 		break;
 	}
 
+#ifdef CONFIG_PPC_64K_PAGES
+	/*
+	 * Mark this as a 64k segment if the host is using
+	 * 64k pages, the host MMU supports 64k pages and
+	 * the guest segment page size is >= 64k,
+	 * but not if this segment contains the magic page.
+	 */
+	if (pagesize >= MMU_PAGE_64K &&
+	    mmu_psize_defs[MMU_PAGE_64K].shift &&
+	    !segment_contains_magic_page(vcpu, esid))
+		gvsid |= VSID_64K;
+#endif
+
 	if (vcpu->arch.shared->msr & MSR_PR)
-		*vsid |= VSID_PR;
+		gvsid |= VSID_PR;
 
+	*vsid = gvsid;
 	return 0;
 
 no_slb:
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b350d94..21a51e8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -34,7 +34,7 @@
 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
 	ppc_md.hpte_invalidate(pte->slot, pte->host_vpn,
-			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+			       pte->pagesize, MMU_SEGSIZE_256M,
 			       false);
 }
 
@@ -90,6 +90,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 	int attempt = 0;
 	struct kvmppc_sid_map *map;
 	int r = 0;
+	int hpsize = MMU_PAGE_4K;
 
 	/* Get host physical address for gpa */
 	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT);
@@ -99,7 +100,6 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 		goto out;
 	}
 	hpaddr <<= PAGE_SHIFT;
-	hpaddr |= orig_pte->raddr & (~0xfffULL & ~PAGE_MASK);
 
 	/* and write the mapping ea -> hpa into the pt */
 	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
@@ -117,8 +117,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 		goto out;
 	}
 
-	vsid = map->host_vsid;
-	vpn = hpt_vpn(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+	vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
 
 	if (!orig_pte->may_write)
 		rflags |= HPTE_R_PP;
@@ -130,7 +129,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 	else
 		kvmppc_mmu_flush_icache(hpaddr >> PAGE_SHIFT);
 
-	hash = hpt_hash(vpn, PTE_SIZE, MMU_SEGSIZE_256M);
+	/*
+	 * Use 64K pages if possible; otherwise, on 64K page kernels,
+	 * we need to transfer 4 more bits from guest real to host real addr.
+	 */
+	if (vsid & VSID_64K)
+		hpsize = MMU_PAGE_64K;
+	else
+		hpaddr |= orig_pte->raddr & (~0xfffULL & ~PAGE_MASK);
+
+	hash = hpt_hash(vpn, mmu_psize_defs[hpsize].shift, MMU_SEGSIZE_256M);
 
 map_again:
 	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
@@ -143,7 +151,7 @@ map_again:
 		}
 
 	ret = ppc_md.hpte_insert(hpteg, vpn, hpaddr, rflags, vflags,
-				 MMU_PAGE_4K, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+				 hpsize, hpsize, MMU_SEGSIZE_256M);
 
 	if (ret < 0) {
 		/* If we couldn't map a primary PTE, try a secondary */
@@ -168,6 +176,7 @@ map_again:
 		pte->host_vpn = vpn;
 		pte->pte = *orig_pte;
 		pte->pfn = hpaddr >> PAGE_SHIFT;
+		pte->pagesize = hpsize;
 
 		kvmppc_mmu_hpte_cache_map(vcpu, pte);
 	}
@@ -291,6 +300,12 @@ int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
 	slb_vsid &= ~SLB_VSID_KP;
 	slb_esid |= slb_index;
 
+#ifdef CONFIG_PPC_64K_PAGES
+	/* Set host segment base page size to 64K if possible */
+	if (gvsid & VSID_64K)
+		slb_vsid |= mmu_psize_defs[MMU_PAGE_64K].sllp;
+#endif
+
 	svcpu->slb[slb_index].esid = slb_esid;
 	svcpu->slb[slb_index].vsid = slb_vsid;
 
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index efd8785..4d39820 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -422,6 +422,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		pte.raddr = eaddr & KVM_PAM;
 		pte.eaddr = eaddr;
 		pte.vpage = eaddr >> 12;
+		pte.page_size = MMU_PAGE_64K;
 	}
 
 	switch (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 08/23] KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (6 preceding siblings ...)
  2013-08-06  4:19 ` [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible Paul Mackerras
@ 2013-08-06  4:20 ` Paul Mackerras
  2013-08-06  4:20 ` [PATCH 09/23] KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation Paul Mackerras
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:20 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

64-bit POWER processors have a three-bit field for page protection in
the hashed page table entry (HPTE).  Currently we only interpret the two
bits that were present in older versions of the architecture.  The only
defined combination that has the new bit set is 110, meaning read-only
for supervisor and no access for user mode.

This adds code to kvmppc_mmu_book3s_64_xlate() to interpret the extra
bit appropriately.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 658ccd7..563fbf7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -298,6 +298,8 @@ do_second:
 	v = pteg[i];
 	r = pteg[i+1];
 	pp = (r & HPTE_R_PP) | key;
+	if (r & HPTE_R_PP0)
+		pp |= 8;
 
 	gpte->eaddr = eaddr;
 	gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
@@ -319,6 +321,7 @@ do_second:
 	case 3:
 	case 5:
 	case 7:
+	case 10:
 		gpte->may_read = true;
 		break;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 09/23] KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (7 preceding siblings ...)
  2013-08-06  4:20 ` [PATCH 08/23] KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs Paul Mackerras
@ 2013-08-06  4:20 ` Paul Mackerras
  2013-08-06  4:21 ` [PATCH 10/23] KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe Paul Mackerras
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:20 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

The implementation of H_ENTER in PR KVM has some errors:

* With H_EXACT not set, if the HPTEG is full, we return H_PTEG_FULL
  as the return value of kvmppc_h_pr_enter, but the caller is expecting
  one of the EMULATE_* values.  The H_PTEG_FULL needs to go in the
  guest's R3 instead.

* With H_EXACT set, if the selected HPTE is already valid, the H_ENTER
  call should return a H_PTEG_FULL error.

This fixes these errors and also makes it write only the selected HPTE,
not the whole group, since only the selected HPTE has been modified.
This also micro-optimizes the calculations involving pte_index and i.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_pr_papr.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index da0e0bc..38f1899 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -21,6 +21,8 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 
+#define HPTE_SIZE	16		/* bytes per HPT entry */
+
 static unsigned long get_pteg_addr(struct kvm_vcpu *vcpu, long pte_index)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
@@ -40,32 +42,39 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
 	long pte_index = kvmppc_get_gpr(vcpu, 5);
 	unsigned long pteg[2 * 8];
 	unsigned long pteg_addr, i, *hpte;
+	long int ret;
 
+	i = pte_index & 7;
 	pte_index &= ~7UL;
 	pteg_addr = get_pteg_addr(vcpu, pte_index);
 
 	copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
 	hpte = pteg;
 
+	ret = H_PTEG_FULL;
 	if (likely((flags & H_EXACT) == 0)) {
-		pte_index &= ~7UL;
 		for (i = 0; ; ++i) {
 			if (i == 8)
-				return H_PTEG_FULL;
+				goto done;
 			if ((*hpte & HPTE_V_VALID) == 0)
 				break;
 			hpte += 2;
 		}
 	} else {
-		i = kvmppc_get_gpr(vcpu, 5) & 7UL;
 		hpte += i * 2;
+		if (*hpte & HPTE_V_VALID)
+			goto done;
 	}
 
 	hpte[0] = kvmppc_get_gpr(vcpu, 6);
 	hpte[1] = kvmppc_get_gpr(vcpu, 7);
-	copy_to_user((void __user *)pteg_addr, pteg, sizeof(pteg));
-	kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+	pteg_addr += i * HPTE_SIZE;
+	copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
 	kvmppc_set_gpr(vcpu, 4, pte_index | i);
+	ret = H_SUCCESS;
+
+ done:
+	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 10/23] KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (8 preceding siblings ...)
  2013-08-06  4:20 ` [PATCH 09/23] KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation Paul Mackerras
@ 2013-08-06  4:21 ` Paul Mackerras
  2013-08-06  4:21 ` [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache Paul Mackerras
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:21 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This adds a per-VM mutex to provide mutual exclusion between vcpus
for accesses to and updates of the guest hashed page table (HPT).
This also makes the code use single-byte writes to the HPT entry
when updating of the reference (R) and change (C) bits.  The reason
for doing this, rather than writing back the whole HPTE, is that on
non-PAPR virtual machines, the guest OS might be writing to the HPTE
concurrently, and writing back the whole HPTE might conflict with
that.  Also, real hardware does single-byte writes to update R and C.

The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading
the HPT and updating R and/or C, and in the PAPR HPT update hcalls
(H_ENTER, H_REMOVE, etc.).  Having the mutex means that we don't need
to use a hypervisor lock bit in the HPT update hcalls, and we don't
need to be careful about the order in which the bytes of the HPTE are
updated by those hcalls.

The other change here is to make emulated TLB invalidations (tlbie)
effective across all vcpus.  To do this we call kvmppc_mmu_pte_vflush
for all vcpus in kvmppc_ppc_book3s_64_tlbie().

For 32-bit, this makes the setting of the accessed and dirty bits use
single-byte writes, and makes tlbie invalidate shadow HPTEs for all
vcpus.

With this, PR KVM can successfully run SMP guests.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  3 +++
 arch/powerpc/kvm/book3s_32_mmu.c    | 36 ++++++++++++++++++++++--------------
 arch/powerpc/kvm/book3s_64_mmu.c    | 33 +++++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_pr.c        |  1 +
 arch/powerpc/kvm/book3s_pr_papr.c   | 33 +++++++++++++++++++++++----------
 5 files changed, 72 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2d3c770..c37207f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -259,6 +259,9 @@ struct kvm_arch {
 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 	int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
+#ifdef CONFIG_KVM_BOOK3S_PR
+	struct mutex hpt_mutex;
+#endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index af04553..856af98 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -271,19 +271,22 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Update PTE C and A bits, so the guest's swapper knows we used the
 	   page */
 	if (found) {
-		u32 oldpte = pteg[i+1];
-
-		if (pte->may_read)
-			pteg[i+1] |= PTEG_FLAG_ACCESSED;
-		if (pte->may_write)
-			pteg[i+1] |= PTEG_FLAG_DIRTY;
-		else
-			dprintk_pte("KVM: Mapping read-only page!\n");
-
-		/* Write back into the PTEG */
-		if (pteg[i+1] != oldpte)
-			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
-
+		u32 pte_r = pteg[i+1];
+		char __user *addr = (char __user *) &pteg[i+1];
+
+		/*
+		 * Use single-byte writes to update the HPTE, to
+		 * conform to what real hardware does.
+		 */
+		if (pte->may_read && !(pte_r & PTEG_FLAG_ACCESSED)) {
+			pte_r |= PTEG_FLAG_ACCESSED;
+			put_user(pte_r >> 8, addr + 2);
+		}
+		if (pte->may_write && !(pte_r & PTEG_FLAG_DIRTY)) {
+			/* XXX should only set this for stores */
+			pte_r |= PTEG_FLAG_DIRTY;
+			put_user(pte_r, addr + 3);
+		}
 		return 0;
 	}
 
@@ -348,7 +351,12 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
 
 static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
 {
-	kvmppc_mmu_pte_flush(vcpu, ea, 0x0FFFF000);
+	int i;
+	struct kvm_vcpu *v;
+
+	/* flush this VA on all cpus */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		kvmppc_mmu_pte_flush(v, ea, 0x0FFFF000);
 }
 
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 563fbf7..26a57ca 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -257,6 +257,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	pgsize = slbe->large ? MMU_PAGE_16M : MMU_PAGE_4K;
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
+
 do_second:
 	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
 	if (kvm_is_error_hva(ptegp))
@@ -332,30 +334,37 @@ do_second:
 
 	/* Update PTE R and C bits, so the guest's swapper knows we used the
 	 * page */
-	if (gpte->may_read) {
-		/* Set the accessed flag */
+	if (gpte->may_read && !(r & HPTE_R_R)) {
+		/*
+		 * Set the accessed flag.
+		 * We have to write this back with a single byte write
+		 * because another vcpu may be accessing this on
+		 * non-PAPR platforms such as mac99, and this is
+		 * what real hardware does.
+		 */
+		char __user *addr = (char __user *) &pteg[i+1];
 		r |= HPTE_R_R;
+		put_user(r >> 8, addr + 6);
 	}
-	if (data && gpte->may_write) {
+	if (data && gpte->may_write && !(r & HPTE_R_C)) {
 		/* Set the dirty flag -- XXX even if not writing */
+		/* Use a single byte write */
+		char __user *addr = (char __user *) &pteg[i+1];
 		r |= HPTE_R_C;
+		put_user(r, addr + 7);
 	}
 
-	/* Write back into the PTEG */
-	if (pteg[i+1] != r) {
-		pteg[i+1] = r;
-		copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
-	}
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 
 	if (!gpte->may_read)
 		return -EPERM;
 	return 0;
 
 no_page_found:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	return -ENOENT;
 
 no_seg_found:
-
 	dprintk("KVM MMU: Trigger segment fault\n");
 	return -EINVAL;
 }
@@ -520,6 +529,8 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 				       bool large)
 {
 	u64 mask = 0xFFFFFFFFFULL;
+	long i;
+	struct kvm_vcpu *v;
 
 	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
 
@@ -542,7 +553,9 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
 		if (large)
 			mask = 0xFFFFFF000ULL;
 	}
-	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+	/* flush this VA on all vcpus */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		kvmppc_mmu_pte_vflush(v, va >> 12, mask);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 4d39820..f61d10d 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1401,6 +1401,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
+	mutex_init(&kvm->arch.hpt_mutex);
 
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
 		spin_lock(&kvm_global_user_count_lock);
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 38f1899..5efa97b 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -48,6 +48,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
 	pte_index &= ~7UL;
 	pteg_addr = get_pteg_addr(vcpu, pte_index);
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
 	hpte = pteg;
 
@@ -74,6 +75,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
 	ret = H_SUCCESS;
 
  done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
@@ -86,26 +88,31 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu)
 	unsigned long avpn = kvmppc_get_gpr(vcpu, 6);
 	unsigned long v = 0, pteg, rb;
 	unsigned long pte[2];
+	long int ret;
 
 	pteg = get_pteg_addr(vcpu, pte_index);
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pte, (void __user *)pteg, sizeof(pte));
 
+	ret = H_NOT_FOUND;
 	if ((pte[0] & HPTE_V_VALID) == 0 ||
 	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn) ||
-	    ((flags & H_ANDCOND) && (pte[0] & avpn) != 0)) {
-		kvmppc_set_gpr(vcpu, 3, H_NOT_FOUND);
-		return EMULATE_DONE;
-	}
+	    ((flags & H_ANDCOND) && (pte[0] & avpn) != 0))
+		goto done;
 
 	copy_to_user((void __user *)pteg, &v, sizeof(v));
 
 	rb = compute_tlbie_rb(pte[0], pte[1], pte_index);
 	vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false);
 
-	kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+	ret = H_SUCCESS;
 	kvmppc_set_gpr(vcpu, 4, pte[0]);
 	kvmppc_set_gpr(vcpu, 5, pte[1]);
 
+ done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
+	kvmppc_set_gpr(vcpu, 3, ret);
+
 	return EMULATE_DONE;
 }
 
@@ -133,6 +140,7 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu)
 	int paramnr = 4;
 	int ret = H_SUCCESS;
 
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	for (i = 0; i < H_BULK_REMOVE_MAX_BATCH; i++) {
 		unsigned long tsh = kvmppc_get_gpr(vcpu, paramnr+(2*i));
 		unsigned long tsl = kvmppc_get_gpr(vcpu, paramnr+(2*i)+1);
@@ -181,6 +189,7 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu)
 		}
 		kvmppc_set_gpr(vcpu, paramnr+(2*i), tsh);
 	}
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
@@ -193,15 +202,16 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
 	unsigned long avpn = kvmppc_get_gpr(vcpu, 6);
 	unsigned long rb, pteg, r, v;
 	unsigned long pte[2];
+	long int ret;
 
 	pteg = get_pteg_addr(vcpu, pte_index);
+	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 	copy_from_user(pte, (void __user *)pteg, sizeof(pte));
 
+	ret = H_NOT_FOUND;
 	if ((pte[0] & HPTE_V_VALID) == 0 ||
-	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn)) {
-		kvmppc_set_gpr(vcpu, 3, H_NOT_FOUND);
-		return EMULATE_DONE;
-	}
+	    ((flags & H_AVPN) && (pte[0] & ~0x7fUL) != avpn))
+		goto done;
 
 	v = pte[0];
 	r = pte[1];
@@ -216,8 +226,11 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
 	rb = compute_tlbie_rb(v, r, pte_index);
 	vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false);
 	copy_to_user((void __user *)pteg, pte, sizeof(pte));
+	ret = H_SUCCESS;
 
-	kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+ done:
+	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
+	kvmppc_set_gpr(vcpu, 3, ret);
 
 	return EMULATE_DONE;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (9 preceding siblings ...)
  2013-08-06  4:21 ` [PATCH 10/23] KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe Paul Mackerras
@ 2013-08-06  4:21 ` Paul Mackerras
  2013-08-12 10:03   ` Aneesh Kumar K.V
  2013-08-06  4:22 ` [PATCH 12/23] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode Paul Mackerras
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:21 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
rather than having them embedded in the kvmppc_vcpu_book3s struct,
which is allocated with vzalloc.  The reason is to reduce the
differences between PR and HV KVM in order to make is easier to have
them coexist in one kernel binary.

With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h    |  4 +---
 arch/powerpc/include/asm/kvm_book3s_32.h |  2 +-
 arch/powerpc/include/asm/kvm_host.h      |  5 +++++
 arch/powerpc/kvm/book3s_32_mmu.c         |  8 ++++----
 arch/powerpc/kvm/book3s_64_mmu.c         | 11 +++++------
 arch/powerpc/kvm/book3s_pr.c             | 29 ++++++++++++++++++-----------
 6 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 322b539..1b32f6c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -70,8 +70,6 @@ struct hpte_cache {
 };
 
 struct kvmppc_vcpu_book3s {
-	struct kvm_vcpu vcpu;
-	struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
 	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
 	struct {
 		u64 esid;
@@ -192,7 +190,7 @@ extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
-	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+	return vcpu->arch.book3s;
 }
 
 extern void kvm_return_point(void);
diff --git a/arch/powerpc/include/asm/kvm_book3s_32.h b/arch/powerpc/include/asm/kvm_book3s_32.h
index ce0ef6c..c720e0b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_32.h
+++ b/arch/powerpc/include/asm/kvm_book3s_32.h
@@ -22,7 +22,7 @@
 
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
 {
-	return to_book3s(vcpu)->shadow_vcpu;
+	return vcpu->arch.shadow_vcpu;
 }
 
 static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c37207f..4d83972 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -91,6 +91,9 @@ struct lppaca;
 struct slb_shadow;
 struct dtl_entry;
 
+struct kvmppc_vcpu_book3s;
+struct kvmppc_book3s_shadow_vcpu;
+
 struct kvm_vm_stat {
 	u32 remote_tlb_flush;
 };
@@ -409,6 +412,8 @@ struct kvm_vcpu_arch {
 	int slb_max;		/* 1 + index of last valid entry in slb[] */
 	int slb_nr;		/* total number of entries in SLB */
 	struct kvmppc_mmu mmu;
+	struct kvmppc_vcpu_book3s *book3s;
+	struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
 #endif
 
 	ulong gpr[32];
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 856af98..b14af6d 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -111,10 +111,11 @@ static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
 	kvmppc_set_msr(vcpu, 0);
 }
 
-static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvm_vcpu *vcpu,
 				      u32 sre, gva_t eaddr,
 				      bool primary)
 {
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 	u32 page, hash, pteg, htabmask;
 	hva_t r;
 
@@ -132,7 +133,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3
 		kvmppc_get_pc(&vcpu_book3s->vcpu), eaddr, vcpu_book3s->sdr1, pteg,
 		sr_vsid(sre));
 
-	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	r = gfn_to_hva(vcpu->kvm, pteg >> PAGE_SHIFT);
 	if (kvm_is_error_hva(r))
 		return r;
 	return r | (pteg & ~PAGE_MASK);
@@ -203,7 +204,6 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 				     struct kvmppc_pte *pte, bool data,
 				     bool primary)
 {
-	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 	u32 sre;
 	hva_t ptegp;
 	u32 pteg[16];
@@ -218,7 +218,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
 
-	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu, sre, eaddr, primary);
 	if (kvm_is_error_hva(ptegp)) {
 		printk(KERN_INFO "KVM: Invalid PTEG!\n");
 		goto no_page_found;
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 26a57ca..86925da 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -130,11 +130,11 @@ static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
 	return ((eaddr & kvmppc_slb_offset_mask(slbe)) >> p);
 }
 
-static hva_t kvmppc_mmu_book3s_64_get_pteg(
-				struct kvmppc_vcpu_book3s *vcpu_book3s,
+static hva_t kvmppc_mmu_book3s_64_get_pteg(struct kvm_vcpu *vcpu,
 				struct kvmppc_slb *slbe, gva_t eaddr,
 				bool second)
 {
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 	u64 hash, pteg, htabsize;
 	u32 ssize;
 	hva_t r;
@@ -159,10 +159,10 @@ static hva_t kvmppc_mmu_book3s_64_get_pteg(
 
 	/* When running a PAPR guest, SDR1 contains a HVA address instead
            of a GPA */
-	if (vcpu_book3s->vcpu.arch.papr_enabled)
+	if (vcpu->arch.papr_enabled)
 		r = pteg;
 	else
-		r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+		r = gfn_to_hva(vcpu->kvm, pteg >> PAGE_SHIFT);
 
 	if (kvm_is_error_hva(r))
 		return r;
@@ -208,7 +208,6 @@ static int decode_pagesize(struct kvmppc_slb *slbe, u64 r)
 static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 				struct kvmppc_pte *gpte, bool data)
 {
-	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 	struct kvmppc_slb *slbe;
 	hva_t ptegp;
 	u64 pteg[16];
@@ -260,7 +259,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	mutex_lock(&vcpu->kvm->arch.hpt_mutex);
 
 do_second:
-	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu, slbe, eaddr, second);
 	if (kvm_is_error_hva(ptegp))
 		goto no_page_found;
 
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index f61d10d..5b06a70 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -66,7 +66,7 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 #endif
 	vcpu->cpu = smp_processor_id();
 #ifdef CONFIG_PPC_BOOK3S_32
-	current->thread.kvm_shadow_vcpu = to_book3s(vcpu)->shadow_vcpu;
+	current->thread.kvm_shadow_vcpu = vcpu->arch.shadow_vcpu;
 #endif
 }
 
@@ -1123,17 +1123,22 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	int err = -ENOMEM;
 	unsigned long p;
 
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu)
+		goto out;
+
 	vcpu_book3s = vzalloc(sizeof(struct kvmppc_vcpu_book3s));
 	if (!vcpu_book3s)
-		goto out;
+		goto free_vcpu;
+	vcpu->arch.book3s = vcpu_book3s;
 
 #ifdef CONFIG_KVM_BOOK3S_32
-	vcpu_book3s->shadow_vcpu =
+	vcpu->arch.shadow_vcpu =
 		kzalloc(sizeof(*vcpu_book3s->shadow_vcpu), GFP_KERNEL);
-	if (!vcpu_book3s->shadow_vcpu)
-		goto free_vcpu;
+	if (!vcpu->arch.shadow_vcpu)
+		goto free_vcpu3s;
 #endif
-	vcpu = &vcpu_book3s->vcpu;
+
 	err = kvm_vcpu_init(vcpu, kvm, id);
 	if (err)
 		goto free_shadow_vcpu;
@@ -1171,10 +1176,12 @@ uninit_vcpu:
 	kvm_vcpu_uninit(vcpu);
 free_shadow_vcpu:
 #ifdef CONFIG_KVM_BOOK3S_32
-	kfree(vcpu_book3s->shadow_vcpu);
-free_vcpu:
+	kfree(vcpu->arch.shadow_vcpu);
+free_vcpu3s:
 #endif
 	vfree(vcpu_book3s);
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
 	return ERR_PTR(err);
 }
@@ -1185,8 +1192,9 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 
 	free_page((unsigned long)vcpu->arch.shared & PAGE_MASK);
 	kvm_vcpu_uninit(vcpu);
-	kfree(vcpu_book3s->shadow_vcpu);
+	kfree(vcpu->arch.shadow_vcpu);
 	vfree(vcpu_book3s);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
@@ -1431,8 +1439,7 @@ static int kvmppc_book3s_init(void)
 {
 	int r;
 
-	r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), 0,
-		     THIS_MODULE);
+	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
 
 	if (r)
 		return r;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 12/23] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (10 preceding siblings ...)
  2013-08-06  4:21 ` [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache Paul Mackerras
@ 2013-08-06  4:22 ` Paul Mackerras
  2013-08-06  4:22 ` [PATCH 13/23] KVM: PPC: Book3S: Move skip-interrupt handlers to common code Paul Mackerras
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:22 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

When an interrupt or exception happens in the guest that comes to the
host, the CPU goes to hypervisor real mode (MMU off) to handle the
exception but doesn't change the MMU context.  After saving a few
registers, we then clear the "in guest" flag.  If, for any reason,
we get an exception in the real-mode code, that then gets handled
by the normal kernel exception handlers, which turn the MMU on.  This
is disasterous if the MMU is still set to the guest context, since we
end up executing instructions from random places in the guest kernel
with hypervisor privilege.

In order to catch this situation, we define a new value for the "in guest"
flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
mode with guest MMU context.  If the "in guest" flag is set to this value,
we branch off to an emergency handler.  For the moment, this just does
a branch to self to stop the CPU from doing anything further.

While we're here, we define another new flag value to indicate that we
are in a HV guest, as distinct from a PR guest.  This will be useful
when we have a kernel that can support both PR and HV guests concurrently.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_asm.h        |  2 ++
 arch/powerpc/include/asm/kvm_book3s_asm.h |  1 +
 arch/powerpc/kernel/asm-offsets.c         |  1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 38 +++++++++++++++++++++++--------
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 3d70b7e..9ca0228 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -137,6 +137,8 @@
 #define KVM_GUEST_MODE_NONE	0
 #define KVM_GUEST_MODE_GUEST	1
 #define KVM_GUEST_MODE_SKIP	2
+#define KVM_GUEST_MODE_GUEST_HV	3
+#define KVM_GUEST_MODE_HOST_HV	4
 
 #define KVM_INST_FETCH_FAILED	-1
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 4141409..360742a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -79,6 +79,7 @@ struct kvmppc_host_state {
 	ulong vmhandler;
 	ulong scratch0;
 	ulong scratch1;
+	ulong scratch2;
 	u8 in_guest;
 	u8 restore_hid5;
 	u8 napping;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 14a8004..cbd9366 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -574,6 +574,7 @@ int main(void)
 	HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler);
 	HSTATE_FIELD(HSTATE_SCRATCH0, scratch0);
 	HSTATE_FIELD(HSTATE_SCRATCH1, scratch1);
+	HSTATE_FIELD(HSTATE_SCRATCH2, scratch2);
 	HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
 	HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5);
 	HSTATE_FIELD(HSTATE_NAPPING, napping);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 60dce5b..cf3d045 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -266,6 +266,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 	mtspr	SPRN_DAR, r5
 	mtspr	SPRN_DSISR, r6
 
+	li	r6, KVM_GUEST_MODE_HOST_HV
+	stb	r6, HSTATE_IN_GUEST(r13)
+
 BEGIN_FTR_SECTION
 	/* Restore AMR and UAMOR, set AMOR to all 1s */
 	ld	r5,VCPU_AMR(r4)
@@ -533,7 +536,7 @@ fast_guest_return:
 	mtspr	SPRN_HSRR1,r11
 
 	/* Activate guest mode, so faults get handled by KVM */
-	li	r9, KVM_GUEST_MODE_GUEST
+	li	r9, KVM_GUEST_MODE_GUEST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
 
 	/* Enter guest */
@@ -585,8 +588,15 @@ kvmppc_interrupt:
 	 * guest CR, R12 saved in shadow VCPU SCRATCH1/0
 	 * guest R13 saved in SPRN_SCRATCH0
 	 */
-	/* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */
-	std	r9, HSTATE_HOST_R2(r13)
+	std	r9, HSTATE_SCRATCH2(r13)
+
+	lbz	r9, HSTATE_IN_GUEST(r13)
+	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
+	beq	kvmppc_bad_host_intr
+	/* We're now back in the host but in guest MMU context */
+	li	r9, KVM_GUEST_MODE_HOST_HV
+	stb	r9, HSTATE_IN_GUEST(r13)
+
 	ld	r9, HSTATE_KVM_VCPU(r13)
 
 	/* Save registers */
@@ -600,7 +610,7 @@ kvmppc_interrupt:
 	std	r6, VCPU_GPR(R6)(r9)
 	std	r7, VCPU_GPR(R7)(r9)
 	std	r8, VCPU_GPR(R8)(r9)
-	ld	r0, HSTATE_HOST_R2(r13)
+	ld	r0, HSTATE_SCRATCH2(r13)
 	std	r0, VCPU_GPR(R9)(r9)
 	std	r10, VCPU_GPR(R10)(r9)
 	std	r11, VCPU_GPR(R11)(r9)
@@ -634,10 +644,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	std	r3, VCPU_GPR(R13)(r9)
 	std	r4, VCPU_LR(r9)
 
-	/* Unset guest mode */
-	li	r0, KVM_GUEST_MODE_NONE
-	stb	r0, HSTATE_IN_GUEST(r13)
-
 	stw	r12,VCPU_TRAP(r9)
 
 	/* Save HEIR (HV emulation assist reg) in last_inst
@@ -1050,6 +1056,10 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_AMR,r6
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 
+	/* Unset guest mode */
+	li	r0, KVM_GUEST_MODE_NONE
+	stb	r0, HSTATE_IN_GUEST(r13)
+
 	/* Switch DSCR back to host value */
 BEGIN_FTR_SECTION
 	mfspr	r8, SPRN_DSCR
@@ -1321,7 +1331,7 @@ fast_interrupt_c_return:
 	stw	r8, VCPU_LAST_INST(r9)
 
 	/* Unset guest mode. */
-	li	r0, KVM_GUEST_MODE_NONE
+	li	r0, KVM_GUEST_MODE_HOST_HV
 	stb	r0, HSTATE_IN_GUEST(r13)
 	b	guest_exit_cont
 
@@ -1696,6 +1706,8 @@ secondary_too_late:
 	cmpwi	r3,0
 	bne	13b
 	HMT_MEDIUM
+	li	r0, KVM_GUEST_MODE_NONE
+	stb	r0, HSTATE_IN_GUEST(r13)
 	ld	r11,PACA_SLBSHADOWPTR(r13)
 
 	.rept	SLB_NUM_BOLTED
@@ -1867,3 +1879,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	lwz	r7,VCPU_VRSAVE(r4)
 	mtspr	SPRN_VRSAVE,r7
 	blr
+
+/*
+ * We come here if we get any exception or interrupt while we are
+ * executing host real mode code while in guest MMU context.
+ * For now just spin, but we should do something better.
+ */
+kvmppc_bad_host_intr:
+	b	.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 13/23] KVM: PPC: Book3S: Move skip-interrupt handlers to common code
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (11 preceding siblings ...)
  2013-08-06  4:22 ` [PATCH 12/23] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode Paul Mackerras
@ 2013-08-06  4:22 ` Paul Mackerras
  2013-08-06  4:23 ` [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts Paul Mackerras
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:22 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Both PR and HV KVM have separate, identical copies of the
kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are
used for the situation where an interrupt happens when loading the
instruction that caused an exit from the guest.  To eliminate this
duplication and make it easier to compile in both PR and HV KVM,
this moves this code to arch/powerpc/kernel/exceptions-64s.S along
with other kernel interrupt handler code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/exceptions-64s.S    | 26 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 ------------------------
 arch/powerpc/kvm/book3s_rmhandlers.S    | 26 --------------------------
 3 files changed, 26 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 40e4a17..e3c8a03 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -636,6 +636,32 @@ slb_miss_user_pseries:
 	b	.				/* prevent spec. execution */
 #endif /* __DISABLED__ */
 
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+kvmppc_skip_interrupt:
+	/*
+	 * Here all GPRs are unchanged from when the interrupt happened
+	 * except for r13, which is saved in SPRG_SCRATCH0.
+	 */
+	mfspr	r13, SPRN_SRR0
+	addi	r13, r13, 4
+	mtspr	SPRN_SRR0, r13
+	GET_SCRATCH0(r13)
+	rfid
+	b	.
+
+kvmppc_skip_Hinterrupt:
+	/*
+	 * Here all GPRs are unchanged from when the interrupt happened
+	 * except for r13, which is saved in SPRG_SCRATCH0.
+	 */
+	mfspr	r13, SPRN_HSRR0
+	addi	r13, r13, 4
+	mtspr	SPRN_HSRR0, r13
+	GET_SCRATCH0(r13)
+	hrfid
+	b	.
+#endif
+
 /*
  * Code from here down to __end_handlers is invoked from the
  * exception prologs above.  Because the prologs assemble the
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index cf3d045..af9ba85 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -29,30 +29,6 @@
 #include <asm/kvm_book3s_asm.h>
 #include <asm/mmu-hash64.h>
 
-/*****************************************************************************
- *                                                                           *
- *        Real Mode handlers that need to be in the linear mapping           *
- *                                                                           *
- ****************************************************************************/
-
-	.globl	kvmppc_skip_interrupt
-kvmppc_skip_interrupt:
-	mfspr	r13,SPRN_SRR0
-	addi	r13,r13,4
-	mtspr	SPRN_SRR0,r13
-	GET_SCRATCH0(r13)
-	rfid
-	b	.
-
-	.globl	kvmppc_skip_Hinterrupt
-kvmppc_skip_Hinterrupt:
-	mfspr	r13,SPRN_HSRR0
-	addi	r13,r13,4
-	mtspr	SPRN_HSRR0,r13
-	GET_SCRATCH0(r13)
-	hrfid
-	b	.
-
 /*
  * Call kvmppc_hv_entry in real mode.
  * Must be called with interrupts hard-disabled.
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S
index b64d7f9..b746c38 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -38,32 +38,6 @@
 
 #define FUNC(name) 		GLUE(.,name)
 
-	.globl	kvmppc_skip_interrupt
-kvmppc_skip_interrupt:
-	/*
-	 * Here all GPRs are unchanged from when the interrupt happened
-	 * except for r13, which is saved in SPRG_SCRATCH0.
-	 */
-	mfspr	r13, SPRN_SRR0
-	addi	r13, r13, 4
-	mtspr	SPRN_SRR0, r13
-	GET_SCRATCH0(r13)
-	rfid
-	b	.
-
-	.globl	kvmppc_skip_Hinterrupt
-kvmppc_skip_Hinterrupt:
-	/*
-	 * Here all GPRs are unchanged from when the interrupt happened
-	 * except for r13, which is saved in SPRG_SCRATCH0.
-	 */
-	mfspr	r13, SPRN_HSRR0
-	addi	r13, r13, 4
-	mtspr	SPRN_HSRR0, r13
-	GET_SCRATCH0(r13)
-	hrfid
-	b	.
-
 #elif defined(CONFIG_PPC_BOOK3S_32)
 
 #define FUNC(name)		name
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (12 preceding siblings ...)
  2013-08-06  4:22 ` [PATCH 13/23] KVM: PPC: Book3S: Move skip-interrupt handlers to common code Paul Mackerras
@ 2013-08-06  4:23 ` Paul Mackerras
  2013-08-30 16:30   ` Alexander Graf
  2013-08-06  4:24 ` [PATCH 15/23] KVM: PPC: Book3S: Rename symbols that exist in both PR and HV KVM Paul Mackerras
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:23 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

When we are running a PR KVM guest on POWER8, we have to disable the
new POWER8 feature of taking interrupts with relocation on, that is,
of taking interrupts without disabling the MMU, because the SLB does
not contain the normal kernel SLB entries while in the guest.
Currently we disable relocation-on interrupts when a PR guest is
created, and leave it disabled until there are no more PR guests in
existence.

This defers the disabling of relocation-on interrupts until the first
time a PR KVM guest vcpu is run.  The reason is that in future we will
support both PR and HV guests in the same kernel, and this will avoid
disabling relocation-on interrupts unnecessarily for guests which turn
out to be HV guests, as we will not know at VM creation time whether
it will be a PR or a HV guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_pr.c        | 71 ++++++++++++++++++++++++++-----------
 2 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 4d83972..c012db2 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -264,6 +264,7 @@ struct kvm_arch {
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
 #ifdef CONFIG_KVM_BOOK3S_PR
 	struct mutex hpt_mutex;
+	bool relon_disabled;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 5b06a70..2759ddc 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1197,6 +1197,47 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
+/*
+ * On POWER8, we have to disable relocation-on interrupts while
+ * we are in the guest, since the guest doesn't have the normal
+ * kernel SLB contents.  Since disabling relocation-on interrupts
+ * is a fairly heavy-weight operation, we do it once when starting
+ * the first guest vcpu and leave it disabled until the last guest
+ * has been destroyed.
+ */
+static unsigned int kvm_global_user_count = 0;
+static DEFINE_SPINLOCK(kvm_global_user_count_lock);
+
+static void disable_relon_interrupts(struct kvm *kvm)
+{
+	mutex_lock(&kvm->lock);
+	if (!kvm->arch.relon_disabled) {
+		if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
+			spin_lock(&kvm_global_user_count_lock);
+			if (++kvm_global_user_count == 1)
+				pSeries_disable_reloc_on_exc();
+			spin_unlock(&kvm_global_user_count_lock);
+		}
+		/* order disabling above with setting relon_disabled */
+		smp_mb();
+		kvm->arch.relon_disabled = true;
+	}
+	mutex_unlock(&kvm->lock);
+}
+
+static void enable_relon_interrupts(struct kvm *kvm)
+{
+	if (kvm->arch.relon_disabled &&
+	    firmware_has_feature(FW_FEATURE_SET_MODE)) {
+		spin_lock(&kvm_global_user_count_lock);
+		BUG_ON(kvm_global_user_count == 0);
+		if (--kvm_global_user_count == 0)
+			pSeries_enable_reloc_on_exc();
+		spin_unlock(&kvm_global_user_count_lock);
+	}
+	kvm->arch.relon_disabled = false;
+}
+
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int ret;
@@ -1234,6 +1275,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (!vcpu->kvm->arch.relon_disabled)
+		disable_relon_interrupts(vcpu->kvm);
+
 	/* Save FPU state in stack */
 	if (current->thread.regs->msr & MSR_FP)
 		giveup_fpu(current);
@@ -1400,9 +1444,6 @@ void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 }
 
-static unsigned int kvm_global_user_count = 0;
-static DEFINE_SPINLOCK(kvm_global_user_count_lock);
-
 int kvmppc_core_init_vm(struct kvm *kvm)
 {
 #ifdef CONFIG_PPC64
@@ -1411,28 +1452,18 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 #endif
 	mutex_init(&kvm->arch.hpt_mutex);
 
-	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
-		spin_lock(&kvm_global_user_count_lock);
-		if (++kvm_global_user_count == 1)
-			pSeries_disable_reloc_on_exc();
-		spin_unlock(&kvm_global_user_count_lock);
-	}
+	/*
+	 * If we don't have relocation-on interrupts at all,
+	 * then we can consider them to be already disabled.
+	 */
+	kvm->arch.relon_disabled = !firmware_has_feature(FW_FEATURE_SET_MODE);
+
 	return 0;
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
-#ifdef CONFIG_PPC64
-	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
-#endif
-
-	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
-		spin_lock(&kvm_global_user_count_lock);
-		BUG_ON(kvm_global_user_count == 0);
-		if (--kvm_global_user_count == 0)
-			pSeries_enable_reloc_on_exc();
-		spin_unlock(&kvm_global_user_count_lock);
-	}
+	enable_relon_interrupts(kvm);
 }
 
 static int kvmppc_book3s_init(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 15/23] KVM: PPC: Book3S: Rename symbols that exist in both PR and HV KVM
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (13 preceding siblings ...)
  2013-08-06  4:23 ` [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts Paul Mackerras
@ 2013-08-06  4:24 ` Paul Mackerras
  2013-08-06  4:24 ` [PATCH 16/23] KVM: PPC: Book3S: Merge implementations of KVM_PPC_GET_SMMU_INFO ioctl Paul Mackerras
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:24 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This renames almost all of the symbols that exist in both PR and HV
KVM, as one step towards making it possible to compile both in one
kernel image.  Symbols in the PR KVM implementation get "_pr"
appended, and those in the HV KVM implementation get "_hv".  Then,
in book3s.c, we add a function with the name without the suffix and
arrange for it to call the appropriate suffixed function using either
the VCPU_DO_PR/VCPU_DO_HV pair of macros or the DO_IF_PR/DO_IF_HV
pair.  These macros take a "kvm" or "vcpu" argument that is currently
unused, but which will be used in future patches.

There are a few exceptions to this general scheme:

* kvmppc_core_free_memslot() and kvmppc_core_create_memslot() don't
  take a kvm or vcpu argument, so for them we call the HV function
  if HV is selected in the kernel config (the PR implementation of
  these is empty).

* kvmppc_core_init_vm() and kvmppc_core_destroy_vm() have some common
code factored into the book3s.c implementation.

* kvmppc_book3s_init(), kvmppc_book3s_exit() and
  kvmppc_core_check_processor_compat() have been moved entirely
  into book3s.c

* kvmppc_interrupt and kvm_vm_ioctl_get_smmu_info() are not handled
  here.

* The kvmppc_handler_highmem label is unused and is removed here.

* kvm_return_point() is declared but not defined or used anywhere,
  so this removes the declaration.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h   |  74 +++++++++-
 arch/powerpc/kvm/book3s.c               | 232 +++++++++++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_32_mmu_host.c   |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c   |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c     |  17 +--
 arch/powerpc/kvm/book3s_hv.c            | 106 +++++----------
 arch/powerpc/kvm/book3s_hv_interrupts.S |   3 -
 arch/powerpc/kvm/book3s_interrupts.S    |   5 +-
 arch/powerpc/kvm/book3s_pr.c            | 116 ++++------------
 9 files changed, 374 insertions(+), 183 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 1b32f6c..476d862 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -24,6 +24,8 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_book3s_asm.h>
 
+union kvmppc_one_reg;
+
 struct kvmppc_bat {
 	u64 raw;
 	u32 bepi;
@@ -124,7 +126,6 @@ extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong ea_mask)
 extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
 extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end);
 extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
-extern void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr);
 extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu);
@@ -188,13 +189,80 @@ extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
 
+/* Functions that have implementations in both PR and HV KVM */
+extern struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
+						   unsigned int id);
+extern struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
+						   unsigned int id);
+extern void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
+					struct kvm_memory_slot *dont);
+extern int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot,
+					 unsigned long npages);
+extern int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
+				struct kvm_memory_slot *memslot,
+				struct kvm_userspace_memory_region *mem);
+extern void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
+				struct kvm_userspace_memory_region *mem,
+				const struct kvm_memory_slot *old);
+extern int kvmppc_core_init_vm_pr(struct kvm *kvm);
+extern int kvmppc_core_init_vm_hv(struct kvm *kvm);
+extern void kvmppc_core_destroy_vm_pr(struct kvm *kvm);
+extern void kvmppc_core_destroy_vm_hv(struct kvm *kvm);
+
+extern void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu);
+extern void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu);
+extern void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu);
+extern void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr);
+extern void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr);
+extern void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr);
+extern void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr);
+extern int kvm_arch_vcpu_ioctl_get_sregs_pr(struct kvm_vcpu *vcpu,
+					    struct kvm_sregs *sregs);
+extern int kvm_arch_vcpu_ioctl_get_sregs_hv(struct kvm_vcpu *vcpu,
+					    struct kvm_sregs *sregs);
+extern int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu,
+					    struct kvm_sregs *sregs);
+extern int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu,
+					    struct kvm_sregs *sregs);
+extern void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu);
+extern int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+extern int kvmppc_vcpu_run_hv(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_destroy_pr(struct kvm_vcpu *vcpu);
+extern int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id,
+				 union kvmppc_one_reg *val);
+extern int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
+				 union kvmppc_one_reg *val);
+extern int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id,
+				 union kvmppc_one_reg *val);
+extern int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
+				 union kvmppc_one_reg *val);
+extern int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm,
+					 struct kvm_dirty_log *log);
+extern int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
+					 struct kvm_dirty_log *log);
+extern int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu);
+extern int kvm_unmap_hva_pr(struct kvm *kvm, unsigned long hva);
+extern int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva);
+extern int kvm_unmap_hva_range_pr(struct kvm *kvm, unsigned long start,
+				  unsigned long end);
+extern int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start,
+				  unsigned long end);
+extern int kvm_age_hva_pr(struct kvm *kvm, unsigned long hva);
+extern int kvm_age_hva_hv(struct kvm *kvm, unsigned long hva);
+extern int kvm_test_age_hva_pr(struct kvm *kvm, unsigned long hva);
+extern int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva);
+extern void kvm_set_spte_hva_pr(struct kvm *kvm, unsigned long hva, pte_t pte);
+extern void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte);
+extern void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
+					 struct kvm_memory_slot *memslot);
+
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.book3s;
 }
 
-extern void kvm_return_point(void);
-
 /* Also add subarch specific defines */
 
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 700df6f..4b136be 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -61,6 +61,20 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+#ifdef CONFIG_KVM_BOOK3S_PR
+#define DO_IF_PR(kvm, x)	x
+#define DO_IF_HV(kvm, x)	
+#define VCPU_DO_PR(vcpu, x)	x
+#define VCPU_DO_HV(vcpu, x)
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+#define DO_IF_PR(kvm, x)
+#define DO_IF_HV(kvm, x)	x
+#define VCPU_DO_PR(vcpu, x)
+#define VCPU_DO_HV(vcpu, x)	x
+#endif
+
 void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
 {
 }
@@ -419,6 +433,22 @@ void kvmppc_subarch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	VCPU_DO_PR(vcpu, return kvm_arch_vcpu_ioctl_get_sregs_pr(vcpu, sregs));
+	VCPU_DO_HV(vcpu, return kvm_arch_vcpu_ioctl_get_sregs_hv(vcpu, sregs));
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	VCPU_DO_PR(vcpu, return kvm_arch_vcpu_ioctl_set_sregs_pr(vcpu, sregs));
+	VCPU_DO_HV(vcpu, return kvm_arch_vcpu_ioctl_set_sregs_hv(vcpu, sregs));
+	return -EINVAL;
+}
+
 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
 	int i;
@@ -495,7 +525,8 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 	if (size > sizeof(val))
 		return -EINVAL;
 
-	r = kvmppc_get_one_reg(vcpu, reg->id, &val);
+	VCPU_DO_PR(vcpu, r = kvmppc_get_one_reg_pr(vcpu, reg->id, &val));
+	VCPU_DO_HV(vcpu, r = kvmppc_get_one_reg_hv(vcpu, reg->id, &val));
 
 	if (r == -EINVAL) {
 		r = 0;
@@ -572,7 +603,8 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 	if (copy_from_user(&val, (char __user *)(unsigned long)reg->addr, size))
 		return -EFAULT;
 
-	r = kvmppc_set_one_reg(vcpu, reg->id, &val);
+	VCPU_DO_PR(vcpu, r = kvmppc_set_one_reg_pr(vcpu, reg->id, &val));
+	VCPU_DO_HV(vcpu, r = kvmppc_set_one_reg_hv(vcpu, reg->id, &val));
 
 	if (r == -EINVAL) {
 		r = 0;
@@ -625,6 +657,31 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 	return r;
 }
 
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	VCPU_DO_PR(vcpu, kvmppc_core_vcpu_load_pr(vcpu, cpu));
+	VCPU_DO_HV(vcpu, kvmppc_core_vcpu_load_hv(vcpu, cpu));
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	VCPU_DO_PR(vcpu, kvmppc_core_vcpu_put_pr(vcpu));
+	VCPU_DO_HV(vcpu, kvmppc_core_vcpu_put_hv(vcpu));
+}
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	VCPU_DO_PR(vcpu, kvmppc_set_msr_pr(vcpu, msr));
+	VCPU_DO_HV(vcpu, kvmppc_set_msr_hv(vcpu, msr));
+}
+
+int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	VCPU_DO_PR(vcpu, return kvmppc_vcpu_run_pr(kvm_run, vcpu));
+	VCPU_DO_HV(vcpu, return kvmppc_vcpu_run_hv(kvm_run, vcpu));
+	return -EINVAL;
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
                                   struct kvm_translation *tr)
 {
@@ -644,3 +701,174 @@ void kvmppc_decrementer_func(unsigned long data)
 	kvmppc_core_queue_dec(vcpu);
 	kvm_vcpu_kick(vcpu);
 }
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	DO_IF_PR(kvm, return kvmppc_core_vcpu_create_pr(kvm, id));
+	DO_IF_HV(kvm, return kvmppc_core_vcpu_create_hv(kvm, id));
+	return NULL;
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	VCPU_DO_PR(vcpu, kvmppc_core_vcpu_free_pr(vcpu));
+	VCPU_DO_HV(vcpu, kvmppc_core_vcpu_free_hv(vcpu));
+}
+
+int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
+{
+	VCPU_DO_PR(vcpu, return kvmppc_core_check_requests_pr(vcpu));
+	return 1;
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	DO_IF_PR(kvm, return kvm_vm_ioctl_get_dirty_log_pr(kvm, log));
+	DO_IF_HV(kvm, return kvm_vm_ioctl_get_dirty_log_hv(kvm, log));
+	return -ENOTTY;
+}
+
+void kvmppc_core_free_memslot(struct kvm_memory_slot *free,
+			      struct kvm_memory_slot *dont)
+{
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	kvmppc_core_free_memslot_hv(free, dont);
+#endif
+}
+
+int kvmppc_core_create_memslot(struct kvm_memory_slot *slot,
+			       unsigned long npages)
+{
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	return kvmppc_core_create_memslot_hv(slot, npages);
+#endif
+	return 0;
+}
+
+void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	DO_IF_HV(kvm, kvmppc_core_flush_memslot_hv(kvm, memslot));
+}
+
+int kvmppc_core_prepare_memory_region(struct kvm *kvm,
+				struct kvm_memory_slot *memslot,
+				struct kvm_userspace_memory_region *mem)
+{
+	DO_IF_HV(kvm, return kvmppc_core_prepare_memory_region_hv(kvm,
+							memslot, mem));
+	return 0;
+}
+
+void kvmppc_core_commit_memory_region(struct kvm *kvm,
+				struct kvm_userspace_memory_region *mem,
+				const struct kvm_memory_slot *old)
+{
+	DO_IF_HV(kvm, kvmppc_core_commit_memory_region_hv(kvm, mem, old));
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	DO_IF_PR(kvm, return kvm_unmap_hva_pr(kvm, hva));
+	DO_IF_HV(kvm, return kvm_unmap_hva_hv(kvm, hva));
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+	DO_IF_PR(kvm, return kvm_unmap_hva_range_pr(kvm, start, end));
+	DO_IF_HV(kvm, return kvm_unmap_hva_range_hv(kvm, start, end));
+	return 0;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	DO_IF_PR(kvm, return kvm_age_hva_pr(kvm, hva));
+	DO_IF_HV(kvm, return kvm_age_hva_hv(kvm, hva));
+	return 0;
+}
+
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	DO_IF_PR(kvm, return kvm_test_age_hva_pr(kvm, hva));
+	DO_IF_HV(kvm, return kvm_test_age_hva_hv(kvm, hva));
+	return 0;
+}
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	DO_IF_PR(kvm, kvm_set_spte_hva_pr(kvm, hva, pte));
+	DO_IF_HV(kvm, kvm_set_spte_hva_hv(kvm, hva, pte));
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	VCPU_DO_PR(vcpu, kvmppc_mmu_destroy_pr(vcpu));
+}
+
+int kvmppc_core_init_vm(struct kvm *kvm)
+{
+	int err = -EINVAL;
+
+#ifdef CONFIG_PPC64
+	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_PR
+	err = kvmppc_core_init_vm_pr(kvm);
+#endif
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	err = kvmppc_core_init_vm_hv(kvm);
+#endif
+
+	return err;
+}
+
+void kvmppc_core_destroy_vm(struct kvm *kvm)
+{
+	DO_IF_PR(kvm, kvmppc_core_destroy_vm_pr(kvm));
+	DO_IF_HV(kvm, kvmppc_core_destroy_vm_hv(kvm));
+
+#ifdef CONFIG_PPC64
+	kvmppc_rtas_tokens_free(kvm);
+	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
+#endif
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+#if defined(CONFIG_KVM_BOOK3S_64_HV)
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return -EIO;
+#endif
+	return 0;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	int r;
+
+	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	if (r)
+		return r;
+
+#ifdef CONFIG_KVM_BOOK3S_PR
+	r = kvmppc_mmu_hpte_sysinit();
+#endif
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	r = kvmppc_mmu_hv_init();
+#endif
+
+	return r;
+}
+
+static void kvmppc_book3s_exit(void)
+{
+#ifdef CONFIG_KVM_BOOK3S_PR
+	kvmppc_mmu_hpte_sysexit();
+#endif
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 00e619b..c4361ef 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -341,7 +341,7 @@ void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
 	svcpu_put(svcpu);
 }
 
-void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+void kvmppc_mmu_destroy_pr(struct kvm_vcpu *vcpu)
 {
 	int i;
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 21a51e8..3dd178c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -341,7 +341,7 @@ void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
 	svcpu_put(svcpu);
 }
 
-void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+void kvmppc_mmu_destroy_pr(struct kvm_vcpu *vcpu)
 {
 	kvmppc_mmu_hpte_destroy(vcpu);
 	__destroy_context(to_book3s(vcpu)->context_id[0]);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7eb5dda..e37c785 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -260,10 +260,6 @@ int kvmppc_mmu_hv_init(void)
 	return 0;
 }
 
-void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
-{
-}
-
 static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
 {
 	kvmppc_set_msr(vcpu, MSR_SF | MSR_ME);
@@ -904,21 +900,22 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	return 0;
 }
 
-int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
 {
 	if (kvm->arch.using_mmu_notifiers)
 		kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 	return 0;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
+int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end)
 {
 	if (kvm->arch.using_mmu_notifiers)
 		kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
 	return 0;
 }
 
-void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
+				  struct kvm_memory_slot *memslot)
 {
 	unsigned long *rmapp;
 	unsigned long gfn;
@@ -992,7 +989,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	return ret;
 }
 
-int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+int kvm_age_hva_hv(struct kvm *kvm, unsigned long hva)
 {
 	if (!kvm->arch.using_mmu_notifiers)
 		return 0;
@@ -1030,14 +1027,14 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	return ret;
 }
 
-int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva)
 {
 	if (!kvm->arch.using_mmu_notifiers)
 		return 0;
 	return kvm_handle_hva(kvm, hva, kvm_test_age_rmapp);
 }
 
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
 	if (!kvm->arch.using_mmu_notifiers)
 		return;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2b95c45..fcf0564 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -125,7 +125,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
  * purely defensive; they should never fail.)
  */
 
-void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
@@ -143,7 +143,7 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
-void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
@@ -155,13 +155,13 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
-void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
 	vcpu->arch.shregs.msr = msr;
 	kvmppc_end_cede(vcpu);
 }
 
-void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 {
 	vcpu->arch.pvr = pvr;
 }
@@ -576,8 +576,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 	return RESUME_GUEST;
 }
 
-static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
-			      struct task_struct *tsk)
+static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
+				 struct task_struct *tsk)
 {
 	int r = RESUME_HOST;
 
@@ -679,8 +679,8 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	return r;
 }
 
-int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
-                                  struct kvm_sregs *sregs)
+int kvm_arch_vcpu_ioctl_get_sregs_hv(struct kvm_vcpu *vcpu,
+				     struct kvm_sregs *sregs)
 {
 	int i;
 
@@ -695,12 +695,12 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
-                                  struct kvm_sregs *sregs)
+int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu,
+				     struct kvm_sregs *sregs)
 {
 	int i, j;
 
-	kvmppc_set_pvr(vcpu, sregs->pvr);
+	kvmppc_set_pvr_hv(vcpu, sregs->pvr);
 
 	j = 0;
 	for (i = 0; i < vcpu->arch.slb_nr; i++) {
@@ -715,7 +715,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
+int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
+			  union kvmppc_one_reg *val)
 {
 	int r = 0;
 	long int i;
@@ -796,7 +797,8 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
 	return r;
 }
 
-int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
+int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
+			  union kvmppc_one_reg *val)
 {
 	int r = 0;
 	long int i;
@@ -889,14 +891,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
 	return r;
 }
 
-int kvmppc_core_check_processor_compat(void)
-{
-	if (cpu_has_feature(CPU_FTR_HVMODE))
-		return 0;
-	return -EIO;
-}
-
-struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id)
 {
 	struct kvm_vcpu *vcpu;
 	int err = -EINVAL;
@@ -921,7 +916,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	vcpu->arch.ctrl = CTRL_RUNLATCH;
 	/* default to host PVR, since we can't spoof it */
 	vcpu->arch.pvr = mfspr(SPRN_PVR);
-	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	kvmppc_set_pvr_hv(vcpu, vcpu->arch.pvr);
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
@@ -973,7 +968,7 @@ static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
 					vpa->dirty);
 }
 
-void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 {
 	spin_lock(&vcpu->arch.vpa_update_lock);
 	unpin_vpa(vcpu->kvm, &vcpu->arch.dtl);
@@ -1265,8 +1260,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 
 		ret = RESUME_GUEST;
 		if (vcpu->arch.trap)
-			ret = kvmppc_handle_exit(vcpu->arch.kvm_run, vcpu,
-						 vcpu->arch.run_task);
+			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+						    vcpu->arch.run_task);
 
 		vcpu->arch.ret = ret;
 		vcpu->arch.trap = 0;
@@ -1425,7 +1420,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	return vcpu->arch.ret;
 }
 
-int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
+int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
 	int r;
 	int srcu_idx;
@@ -1614,7 +1609,7 @@ int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memory_slot *memslot;
 	int r;
@@ -1668,8 +1663,8 @@ static void unpin_slot(struct kvm_memory_slot *memslot)
 	}
 }
 
-void kvmppc_core_free_memslot(struct kvm_memory_slot *free,
-			      struct kvm_memory_slot *dont)
+void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
+				 struct kvm_memory_slot *dont)
 {
 	if (!dont || free->arch.rmap != dont->arch.rmap) {
 		vfree(free->arch.rmap);
@@ -1682,8 +1677,8 @@ void kvmppc_core_free_memslot(struct kvm_memory_slot *free,
 	}
 }
 
-int kvmppc_core_create_memslot(struct kvm_memory_slot *slot,
-			       unsigned long npages)
+int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot,
+				  unsigned long npages)
 {
 	slot->arch.rmap = vzalloc(npages * sizeof(*slot->arch.rmap));
 	if (!slot->arch.rmap)
@@ -1693,9 +1688,9 @@ int kvmppc_core_create_memslot(struct kvm_memory_slot *slot,
 	return 0;
 }
 
-int kvmppc_core_prepare_memory_region(struct kvm *kvm,
-				      struct kvm_memory_slot *memslot,
-				      struct kvm_userspace_memory_region *mem)
+int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
+				struct kvm_memory_slot *memslot,
+				struct kvm_userspace_memory_region *mem)
 {
 	unsigned long *phys;
 
@@ -1711,9 +1706,9 @@ int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 	return 0;
 }
 
-void kvmppc_core_commit_memory_region(struct kvm *kvm,
-				      struct kvm_userspace_memory_region *mem,
-				      const struct kvm_memory_slot *old)
+void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
+				struct kvm_userspace_memory_region *mem,
+				const struct kvm_memory_slot *old)
 {
 	unsigned long npages = mem->memory_size >> PAGE_SHIFT;
 	struct kvm_memory_slot *memslot;
@@ -1876,7 +1871,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	goto out_srcu;
 }
 
-int kvmppc_core_init_vm(struct kvm *kvm)
+int kvmppc_core_init_vm_hv(struct kvm *kvm)
 {
 	unsigned long lpcr, lpid;
 
@@ -1894,9 +1889,6 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	 */
 	cpumask_setall(&kvm->arch.need_tlb_flush);
 
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
-	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
-
 	kvm->arch.rma = NULL;
 
 	kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
@@ -1932,7 +1924,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	return 0;
 }
 
-void kvmppc_core_destroy_vm(struct kvm *kvm)
+void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
 	uninhibit_secondary_onlining();
 
@@ -1941,15 +1933,7 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 		kvm->arch.rma = NULL;
 	}
 
-	kvmppc_rtas_tokens_free(kvm);
-
 	kvmppc_free_hpt(kvm);
-	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
-}
-
-/* These are stubs for now */
-void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
-{
 }
 
 /* We don't need to emulate any privileged instructions or dcbz */
@@ -1968,25 +1952,3 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
 {
 	return EMULATE_FAIL;
 }
-
-static int kvmppc_book3s_hv_init(void)
-{
-	int r;
-
-	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
-
-	if (r)
-		return r;
-
-	r = kvmppc_mmu_hv_init();
-
-	return r;
-}
-
-static void kvmppc_book3s_hv_exit(void)
-{
-	kvm_exit();
-}
-
-module_init(kvmppc_book3s_hv_init);
-module_exit(kvmppc_book3s_hv_exit);
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S
index 37f1cc4..928142c 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -158,9 +158,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
  * Interrupts are enabled again at this point.
  */
 
-.global kvmppc_handler_highmem
-kvmppc_handler_highmem:
-
 	/*
 	 * Register usage at this point:
 	 *
diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S
index c81a185..0e2c7ac 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -120,9 +120,6 @@ kvm_start_lightweight:
  *
  */
 
-.global kvmppc_handler_highmem
-kvmppc_handler_highmem:
-
 	/*
 	 * Register usage at this point:
 	 *
@@ -183,7 +180,7 @@ kvmppc_handler_highmem:
 
 	/* Restore r3 (kvm_run) and r4 (vcpu) */
 	REST_2GPRS(3, r1)
-	bl	FUNC(kvmppc_handle_exit)
+	bl	FUNC(kvmppc_handle_exit_pr)
 
 	/* If RESUME_GUEST, get back in the loop */
 	cmpwi	r3, RESUME_GUEST
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 2759ddc..ab3b032 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -56,7 +56,7 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 #define HW_PAGE_SIZE PAGE_SIZE
 #endif
 
-void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
@@ -70,7 +70,7 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 #endif
 }
 
-void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
@@ -137,7 +137,7 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
 	vcpu->arch.last_inst   = svcpu->last_inst;
 }
 
-int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
+int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu)
 {
 	int r = 1; /* Indicate we want to get back into the guest */
 
@@ -151,7 +151,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 
 /************* MMU Notifiers *************/
 
-int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+int kvm_unmap_hva_pr(struct kvm *kvm, unsigned long hva)
 {
 	trace_kvm_unmap_hva(hva);
 
@@ -164,7 +164,8 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 	return 0;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
+int kvm_unmap_hva_range_pr(struct kvm *kvm, unsigned long start,
+			   unsigned long end)
 {
 	/* kvm_unmap_hva flushes everything anyways */
 	kvm_unmap_hva(kvm, start);
@@ -172,19 +173,19 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
 	return 0;
 }
 
-int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+int kvm_age_hva_pr(struct kvm *kvm, unsigned long hva)
 {
 	/* XXX could be more clever ;) */
 	return 0;
 }
 
-int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+int kvm_test_age_hva_pr(struct kvm *kvm, unsigned long hva)
 {
 	/* XXX could be more clever ;) */
 	return 0;
 }
 
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+void kvm_set_spte_hva_pr(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
 	/* The page will get remapped properly on its next fault */
 	kvm_unmap_hva(kvm, hva);
@@ -209,7 +210,7 @@ static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu)
 	vcpu->arch.shadow_msr = smsr;
 }
 
-void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 {
 	ulong old_msr = vcpu->arch.shared->msr;
 
@@ -269,7 +270,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 }
 
-void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
 {
 	u32 host_pvr;
 
@@ -688,8 +689,8 @@ static void kvmppc_handle_lost_ext(struct kvm_vcpu *vcpu)
 	current->thread.regs->msr |= lost_ext;
 }
 
-int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
-                       unsigned int exit_nr)
+int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			  unsigned int exit_nr)
 {
 	int r = RESUME_HOST;
 	int s;
@@ -989,8 +990,8 @@ program_interrupt:
 	return r;
 }
 
-int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
-                                  struct kvm_sregs *sregs)
+int kvm_arch_vcpu_ioctl_get_sregs_pr(struct kvm_vcpu *vcpu,
+				     struct kvm_sregs *sregs)
 {
 	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
 	int i;
@@ -1016,13 +1017,13 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
-                                  struct kvm_sregs *sregs)
+int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu,
+				     struct kvm_sregs *sregs)
 {
 	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
 	int i;
 
-	kvmppc_set_pvr(vcpu, sregs->pvr);
+	kvmppc_set_pvr_pr(vcpu, sregs->pvr);
 
 	vcpu3s->sdr1 = sregs->u.s.sdr1;
 	if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) {
@@ -1052,7 +1053,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
+int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id,
+			  union kvmppc_one_reg *val)
 {
 	int r = 0;
 
@@ -1081,7 +1083,8 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
 	return r;
 }
 
-int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
+int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id,
+			  union kvmppc_one_reg *val)
 {
 	int r = 0;
 
@@ -1111,12 +1114,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val)
 	return r;
 }
 
-int kvmppc_core_check_processor_compat(void)
-{
-	return 0;
-}
-
-struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm, unsigned int id)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s;
 	struct kvm_vcpu *vcpu;
@@ -1161,7 +1159,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	/* default to book3s_32 (750) */
 	vcpu->arch.pvr = 0x84202;
 #endif
-	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	kvmppc_set_pvr_pr(vcpu, vcpu->arch.pvr);
 	vcpu->arch.slb_nr = 64;
 
 	vcpu->arch.shadow_msr = MSR_USER64;
@@ -1186,7 +1184,7 @@ out:
 	return ERR_PTR(err);
 }
 
-void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
@@ -1238,7 +1236,7 @@ static void enable_relon_interrupts(struct kvm *kvm)
 	kvm->arch.relon_disabled = false;
 }
 
-int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int ret;
 	double fpr[32][TS_FPRWIDTH];
@@ -1350,8 +1348,7 @@ out:
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-				      struct kvm_dirty_log *log)
+int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memory_slot *memslot;
 	struct kvm_vcpu *vcpu;
@@ -1416,40 +1413,8 @@ int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
 }
 #endif /* CONFIG_PPC64 */
 
-void kvmppc_core_free_memslot(struct kvm_memory_slot *free,
-			      struct kvm_memory_slot *dont)
-{
-}
-
-int kvmppc_core_create_memslot(struct kvm_memory_slot *slot,
-			       unsigned long npages)
+int kvmppc_core_init_vm_pr(struct kvm *kvm)
 {
-	return 0;
-}
-
-int kvmppc_core_prepare_memory_region(struct kvm *kvm,
-				      struct kvm_memory_slot *memslot,
-				      struct kvm_userspace_memory_region *mem)
-{
-	return 0;
-}
-
-void kvmppc_core_commit_memory_region(struct kvm *kvm,
-				struct kvm_userspace_memory_region *mem,
-				const struct kvm_memory_slot *old)
-{
-}
-
-void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
-{
-}
-
-int kvmppc_core_init_vm(struct kvm *kvm)
-{
-#ifdef CONFIG_PPC64
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
-	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
-#endif
 	mutex_init(&kvm->arch.hpt_mutex);
 
 	/*
@@ -1461,30 +1426,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	return 0;
 }
 
-void kvmppc_core_destroy_vm(struct kvm *kvm)
+void kvmppc_core_destroy_vm_pr(struct kvm *kvm)
 {
 	enable_relon_interrupts(kvm);
 }
-
-static int kvmppc_book3s_init(void)
-{
-	int r;
-
-	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
-
-	if (r)
-		return r;
-
-	r = kvmppc_mmu_hpte_sysinit();
-
-	return r;
-}
-
-static void kvmppc_book3s_exit(void)
-{
-	kvmppc_mmu_hpte_sysexit();
-	kvm_exit();
-}
-
-module_init(kvmppc_book3s_init);
-module_exit(kvmppc_book3s_exit);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 16/23] KVM: PPC: Book3S: Merge implementations of KVM_PPC_GET_SMMU_INFO ioctl
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (14 preceding siblings ...)
  2013-08-06  4:24 ` [PATCH 15/23] KVM: PPC: Book3S: Rename symbols that exist in both PR and HV KVM Paul Mackerras
@ 2013-08-06  4:24 ` Paul Mackerras
  2013-08-06  4:25 ` [PATCH 17/23] KVM: PPC: Book3S HV: Factorize kvmppc_core_vcpu_create_hv() Paul Mackerras
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:24 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This merges the PR and HV implementations of kvm_vm_ioctl_get_smmu_info()
into a single implementation in book3s.c.  Since userspace tends to
call this ioctl very early in the life of a VM, before (for instance)
enabling PAPR mode, we will need this to return results that are
compatible with both PR and HV guests, once we are able to compile both
PR and HV into one kernel image.  For HV guests, the capabilities and
encodings need to be consistent with what the real hardware we are
running on can do, whereas for PR guests, the MMU is completely
virtual and so the set of capabilities and encodings is arbitrary.

To achieve this, we report a set of segment and page sizes and
encodings that are consistent with what real POWER processors do.
If the guest could potentially use HV mode then we filter that set
to remove anything that is not implemented by the CPU that we are
running on.  The helper function, kvm_book3s_hv_possible(), that add
to trigger this filtering is currently just defined based on the
kernel configuration.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h |  4 +++
 arch/powerpc/kvm/book3s.c          | 53 ++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c       | 38 ---------------------------
 arch/powerpc/kvm/book3s_pr.c       | 30 ---------------------
 4 files changed, 57 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index b15554a..af7fe62 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -283,6 +283,8 @@ static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 
 extern void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu);
 
+static inline int kvm_book3s_hv_possible(void)	{ return 1; }
+
 #else
 static inline void __init kvm_cma_reserve(void)
 {}
@@ -302,6 +304,8 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	kvm_vcpu_kick(vcpu);
 }
+
+static inline int kvm_book3s_hv_possible(void)		{ return 0; }
 #endif
 
 #ifdef CONFIG_KVM_XICS
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 4b136be..06abd84 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -728,6 +728,59 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	return -ENOTTY;
 }
 
+#ifdef CONFIG_PPC64
+static void add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
+			      int linux_psize, int shift, int sllp, int lp)
+{
+	struct mmu_psize_def *def = &mmu_psize_defs[linux_psize];
+
+	if (kvm_book3s_hv_possible()) {
+		/* Check this matches what the hardware does */
+		if (shift != def->shift || sllp != def->sllp ||
+		    lp != def->penc[linux_psize])
+			return;
+	}
+
+	(*sps)->page_shift = shift;
+	(*sps)->slb_enc = sllp;
+	(*sps)->enc[0].page_shift = shift;
+	(*sps)->enc[0].pte_enc = lp;
+	(*sps)++;
+}
+
+int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm,
+			       struct kvm_ppc_smmu_info *info)
+{
+	struct kvm_ppc_one_seg_page_size *sps;
+
+	/*
+	 * At this stage we don't know whether this VM will be
+	 * HV or PR, so if it could be HV, restrict what we report
+	 * to what the hardware can do.
+	 */
+	if (kvm_book3s_hv_possible()) {
+		info->slb_size = mmu_slb_size;
+		info->flags = KVM_PPC_PAGE_SIZES_REAL;
+		if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
+			info->flags |= KVM_PPC_1T_SEGMENTS;
+	} else {
+		/* emulated SLB is always 64 entries */
+		info->slb_size = 64;
+		info->flags = KVM_PPC_1T_SEGMENTS;
+	}
+
+	/* No multi-page size segments (MPSS) support yet */
+	sps = &info->sps[0];
+	add_seg_page_size(&sps, MMU_PAGE_4K, 12, 0, 0);
+	add_seg_page_size(&sps, MMU_PAGE_64K, 16,
+			  SLB_VSID_L | SLB_VSID_LP_01, 1);
+	add_seg_page_size(&sps, MMU_PAGE_16M, 24,
+			  SLB_VSID_L | SLB_VSID_LP_00, 0);
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void kvmppc_core_free_memslot(struct kvm_memory_slot *free,
 			      struct kvm_memory_slot *dont)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fcf0564..13f79dd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1568,44 +1568,6 @@ long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *ret)
 	return fd;
 }
 
-static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
-				     int linux_psize)
-{
-	struct mmu_psize_def *def = &mmu_psize_defs[linux_psize];
-
-	if (!def->shift)
-		return;
-	(*sps)->page_shift = def->shift;
-	(*sps)->slb_enc = def->sllp;
-	(*sps)->enc[0].page_shift = def->shift;
-	/*
-	 * Only return base page encoding. We don't want to return
-	 * all the supporting pte_enc, because our H_ENTER doesn't
-	 * support MPSS yet. Once they do, we can start passing all
-	 * support pte_enc here
-	 */
-	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
-	(*sps)++;
-}
-
-int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
-{
-	struct kvm_ppc_one_seg_page_size *sps;
-
-	info->flags = KVM_PPC_PAGE_SIZES_REAL;
-	if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
-		info->flags |= KVM_PPC_1T_SEGMENTS;
-	info->slb_size = mmu_slb_size;
-
-	/* We only support these sizes for now, and no muti-size segments */
-	sps = &info->sps[0];
-	kvmppc_add_seg_page_size(&sps, MMU_PAGE_4K);
-	kvmppc_add_seg_page_size(&sps, MMU_PAGE_64K);
-	kvmppc_add_seg_page_size(&sps, MMU_PAGE_16M);
-
-	return 0;
-}
-
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ab3b032..f583e10 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1383,36 +1383,6 @@ out:
 	return r;
 }
 
-#ifdef CONFIG_PPC64
-int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
-{
-	info->flags = KVM_PPC_1T_SEGMENTS;
-
-	/* SLB is always 64 entries */
-	info->slb_size = 64;
-
-	/* Standard 4k base page size segment */
-	info->sps[0].page_shift = 12;
-	info->sps[0].slb_enc = 0;
-	info->sps[0].enc[0].page_shift = 12;
-	info->sps[0].enc[0].pte_enc = 0;
-
-	/* Standard 16M large page size segment */
-	info->sps[1].page_shift = 24;
-	info->sps[1].slb_enc = SLB_VSID_L;
-	info->sps[1].enc[0].page_shift = 24;
-	info->sps[1].enc[0].pte_enc = 0;
-
-	/* 64k large page size */
-	info->sps[2].page_shift = 16;
-	info->sps[2].slb_enc = SLB_VSID_L | SLB_VSID_LP_01;
-	info->sps[2].enc[0].page_shift = 16;
-	info->sps[2].enc[0].pte_enc = 1;
-
-	return 0;
-}
-#endif /* CONFIG_PPC64 */
-
 int kvmppc_core_init_vm_pr(struct kvm *kvm)
 {
 	mutex_init(&kvm->arch.hpt_mutex);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 17/23] KVM: PPC: Book3S HV: Factorize kvmppc_core_vcpu_create_hv()
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (15 preceding siblings ...)
  2013-08-06  4:24 ` [PATCH 16/23] KVM: PPC: Book3S: Merge implementations of KVM_PPC_GET_SMMU_INFO ioctl Paul Mackerras
@ 2013-08-06  4:25 ` Paul Mackerras
  2013-08-06  4:25 ` [PATCH 18/23] KVM: PPC: Book3S: Allow both PR and HV KVM to be selected Paul Mackerras
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:25 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This splits kvmppc_core_vcpu_create_hv() into three functions and
adds a new kvmppc_free_vcores() to free the kvmppc_vcore structures
that we allocate for a guest, which are currently being leaked.
The reason for the split is to make the split-out code available
for later use in converting PR kvm_vcpu structs to HV use.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 95 +++++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 13f79dd..c524d6b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -891,32 +891,51 @@ int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
-struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id)
+static int kvmppc_alloc_vcore(struct kvm_vcpu *vcpu, unsigned int id)
 {
-	struct kvm_vcpu *vcpu;
-	int err = -EINVAL;
-	int core;
+	struct kvm *kvm = vcpu->kvm;
 	struct kvmppc_vcore *vcore;
+	int core;
 
 	core = id / threads_per_core;
 	if (core >= KVM_MAX_VCORES)
-		goto out;
+		return -EINVAL;
 
-	err = -ENOMEM;
-	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
-	if (!vcpu)
-		goto out;
+	vcore = kvm->arch.vcores[core];
+	if (!vcore) {
+		vcore = kzalloc(sizeof(struct kvmppc_vcore), GFP_KERNEL);
+		if (!vcore)
+			return -ENOMEM;
+		INIT_LIST_HEAD(&vcore->runnable_threads);
+		spin_lock_init(&vcore->lock);
+		init_waitqueue_head(&vcore->wq);
+		vcore->preempt_tb = TB_NIL;
+		kvm->arch.vcores[core] = vcore;
+		kvm->arch.online_vcores++;
+	}
 
-	err = kvm_vcpu_init(vcpu, kvm, id);
-	if (err)
-		goto free_vcpu;
+	spin_lock(&vcore->lock);
+	++vcore->num_threads;
+	spin_unlock(&vcore->lock);
+	vcpu->arch.vcore = vcore;
+
+	return 0;
+}
 
+static void kvmppc_free_vcores(struct kvm *kvm)
+{
+	long int i;
+
+	for (i = 0; i < KVM_MAX_VCORES; ++i)
+		kfree(kvm->arch.vcores[i]);
+	kvm->arch.online_vcores = 0;
+}
+
+static void kvmppc_setup_hv_vcpu(struct kvm_vcpu *vcpu)
+{
 	vcpu->arch.shared = &vcpu->arch.shregs;
 	vcpu->arch.mmcr[0] = MMCR0_FC;
 	vcpu->arch.ctrl = CTRL_RUNLATCH;
-	/* default to host PVR, since we can't spoof it */
-	vcpu->arch.pvr = mfspr(SPRN_PVR);
-	kvmppc_set_pvr_hv(vcpu, vcpu->arch.pvr);
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
@@ -927,31 +946,34 @@ struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id)
 
 	init_waitqueue_head(&vcpu->arch.cpu_run);
 
-	mutex_lock(&kvm->lock);
-	vcore = kvm->arch.vcores[core];
-	if (!vcore) {
-		vcore = kzalloc(sizeof(struct kvmppc_vcore), GFP_KERNEL);
-		if (vcore) {
-			INIT_LIST_HEAD(&vcore->runnable_threads);
-			spin_lock_init(&vcore->lock);
-			init_waitqueue_head(&vcore->wq);
-			vcore->preempt_tb = TB_NIL;
-		}
-		kvm->arch.vcores[core] = vcore;
-		kvm->arch.online_vcores++;
-	}
-	mutex_unlock(&kvm->lock);
+	vcpu->arch.cpu_type = KVM_CPU_3S_64;
+	kvmppc_sanity_check(vcpu);
+}
 
-	if (!vcore)
+struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id)
+{
+	struct kvm_vcpu *vcpu;
+	int err = -EINVAL;
+
+	err = -ENOMEM;
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu)
+		goto out;
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
 		goto free_vcpu;
 
-	spin_lock(&vcore->lock);
-	++vcore->num_threads;
-	spin_unlock(&vcore->lock);
-	vcpu->arch.vcore = vcore;
+	/* default to host PVR, since we can't spoof it */
+	vcpu->arch.pvr = mfspr(SPRN_PVR);
 
-	vcpu->arch.cpu_type = KVM_CPU_3S_64;
-	kvmppc_sanity_check(vcpu);
+	mutex_lock(&kvm->lock);
+	err = kvmppc_alloc_vcore(vcpu, id);
+	mutex_unlock(&kvm->lock);
+	if (err)
+		goto free_vcpu;
+
+	kvmppc_setup_hv_vcpu(vcpu);
 
 	return vcpu;
 
@@ -1890,6 +1912,7 @@ void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
 	uninhibit_secondary_onlining();
 
+	kvmppc_free_vcores(kvm);
 	if (kvm->arch.rma) {
 		kvm_release_rma(kvm->arch.rma);
 		kvm->arch.rma = NULL;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 18/23] KVM: PPC: Book3S: Allow both PR and HV KVM to be selected
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (16 preceding siblings ...)
  2013-08-06  4:25 ` [PATCH 17/23] KVM: PPC: Book3S HV: Factorize kvmppc_core_vcpu_create_hv() Paul Mackerras
@ 2013-08-06  4:25 ` Paul Mackerras
  2013-08-06  4:26 ` [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest Paul Mackerras
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:25 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This makes the config options for PR and HV KVM independently selectable,
making it possible to compile a KVM module with both PR and HV code in
it.  This adds fields to struct kvm_arch and struct kvm_vcpu_arch to
indicate whether the guest is using PR or HV KVM, though at this stage
all guests in a given kernel instance are of the same type: HV KVM if
HV is enabled and the machine supports it (i.e. has suitable CPUs and
has a working hypervisor mode available), otherwise PR.

Since the code in book3s_64_vio_hv.c is called from real mode with HV
KVM, and therefore has to be built into the main kernel binary, this
makes it always built-in rather than part of the KVM module.  It gets
called from the KVM module by PR KVM, so this adds an EXPORT_SYMBOL_GPL().

If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.

Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously.  For capabilities that
are only provided by HV KVM (for example, KVM_PPC_ALLOCATE_HTAB), we
return the HV value only if HV KVM is possible on the current machine.
For capabilities provided by PR KVM but not HV, we return the PR
value unless only HV KVM has been configured.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h   | 67 +++++++++++++++++++++------------
 arch/powerpc/include/asm/kvm_host.h     |  6 +++
 arch/powerpc/include/asm/kvm_ppc.h      |  5 ++-
 arch/powerpc/kvm/Kconfig                | 15 +++++++-
 arch/powerpc/kvm/Makefile               | 11 +++---
 arch/powerpc/kvm/book3s.c               | 56 +++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c     |  1 +
 arch/powerpc/kvm/book3s_emulate.c       |  9 +++++
 arch/powerpc/kvm/book3s_exports.c       |  3 +-
 arch/powerpc/kvm/book3s_hv.c            |  3 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 ++
 arch/powerpc/kvm/book3s_segment.S       |  7 ++++
 arch/powerpc/kvm/book3s_xics.c          |  2 +-
 arch/powerpc/kvm/powerpc.c              | 57 ++++++++++++++++++----------
 14 files changed, 184 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 476d862..f6af43f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -272,6 +272,29 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 #include <asm/kvm_book3s_64.h>
 #endif
 
+#if defined(CONFIG_KVM_BOOK3S_PR) && defined(CONFIG_KVM_BOOK3S_64_HV)
+static inline int kvmppc_vcpu_pr(struct kvm_vcpu *vcpu)
+{
+	return !vcpu->arch.use_hv;
+}
+
+static inline int kvmppc_vcpu_hv(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.use_hv;
+}
+
+#else
+static inline int kvmppc_vcpu_pr(struct kvm_vcpu *vcpu)
+{
+	return IS_ENABLED(CONFIG_KVM_BOOK3S_PR);
+}
+
+static inline int kvmppc_vcpu_hv(struct kvm_vcpu *vcpu)
+{
+	return IS_ENABLED(CONFIG_KVM_BOOK3S_64_HV);
+}
+#endif
+
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
 	vcpu->arch.gpr[num] = val;
@@ -366,28 +389,38 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
-#ifdef CONFIG_KVM_BOOK3S_PR
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
 
 static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
 {
-	return to_book3s(vcpu)->hior;
+	if (kvmppc_vcpu_pr(vcpu))
+		return to_book3s(vcpu)->hior;
+	return 0;
 }
 
 static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
 			unsigned long pending_now, unsigned long old_pending)
 {
-	if (pending_now)
-		vcpu->arch.shared->int_pending = 1;
-	else if (old_pending)
-		vcpu->arch.shared->int_pending = 0;
+	if (kvmppc_vcpu_pr(vcpu)) {
+		if (pending_now)
+			vcpu->arch.shared->int_pending = 1;
+		else if (old_pending)
+			vcpu->arch.shared->int_pending = 0;
+	}
 }
 
 static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
 {
-	ulong crit_raw = vcpu->arch.shared->critical;
-	ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
+	ulong crit_raw;
+	ulong crit_r1;
 	bool crit;
 
+	if (!kvmppc_vcpu_pr(vcpu))
+		return false;
+
+	crit_raw = vcpu->arch.shared->critical;
+	crit_r1 = kvmppc_get_gpr(vcpu, 1);
+
 	/* Truncate crit indicators in 32 bit mode */
 	if (!(vcpu->arch.shared->msr & MSR_SF)) {
 		crit_raw &= 0xffffffff;
@@ -401,23 +434,7 @@ static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
 
 	return crit;
 }
-#else /* CONFIG_KVM_BOOK3S_PR */
-
-static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
-{
-	return 0;
-}
-
-static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
-			unsigned long pending_now, unsigned long old_pending)
-{
-}
-
-static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
-{
-	return false;
-}
-#endif
+#endif /* CONFIG_KVM_BOOK3S_HANDLER */
 
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c012db2..647e064 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -237,6 +237,7 @@ struct kvm_arch_memory_slot {
 
 struct kvm_arch {
 	unsigned int lpid;
+	int kvm_mode;
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	unsigned long hpt_virt;
 	struct revmap_entry *revmap;
@@ -278,6 +279,10 @@ struct kvm_arch {
 #endif
 };
 
+/* Values for kvm_mode */
+#define KVM_MODE_PR		1
+#define KVM_MODE_HV		2
+
 /*
  * Struct for a virtual core.
  * Note: entry_exit_count combines an entry count in the bottom 8 bits
@@ -409,6 +414,7 @@ struct kvm_vcpu_arch {
 	ulong host_stack;
 	u32 host_pid;
 #ifdef CONFIG_PPC_BOOK3S
+	bool use_hv;
 	struct kvmppc_slb slb[64];
 	int slb_max;		/* 1 + index of last valid entry in slb[] */
 	int slb_nr;		/* total number of entries in SLB */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index af7fe62..7408e2b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -283,7 +283,8 @@ static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 
 extern void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu);
 
-static inline int kvm_book3s_hv_possible(void)	{ return 1; }
+extern int kvm_book3s_hv_possible(void);
+extern int kvm_is_book3s_hv(struct kvm *kvm);
 
 #else
 static inline void __init kvm_cma_reserve(void)
@@ -306,6 +307,8 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 }
 
 static inline int kvm_book3s_hv_possible(void)		{ return 0; }
+static inline int kvm_is_book3s_hv(struct kvm *kvm)	{ return 0; }
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index ffaef2c..2eeb2ae 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -89,9 +89,20 @@ config KVM_BOOK3S_64_HV
 	  If unsure, say N.
 
 config KVM_BOOK3S_64_PR
-	def_bool y
-	depends on KVM_BOOK3S_64 && !KVM_BOOK3S_64_HV
+	bool "KVM support without using hypervisor mode in host"
+	depends on KVM_BOOK3S_64
 	select KVM_BOOK3S_PR
+	---help---
+	  Support running guest kernels in virtual machines on processors
+	  without using hypervisor mode in the host, by running the
+	  guest in user mode (problem state) and emulating all
+	  privileged instructions and registers.
+
+	  This is not as fast as using hypervisor mode, but works on
+	  machines where hypervisor mode is not available or not usable,
+	  and can emulate processors that are different from the host
+	  processor, including emulating 32-bit processors on a 64-bit
+	  host.
 
 config KVM_BOOKE_HV
 	bool
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 6646c95..5d63d7f 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -53,32 +53,33 @@ kvm-e500mc-objs := \
 	e500_emulate.o
 kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
 
+kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) := \
+	book3s_64_vio_hv.o
+
 kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
 	$(KVM)/coalesced_mmio.o \
 	fpu.o \
 	book3s_paired_singles.o \
 	book3s_pr.o \
 	book3s_pr_papr.o \
-	book3s_64_vio_hv.o \
 	book3s_emulate.o \
 	book3s_interrupts.o \
 	book3s_mmu_hpte.o \
 	book3s_64_mmu_host.o \
 	book3s_64_mmu.o \
 	book3s_32_mmu.o
-kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
+kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_PR) += \
 	book3s_rmhandlers.o
 
-kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
+kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) += \
 	book3s_hv.o \
 	book3s_hv_interrupts.o \
 	book3s_64_mmu_hv.o
 kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
 	book3s_hv_rm_xics.o
-kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
+kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) += \
 	book3s_hv_rmhandlers.o \
 	book3s_hv_rm_mmu.o \
-	book3s_64_vio_hv.o \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_cma.o \
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 06abd84..f22b3af 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -61,19 +61,54 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+static int hv_ok;
+
+int kvm_book3s_hv_possible(void)
+{
+	return hv_ok;
+}
+
+int kvm_is_book3s_hv(struct kvm *kvm)
+{
+	return kvm->arch.kvm_mode == KVM_MODE_HV;
+}
+#endif
+
 #ifdef CONFIG_KVM_BOOK3S_PR
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/* Do x if the VM mode is PR */
+#define DO_IF_PR(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_PR) { x; }
+/* Do x if the VM mode is HV */
+#define DO_IF_HV(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_HV) { x; }
+
+/* Do x for PR vcpus */
+#define VCPU_DO_PR(vcpu, x)	if (!(vcpu)->arch.use_hv) { x; }
+/* Do x for HV vcpus */
+#define VCPU_DO_HV(vcpu, x)	if ((vcpu)->arch.use_hv) { x; }
+
+#else
 #define DO_IF_PR(kvm, x)	x
 #define DO_IF_HV(kvm, x)	
 #define VCPU_DO_PR(vcpu, x)	x
 #define VCPU_DO_HV(vcpu, x)
 #endif
 
+#else
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 #define DO_IF_PR(kvm, x)
 #define DO_IF_HV(kvm, x)	x
 #define VCPU_DO_PR(vcpu, x)
 #define VCPU_DO_HV(vcpu, x)	x
+
+#else
+#define DO_IF_PR(kvm, x)
+#define DO_IF_HV(kvm, x)
+#define VCPU_DO_PR(vcpu, x)
+#define VCPU_DO_HV(vcpu, x)
 #endif
+#endif /* CONFIG_KVM_BOOK3S_PR */
+
 
 void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
 {
@@ -867,11 +902,16 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	if (hv_ok) {
+		err = kvmppc_core_init_vm_hv(kvm);
+		kvm->arch.kvm_mode = KVM_MODE_HV;
+		return err;
+	}
+#endif
 #ifdef CONFIG_KVM_BOOK3S_PR
 	err = kvmppc_core_init_vm_pr(kvm);
-#endif
-#ifdef CONFIG_KVM_BOOK3S_64_HV
-	err = kvmppc_core_init_vm_hv(kvm);
+	kvm->arch.kvm_mode = KVM_MODE_PR;
 #endif
 
 	return err;
@@ -890,7 +930,7 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 
 int kvmppc_core_check_processor_compat(void)
 {
-#if defined(CONFIG_KVM_BOOK3S_64_HV)
+#if defined(CONFIG_KVM_BOOK3S_64_HV) && !defined(CONFIG_KVM_BOOK3S_PR)
 	if (!cpu_has_feature(CPU_FTR_HVMODE))
 		return -EIO;
 #endif
@@ -905,11 +945,13 @@ static int kvmppc_book3s_init(void)
 	if (r)
 		return r;
 
-#ifdef CONFIG_KVM_BOOK3S_PR
-	r = kvmppc_mmu_hpte_sysinit();
-#endif
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	r = kvmppc_mmu_hv_init();
+	if (!r)
+		hv_ok = 1;
+#endif
+#ifdef CONFIG_KVM_BOOK3S_PR
+	r = kvmppc_mmu_hpte_sysinit();
 #endif
 
 	return r;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 30c2f3b..2c25f54 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -74,3 +74,4 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	/* Didn't find the liobn, punt it to userspace */
 	return H_TOO_HARD;
 }
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
index 34044b1..fe958e1 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -95,6 +95,9 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	int ra = get_ra(inst);
 	int rb = get_rb(inst);
 
+	if (kvmppc_vcpu_hv(vcpu))
+		return EMULATE_FAIL;
+
 	switch (get_op(inst)) {
 	case 19:
 		switch (get_xop(inst)) {
@@ -349,6 +352,9 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
 {
 	int emulated = EMULATE_DONE;
 
+	if (kvmppc_vcpu_hv(vcpu))
+		return EMULATE_FAIL;
+
 	switch (sprn) {
 	case SPRN_SDR1:
 		if (!spr_allowed(vcpu, PRIV_HYPER))
@@ -472,6 +478,9 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
 {
 	int emulated = EMULATE_DONE;
 
+	if (kvmppc_vcpu_hv(vcpu))
+		return EMULATE_FAIL;
+
 	switch (sprn) {
 	case SPRN_IBAT0U ... SPRN_IBAT3L:
 	case SPRN_IBAT4U ... SPRN_IBAT7L:
diff --git a/arch/powerpc/kvm/book3s_exports.c b/arch/powerpc/kvm/book3s_exports.c
index 7057a02..0730d98 100644
--- a/arch/powerpc/kvm/book3s_exports.c
+++ b/arch/powerpc/kvm/book3s_exports.c
@@ -22,7 +22,8 @@
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 EXPORT_SYMBOL_GPL(kvmppc_hv_entry_trampoline);
-#else
+#endif
+#ifdef CONFIG_KVM_BOOK3S_PR
 EXPORT_SYMBOL_GPL(kvmppc_entry_trampoline);
 EXPORT_SYMBOL_GPL(kvmppc_load_up_fpu);
 #ifdef CONFIG_ALTIVEC
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c524d6b..956318b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -959,6 +959,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id)
 	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
 	if (!vcpu)
 		goto out;
+	vcpu->arch.use_hv = true;
 
 	err = kvm_vcpu_init(vcpu, kvm, id);
 	if (err)
@@ -1921,6 +1922,7 @@ void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 	kvmppc_free_hpt(kvm);
 }
 
+#ifndef CONFIG_KVM_BOOK3S_PR
 /* We don't need to emulate any privileged instructions or dcbz */
 int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
                            unsigned int inst, int *advance)
@@ -1937,3 +1939,4 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
 {
 	return EMULATE_FAIL;
 }
+#endif
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index af9ba85..467b24d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -569,6 +569,10 @@ kvmppc_interrupt:
 	lbz	r9, HSTATE_IN_GUEST(r13)
 	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
 	beq	kvmppc_bad_host_intr
+#ifdef CONFIG_KVM_BOOK3S_PR
+	cmpwi	r9, KVM_GUEST_MODE_GUEST
+	beq	kvmppc_interrupt_pr
+#endif
 	/* We're now back in the host but in guest MMU context */
 	li	r9, KVM_GUEST_MODE_HOST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 1abe478..13cfc59 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -161,8 +161,15 @@ kvmppc_handler_trampoline_enter_end:
 .global kvmppc_handler_trampoline_exit
 kvmppc_handler_trampoline_exit:
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+.global kvmppc_interrupt_pr
+kvmppc_interrupt_pr:
+	ld	r9, HSTATE_SCRATCH2(r13)
+
+#else
 .global kvmppc_interrupt
 kvmppc_interrupt:
+#endif
 
 	/* Register usage at this point:
 	 *
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a3a5cb8..d28d2f7 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -818,7 +818,7 @@ int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 	}
 
 	/* Check for real mode returning too hard */
-	if (xics->real_mode)
+	if (xics->real_mode && kvmppc_vcpu_hv(vcpu))
 		return kvmppc_xics_rm_complete(vcpu, req);
 
 	switch (req) {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4e05f8c..49bbc9e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -50,7 +50,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-#ifndef CONFIG_KVM_BOOK3S_64_HV
 /*
  * Common checks before entering the guest world.  Call with interrupts
  * disabled.
@@ -125,7 +124,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 
 	return r;
 }
-#endif /* CONFIG_KVM_BOOK3S_64_HV */
 
 int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 {
@@ -193,8 +191,8 @@ int kvmppc_sanity_check(struct kvm_vcpu *vcpu)
 		goto out;
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
-	/* HV KVM can only do PAPR mode for now */
-	if (!vcpu->arch.papr_enabled)
+	/* HV KVM can only do PAPR mode */
+	if (!vcpu->arch.papr_enabled && kvmppc_vcpu_hv(vcpu))
 		goto out;
 #endif
 
@@ -298,6 +296,12 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+#if defined(CONFIG_KVM_BOOK3S_64_HV) && !defined(CONFIG_KVM_BOOK3S_PR)
+#define KVM_IS_BOOK3S_HV_ONLY	1
+#else
+#define KVM_IS_BOOK3S_HV_ONLY	0
+#endif
+
 int kvm_dev_ioctl_check_extension(long ext)
 {
 	int r;
@@ -320,22 +324,24 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_DEVICE_CTRL:
 		r = 1;
 		break;
-#ifndef CONFIG_KVM_BOOK3S_64_HV
 	case KVM_CAP_PPC_PAIRED_SINGLES:
 	case KVM_CAP_PPC_OSI:
 	case KVM_CAP_PPC_GET_PVINFO:
 #if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
 	case KVM_CAP_SW_TLB:
 #endif
-#ifdef CONFIG_KVM_MPIC
-	case KVM_CAP_IRQ_MPIC:
-#endif
-		r = 1;
+		r = !KVM_IS_BOOK3S_HV_ONLY;
 		break;
+#ifdef CONFIG_KVM_MMIO
 	case KVM_CAP_COALESCED_MMIO:
 		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
 		break;
 #endif
+#ifdef CONFIG_KVM_MPIC
+	case KVM_CAP_IRQ_MPIC:
+		r = 1;
+		break;
+#endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_CAP_SPAPR_TCE:
 	case KVM_CAP_PPC_ALLOC_HTAB:
@@ -348,30 +354,32 @@ int kvm_dev_ioctl_check_extension(long ext)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	case KVM_CAP_PPC_SMT:
-		r = threads_per_core;
+		r = 0;
+		if (kvm_book3s_hv_possible())
+			r = threads_per_core;
 		break;
 	case KVM_CAP_PPC_RMA:
-		r = 1;
-		/* PPC970 requires an RMA */
-		if (cpu_has_feature(CPU_FTR_ARCH_201))
+		r = kvm_book3s_hv_possible();
+		/* PPC970 requires an RMA for HV KVM */
+		if (r && cpu_has_feature(CPU_FTR_ARCH_201))
 			r = 2;
 		break;
 #endif
 	case KVM_CAP_SYNC_MMU:
 #ifdef CONFIG_KVM_BOOK3S_64_HV
-		r = cpu_has_feature(CPU_FTR_ARCH_206) ? 1 : 0;
+		r = !KVM_IS_BOOK3S_HV_ONLY ||
+			cpu_has_feature(CPU_FTR_ARCH_206);
 #elif defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 		r = 1;
 #else
 		r = 0;
-		break;
 #endif
+		break;
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	case KVM_CAP_PPC_HTAB_FD:
-		r = 1;
+		r = kvm_book3s_hv_possible();
 		break;
 #endif
-		break;
 	case KVM_CAP_NR_VCPUS:
 		/*
 		 * Recommending a number of CPUs is somewhat arbitrary; we
@@ -379,10 +387,10 @@ int kvm_dev_ioctl_check_extension(long ext)
 		 * will have secondary threads "offline"), and for other KVM
 		 * implementations just count online CPUs.
 		 */
-#ifdef CONFIG_KVM_BOOK3S_64_HV
-		r = num_present_cpus();
-#else
 		r = num_online_cpus();
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+		if (kvm_book3s_hv_possible())
+			r = num_present_cpus();
 #endif
 		break;
 	case KVM_CAP_MAX_VCPUS:
@@ -1027,6 +1035,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		struct kvm_allocate_rma rma;
 		struct kvm *kvm = filp->private_data;
 
+		r = -ENOTTY;
+		if (!kvm_is_book3s_hv(kvm))
+			break;
 		r = kvm_vm_ioctl_allocate_rma(kvm, &rma);
 		if (r >= 0 && copy_to_user(argp, &rma, sizeof(rma)))
 			r = -EFAULT;
@@ -1036,6 +1047,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_PPC_ALLOCATE_HTAB: {
 		u32 htab_order;
 
+		r = -ENOTTY;
+		if (!kvm_is_book3s_hv(kvm))
+			break;
 		r = -EFAULT;
 		if (get_user(htab_order, (u32 __user *)argp))
 			break;
@@ -1052,6 +1066,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_PPC_GET_HTAB_FD: {
 		struct kvm_get_htab_fd ghf;
 
+		r = -ENOTTY;
+		if (!kvm_is_book3s_hv(kvm))
+			break;
 		r = -EFAULT;
 		if (copy_from_user(&ghf, argp, sizeof(ghf)))
 			break;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (17 preceding siblings ...)
  2013-08-06  4:25 ` [PATCH 18/23] KVM: PPC: Book3S: Allow both PR and HV KVM to be selected Paul Mackerras
@ 2013-08-06  4:26 ` Paul Mackerras
  2013-09-12 22:56   ` Alexander Graf
  2013-08-06  4:27 ` [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages Paul Mackerras
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:26 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

This makes it possible to have both PR and HV guests running
concurrently on one machine, by deferring the decision about which type
of KVM to use for each guest until it either enables the PAPR capability
or runs a vcpu.  (Of course, this is only possible if both
CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)

Guests start out essentially as PR guests but with kvm->arch.kvm_mode
set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
capability, and the machine is capable of running HV guests (i.e. it
has suitable CPUs and has a usable hypervisor mode available), the
guest gets converted to an HV guest at that point.  If userspace runs
a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
guest is confirmed as a PR guest at that point.

This also moves the preloading of the FPU for PR guests from
kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
kvmppc_set_msr_pr() can be called before any vcpu has been run, and
it may be that the guest will end up as a HV guest, and in this case
the preloading is not appropriate.  Instead it is now done after we
have emulated a privileged or illegal instruction, if the guest MSR
now has FP set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 ++
 arch/powerpc/include/asm/kvm_booke.h  |  2 +
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s.c             | 76 ++++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv.c          | 77 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_pr.c          | 24 +++++++++--
 arch/powerpc/kvm/powerpc.c            |  1 +
 7 files changed, 167 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index f6af43f..e0bc83b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -189,6 +189,9 @@ extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
 
+extern void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu);
+extern int kvmppc_convert_to_hv(struct kvm *kvm);
+extern void kvmppc_release_vcpu_pr(struct kvm_vcpu *vcpu);
 /* Functions that have implementations in both PR and HV KVM */
 extern struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
 						   unsigned int id);
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index d3c1eb3..450bd71 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -23,6 +23,8 @@
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 
+static inline void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu) {}
+
 /* LPIDs we support with this build -- runtime limit may be lower */
 #define KVMPPC_NR_LPIDS                        64
 
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 647e064..138e781 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -280,6 +280,7 @@ struct kvm_arch {
 };
 
 /* Values for kvm_mode */
+#define KVM_MODE_UNKNOWN	0
 #define KVM_MODE_PR		1
 #define KVM_MODE_HV		2
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f22b3af..bddbfaa 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -77,9 +77,11 @@ int kvm_is_book3s_hv(struct kvm *kvm)
 
 #ifdef CONFIG_KVM_BOOK3S_PR
 #ifdef CONFIG_KVM_BOOK3S_64_HV
-/* Do x if the VM mode is PR */
+/* Do x if the VM mode is known to be PR */
 #define DO_IF_PR(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_PR) { x; }
-/* Do x if the VM mode is HV */
+/* Do x if the VM mode is unknown or is known to be PR */
+#define DO_IF_PR_U(kvm, x)	if ((kvm)->arch.kvm_mode != KVM_MODE_HV) { x; }
+/* Do x if the VM mode is known to be HV */
 #define DO_IF_HV(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_HV) { x; }
 
 /* Do x for PR vcpus */
@@ -89,6 +91,7 @@ int kvm_is_book3s_hv(struct kvm *kvm)
 
 #else
 #define DO_IF_PR(kvm, x)	x
+#define DO_IF_PR_U(kvm, x)	x
 #define DO_IF_HV(kvm, x)	
 #define VCPU_DO_PR(vcpu, x)	x
 #define VCPU_DO_HV(vcpu, x)
@@ -97,12 +100,14 @@ int kvm_is_book3s_hv(struct kvm *kvm)
 #else
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 #define DO_IF_PR(kvm, x)
+#define DO_IF_PR_U(kvm, x)
 #define DO_IF_HV(kvm, x)	x
 #define VCPU_DO_PR(vcpu, x)
 #define VCPU_DO_HV(vcpu, x)	x
 
 #else
 #define DO_IF_PR(kvm, x)
+#define DO_IF_PR_U(kvm, x)
 #define DO_IF_HV(kvm, x)
 #define VCPU_DO_PR(vcpu, x)
 #define VCPU_DO_HV(vcpu, x)
@@ -712,11 +717,47 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
+	struct kvm *kvm = vcpu->kvm;
+
+	/*
+	 * If HV mode hasn't been selected by now, make it PR mode
+	 * from now on.
+	 */
+	if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN) {
+		mutex_lock(&kvm->lock);
+		if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN)
+			kvm->arch.kvm_mode = KVM_MODE_PR;
+		mutex_unlock(&kvm->lock);
+	}
+
 	VCPU_DO_PR(vcpu, return kvmppc_vcpu_run_pr(kvm_run, vcpu));
 	VCPU_DO_HV(vcpu, return kvmppc_vcpu_run_hv(kvm_run, vcpu));
 	return -EINVAL;
 }
 
+/*
+ * If we can do either PR or HV, switch to HV if possible.
+ */
+void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu)
+{
+#if defined(CONFIG_KVM_BOOK3S_PR) && defined(CONFIG_KVM_BOOK3S_64_HV)
+	struct kvm *kvm = vcpu->kvm;
+	int err;
+
+	mutex_lock(&kvm->lock);
+	if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN) {
+		if (kvm_book3s_hv_possible()) {
+			/* should check PVRs */
+			err = kvmppc_convert_to_hv(kvm);
+			if (!err)
+				pr_debug("KVM: Using HV mode for PAPR guest\n");
+		} else
+			kvm->arch.kvm_mode = KVM_MODE_PR;
+	}
+	mutex_unlock(&kvm->lock);
+#endif
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
                                   struct kvm_translation *tr)
 {
@@ -739,7 +780,7 @@ void kvmppc_decrementer_func(unsigned long data)
 
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
-	DO_IF_PR(kvm, return kvmppc_core_vcpu_create_pr(kvm, id));
+	DO_IF_PR_U(kvm, return kvmppc_core_vcpu_create_pr(kvm, id));
 	DO_IF_HV(kvm, return kvmppc_core_vcpu_create_hv(kvm, id));
 	return NULL;
 }
@@ -758,7 +799,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
-	DO_IF_PR(kvm, return kvm_vm_ioctl_get_dirty_log_pr(kvm, log));
+	DO_IF_PR_U(kvm, return kvm_vm_ioctl_get_dirty_log_pr(kvm, log));
 	DO_IF_HV(kvm, return kvm_vm_ioctl_get_dirty_log_hv(kvm, log));
 	return -ENOTTY;
 }
@@ -902,24 +943,33 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_64_HV
-	if (hv_ok) {
-		err = kvmppc_core_init_vm_hv(kvm);
-		kvm->arch.kvm_mode = KVM_MODE_HV;
-		return err;
-	}
-#endif
+	/*
+	 * If both PR and HV are enabled, then new VMs start out as
+	 * PR and get converted to HV when userspace enables the
+	 * KVM_CAP_PPC_PAPR capability, assuming the system supports
+	 * HV-mode KVM (i.e. has suitable CPUs and has hypervisor
+	 * mode available).
+	 */
 #ifdef CONFIG_KVM_BOOK3S_PR
 	err = kvmppc_core_init_vm_pr(kvm);
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	kvm->arch.kvm_mode = KVM_MODE_UNKNOWN;
+#else
 	kvm->arch.kvm_mode = KVM_MODE_PR;
 #endif
-
+#else
+#if defined(CONFIG_KVM_BOOK3S_64_HV)
+	err = kvmppc_core_init_vm_hv(kvm);
+	kvm->arch.kvm_mode = KVM_MODE_HV;
+#endif
+#endif
 	return err;
+
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
-	DO_IF_PR(kvm, kvmppc_core_destroy_vm_pr(kvm));
+	DO_IF_PR_U(kvm, kvmppc_core_destroy_vm_pr(kvm));
 	DO_IF_HV(kvm, kvmppc_core_destroy_vm_hv(kvm));
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 956318b..fee28e5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -984,6 +984,83 @@ out:
 	return ERR_PTR(err);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_PR
+/*
+ * Adapt to different storage conventions for registers between PR and HV.
+ */
+static void kvmppc_convert_regs(struct kvm_vcpu *vcpu)
+{
+	int i, j;
+
+	/* Pack SLB entries down to low indexes and add index field */
+	j = 0;
+	for (i = 0; i < vcpu->arch.slb_nr; ++i) {
+		if (vcpu->arch.slb[i].valid) {
+			vcpu->arch.slb[j].orige = vcpu->arch.slb[i].orige | i;
+			vcpu->arch.slb[j].origv = vcpu->arch.slb[i].origv;
+			++j;
+		}
+	}
+
+	memcpy(&vcpu->arch.shregs, vcpu->arch.shared,
+	       sizeof(vcpu->arch.shregs));
+}
+
+/* Caller must hold kvm->lock */
+int kvmppc_convert_to_hv(struct kvm *kvm)
+{
+	int err;
+	long int i;
+	struct kvm_memory_slot *memslot;
+	struct kvm_memslots *slots;
+	struct kvm_vcpu *vcpu;
+
+	/* First do all the necessary memory allocations */
+	err = kvmppc_core_init_vm_hv(kvm);
+	if (err)
+		goto out;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		err = kvmppc_alloc_vcore(vcpu, vcpu->vcpu_id);
+		if (err)
+			goto free_vm;
+	}
+
+	mutex_lock(&kvm->slots_lock);
+	slots = kvm->memslots;
+	err = 0;
+	kvm_for_each_memslot(memslot, slots) {
+		err = kvmppc_core_prepare_memory_region_hv(kvm, memslot, NULL);
+		if (err)
+			goto free_slots_locked;
+	}
+	mutex_unlock(&kvm->slots_lock);
+
+	/*
+	 * Now that memory is allocated, switch the VM over to HV mode.
+	 */
+	kvm->arch.kvm_mode = KVM_MODE_HV;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvmppc_convert_regs(vcpu);
+		kvmppc_release_vcpu_pr(vcpu);
+		vcpu->arch.use_hv = true;
+		kvmppc_setup_hv_vcpu(vcpu);
+	}
+
+	return 0;
+
+ free_slots_locked:
+	kvm_for_each_memslot(memslot, slots)
+		kvmppc_core_free_memslot_hv(memslot, NULL);
+	mutex_unlock(&kvm->slots_lock);
+ free_vm:
+	kvmppc_core_destroy_vm_hv(kvm);
+ out:
+	return err;
+}
+#endif /* CONFIG_KVM_BOOK3S_PR */
+
 static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
 {
 	if (vpa->pinned_addr)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index f583e10..f35425e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -264,10 +264,6 @@ void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 		kvmppc_mmu_pte_flush(vcpu, (uint32_t)vcpu->arch.magic_page_pa,
 				     ~0xFFFUL);
 	}
-
-	/* Preload FPU if it's enabled */
-	if (vcpu->arch.shared->msr & MSR_FP)
-		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 }
 
 void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
@@ -840,6 +836,11 @@ program_interrupt:
 		switch (er) {
 		case EMULATE_DONE:
 			r = RESUME_GUEST_NV;
+			/* Preload FPU if it's enabled */
+			if (vcpu->arch.shared->msr & MSR_FP &
+			    ~vcpu->arch.guest_owned_ext)
+				kvmppc_handle_ext(vcpu,
+					BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 			break;
 		case EMULATE_AGAIN:
 			r = RESUME_GUEST;
@@ -1184,6 +1185,21 @@ out:
 	return ERR_PTR(err);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/*
+ * Release PR-specific resources allocated for this vcpu.
+ */
+void kvmppc_release_vcpu_pr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_destroy_pr(vcpu);
+	free_page((unsigned long)vcpu->arch.shared & PAGE_MASK);
+	kfree(vcpu->arch.shadow_vcpu);
+	vcpu->arch.shadow_vcpu = NULL;
+	vfree(vcpu->arch.book3s);
+	vcpu->arch.book3s = NULL;
+}
+#endif /* CONFIG_KVM_BOOK3S_64_HV */
+
 void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 49bbc9e..a8caeec 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -800,6 +800,7 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	case KVM_CAP_PPC_PAPR:
 		r = 0;
 		vcpu->arch.papr_enabled = true;
+		kvmppc_core_enable_papr(vcpu);
 		break;
 	case KVM_CAP_PPC_EPR:
 		r = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (18 preceding siblings ...)
  2013-08-06  4:26 ` [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest Paul Mackerras
@ 2013-08-06  4:27 ` Paul Mackerras
  2013-09-12 23:01   ` Alexander Graf
  2013-08-06  4:27 ` [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Paul Mackerras
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:27 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Currently we request write access to all pages that get mapped into the
guest, even if the guest is only loading from the page.  This reduces
the effectiveness of KSM because it means that we unshare every page we
access.  Also, we always set the changed (C) bit in the guest HPTE if
it allows writing, even for a guest load.

This fixes both these problems.  We pass an 'iswrite' flag to the
mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
the access is a load or a store.  The mmu.xlate() functions now only
set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
instead of gfn_to_pfn() so that it can indicate whether we need write
access to the page, and get back a 'writable' flag to indicate whether
the page is writable or not.  If that 'writable' flag is clear, we then
make the host HPTE read-only even if the guest HPTE allowed writing.

This means that we can get a protection fault when the guest writes to a
page that it has mapped read-write but which is read-only on the host
side (perhaps due to KSM having merged the page).  Thus we now call
kvmppc_handle_pagefault() for protection faults as well as HPTE not found
faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
guest HPTE and we thus need to install a new host HPTE, we then need to
remove the old host HPTE if there is one.  This is done with a new
function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
find and remove the old host HPTE.

Since the memslot-related functions require the KVM SRCU read lock to
be held, this adds srcu_read_lock/unlock pairs around the calls to
kvmppc_handle_pagefault().

Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
guest HPTEs that don't permit access, and to return -EPERM for accesses
that are not permitted by the page protections.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |  7 +++++--
 arch/powerpc/include/asm/kvm_host.h   |  3 ++-
 arch/powerpc/kvm/book3s.c             | 15 +++++++++------
 arch/powerpc/kvm/book3s_32_mmu.c      | 32 +++++++++++++++++---------------
 arch/powerpc/kvm/book3s_32_mmu_host.c | 14 +++++++++++---
 arch/powerpc/kvm/book3s_64_mmu.c      |  9 +++++----
 arch/powerpc/kvm/book3s_64_mmu_host.c | 20 +++++++++++++++++---
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  2 +-
 arch/powerpc/kvm/book3s_pr.c          | 29 ++++++++++++++++++++++++-----
 9 files changed, 91 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index e0bc83b..4fe6864 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -129,7 +129,9 @@ extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
 extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu);
-extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool iswrite);
+extern void kvmppc_mmu_unmap_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
 extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
 extern void kvmppc_mmu_flush_segment(struct kvm_vcpu *vcpu, ulong eaddr, ulong seg_size);
 extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
@@ -158,7 +160,8 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
 			   bool upper, u32 val);
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu);
-extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing,
+			bool *writable);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
 			unsigned long *rmap, long pte_index, int realmode);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 138e781..52c7b80 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -356,7 +356,8 @@ struct kvmppc_mmu {
 	/* book3s */
 	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
 	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
-	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr,
+		      struct kvmppc_pte *pte, bool data, bool iswrite);
 	void (*reset_msr)(struct kvm_vcpu *vcpu);
 	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
 	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index bddbfaa..f0896b4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -340,7 +340,8 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing,
+			bool *writable)
 {
 	ulong mp_pa = vcpu->arch.magic_page_pa;
 
@@ -356,20 +357,22 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 
 		pfn = (pfn_t)virt_to_phys((void*)shared_page) >> PAGE_SHIFT;
 		get_page(pfn_to_page(pfn));
+		if (writable)
+			*writable = true;
 		return pfn;
 	}
 
-	return gfn_to_pfn(vcpu->kvm, gfn);
+	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
 }
 
 static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
-			 struct kvmppc_pte *pte)
+			bool iswrite, struct kvmppc_pte *pte)
 {
 	int relocated = (vcpu->arch.shared->msr & (data ? MSR_DR : MSR_IR));
 	int r;
 
 	if (relocated) {
-		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data, iswrite);
 	} else {
 		pte->eaddr = eaddr;
 		pte->raddr = eaddr & KVM_PAM;
@@ -415,7 +418,7 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	vcpu->stat.st++;
 
-	if (kvmppc_xlate(vcpu, *eaddr, data, &pte))
+	if (kvmppc_xlate(vcpu, *eaddr, data, true, &pte))
 		return -ENOENT;
 
 	*eaddr = pte.raddr;
@@ -437,7 +440,7 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	vcpu->stat.ld++;
 
-	if (kvmppc_xlate(vcpu, *eaddr, data, &pte))
+	if (kvmppc_xlate(vcpu, *eaddr, data, false, &pte))
 		goto nopte;
 
 	*eaddr = pte.raddr;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index b14af6d..76a64ce 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -84,7 +84,8 @@ static inline bool sr_nx(u32 sr_raw)
 }
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
-					  struct kvmppc_pte *pte, bool data);
+					  struct kvmppc_pte *pte, bool data,
+					  bool iswrite);
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 					     u64 *vsid);
 
@@ -99,7 +100,7 @@ static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
 	u64 vsid;
 	struct kvmppc_pte pte;
 
-	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data, false))
 		return pte.vpage;
 
 	kvmppc_mmu_book3s_32_esid_to_vsid(vcpu, eaddr >> SID_SHIFT, &vsid);
@@ -146,7 +147,8 @@ static u32 kvmppc_mmu_book3s_32_get_ptem(u32 sre, gva_t eaddr, bool primary)
 }
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
-					  struct kvmppc_pte *pte, bool data)
+					  struct kvmppc_pte *pte, bool data,
+					  bool iswrite)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 	struct kvmppc_bat *bat;
@@ -187,8 +189,7 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
 				printk(KERN_INFO "BAT is not readable!\n");
 				continue;
 			}
-			if (!pte->may_write) {
-				/* let's treat r/o BATs as not-readable for now */
+			if (iswrite && !pte->may_write) {
 				dprintk_pte("BAT is read-only!\n");
 				continue;
 			}
@@ -202,7 +203,7 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 				     struct kvmppc_pte *pte, bool data,
-				     bool primary)
+				     bool iswrite, bool primary)
 {
 	u32 sre;
 	hva_t ptegp;
@@ -258,9 +259,6 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 					break;
 			}
 
-			if ( !pte->may_read )
-				continue;
-
 			dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
 				    pteg[i], pteg[i+1], pp);
 			found = 1;
@@ -282,11 +280,12 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
 			pte_r |= PTEG_FLAG_ACCESSED;
 			put_user(pte_r >> 8, addr + 2);
 		}
-		if (pte->may_write && !(pte_r & PTEG_FLAG_DIRTY)) {
-			/* XXX should only set this for stores */
+		if (iswrite && pte->may_write && !(pte_r & PTEG_FLAG_DIRTY)) {
 			pte_r |= PTEG_FLAG_DIRTY;
 			put_user(pte_r, addr + 3);
 		}
+		if (!pte->may_read || (iswrite && !pte->may_write))
+			return -EPERM;
 		return 0;
 	}
 
@@ -305,7 +304,8 @@ no_page_found:
 }
 
 static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
-				      struct kvmppc_pte *pte, bool data)
+				      struct kvmppc_pte *pte, bool data,
+				      bool iswrite)
 {
 	int r;
 	ulong mp_ea = vcpu->arch.magic_page_ea;
@@ -327,11 +327,13 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		return 0;
 	}
 
-	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data, iswrite);
 	if (r < 0)
-	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+		r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte,
+						   data, iswrite, true);
 	if (r < 0)
-	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+		r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte,
+						   data, iswrite, false);
 
 	return r;
 }
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index c4361ef..3a0abd2 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -138,7 +138,8 @@ static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr,
 
 extern char etext[];
 
-int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
+			bool iswrite)
 {
 	pfn_t hpaddr;
 	u64 vpn;
@@ -152,9 +153,11 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 	bool evict = false;
 	struct hpte_cache *pte;
 	int r = 0;
+	bool writable;
 
 	/* Get host physical address for gpa */
-	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT);
+	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT,
+				   iswrite, &writable);
 	if (is_error_noslot_pfn(hpaddr)) {
 		printk(KERN_INFO "Couldn't get guest page for gfn %lx!\n",
 				 orig_pte->eaddr);
@@ -204,7 +207,7 @@ next_pteg:
 		(primary ? 0 : PTE_SEC);
 	pteg1 = hpaddr | PTE_M | PTE_R | PTE_C;
 
-	if (orig_pte->may_write) {
+	if (orig_pte->may_write && writable) {
 		pteg1 |= PP_RWRW;
 		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
 	} else {
@@ -259,6 +262,11 @@ out:
 	return r;
 }
 
+void kvmppc_mmu_unmap_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	kvmppc_mmu_pte_vflush(vcpu, pte->vpage, 0xfffffffffULL);
+}
+
 static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
 {
 	struct kvmppc_sid_map *map;
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 86925da..92c2aa8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -206,7 +206,8 @@ static int decode_pagesize(struct kvmppc_slb *slbe, u64 r)
 }
 
 static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
-				struct kvmppc_pte *gpte, bool data)
+				      struct kvmppc_pte *gpte, bool data,
+				      bool iswrite)
 {
 	struct kvmppc_slb *slbe;
 	hva_t ptegp;
@@ -345,8 +346,8 @@ do_second:
 		r |= HPTE_R_R;
 		put_user(r >> 8, addr + 6);
 	}
-	if (data && gpte->may_write && !(r & HPTE_R_C)) {
-		/* Set the dirty flag -- XXX even if not writing */
+	if (iswrite && gpte->may_write && !(r & HPTE_R_C)) {
+		/* Set the dirty flag */
 		/* Use a single byte write */
 		char __user *addr = (char __user *) &pteg[i+1];
 		r |= HPTE_R_C;
@@ -355,7 +356,7 @@ do_second:
 
 	mutex_unlock(&vcpu->kvm->arch.hpt_mutex);
 
-	if (!gpte->may_read)
+	if (!gpte->may_read || (iswrite && !gpte->may_write))
 		return -EPERM;
 	return 0;
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 3dd178c..7fcf38f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -78,7 +78,8 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
 	return NULL;
 }
 
-int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
+			bool iswrite)
 {
 	unsigned long vpn;
 	pfn_t hpaddr;
@@ -91,9 +92,11 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 	struct kvmppc_sid_map *map;
 	int r = 0;
 	int hpsize = MMU_PAGE_4K;
+	bool writable;
 
 	/* Get host physical address for gpa */
-	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT);
+	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT,
+				   iswrite, &writable);
 	if (is_error_noslot_pfn(hpaddr)) {
 		printk(KERN_INFO "Couldn't get guest page for gfn %lx!\n", orig_pte->eaddr);
 		r = -EINVAL;
@@ -119,7 +122,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 
 	vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
 
-	if (!orig_pte->may_write)
+	if (!orig_pte->may_write || !writable)
 		rflags |= HPTE_R_PP;
 	else
 		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
@@ -186,6 +189,17 @@ out:
 	return r;
 }
 
+void kvmppc_mmu_unmap_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	u64 mask = 0xfffffffffULL;
+	u64 vsid;
+
+	vcpu->arch.mmu.esid_to_vsid(vcpu, pte->eaddr >> SID_SHIFT, &vsid);
+	if (vsid & VSID_64K)
+		mask = 0xffffffff0ULL;
+	kvmppc_mmu_pte_vflush(vcpu, pte->vpage, mask);
+}
+
 static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
 {
 	struct kvmppc_sid_map *map;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index e37c785..49ad17a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -447,7 +447,7 @@ static unsigned long kvmppc_mmu_get_real_addr(unsigned long v, unsigned long r,
 }
 
 static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
-			struct kvmppc_pte *gpte, bool data)
+			struct kvmppc_pte *gpte, bool data, bool iswrite)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvmppc_slb *slbe;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index f35425e..71f7cfe 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -398,6 +398,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			    ulong eaddr, int vec)
 {
 	bool data = (vec == BOOK3S_INTERRUPT_DATA_STORAGE);
+	bool iswrite = false;
 	int r = RESUME_GUEST;
 	int relocated;
 	int page_found = 0;
@@ -408,10 +409,12 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	u64 vsid;
 
 	relocated = data ? dr : ir;
+	if (data && (vcpu->arch.fault_dsisr & DSISR_ISSTORE))
+		iswrite = true;
 
 	/* Resolve real address if translation turned on */
 	if (relocated) {
-		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data, iswrite);
 	} else {
 		pte.may_execute = true;
 		pte.may_read = true;
@@ -472,12 +475,20 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
 	} else if (!is_mmio &&
 		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		if (data && !(vcpu->arch.fault_dsisr & DSISR_NOHPTE)) {
+			/*
+			 * There is already a host HPTE there, presumably
+			 * a read-only one for a page the guest thinks
+			 * is writable, so get rid of it first.
+			 */
+			kvmppc_mmu_unmap_page(vcpu, &pte);
+		}
 		/* The guest's PTE is not mapped yet. Map on the host */
-		kvmppc_mmu_map_page(vcpu, &pte);
+		kvmppc_mmu_map_page(vcpu, &pte, iswrite);
 		if (data)
 			vcpu->stat.sp_storage++;
 		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
-			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			 (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
 			kvmppc_patch_dcbz(vcpu, &pte);
 	} else {
 		/* MMIO */
@@ -727,7 +738,9 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 		/* only care about PTEG not found errors, but leave NX alone */
 		if (shadow_srr1 & 0x40000000) {
+			int idx = srcu_read_lock(&vcpu->kvm->srcu);
 			r = kvmppc_handle_pagefault(run, vcpu, kvmppc_get_pc(vcpu), exit_nr);
+			srcu_read_unlock(&vcpu->kvm->srcu, idx);
 			vcpu->stat.sp_instruc++;
 		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
 			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
@@ -769,9 +782,15 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		}
 #endif
 
-		/* The only case we need to handle is missing shadow PTEs */
-		if (fault_dsisr & DSISR_NOHPTE) {
+		/*
+		 * We need to handle missing shadow PTEs, and
+		 * protection faults due to us mapping a page read-only
+		 * when the guest thinks it is writable.
+		 */
+		if (fault_dsisr & (DSISR_NOHPTE | DSISR_PROTFAULT)) {
+			int idx = srcu_read_lock(&vcpu->kvm->srcu);
 			r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
+			srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		} else {
 			vcpu->arch.shared->dar = dar;
 			vcpu->arch.shared->dsisr = fault_dsisr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (19 preceding siblings ...)
  2013-08-06  4:27 ` [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages Paul Mackerras
@ 2013-08-06  4:27 ` Paul Mackerras
  2013-08-07  4:13   ` Bhushan Bharat-R65777
  2013-08-07  5:17   ` Bhushan Bharat-R65777
  2013-08-06  4:27 ` [PATCH 22/23] KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written Paul Mackerras
  2013-08-06  4:28 ` [PATCH 23/23] KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers Paul Mackerras
  22 siblings, 2 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:27 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

When the MM code is invalidating a range of pages, it calls the KVM
kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
kvm_unmap_hva_range(), which arranges to flush all the existing host
HPTEs for guest pages.  However, the Linux PTEs for the range being
flushed are still valid at that point.  We are not supposed to establish
any new references to pages in the range until the ...range_end()
notifier gets called.  The PPC-specific KVM code doesn't get any
explicit notification of that; instead, we are supposed to use
mmu_notifier_retry() to test whether we are or have been inside a
range flush notifier pair while we have been getting a page and
instantiating a host HPTE for the page.

This therefore adds a call to mmu_notifier_retry inside
kvmppc_mmu_map_page().  This call is inside a region locked with
kvm->mmu_lock, which is the same lock that is called by the KVM
MMU notifier functions, thus ensuring that no new notification can
proceed while we are in the locked region.  Inside this region we
also create the host HPTE and link the corresponding hpte_cache
structure into the lists used to find it later.  We cannot allocate
the hpte_cache structure inside this locked region because that can
lead to deadlock, so we allocate it outside the region and free it
if we end up not using it.

This also moves the updates of vcpu3s->hpte_cache_count inside the
regions locked with vcpu3s->mmu_lock, and does the increment in
kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
rather than when it is allocated, in order that the hpte_cache_count
is accurate.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |  1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c | 37 ++++++++++++++++++++++++++---------
 arch/powerpc/kvm/book3s_mmu_hpte.c    | 14 +++++++++----
 3 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 4fe6864..e711e77 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -143,6 +143,7 @@ extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
 
 extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte);
 extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte);
 extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu);
 extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 7fcf38f..b7e9504 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -93,6 +93,13 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	int r = 0;
 	int hpsize = MMU_PAGE_4K;
 	bool writable;
+	unsigned long mmu_seq;
+	struct kvm *kvm = vcpu->kvm;
+	struct hpte_cache *cpte;
+
+	/* used to check for invalidations in progress */
+	mmu_seq = kvm->mmu_notifier_seq;
+	smp_rmb();
 
 	/* Get host physical address for gpa */
 	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT,
@@ -143,6 +150,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 
 	hash = hpt_hash(vpn, mmu_psize_defs[hpsize].shift, MMU_SEGSIZE_256M);
 
+	cpte = kvmppc_mmu_hpte_cache_next(vcpu);
+
+	spin_lock(&kvm->mmu_lock);
+	if (!cpte || mmu_notifier_retry(kvm, mmu_seq)) {
+		r = -EAGAIN;
+		goto out_unlock;
+	}
+
 map_again:
 	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
 
@@ -150,7 +165,7 @@ map_again:
 	if (attempt > 1)
 		if (ppc_md.hpte_remove(hpteg) < 0) {
 			r = -1;
-			goto out;
+			goto out_unlock;
 		}
 
 	ret = ppc_md.hpte_insert(hpteg, vpn, hpaddr, rflags, vflags,
@@ -163,8 +178,6 @@ map_again:
 		attempt++;
 		goto map_again;
 	} else {
-		struct hpte_cache *pte = kvmppc_mmu_hpte_cache_next(vcpu);
-
 		trace_kvm_book3s_64_mmu_map(rflags, hpteg,
 					    vpn, hpaddr, orig_pte);
 
@@ -175,15 +188,21 @@ map_again:
 			hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
 		}
 
-		pte->slot = hpteg + (ret & 7);
-		pte->host_vpn = vpn;
-		pte->pte = *orig_pte;
-		pte->pfn = hpaddr >> PAGE_SHIFT;
-		pte->pagesize = hpsize;
+		cpte->slot = hpteg + (ret & 7);
+		cpte->host_vpn = vpn;
+		cpte->pte = *orig_pte;
+		cpte->pfn = hpaddr >> PAGE_SHIFT;
+		cpte->pagesize = hpsize;
 
-		kvmppc_mmu_hpte_cache_map(vcpu, pte);
+		kvmppc_mmu_hpte_cache_map(vcpu, cpte);
+		cpte = NULL;
 	}
+
+out_unlock:
+	spin_unlock(&kvm->mmu_lock);
 	kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+	if (cpte)
+		kvmppc_mmu_hpte_cache_free(cpte);
 
 out:
 	return r;
diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c
index d2d280b..6b79bfc 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -98,6 +98,8 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 			   &vcpu3s->hpte_hash_vpte_64k[index]);
 #endif
 
+	vcpu3s->hpte_cache_count++;
+
 	spin_unlock(&vcpu3s->mmu_lock);
 }
 
@@ -131,10 +133,10 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 #ifdef CONFIG_PPC_BOOK3S_64
 	hlist_del_init_rcu(&pte->list_vpte_64k);
 #endif
+	vcpu3s->hpte_cache_count--;
 
 	spin_unlock(&vcpu3s->mmu_lock);
 
-	vcpu3s->hpte_cache_count--;
 	call_rcu(&pte->rcu_head, free_pte_rcu);
 }
 
@@ -331,15 +333,19 @@ struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
 	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
 	struct hpte_cache *pte;
 
-	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
-	vcpu3s->hpte_cache_count++;
-
 	if (vcpu3s->hpte_cache_count == HPTEG_CACHE_NUM)
 		kvmppc_mmu_pte_flush_all(vcpu);
 
+	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
+
 	return pte;
 }
 
+void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte)
+{
+	kmem_cache_free(hpte_cache, pte);
+}
+
 void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu)
 {
 	kvmppc_mmu_pte_flush(vcpu, 0, 0);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 22/23] KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (20 preceding siblings ...)
  2013-08-06  4:27 ` [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Paul Mackerras
@ 2013-08-06  4:27 ` Paul Mackerras
  2013-08-06  4:28 ` [PATCH 23/23] KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers Paul Mackerras
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:27 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

The mark_page_dirty() function, despite what its name might suggest,
doesn't actually mark the page as dirty as far as the MM subsystem is
concerned.  It merely sets a bit in KVM's map of dirty pages, if
userspace has requested dirty tracking for the relevant memslot.
To tell the MM subsystem that the page is dirty, we have to call
kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()).

This adds a call to kvm_set_pfn_dirty(), and while we are here, also
adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that
the page has been accessed.  Since we are now using the pfn in
several places, this adds a 'pfn' variable to store it and changes
the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which
is the same thing.

This also changes a use of HPTE_R_PP to PP_RXRX.  Both are 3, but
PP_RXRX is more informative as being the read-only page permission
bit setting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b7e9504..8e8aff9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -96,20 +96,21 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
 	struct hpte_cache *cpte;
+	unsigned long gfn = orig_pte->raddr >> PAGE_SHIFT;
+	unsigned long pfn;
 
 	/* used to check for invalidations in progress */
 	mmu_seq = kvm->mmu_notifier_seq;
 	smp_rmb();
 
 	/* Get host physical address for gpa */
-	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT,
-				   iswrite, &writable);
-	if (is_error_noslot_pfn(hpaddr)) {
-		printk(KERN_INFO "Couldn't get guest page for gfn %lx!\n", orig_pte->eaddr);
+	pfn = kvmppc_gfn_to_pfn(vcpu, gfn, iswrite, &writable);
+	if (is_error_noslot_pfn(pfn)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %lx!\n", gfn);
 		r = -EINVAL;
 		goto out;
 	}
-	hpaddr <<= PAGE_SHIFT;
+	hpaddr = pfn << PAGE_SHIFT;
 
 	/* and write the mapping ea -> hpa into the pt */
 	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
@@ -129,15 +130,18 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 
 	vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
 
+	kvm_set_pfn_accessed(pfn);
 	if (!orig_pte->may_write || !writable)
-		rflags |= HPTE_R_PP;
-	else
-		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+		rflags |= PP_RXRX;
+	else {
+		mark_page_dirty(vcpu->kvm, gfn);
+		kvm_set_pfn_dirty(pfn);
+	}
 
 	if (!orig_pte->may_execute)
 		rflags |= HPTE_R_N;
 	else
-		kvmppc_mmu_flush_icache(hpaddr >> PAGE_SHIFT);
+		kvmppc_mmu_flush_icache(pfn);
 
 	/*
 	 * Use 64K pages if possible; otherwise, on 64K page kernels,
@@ -191,7 +195,7 @@ map_again:
 		cpte->slot = hpteg + (ret & 7);
 		cpte->host_vpn = vpn;
 		cpte->pte = *orig_pte;
-		cpte->pfn = hpaddr >> PAGE_SHIFT;
+		cpte->pfn = pfn;
 		cpte->pagesize = hpsize;
 
 		kvmppc_mmu_hpte_cache_map(vcpu, cpte);
@@ -200,7 +204,7 @@ map_again:
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+	kvm_release_pfn_clean(pfn);
 	if (cpte)
 		kvmppc_mmu_hpte_cache_free(cpte);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 23/23] KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers
  2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
                   ` (21 preceding siblings ...)
  2013-08-06  4:27 ` [PATCH 22/23] KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written Paul Mackerras
@ 2013-08-06  4:28 ` Paul Mackerras
  22 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-06  4:28 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Currently, whenever any of the MMU notifier callbacks get called, we
invalidate all the shadow PTEs.  This is inefficient because it means
that we typically then get a lot of DSIs and ISIs in the guest to fault
the shadow PTEs back in.  We do this even if the address range being
notified doesn't correspond to guest memory.

This commit adds code to scan the memslot array to find out what range(s)
of guest physical addresses corresponds to the host virtual address range
being affected.  For each such range we flush only the shadow PTEs
for the range, on all cpus.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_pr.c | 40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 71f7cfe..2336d9c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -150,16 +150,41 @@ int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu)
 }
 
 /************* MMU Notifiers *************/
+static void do_kvm_unmap_hva(struct kvm *kvm, unsigned long start,
+			     unsigned long end)
+{
+	long i;
+	struct kvm_vcpu *vcpu;
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long hva_start, hva_end;
+		gfn_t gfn, gfn_end;
+
+		hva_start = max(start, memslot->userspace_addr);
+		hva_end = min(end, memslot->userspace_addr +
+					(memslot->npages << PAGE_SHIFT));
+		if (hva_start >= hva_end)
+			continue;
+		/*
+		 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+		 * {gfn, gfn+1, ..., gfn_end-1}.
+		 */
+		gfn = hva_to_gfn_memslot(hva_start, memslot);
+		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			kvmppc_mmu_pte_pflush(vcpu, gfn << PAGE_SHIFT,
+					      gfn_end << PAGE_SHIFT);
+	}
+}
 
 int kvm_unmap_hva_pr(struct kvm *kvm, unsigned long hva)
 {
 	trace_kvm_unmap_hva(hva);
 
-	/*
-	 * Flush all shadow tlb entries everywhere. This is slow, but
-	 * we are 100% sure that we catch the to be unmapped page
-	 */
-	kvm_flush_remote_tlbs(kvm);
+	do_kvm_unmap_hva(kvm, hva, hva + PAGE_SIZE);
 
 	return 0;
 }
@@ -167,8 +192,7 @@ int kvm_unmap_hva_pr(struct kvm *kvm, unsigned long hva)
 int kvm_unmap_hva_range_pr(struct kvm *kvm, unsigned long start,
 			   unsigned long end)
 {
-	/* kvm_unmap_hva flushes everything anyways */
-	kvm_unmap_hva(kvm, start);
+	do_kvm_unmap_hva(kvm, start, end);
 
 	return 0;
 }
@@ -188,7 +212,7 @@ int kvm_test_age_hva_pr(struct kvm *kvm, unsigned long hva)
 void kvm_set_spte_hva_pr(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
 	/* The page will get remapped properly on its next fault */
-	kvm_unmap_hva(kvm, hva);
+	do_kvm_unmap_hva(kvm, hva, hva + PAGE_SIZE);
 }
 
 /*****************************************/
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* RE: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-06  4:27 ` [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Paul Mackerras
@ 2013-08-07  4:13   ` Bhushan Bharat-R65777
  2013-08-07  4:28     ` Paul Mackerras
  2013-08-07  5:17   ` Bhushan Bharat-R65777
  1 sibling, 1 reply; 68+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-07  4:13 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf, Benjamin Herrenschmidt
  Cc: kvm-ppc@vger.kernel.org, kvm@vger.kernel.org



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of
> Paul Mackerras
> Sent: Tuesday, August 06, 2013 9:58 AM
> To: Alexander Graf; Benjamin Herrenschmidt
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org
> Subject: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in
> kvmppc_mmu_map_page()
> 
> When the MM code is invalidating a range of pages, it calls the KVM
> kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
> kvm_unmap_hva_range(), which arranges to flush all the existing host HPTEs for
> guest pages.  However, the Linux PTEs for the range being flushed are still
> valid at that point.  We are not supposed to establish any new references to
> pages in the range until the ...range_end() notifier gets called.  The PPC-
> specific KVM code doesn't get any explicit notification of that; instead, we are
> supposed to use
> mmu_notifier_retry() to test whether we are or have been inside a range flush
> notifier pair while we have been getting a page and instantiating a host HPTE
> for the page.
> 
> This therefore adds a call to mmu_notifier_retry inside kvmppc_mmu_map_page().
> This call is inside a region locked with
> kvm->mmu_lock, which is the same lock that is called by the KVM
> MMU notifier functions, thus ensuring that no new notification can proceed while
> we are in the locked region.  Inside this region we also create the host HPTE
> and link the corresponding hpte_cache structure into the lists used to find it
> later.  We cannot allocate the hpte_cache structure inside this locked region
> because that can lead to deadlock, so we allocate it outside the region and free
> it if we end up not using it.
> 
> This also moves the updates of vcpu3s->hpte_cache_count inside the regions
> locked with vcpu3s->mmu_lock, and does the increment in
> kvmppc_mmu_hpte_cache_map() when the pte is added to the cache rather than when
> it is allocated, in order that the hpte_cache_count is accurate.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  1 +
> arch/powerpc/kvm/book3s_64_mmu_host.c | 37 ++++++++++++++++++++++++++---------
>  arch/powerpc/kvm/book3s_mmu_hpte.c    | 14 +++++++++----
>  3 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 4fe6864..e711e77 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -143,6 +143,7 @@ extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t
> eaddr,
> 
>  extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache
> *pte);  extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu
> *vcpu);
> +extern void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte);
>  extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu);  extern int
> kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);  extern void
> kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte); diff -
> -git a/arch/powerpc/kvm/book3s_64_mmu_host.c
> b/arch/powerpc/kvm/book3s_64_mmu_host.c
> index 7fcf38f..b7e9504 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
> @@ -93,6 +93,13 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct
> kvmppc_pte *orig_pte,
>  	int r = 0;
>  	int hpsize = MMU_PAGE_4K;
>  	bool writable;
> +	unsigned long mmu_seq;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct hpte_cache *cpte;
> +
> +	/* used to check for invalidations in progress */
> +	mmu_seq = kvm->mmu_notifier_seq;
> +	smp_rmb();

Should not the smp_rmb() come before reading kvm->mmu_notifier_seq.

-Bharat

> 
>  	/* Get host physical address for gpa */
>  	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT, @@ -143,6
> +150,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte
> *orig_pte,
> 
>  	hash = hpt_hash(vpn, mmu_psize_defs[hpsize].shift, MMU_SEGSIZE_256M);
> 
> +	cpte = kvmppc_mmu_hpte_cache_next(vcpu);
> +
> +	spin_lock(&kvm->mmu_lock);
> +	if (!cpte || mmu_notifier_retry(kvm, mmu_seq)) {
> +		r = -EAGAIN;
> +		goto out_unlock;
> +	}
> +
>  map_again:
>  	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
> 
> @@ -150,7 +165,7 @@ map_again:
>  	if (attempt > 1)
>  		if (ppc_md.hpte_remove(hpteg) < 0) {
>  			r = -1;
> -			goto out;
> +			goto out_unlock;
>  		}
> 
>  	ret = ppc_md.hpte_insert(hpteg, vpn, hpaddr, rflags, vflags, @@ -163,8
> +178,6 @@ map_again:
>  		attempt++;
>  		goto map_again;
>  	} else {
> -		struct hpte_cache *pte = kvmppc_mmu_hpte_cache_next(vcpu);
> -
>  		trace_kvm_book3s_64_mmu_map(rflags, hpteg,
>  					    vpn, hpaddr, orig_pte);
> 
> @@ -175,15 +188,21 @@ map_again:
>  			hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
>  		}
> 
> -		pte->slot = hpteg + (ret & 7);
> -		pte->host_vpn = vpn;
> -		pte->pte = *orig_pte;
> -		pte->pfn = hpaddr >> PAGE_SHIFT;
> -		pte->pagesize = hpsize;
> +		cpte->slot = hpteg + (ret & 7);
> +		cpte->host_vpn = vpn;
> +		cpte->pte = *orig_pte;
> +		cpte->pfn = hpaddr >> PAGE_SHIFT;
> +		cpte->pagesize = hpsize;
> 
> -		kvmppc_mmu_hpte_cache_map(vcpu, pte);
> +		kvmppc_mmu_hpte_cache_map(vcpu, cpte);
> +		cpte = NULL;
>  	}
> +
> +out_unlock:
> +	spin_unlock(&kvm->mmu_lock);
>  	kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
> +	if (cpte)
> +		kvmppc_mmu_hpte_cache_free(cpte);
> 
>  out:
>  	return r;
> diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c
> b/arch/powerpc/kvm/book3s_mmu_hpte.c
> index d2d280b..6b79bfc 100644
> --- a/arch/powerpc/kvm/book3s_mmu_hpte.c
> +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
> @@ -98,6 +98,8 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct
> hpte_cache *pte)
>  			   &vcpu3s->hpte_hash_vpte_64k[index]);
>  #endif
> 
> +	vcpu3s->hpte_cache_count++;
> +
>  	spin_unlock(&vcpu3s->mmu_lock);
>  }
> 
> @@ -131,10 +133,10 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct
> hpte_cache *pte)  #ifdef CONFIG_PPC_BOOK3S_64
>  	hlist_del_init_rcu(&pte->list_vpte_64k);
>  #endif
> +	vcpu3s->hpte_cache_count--;
> 
>  	spin_unlock(&vcpu3s->mmu_lock);
> 
> -	vcpu3s->hpte_cache_count--;
>  	call_rcu(&pte->rcu_head, free_pte_rcu);  }
> 
> @@ -331,15 +333,19 @@ struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct
> kvm_vcpu *vcpu)
>  	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
>  	struct hpte_cache *pte;
> 
> -	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
> -	vcpu3s->hpte_cache_count++;
> -
>  	if (vcpu3s->hpte_cache_count == HPTEG_CACHE_NUM)
>  		kvmppc_mmu_pte_flush_all(vcpu);
> 
> +	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
> +
>  	return pte;
>  }
> 
> +void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte) {
> +	kmem_cache_free(hpte_cache, pte);
> +}
> +
>  void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu)  {
>  	kvmppc_mmu_pte_flush(vcpu, 0, 0);
> --
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a
> message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-07  4:13   ` Bhushan Bharat-R65777
@ 2013-08-07  4:28     ` Paul Mackerras
  2013-08-07  5:18       ` Bhushan Bharat-R65777
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-07  4:28 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Alexander Graf, Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

On Wed, Aug 07, 2013 at 04:13:34AM +0000, Bhushan Bharat-R65777 wrote:
> 
> > +	/* used to check for invalidations in progress */
> > +	mmu_seq = kvm->mmu_notifier_seq;
> > +	smp_rmb();
> 
> Should not the smp_rmb() come before reading kvm->mmu_notifier_seq.

No, it should come after, because it is ordering the read of
kvm->mmu_notifier_seq before the read of the Linux PTE.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-06  4:27 ` [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Paul Mackerras
  2013-08-07  4:13   ` Bhushan Bharat-R65777
@ 2013-08-07  5:17   ` Bhushan Bharat-R65777
  2013-08-07  8:27     ` Paul Mackerras
  1 sibling, 1 reply; 68+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-07  5:17 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf, Benjamin Herrenschmidt
  Cc: kvm-ppc@vger.kernel.org, kvm@vger.kernel.org



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of
> Paul Mackerras
> Sent: Tuesday, August 06, 2013 9:58 AM
> To: Alexander Graf; Benjamin Herrenschmidt
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org
> Subject: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in
> kvmppc_mmu_map_page()
> 
> When the MM code is invalidating a range of pages, it calls the KVM
> kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
> kvm_unmap_hva_range(), which arranges to flush all the existing host
> HPTEs for guest pages.  However, the Linux PTEs for the range being
> flushed are still valid at that point.  We are not supposed to establish
> any new references to pages in the range until the ...range_end()
> notifier gets called.  The PPC-specific KVM code doesn't get any
> explicit notification of that; instead, we are supposed to use
> mmu_notifier_retry() to test whether we are or have been inside a
> range flush notifier pair while we have been getting a page and
> instantiating a host HPTE for the page.
> 
> This therefore adds a call to mmu_notifier_retry inside
> kvmppc_mmu_map_page().  This call is inside a region locked with
> kvm->mmu_lock, which is the same lock that is called by the KVM
> MMU notifier functions, thus ensuring that no new notification can
> proceed while we are in the locked region.  Inside this region we
> also create the host HPTE and link the corresponding hpte_cache
> structure into the lists used to find it later.  We cannot allocate
> the hpte_cache structure inside this locked region because that can
> lead to deadlock, so we allocate it outside the region and free it
> if we end up not using it.
> 
> This also moves the updates of vcpu3s->hpte_cache_count inside the
> regions locked with vcpu3s->mmu_lock, and does the increment in
> kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
> rather than when it is allocated, in order that the hpte_cache_count
> is accurate.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  1 +
>  arch/powerpc/kvm/book3s_64_mmu_host.c | 37 ++++++++++++++++++++++++++---------
>  arch/powerpc/kvm/book3s_mmu_hpte.c    | 14 +++++++++----
>  3 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 4fe6864..e711e77 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -143,6 +143,7 @@ extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t
> eaddr,
> 
>  extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache
> *pte);
>  extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
> +extern void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte);
>  extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu);
>  extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
>  extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache
> *pte);
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c
> b/arch/powerpc/kvm/book3s_64_mmu_host.c
> index 7fcf38f..b7e9504 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
> @@ -93,6 +93,13 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct
> kvmppc_pte *orig_pte,
>  	int r = 0;
>  	int hpsize = MMU_PAGE_4K;
>  	bool writable;
> +	unsigned long mmu_seq;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct hpte_cache *cpte;
> +
> +	/* used to check for invalidations in progress */
> +	mmu_seq = kvm->mmu_notifier_seq;
> +	smp_rmb();
> 
>  	/* Get host physical address for gpa */
>  	hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte->raddr >> PAGE_SHIFT,
> @@ -143,6 +150,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct
> kvmppc_pte *orig_pte,
> 
>  	hash = hpt_hash(vpn, mmu_psize_defs[hpsize].shift, MMU_SEGSIZE_256M);
> 
> +	cpte = kvmppc_mmu_hpte_cache_next(vcpu);
> +
> +	spin_lock(&kvm->mmu_lock);
> +	if (!cpte || mmu_notifier_retry(kvm, mmu_seq)) {
> +		r = -EAGAIN;

Pauls, I am trying to understand the flow; does retry mean that we do not create the mapping and return to guest, which will fault again and then we will retry? 

Thanks
-Bharat

> +		goto out_unlock;
> +	}
> +
>  map_again:
>  	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
> 
> @@ -150,7 +165,7 @@ map_again:
>  	if (attempt > 1)
>  		if (ppc_md.hpte_remove(hpteg) < 0) {
>  			r = -1;
> -			goto out;
> +			goto out_unlock;
>  		}
> 
>  	ret = ppc_md.hpte_insert(hpteg, vpn, hpaddr, rflags, vflags,
> @@ -163,8 +178,6 @@ map_again:
>  		attempt++;
>  		goto map_again;
>  	} else {
> -		struct hpte_cache *pte = kvmppc_mmu_hpte_cache_next(vcpu);
> -
>  		trace_kvm_book3s_64_mmu_map(rflags, hpteg,
>  					    vpn, hpaddr, orig_pte);
> 
> @@ -175,15 +188,21 @@ map_again:
>  			hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
>  		}
> 
> -		pte->slot = hpteg + (ret & 7);
> -		pte->host_vpn = vpn;
> -		pte->pte = *orig_pte;
> -		pte->pfn = hpaddr >> PAGE_SHIFT;
> -		pte->pagesize = hpsize;
> +		cpte->slot = hpteg + (ret & 7);
> +		cpte->host_vpn = vpn;
> +		cpte->pte = *orig_pte;
> +		cpte->pfn = hpaddr >> PAGE_SHIFT;
> +		cpte->pagesize = hpsize;
> 
> -		kvmppc_mmu_hpte_cache_map(vcpu, pte);
> +		kvmppc_mmu_hpte_cache_map(vcpu, cpte);
> +		cpte = NULL;
>  	}
> +
> +out_unlock:
> +	spin_unlock(&kvm->mmu_lock);
>  	kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
> +	if (cpte)
> +		kvmppc_mmu_hpte_cache_free(cpte);
> 
>  out:
>  	return r;
> diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c
> b/arch/powerpc/kvm/book3s_mmu_hpte.c
> index d2d280b..6b79bfc 100644
> --- a/arch/powerpc/kvm/book3s_mmu_hpte.c
> +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
> @@ -98,6 +98,8 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct
> hpte_cache *pte)
>  			   &vcpu3s->hpte_hash_vpte_64k[index]);
>  #endif
> 
> +	vcpu3s->hpte_cache_count++;
> +
>  	spin_unlock(&vcpu3s->mmu_lock);
>  }
> 
> @@ -131,10 +133,10 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct
> hpte_cache *pte)
>  #ifdef CONFIG_PPC_BOOK3S_64
>  	hlist_del_init_rcu(&pte->list_vpte_64k);
>  #endif
> +	vcpu3s->hpte_cache_count--;
> 
>  	spin_unlock(&vcpu3s->mmu_lock);
> 
> -	vcpu3s->hpte_cache_count--;
>  	call_rcu(&pte->rcu_head, free_pte_rcu);
>  }
> 
> @@ -331,15 +333,19 @@ struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct
> kvm_vcpu *vcpu)
>  	struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
>  	struct hpte_cache *pte;
> 
> -	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
> -	vcpu3s->hpte_cache_count++;
> -
>  	if (vcpu3s->hpte_cache_count == HPTEG_CACHE_NUM)
>  		kvmppc_mmu_pte_flush_all(vcpu);
> 
> +	pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
> +
>  	return pte;
>  }
> 
> +void kvmppc_mmu_hpte_cache_free(struct hpte_cache *pte)
> +{
> +	kmem_cache_free(hpte_cache, pte);
> +}
> +
>  void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu)
>  {
>  	kvmppc_mmu_pte_flush(vcpu, 0, 0);
> --
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-07  4:28     ` Paul Mackerras
@ 2013-08-07  5:18       ` Bhushan Bharat-R65777
  0 siblings, 0 replies; 68+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-07  5:18 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexander Graf, Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



> -----Original Message-----
> From: Paul Mackerras [mailto:paulus@samba.org]
> Sent: Wednesday, August 07, 2013 9:59 AM
> To: Bhushan Bharat-R65777
> Cc: Alexander Graf; Benjamin Herrenschmidt; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in
> kvmppc_mmu_map_page()
> 
> On Wed, Aug 07, 2013 at 04:13:34AM +0000, Bhushan Bharat-R65777 wrote:
> >
> > > +	/* used to check for invalidations in progress */
> > > +	mmu_seq = kvm->mmu_notifier_seq;
> > > +	smp_rmb();
> >
> > Should not the smp_rmb() come before reading kvm->mmu_notifier_seq.
> 
> No, it should come after, because it is ordering the read of
> kvm->mmu_notifier_seq before the read of the Linux PTE.

Ahh, ok. Thanks

-Bharat

> 
> Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-07  5:17   ` Bhushan Bharat-R65777
@ 2013-08-07  8:27     ` Paul Mackerras
  2013-08-07  8:31       ` Bhushan Bharat-R65777
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-07  8:27 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Alexander Graf, Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

On Wed, Aug 07, 2013 at 05:17:29AM +0000, Bhushan Bharat-R65777 wrote:
> 
> Pauls, I am trying to understand the flow; does retry mean that we do not create the mapping and return to guest, which will fault again and then we will retry? 

Yes, and you do put_page or kvm_release_pfn_clean for any page that
you got.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-07  8:27     ` Paul Mackerras
@ 2013-08-07  8:31       ` Bhushan Bharat-R65777
  2013-08-08 12:06         ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-07  8:31 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexander Graf, Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



> -----Original Message-----
> From: Paul Mackerras [mailto:paulus@samba.org]
> Sent: Wednesday, August 07, 2013 1:58 PM
> To: Bhushan Bharat-R65777
> Cc: Alexander Graf; Benjamin Herrenschmidt; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in
> kvmppc_mmu_map_page()
> 
> On Wed, Aug 07, 2013 at 05:17:29AM +0000, Bhushan Bharat-R65777 wrote:
> >
> > Pauls, I am trying to understand the flow; does retry mean that we do not
> create the mapping and return to guest, which will fault again and then we will
> retry?
> 
> Yes, and you do put_page or kvm_release_pfn_clean for any page that you got.

Ok, but what is the value to return back to guest when we know it is again going to generate fault. 
Cannot we retry within KVM?

Thanks
-Bharat

> 
> Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
  2013-08-07  8:31       ` Bhushan Bharat-R65777
@ 2013-08-08 12:06         ` Paul Mackerras
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-08 12:06 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Alexander Graf, Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

On Wed, Aug 07, 2013 at 08:31:04AM +0000, Bhushan Bharat-R65777 wrote:
> 
> 
> > -----Original Message-----
> > From: Paul Mackerras [mailto:paulus@samba.org]
> > Sent: Wednesday, August 07, 2013 1:58 PM
> > To: Bhushan Bharat-R65777
> > Cc: Alexander Graf; Benjamin Herrenschmidt; kvm-ppc@vger.kernel.org;
> > kvm@vger.kernel.org
> > Subject: Re: [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in
> > kvmppc_mmu_map_page()
> > 
> > On Wed, Aug 07, 2013 at 05:17:29AM +0000, Bhushan Bharat-R65777 wrote:
> > >
> > > Pauls, I am trying to understand the flow; does retry mean that we do not
> > create the mapping and return to guest, which will fault again and then we will
> > retry?
> > 
> > Yes, and you do put_page or kvm_release_pfn_clean for any page that you got.
> 
> Ok, but what is the value to return back to guest when we know it is again going to generate fault. 
> Cannot we retry within KVM?

You can, though you should make sure you include a preemption point.
Going back to the guest gets you a preemption point because of the
cond_resched() call in kvmppc_prepare_to_enter().

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  2013-08-06  4:14 ` [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX Paul Mackerras
@ 2013-08-08 15:49   ` Aneesh Kumar K.V
  2013-08-28 22:51   ` Alexander Graf
  1 sibling, 0 replies; 68+ messages in thread
From: Aneesh Kumar K.V @ 2013-08-08 15:49 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> @@ -575,8 +577,6 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
>  	printk(KERN_INFO "Loading up ext 0x%lx\n", msr);
>  #endif
>  
> -	current->thread.regs->msr |= msr;
> -
>  	if (msr & MSR_FP) {
>  		for (i = 0; i < ARRAY_SIZE(vcpu->arch.fpr); i++)
>  			thread_fpr[get_fpr_index(i)] = vcpu_fpr[i];
> @@ -598,12 +598,32 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
>  #endif
>  	}
>  
> +	current->thread.regs->msr |= msr;
>  	vcpu->arch.guest_owned_ext |= msr;
>  	kvmppc_recalc_shadow_msr(vcpu);
>  
>  	return RESUME_GUEST;
>  }

Any specific reason you are doing the above ?

-aneesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
  2013-08-06  4:16 ` [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Paul Mackerras
@ 2013-08-11 11:06   ` Aneesh Kumar K.V
  2013-08-28 22:00   ` Alexander Graf
  1 sibling, 0 replies; 68+ messages in thread
From: Aneesh Kumar K.V @ 2013-08-11 11:06 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> Currently PR-style KVM keeps the volatile guest register values
> (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
> the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
> places, a kmalloc'd struct and in the PACA, and it gets copied back
> and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
> can't rely on being able to access the kmalloc'd struct.
>
> This changes the code to copy the volatile values into the shadow_vcpu
> as one of the last things done before entering the guest.  Similarly
> the values are copied back out of the shadow_vcpu to the kvm_vcpu
> immediately after exiting the guest.  We arrange for interrupts to be
> still disabled at this point so that we can't get preempted on 64-bit
> and end up copying values from the wrong PACA.

Can we remove kvmppc_vcpu_book3s.shadow_vcpu in this patch ? Do we still
use the kmalloc'd shadow_vcpu ?

-aneesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache
  2013-08-06  4:21 ` [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache Paul Mackerras
@ 2013-08-12 10:03   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 68+ messages in thread
From: Aneesh Kumar K.V @ 2013-08-12 10:03 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf, Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
> rather than having them embedded in the kvmppc_vcpu_book3s struct,
> which is allocated with vzalloc.  The reason is to reduce the
> differences between PR and HV KVM in order to make is easier to have
> them coexist in one kernel binary.
>
> With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
> struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
> from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct.
>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h    |  4 +---
>  arch/powerpc/include/asm/kvm_book3s_32.h |  2 +-
>  arch/powerpc/include/asm/kvm_host.h      |  5 +++++
>  arch/powerpc/kvm/book3s_32_mmu.c         |  8 ++++----
>  arch/powerpc/kvm/book3s_64_mmu.c         | 11 +++++------
>  arch/powerpc/kvm/book3s_pr.c             | 29 ++++++++++++++++++-----------
>  6 files changed, 34 insertions(+), 25 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 322b539..1b32f6c 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -70,8 +70,6 @@ struct hpte_cache {
>  };
>  
>  struct kvmppc_vcpu_book3s {
> -	struct kvm_vcpu vcpu;
> -	struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
>  	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
>  	struct {
>  		u64 esid;
> @@ -192,7 +190,7 @@ extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
>  
>  static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
>  {
> -	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
> +	return vcpu->arch.book3s;
>  }
>  
>  extern void kvm_return_point(void);
> diff --git a/arch/powerpc/include/asm/kvm_book3s_32.h b/arch/powerpc/include/asm/kvm_book3s_32.h
> index ce0ef6c..c720e0b 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_32.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_32.h
> @@ -22,7 +22,7 @@
>  
>  static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
>  {
> -	return to_book3s(vcpu)->shadow_vcpu;
> +	return vcpu->arch.shadow_vcpu;
>  }
>  
>  static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index c37207f..4d83972 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -91,6 +91,9 @@ struct lppaca;
>  struct slb_shadow;
>  struct dtl_entry;
>  
> +struct kvmppc_vcpu_book3s;
> +struct kvmppc_book3s_shadow_vcpu;
> +
>  struct kvm_vm_stat {
>  	u32 remote_tlb_flush;
>  };
> @@ -409,6 +412,8 @@ struct kvm_vcpu_arch {
>  	int slb_max;		/* 1 + index of last valid entry in slb[] */
>  	int slb_nr;		/* total number of entries in SLB */
>  	struct kvmppc_mmu mmu;
> +	struct kvmppc_vcpu_book3s *book3s;
> +	struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
>  #endif

can the *shadow_vcpu be within  #ifdef CONFIG_PPC_BOOK3S_32 ? Rest of
the code access the variable under  #ifdef CONFIG_PPC_BOOK3S_32



-aneesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
  2013-08-06  4:16 ` [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Paul Mackerras
  2013-08-11 11:06   ` Aneesh Kumar K.V
@ 2013-08-28 22:00   ` Alexander Graf
  2013-08-29  5:04     ` Paul Mackerras
  1 sibling, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:16, Paul Mackerras wrote:

> Currently PR-style KVM keeps the volatile guest register values
> (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
> the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
> places, a kmalloc'd struct and in the PACA, and it gets copied back
> and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
> can't rely on being able to access the kmalloc'd struct.
> 
> This changes the code to copy the volatile values into the shadow_vcpu
> as one of the last things done before entering the guest.  Similarly
> the values are copied back out of the shadow_vcpu to the kvm_vcpu
> immediately after exiting the guest.  We arrange for interrupts to be
> still disabled at this point so that we can't get preempted on 64-bit
> and end up copying values from the wrong PACA.
> 
> This means that the accessor functions in kvm_book3s.h for these
> registers are greatly simplified, and are same between PR and HV KVM.
> In places where accesses to shadow_vcpu fields are now replaced by
> accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
> Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.
> 
> With this, the time to read the PVR one million times in a loop went
> from 582.1ms to 584.3ms (averages of 10 values), a difference which is
> not statistically significant given the variability of the results
> (the standard deviations were 9.5ms and 8.6ms respectively).  A version
> of the patch that used loops to copy the GPR values increased that time
> by around 5% to 611.2ms, so the loop has been unrolled.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_book3s.h     | 220 +++++-------------------------
> arch/powerpc/include/asm/kvm_book3s_asm.h |   6 +-
> arch/powerpc/include/asm/kvm_host.h       |   1 +
> arch/powerpc/kernel/asm-offsets.c         |   4 +-
> arch/powerpc/kvm/book3s_emulate.c         |   8 +-
> arch/powerpc/kvm/book3s_interrupts.S      |  26 +++-
> arch/powerpc/kvm/book3s_pr.c              | 122 ++++++++++++-----
> arch/powerpc/kvm/book3s_rmhandlers.S      |   5 -
> arch/powerpc/kvm/trace.h                  |   7 +-
> 9 files changed, 156 insertions(+), 243 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index fa19e2f..a8897c1 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -198,140 +198,76 @@ extern void kvm_return_point(void);
> #include <asm/kvm_book3s_64.h>
> #endif
> 
> -#ifdef CONFIG_KVM_BOOK3S_PR
> -
> -static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
> -{
> -	return to_book3s(vcpu)->hior;
> -}
> -
> -static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
> -			unsigned long pending_now, unsigned long old_pending)
> -{
> -	if (pending_now)
> -		vcpu->arch.shared->int_pending = 1;
> -	else if (old_pending)
> -		vcpu->arch.shared->int_pending = 0;
> -}
> -
> static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
> {
> -	if ( num < 14 ) {
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -		svcpu->gpr[num] = val;
> -		svcpu_put(svcpu);
> -		to_book3s(vcpu)->shadow_vcpu->gpr[num] = val;
> -	} else
> -		vcpu->arch.gpr[num] = val;
> +	vcpu->arch.gpr[num] = val;
> }
> 
> static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
> {
> -	if ( num < 14 ) {
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -		ulong r = svcpu->gpr[num];
> -		svcpu_put(svcpu);
> -		return r;
> -	} else
> -		return vcpu->arch.gpr[num];
> +	return vcpu->arch.gpr[num];
> }
> 
> static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	svcpu->cr = val;
> -	svcpu_put(svcpu);
> -	to_book3s(vcpu)->shadow_vcpu->cr = val;
> +	vcpu->arch.cr = val;
> }
> 
> static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	u32 r;
> -	r = svcpu->cr;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.cr;
> }
> 
> static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	svcpu->xer = val;
> -	to_book3s(vcpu)->shadow_vcpu->xer = val;
> -	svcpu_put(svcpu);
> +	vcpu->arch.xer = val;
> }
> 
> static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	u32 r;
> -	r = svcpu->xer;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.xer;
> }
> 
> static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	svcpu->ctr = val;
> -	svcpu_put(svcpu);
> +	vcpu->arch.ctr = val;
> }
> 
> static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	ulong r;
> -	r = svcpu->ctr;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.ctr;
> }
> 
> static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	svcpu->lr = val;
> -	svcpu_put(svcpu);
> +	vcpu->arch.lr = val;
> }
> 
> static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	ulong r;
> -	r = svcpu->lr;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.lr;
> }
> 
> static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	svcpu->pc = val;
> -	svcpu_put(svcpu);
> +	vcpu->arch.pc = val;
> }
> 
> static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	ulong r;
> -	r = svcpu->pc;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.pc;
> }
> 
> static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
> {
> 	ulong pc = kvmppc_get_pc(vcpu);
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	u32 r;
> 
> 	/* Load the instruction manually if it failed to do so in the
> 	 * exit path */
> -	if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
> -		kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
> +	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
> +		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
> 
> -	r = svcpu->last_inst;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.last_inst;
> }
> 
> /*
> @@ -342,26 +278,34 @@ static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
> static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
> {
> 	ulong pc = kvmppc_get_pc(vcpu) - 4;
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	u32 r;
> 
> 	/* Load the instruction manually if it failed to do so in the
> 	 * exit path */
> -	if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
> -		kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
> +	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
> +		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
> 
> -	r = svcpu->last_inst;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.last_inst;
> }
> 
> static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
> {
> -	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -	ulong r;
> -	r = svcpu->fault_dar;
> -	svcpu_put(svcpu);
> -	return r;
> +	return vcpu->arch.fault_dar;
> +}
> +
> +#ifdef CONFIG_KVM_BOOK3S_PR
> +
> +static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
> +{
> +	return to_book3s(vcpu)->hior;
> +}
> +
> +static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
> +			unsigned long pending_now, unsigned long old_pending)
> +{
> +	if (pending_now)
> +		vcpu->arch.shared->int_pending = 1;
> +	else if (old_pending)
> +		vcpu->arch.shared->int_pending = 0;
> }
> 
> static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
> @@ -395,100 +339,6 @@ static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
> {
> }
> 
> -static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
> -{
> -	vcpu->arch.gpr[num] = val;
> -}
> -
> -static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
> -{
> -	return vcpu->arch.gpr[num];
> -}
> -
> -static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
> -{
> -	vcpu->arch.cr = val;
> -}
> -
> -static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.cr;
> -}
> -
> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
> -{
> -	vcpu->arch.xer = val;
> -}
> -
> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.xer;
> -}
> -
> -static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
> -{
> -	vcpu->arch.ctr = val;
> -}
> -
> -static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.ctr;
> -}
> -
> -static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
> -{
> -	vcpu->arch.lr = val;
> -}
> -
> -static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.lr;
> -}
> -
> -static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
> -{
> -	vcpu->arch.pc = val;
> -}
> -
> -static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.pc;
> -}
> -
> -static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
> -{
> -	ulong pc = kvmppc_get_pc(vcpu);
> -
> -	/* Load the instruction manually if it failed to do so in the
> -	 * exit path */
> -	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
> -		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
> -
> -	return vcpu->arch.last_inst;
> -}
> -
> -/*
> - * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
> - * Because the sc instruction sets SRR0 to point to the following
> - * instruction, we have to fetch from pc - 4.
> - */
> -static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
> -{
> -	ulong pc = kvmppc_get_pc(vcpu) - 4;
> -
> -	/* Load the instruction manually if it failed to do so in the
> -	 * exit path */
> -	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
> -		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
> -
> -	return vcpu->arch.last_inst;
> -}
> -
> -static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
> -{
> -	return vcpu->arch.fault_dar;
> -}
> -
> static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
> {
> 	return false;
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 9039d3c..4141409 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -108,14 +108,14 @@ struct kvmppc_book3s_shadow_vcpu {
> 	ulong gpr[14];
> 	u32 cr;
> 	u32 xer;
> -
> -	u32 fault_dsisr;
> -	u32 last_inst;
> 	ulong ctr;
> 	ulong lr;
> 	ulong pc;
> +
> 	ulong shadow_srr1;
> 	ulong fault_dar;
> +	u32 fault_dsisr;
> +	u32 last_inst;
> 
> #ifdef CONFIG_PPC_BOOK3S_32
> 	u32     sr[16];			/* Guest SRs */
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 3328353..7b26395 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -463,6 +463,7 @@ struct kvm_vcpu_arch {
> 	u32 ctrl;
> 	ulong dabr;
> 	ulong cfar;
> +	ulong shadow_srr1;
> #endif
> 	u32 vrsave; /* also USPRG0 */
> 	u32 mmucr;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index a67c76e..14a8004 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -515,18 +515,18 @@ int main(void)
> 	DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
> 	DEFINE(VCPU_PTID, offsetof(struct kvm_vcpu, arch.ptid));
> 	DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar));
> +	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
> 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
> 	DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
> 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
> 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
> -	DEFINE(VCPU_SVCPU, offsetof(struct kvmppc_vcpu_book3s, shadow_vcpu) -
> -			   offsetof(struct kvmppc_vcpu_book3s, vcpu));
> 	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
> 	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
> 	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
> 
> #ifdef CONFIG_PPC_BOOK3S_64
> #ifdef CONFIG_KVM_BOOK3S_PR
> +	DEFINE(PACA_SVCPU, offsetof(struct paca_struct, shadow_vcpu));
> # define SVCPU_FIELD(x, f)	DEFINE(x, offsetof(struct paca_struct, shadow_vcpu.f))
> #else
> # define SVCPU_FIELD(x, f)
> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
> index 360ce68..34044b1 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c
> @@ -267,12 +267,9 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 
> 			r = kvmppc_st(vcpu, &addr, 32, zeros, true);
> 			if ((r == -ENOENT) || (r == -EPERM)) {
> -				struct kvmppc_book3s_shadow_vcpu *svcpu;
> -
> -				svcpu = svcpu_get(vcpu);
> 				*advance = 0;
> 				vcpu->arch.shared->dar = vaddr;
> -				svcpu->fault_dar = vaddr;
> +				vcpu->arch.fault_dar = vaddr;
> 
> 				dsisr = DSISR_ISSTORE;
> 				if (r == -ENOENT)
> @@ -281,8 +278,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 					dsisr |= DSISR_PROTFAULT;
> 
> 				vcpu->arch.shared->dsisr = dsisr;
> -				svcpu->fault_dsisr = dsisr;
> -				svcpu_put(svcpu);
> +				vcpu->arch.fault_dsisr = dsisr;
> 
> 				kvmppc_book3s_queue_irqprio(vcpu,
> 					BOOK3S_INTERRUPT_DATA_STORAGE);
> diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S
> index 17cfae5..c81a185 100644
> --- a/arch/powerpc/kvm/book3s_interrupts.S
> +++ b/arch/powerpc/kvm/book3s_interrupts.S
> @@ -26,8 +26,12 @@
> 
> #if defined(CONFIG_PPC_BOOK3S_64)
> #define FUNC(name) 		GLUE(.,name)
> +#define GET_SHADOW_VCPU(reg)    addi	reg, r13, PACA_SVCPU
> +
> #elif defined(CONFIG_PPC_BOOK3S_32)
> #define FUNC(name)		name
> +#define GET_SHADOW_VCPU(reg)	lwz     reg, (THREAD + THREAD_KVM_SVCPU)(r2)
> +
> #endif /* CONFIG_PPC_BOOK3S_XX */
> 
> #define VCPU_LOAD_NVGPRS(vcpu) \
> @@ -87,8 +91,13 @@ kvm_start_entry:
> 	VCPU_LOAD_NVGPRS(r4)
> 
> kvm_start_lightweight:
> +	/* Copy registers into shadow vcpu so we can access them in real mode */
> +	GET_SHADOW_VCPU(r3)
> +	bl	FUNC(kvmppc_copy_to_svcpu)

This will clobber r3 and r4, no? We need to restore them from the stack here I would think.

> +	nop
> 
> #ifdef CONFIG_PPC_BOOK3S_64
> +	/* Get the dcbz32 flag */
> 	PPC_LL	r3, VCPU_HFLAGS(r4)
> 	rldicl	r3, r3, 0, 63		/* r3 &= 1 */
> 	stb	r3, HSTATE_RESTORE_HID5(r13)
> @@ -125,8 +134,17 @@ kvmppc_handler_highmem:
> 	 *
> 	 */
> 
> -	/* R7 = vcpu */
> -	PPC_LL	r7, GPR4(r1)
> +	/* Transfer reg values from shadow vcpu back to vcpu struct */
> +	/* On 64-bit, interrupts are still off at this point */
> +	PPC_LL	r3, GPR4(r1)		/* vcpu pointer */
> +	GET_SHADOW_VCPU(r4)
> +	bl	FUNC(kvmppc_copy_from_svcpu)
> +	nop
> +
> +	/* Re-enable interrupts */
> +	mfmsr	r3
> +	ori	r3, r3, MSR_EE
> +	MTMSR_EERI(r3)
> 
> #ifdef CONFIG_PPC_BOOK3S_64
> 	/*
> @@ -135,8 +153,12 @@ kvmppc_handler_highmem:
> 	 */
> 	ld	r3, PACA_SPRG3(r13)
> 	mtspr	SPRN_SPRG3, r3
> +
> #endif /* CONFIG_PPC_BOOK3S_64 */
> 
> +	/* R7 = vcpu */
> +	PPC_LL	r7, GPR4(r1)
> +
> 	PPC_STL	r14, VCPU_GPR(R14)(r7)
> 	PPC_STL	r15, VCPU_GPR(R15)(r7)
> 	PPC_STL	r16, VCPU_GPR(R16)(r7)
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 6cb29ef..28146c1 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -61,8 +61,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> #ifdef CONFIG_PPC_BOOK3S_64
> 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> 	memcpy(svcpu->slb, to_book3s(vcpu)->slb_shadow, sizeof(svcpu->slb));
> -	memcpy(&get_paca()->shadow_vcpu, to_book3s(vcpu)->shadow_vcpu,
> -	       sizeof(get_paca()->shadow_vcpu));
> 	svcpu->slb_max = to_book3s(vcpu)->slb_shadow_max;
> 	svcpu_put(svcpu);
> #endif
> @@ -77,8 +75,6 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
> #ifdef CONFIG_PPC_BOOK3S_64
> 	struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> 	memcpy(to_book3s(vcpu)->slb_shadow, svcpu->slb, sizeof(svcpu->slb));
> -	memcpy(to_book3s(vcpu)->shadow_vcpu, &get_paca()->shadow_vcpu,
> -	       sizeof(get_paca()->shadow_vcpu));
> 	to_book3s(vcpu)->slb_shadow_max = svcpu->slb_max;
> 	svcpu_put(svcpu);
> #endif
> @@ -87,6 +83,60 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
> 	vcpu->cpu = -1;
> }
> 
> +/* Copy data needed by real-mode code from vcpu to shadow vcpu */
> +void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
> +			  struct kvm_vcpu *vcpu)
> +{
> +	svcpu->gpr[0] = vcpu->arch.gpr[0];
> +	svcpu->gpr[1] = vcpu->arch.gpr[1];
> +	svcpu->gpr[2] = vcpu->arch.gpr[2];
> +	svcpu->gpr[3] = vcpu->arch.gpr[3];
> +	svcpu->gpr[4] = vcpu->arch.gpr[4];
> +	svcpu->gpr[5] = vcpu->arch.gpr[5];
> +	svcpu->gpr[6] = vcpu->arch.gpr[6];
> +	svcpu->gpr[7] = vcpu->arch.gpr[7];
> +	svcpu->gpr[8] = vcpu->arch.gpr[8];
> +	svcpu->gpr[9] = vcpu->arch.gpr[9];
> +	svcpu->gpr[10] = vcpu->arch.gpr[10];
> +	svcpu->gpr[11] = vcpu->arch.gpr[11];
> +	svcpu->gpr[12] = vcpu->arch.gpr[12];
> +	svcpu->gpr[13] = vcpu->arch.gpr[13];
> +	svcpu->cr  = vcpu->arch.cr;
> +	svcpu->xer = vcpu->arch.xer;
> +	svcpu->ctr = vcpu->arch.ctr;
> +	svcpu->lr  = vcpu->arch.lr;
> +	svcpu->pc  = vcpu->arch.pc;
> +}
> +
> +/* Copy data touched by real-mode code from shadow vcpu back to vcpu */
> +void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
> +			    struct kvmppc_book3s_shadow_vcpu *svcpu)
> +{
> +	vcpu->arch.gpr[0] = svcpu->gpr[0];
> +	vcpu->arch.gpr[1] = svcpu->gpr[1];
> +	vcpu->arch.gpr[2] = svcpu->gpr[2];
> +	vcpu->arch.gpr[3] = svcpu->gpr[3];
> +	vcpu->arch.gpr[4] = svcpu->gpr[4];
> +	vcpu->arch.gpr[5] = svcpu->gpr[5];
> +	vcpu->arch.gpr[6] = svcpu->gpr[6];
> +	vcpu->arch.gpr[7] = svcpu->gpr[7];
> +	vcpu->arch.gpr[8] = svcpu->gpr[8];
> +	vcpu->arch.gpr[9] = svcpu->gpr[9];
> +	vcpu->arch.gpr[10] = svcpu->gpr[10];
> +	vcpu->arch.gpr[11] = svcpu->gpr[11];
> +	vcpu->arch.gpr[12] = svcpu->gpr[12];
> +	vcpu->arch.gpr[13] = svcpu->gpr[13];
> +	vcpu->arch.cr  = svcpu->cr;
> +	vcpu->arch.xer = svcpu->xer;
> +	vcpu->arch.ctr = svcpu->ctr;
> +	vcpu->arch.lr  = svcpu->lr;
> +	vcpu->arch.pc  = svcpu->pc;
> +	vcpu->arch.shadow_srr1 = svcpu->shadow_srr1;
> +	vcpu->arch.fault_dar   = svcpu->fault_dar;
> +	vcpu->arch.fault_dsisr = svcpu->fault_dsisr;
> +	vcpu->arch.last_inst   = svcpu->last_inst;
> +}
> +
> int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
> {
> 	int r = 1; /* Indicate we want to get back into the guest */
> @@ -388,22 +438,18 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 
> 	if (page_found == -ENOENT) {
> 		/* Page not found in guest PTE entries */
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> 		vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
> -		vcpu->arch.shared->dsisr = svcpu->fault_dsisr;
> +		vcpu->arch.shared->dsisr = vcpu->arch.fault_dsisr;
> 		vcpu->arch.shared->msr |=
> -			(svcpu->shadow_srr1 & 0x00000000f8000000ULL);
> -		svcpu_put(svcpu);
> +			vcpu->arch.shadow_srr1 & 0x00000000f8000000ULL;
> 		kvmppc_book3s_queue_irqprio(vcpu, vec);
> 	} else if (page_found == -EPERM) {
> 		/* Storage protection */
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> 		vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
> -		vcpu->arch.shared->dsisr = svcpu->fault_dsisr & ~DSISR_NOHPTE;
> +		vcpu->arch.shared->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
> 		vcpu->arch.shared->dsisr |= DSISR_PROTFAULT;
> 		vcpu->arch.shared->msr |=
> -			svcpu->shadow_srr1 & 0x00000000f8000000ULL;
> -		svcpu_put(svcpu);
> +			vcpu->arch.shadow_srr1 & 0x00000000f8000000ULL;
> 		kvmppc_book3s_queue_irqprio(vcpu, vec);
> 	} else if (page_found == -EINVAL) {
> 		/* Page not found in guest SLB */
> @@ -643,21 +689,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 	switch (exit_nr) {
> 	case BOOK3S_INTERRUPT_INST_STORAGE:
> 	{
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -		ulong shadow_srr1 = svcpu->shadow_srr1;
> +		ulong shadow_srr1 = vcpu->arch.shadow_srr1;
> 		vcpu->stat.pf_instruc++;
> 
> #ifdef CONFIG_PPC_BOOK3S_32
> 		/* We set segments as unused segments when invalidating them. So
> 		 * treat the respective fault as segment fault. */
> -		if (svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT] == SR_INVALID) {
> -			kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
> -			r = RESUME_GUEST;
> +		{
> +			struct kvmppc_book3s_shadow_vcpu *svcpu;
> +			u32 sr;
> +
> +			svcpu = svcpu_get(vcpu);
> +			sr = svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT];

Doesn't this break two concurrently running guests now that we don't copy the shadow vcpu anymore? Just move the sr array to a kmalloc'ed area until the whole vcpu is kmalloc'ed. Then you can get rid of all shadow vcpu code.

> 			svcpu_put(svcpu);
> -			break;
> +			if (sr == SR_INVALID) {
> +				kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
> +				r = RESUME_GUEST;
> +				break;
> +			}
> 		}
> #endif
> -		svcpu_put(svcpu);
> 
> 		/* only care about PTEG not found errors, but leave NX alone */
> 		if (shadow_srr1 & 0x40000000) {
> @@ -682,21 +733,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 	case BOOK3S_INTERRUPT_DATA_STORAGE:
> 	{
> 		ulong dar = kvmppc_get_fault_dar(vcpu);
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -		u32 fault_dsisr = svcpu->fault_dsisr;
> +		u32 fault_dsisr = vcpu->arch.fault_dsisr;
> 		vcpu->stat.pf_storage++;
> 
> #ifdef CONFIG_PPC_BOOK3S_32
> 		/* We set segments as unused segments when invalidating them. So
> 		 * treat the respective fault as segment fault. */
> -		if ((svcpu->sr[dar >> SID_SHIFT]) == SR_INVALID) {
> -			kvmppc_mmu_map_segment(vcpu, dar);
> -			r = RESUME_GUEST;
> +		{
> +			struct kvmppc_book3s_shadow_vcpu *svcpu;
> +			u32 sr;
> +
> +			svcpu = svcpu_get(vcpu);
> +			sr = svcpu->sr[dar >> SID_SHIFT];
> 			svcpu_put(svcpu);
> -			break;
> +			if (sr == SR_INVALID) {
> +				kvmppc_mmu_map_segment(vcpu, dar);
> +				r = RESUME_GUEST;
> +				break;
> +			}
> 		}
> #endif
> -		svcpu_put(svcpu);
> 
> 		/* The only case we need to handle is missing shadow PTEs */
> 		if (fault_dsisr & DSISR_NOHPTE) {
> @@ -743,13 +799,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 	case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
> 	{
> 		enum emulation_result er;
> -		struct kvmppc_book3s_shadow_vcpu *svcpu;
> 		ulong flags;
> 
> program_interrupt:
> -		svcpu = svcpu_get(vcpu);
> -		flags = svcpu->shadow_srr1 & 0x1f0000ull;
> -		svcpu_put(svcpu);
> +		flags = vcpu->arch.shadow_srr1 & 0x1f0000ull;
> 
> 		if (vcpu->arch.shared->msr & MSR_PR) {
> #ifdef EXIT_DEBUG
> @@ -881,9 +934,7 @@ program_interrupt:
> 		break;
> 	default:
> 	{
> -		struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
> -		ulong shadow_srr1 = svcpu->shadow_srr1;
> -		svcpu_put(svcpu);
> +		ulong shadow_srr1 = vcpu->arch.shadow_srr1;
> 		/* Ugh - bork here! What did we get? */
> 		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n",
> 			exit_nr, kvmppc_get_pc(vcpu), shadow_srr1);
> @@ -1058,11 +1109,12 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
> 	if (!vcpu_book3s)
> 		goto out;
> 
> +#ifdef CONFIG_KVM_BOOK3S_32
> 	vcpu_book3s->shadow_vcpu =
> 		kzalloc(sizeof(*vcpu_book3s->shadow_vcpu), GFP_KERNEL);
> 	if (!vcpu_book3s->shadow_vcpu)
> 		goto free_vcpu;
> -
> +#endif
> 	vcpu = &vcpu_book3s->vcpu;
> 	err = kvm_vcpu_init(vcpu, kvm, id);
> 	if (err)
> @@ -1095,8 +1147,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
> uninit_vcpu:
> 	kvm_vcpu_uninit(vcpu);
> free_shadow_vcpu:
> +#ifdef CONFIG_KVM_BOOK3S_32
> 	kfree(vcpu_book3s->shadow_vcpu);
> free_vcpu:
> +#endif
> 	vfree(vcpu_book3s);
> out:
> 	return ERR_PTR(err);
> diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S
> index 8f7633e..b64d7f9 100644
> --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> @@ -179,11 +179,6 @@ _GLOBAL(kvmppc_entry_trampoline)
> 
> 	li	r6, MSR_IR | MSR_DR
> 	andc	r6, r5, r6	/* Clear DR and IR in MSR value */
> -	/*
> -	 * Set EE in HOST_MSR so that it's enabled when we get into our
> -	 * C exit handler function
> -	 */
> -	ori	r5, r5, MSR_EE

This looks like an unrelated change?


Alex

> 	mtsrr0	r7
> 	mtsrr1	r6
> 	RFI
> diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
> index e326489..a088e9a 100644
> --- a/arch/powerpc/kvm/trace.h
> +++ b/arch/powerpc/kvm/trace.h
> @@ -101,17 +101,12 @@ TRACE_EVENT(kvm_exit,
> 	),
> 
> 	TP_fast_assign(
> -#ifdef CONFIG_KVM_BOOK3S_PR
> -		struct kvmppc_book3s_shadow_vcpu *svcpu;
> -#endif
> 		__entry->exit_nr	= exit_nr;
> 		__entry->pc		= kvmppc_get_pc(vcpu);
> 		__entry->dar		= kvmppc_get_fault_dar(vcpu);
> 		__entry->msr		= vcpu->arch.shared->msr;
> #ifdef CONFIG_KVM_BOOK3S_PR
> -		svcpu = svcpu_get(vcpu);
> -		__entry->srr1		= svcpu->shadow_srr1;
> -		svcpu_put(svcpu);
> +		__entry->srr1		= vcpu->arch.shadow_srr1;
> #endif
> 		__entry->last_inst	= vcpu->arch.last_inst;
> 	),
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
  2013-08-06  4:18 ` [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() Paul Mackerras
@ 2013-08-28 22:51   ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:18, Paul Mackerras wrote:

> This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large
> page bit in the hashed page table entries (HPTEs) it looks at, and
> to simplify and streamline the code.  The checking of the first dword
> of each HPTE is now done with a single mask and compare operation,
> and all the code dealing with the matching HPTE, if we find one,
> is consolidated in one place in the main line of the function flow.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-queue.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
  2013-08-06  4:15 ` [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls Paul Mackerras
@ 2013-08-28 22:51   ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:15, Paul Mackerras wrote:

> It turns out that if we exit the guest due to a hcall instruction (sc 1),
> and the loading of the instruction in the guest exit path fails for any
> reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
> instruction after the hcall instruction rather than the hcall itself.
> This in turn means that the instruction doesn't get recognized as an
> hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
> as a sc instruction.  That usually results in the guest kernel getting
> a return code of 38 (ENOSYS) from an hcall, which often triggers a
> BUG_ON() or other failure.
> 
> This fixes the problem by adding a new variant of kvmppc_get_last_inst()
> called kvmppc_get_last_sc(), which fetches the instruction if necessary
> from pc - 4 rather than pc.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-queue.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  2013-08-06  4:14 ` [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX Paul Mackerras
  2013-08-08 15:49   ` Aneesh Kumar K.V
@ 2013-08-28 22:51   ` Alexander Graf
  1 sibling, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:14, Paul Mackerras wrote:

> Currently the code assumes that once we load up guest FP/VSX or VMX
> state into the CPU, it stays valid in the CPU registers until we
> explicitly flush it to the thread_struct.  However, on POWER7,
> copy_page() and memcpy() can use VMX.  These functions do flush the
> VMX state to the thread_struct before using VMX instructions, but if
> this happens while we have guest state in the VMX registers, and we
> then re-enter the guest, we don't reload the VMX state from the
> thread_struct, leading to guest corruption.  This has been observed
> to cause guest processes to segfault.
> 
> To fix this, we check before re-entering the guest that all of the
> bits corresponding to facilities owned by the guest, as expressed
> in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
> Any bits that have been cleared correspond to facilities that have
> been used by kernel code and thus flushed to the thread_struct, so
> for them we reload the state from the thread_struct.
> 
> We also need to check current->thread.regs->msr before calling
> giveup_fpu() or giveup_altivec(), since if the relevant bit is
> clear, the state has already been flushed to the thread_struct and
> to flush it again would corrupt it.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-queue.


Alex


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation
  2013-08-06  4:13 ` [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation Paul Mackerras
@ 2013-08-28 22:51   ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:13, Paul Mackerras wrote:

> Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and
> H_XIRR_X in XICS emulation") added a call to get_tb() but didn't
> include the header that defines it, and on some configs this means
> book3s_xics.c fails to compile:
> 
> arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’:
> arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration]
> 
> Cc: stable@vger.kernel.org [v3.10]
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-queue.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages
  2013-08-06  4:18 ` [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages Paul Mackerras
@ 2013-08-28 22:56   ` Alexander Graf
  2013-08-29  5:17     ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 22:56 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:18, Paul Mackerras wrote:

> This adds the code to interpret 64k HPTEs in the guest hashed page
> table (HPT), 64k SLB entries, and to tell the guest about 64k pages
> in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
> by 4k pages.
> 
> This also adds another hash table to the four we have already in
> book3s_mmu_hpte.c to allow us to find all the PTEs that we have
> instantiated that match a given 64k guest page.
> 
> The tlbie instruction changed starting with POWER6 to use a bit in
> the RB operand to indicate large page invalidations, and to use other
> RB bits to indicate the base and actual page sizes and the segment
> size.  64k pages came in slightly earlier, with POWER5++.  At present
> we use one bit in vcpu->arch.hflags to indicate that the emulated
> cpu supports 64k pages and also has the new tlbie definition.  If
> we ever want to support emulation of POWER5++, we will need to use
> another bit.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_asm.h    |  1 +
> arch/powerpc/include/asm/kvm_book3s.h |  6 +++
> arch/powerpc/include/asm/kvm_host.h   |  4 ++
> arch/powerpc/kvm/book3s_64_mmu.c      | 92 +++++++++++++++++++++++++++++++----
> arch/powerpc/kvm/book3s_mmu_hpte.c    | 50 +++++++++++++++++++
> arch/powerpc/kvm/book3s_pr.c          | 30 +++++++++++-
> 6 files changed, 173 insertions(+), 10 deletions(-)
> 
> 

[...]

> @@ -1127,8 +1144,13 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
> 		goto uninit_vcpu;
> 
> #ifdef CONFIG_PPC_BOOK3S_64
> -	/* default to book3s_64 (970fx) */
> +	/*
> +	 * Default to the same as the host if we're on a POWER7[+],
> +	 * otherwise default to PPC970FX.
> +	 */
> 	vcpu->arch.pvr = 0x3C0301;
> +	if (cpu_has_feature(CPU_FTR_ARCH_206))
> +		vcpu->arch.pvr = mfspr(SPRN_PVR);

Unrelated change? Also, why? Any reasonable user space these days should set PVR anyways.

> #else
> 	/* default to book3s_32 (750) */
> 	vcpu->arch.pvr = 0x84202;
> @@ -1331,6 +1353,12 @@ int kvm_vm_ioctl_get_smmu_info(struct kvm *kvm, struct kvm_ppc_smmu_info *info)
> 	info->sps[1].enc[0].page_shift = 24;
> 	info->sps[1].enc[0].pte_enc = 0;
> 
> +	/* 64k large page size */
> +	info->sps[2].page_shift = 16;
> +	info->sps[2].slb_enc = SLB_VSID_L | SLB_VSID_LP_01;
> +	info->sps[2].enc[0].page_shift = 16;
> +	info->sps[2].enc[0].pte_enc = 1;

We only support this with BOOK3S_HFLAG_MULTI_PGSIZE, no?


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible
  2013-08-06  4:19 ` [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible Paul Mackerras
@ 2013-08-28 23:24   ` Alexander Graf
  2013-08-29  5:23     ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-08-28 23:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:19, Paul Mackerras wrote:

> Currently, PR KVM uses 4k pages for the host-side mappings of guest
> memory, regardless of the host page size.  When the host page size is
> 64kB, we might as well use 64k host page mappings for guest mappings
> of 64kB and larger pages and for guest real-mode mappings.  However,
> the magic page has to remain a 4k page.
> 
> To implement this, we first add another flag bit to the guest VSID
> values we use, to indicate that this segment is one where host pages
> should be mapped using 64k pages.  For segments with this bit set
> we set the bits in the shadow SLB entry to indicate a 64k base page
> size.  When faulting in host HPTEs for this segment, we make them
> 64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
> for use when invalidating the HPTE.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_book3s.h |  6 ++++--
> arch/powerpc/kvm/book3s_32_mmu.c      |  1 +
> arch/powerpc/kvm/book3s_64_mmu.c      | 35 ++++++++++++++++++++++++++++++-----
> arch/powerpc/kvm/book3s_64_mmu_host.c | 27 +++++++++++++++++++++------
> arch/powerpc/kvm/book3s_pr.c          |  1 +
> 5 files changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 175f876..322b539 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -66,6 +66,7 @@ struct hpte_cache {
> 	u64 pfn;
> 	ulong slot;
> 	struct kvmppc_pte pte;
> +	int pagesize;
> };
> 
> struct kvmppc_vcpu_book3s {
> @@ -113,8 +114,9 @@ struct kvmppc_vcpu_book3s {
> #define CONTEXT_GUEST		1
> #define CONTEXT_GUEST_END	2
> 
> -#define VSID_REAL	0x0fffffffffc00000ULL
> -#define VSID_BAT	0x0fffffffffb00000ULL
> +#define VSID_REAL	0x07ffffffffc00000ULL
> +#define VSID_BAT	0x07ffffffffb00000ULL
> +#define VSID_64K	0x0800000000000000ULL
> #define VSID_1T		0x1000000000000000ULL
> #define VSID_REAL_DR	0x2000000000000000ULL
> #define VSID_REAL_IR	0x4000000000000000ULL
> diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
> index c8cefdd..af04553 100644
> --- a/arch/powerpc/kvm/book3s_32_mmu.c
> +++ b/arch/powerpc/kvm/book3s_32_mmu.c
> @@ -308,6 +308,7 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> 	ulong mp_ea = vcpu->arch.magic_page_ea;
> 
> 	pte->eaddr = eaddr;
> +	pte->page_size = MMU_PAGE_4K;
> 
> 	/* Magic page override */
> 	if (unlikely(mp_ea) &&
> diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
> index d5fa26c..658ccd7 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu.c
> @@ -542,6 +542,16 @@ static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
> 	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
> }
> 
> +#ifdef CONFIG_PPC_64K_PAGES
> +static int segment_contains_magic_page(struct kvm_vcpu *vcpu, ulong esid)
> +{
> +	ulong mp_ea = vcpu->arch.magic_page_ea;
> +
> +	return mp_ea && !(vcpu->arch.shared->msr & MSR_PR) &&
> +		(mp_ea >> SID_SHIFT) == esid;
> +}
> +#endif
> +
> static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
> 					     u64 *vsid)
> {
> @@ -549,11 +559,13 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
> 	struct kvmppc_slb *slb;
> 	u64 gvsid = esid;
> 	ulong mp_ea = vcpu->arch.magic_page_ea;
> +	int pagesize = MMU_PAGE_64K;
> 
> 	if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
> 		slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
> 		if (slb) {
> 			gvsid = slb->vsid;
> +			pagesize = slb->base_page_size;
> 			if (slb->tb) {
> 				gvsid <<= SID_SHIFT_1T - SID_SHIFT;
> 				gvsid |= esid & ((1ul << (SID_SHIFT_1T - SID_SHIFT)) - 1);
> @@ -564,28 +576,41 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
> 
> 	switch (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
> 	case 0:
> -		*vsid = VSID_REAL | esid;
> +		gvsid = VSID_REAL | esid;
> 		break;
> 	case MSR_IR:
> -		*vsid = VSID_REAL_IR | gvsid;
> +		gvsid |= VSID_REAL_IR;
> 		break;
> 	case MSR_DR:
> -		*vsid = VSID_REAL_DR | gvsid;
> +		gvsid |= VSID_REAL_DR;
> 		break;
> 	case MSR_DR|MSR_IR:
> 		if (!slb)
> 			goto no_slb;
> 
> -		*vsid = gvsid;
> 		break;
> 	default:
> 		BUG();
> 		break;
> 	}
> 
> +#ifdef CONFIG_PPC_64K_PAGES
> +	/*
> +	 * Mark this as a 64k segment if the host is using
> +	 * 64k pages, the host MMU supports 64k pages and
> +	 * the guest segment page size is >= 64k,
> +	 * but not if this segment contains the magic page.

What's the problem with the magic page? As long as we map the magic page as a host 64k page and access only the upper 4k (which we handle today already) we should be set, no?


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
  2013-08-28 22:00   ` Alexander Graf
@ 2013-08-29  5:04     ` Paul Mackerras
  2013-08-29 12:46       ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-29  5:04 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Aug 29, 2013 at 12:00:53AM +0200, Alexander Graf wrote:
> 
> On 06.08.2013, at 06:16, Paul Mackerras wrote:
> 
> > kvm_start_lightweight:
> > +	/* Copy registers into shadow vcpu so we can access them in real mode */
> > +	GET_SHADOW_VCPU(r3)
> > +	bl	FUNC(kvmppc_copy_to_svcpu)
> 
> This will clobber r3 and r4, no? We need to restore them from the stack here I would think.

You're right.  We don't need to restore r3 since we don't actually use
it, but we do need to restore r4.

> > #ifdef CONFIG_PPC_BOOK3S_32
> > 		/* We set segments as unused segments when invalidating them. So
> > 		 * treat the respective fault as segment fault. */
> > -		if (svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT] == SR_INVALID) {
> > -			kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
> > -			r = RESUME_GUEST;
> > +		{
> > +			struct kvmppc_book3s_shadow_vcpu *svcpu;
> > +			u32 sr;
> > +
> > +			svcpu = svcpu_get(vcpu);
> > +			sr = svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT];
> 
> Doesn't this break two concurrently running guests now that we don't copy the shadow vcpu anymore? Just move the sr array to a kmalloc'ed area until the whole vcpu is kmalloc'ed. Then you can get rid of all shadow vcpu code.

This is 32-bit only... the svcpu is already kmalloc'ed, so I'm not
sure what you're asking for here or why you think this would break
with multiple guests.

Paul.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages
  2013-08-28 22:56   ` Alexander Graf
@ 2013-08-29  5:17     ` Paul Mackerras
  2013-08-29 12:48       ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-29  5:17 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Aug 29, 2013 at 12:56:40AM +0200, Alexander Graf wrote:
> 
> On 06.08.2013, at 06:18, Paul Mackerras wrote:
> 
> > #ifdef CONFIG_PPC_BOOK3S_64
> > -	/* default to book3s_64 (970fx) */
> > +	/*
> > +	 * Default to the same as the host if we're on a POWER7[+],
> > +	 * otherwise default to PPC970FX.
> > +	 */
> > 	vcpu->arch.pvr = 0x3C0301;
> > +	if (cpu_has_feature(CPU_FTR_ARCH_206))
> > +		vcpu->arch.pvr = mfspr(SPRN_PVR);
> 
> Unrelated change? Also, why? Any reasonable user space these days should set PVR anyways.

The issue is that the most widely-deployed userspace user of KVM
(i.e., QEMU) does the KVM_PPC_GET_SMMU_INFO ioctl *before* it tells
KVM what it wants the guest PVR to be.  Originally I had
kvm_vm_ioctl_get_smmu_info() returning the 64k page size only if the
BOOK3S_HFLAG_MULTI_PGSIZE flag was set, so I had to add this change so
that userspace would see the 64k page size.  So yes, I could probably
remove this hunk now.

> > 
> > +	/* 64k large page size */
> > +	info->sps[2].page_shift = 16;
> > +	info->sps[2].slb_enc = SLB_VSID_L | SLB_VSID_LP_01;
> > +	info->sps[2].enc[0].page_shift = 16;
> > +	info->sps[2].enc[0].pte_enc = 1;
> 
> We only support this with BOOK3S_HFLAG_MULTI_PGSIZE, no?

The virtual machine implemented by PR KVM supports 64k pages on any
hardware, since it is implementing the POWER MMU in software.  That's
why I didn't make it depend on that flag.  That means that we rely on
userspace to filter out any capabilities that don't apply to the
machine it wants to emulate.  We can't do that filtering here because
userspace queries the MMU capabilities before it sets the PVR.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible
  2013-08-28 23:24   ` Alexander Graf
@ 2013-08-29  5:23     ` Paul Mackerras
  2013-08-29 12:43       ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-29  5:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Aug 29, 2013 at 01:24:04AM +0200, Alexander Graf wrote:
> 
> On 06.08.2013, at 06:19, Paul Mackerras wrote:
> 
> > +#ifdef CONFIG_PPC_64K_PAGES
> > +	/*
> > +	 * Mark this as a 64k segment if the host is using
> > +	 * 64k pages, the host MMU supports 64k pages and
> > +	 * the guest segment page size is >= 64k,
> > +	 * but not if this segment contains the magic page.
> 
> What's the problem with the magic page? As long as we map the magic page as a host 64k page and access only the upper 4k (which we handle today already) we should be set, no?

If we use a 64k host HPTE to map the magic page, then we are taking up
64k of the guest address space, and I was concerned that the guest
might ask to map the magic page at address X and then map something
else at address X+4k or X-4k.  If we use a 64k host HPTE then we would
tromp on those nearby mappings.  If you think the guest will never try
to create any nearby mappings, then we could relax this restriction.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible
  2013-08-29  5:23     ` Paul Mackerras
@ 2013-08-29 12:43       ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-29 12:43 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 29.08.2013, at 07:23, Paul Mackerras wrote:

> On Thu, Aug 29, 2013 at 01:24:04AM +0200, Alexander Graf wrote:
>> 
>> On 06.08.2013, at 06:19, Paul Mackerras wrote:
>> 
>>> +#ifdef CONFIG_PPC_64K_PAGES
>>> +	/*
>>> +	 * Mark this as a 64k segment if the host is using
>>> +	 * 64k pages, the host MMU supports 64k pages and
>>> +	 * the guest segment page size is >= 64k,
>>> +	 * but not if this segment contains the magic page.
>> 
>> What's the problem with the magic page? As long as we map the magic page as a host 64k page and access only the upper 4k (which we handle today already) we should be set, no?
> 
> If we use a 64k host HPTE to map the magic page, then we are taking up
> 64k of the guest address space, and I was concerned that the guest
> might ask to map the magic page at address X and then map something
> else at address X+4k or X-4k.  If we use a 64k host HPTE then we would
> tromp on those nearby mappings.  If you think the guest will never try
> to create any nearby mappings, then we could relax this restriction.

I think we should just add this restriction to the documentation, yes :). So far the only 2 users I'm aware of are Linux and Mac-on-Linux. Both map the magic page to the top of the address space and don't care whether it's the upper 64k or upper 4k they clutter.

Also, we only map this as 64k when the guest is requesting 64k pages for that segment, no? So a guest that wants 4k granularity should still get it when it configures its segment accordingly.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
  2013-08-29  5:04     ` Paul Mackerras
@ 2013-08-29 12:46       ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-29 12:46 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 29.08.2013, at 07:04, Paul Mackerras wrote:

> On Thu, Aug 29, 2013 at 12:00:53AM +0200, Alexander Graf wrote:
>> 
>> On 06.08.2013, at 06:16, Paul Mackerras wrote:
>> 
>>> kvm_start_lightweight:
>>> +	/* Copy registers into shadow vcpu so we can access them in real mode */
>>> +	GET_SHADOW_VCPU(r3)
>>> +	bl	FUNC(kvmppc_copy_to_svcpu)
>> 
>> This will clobber r3 and r4, no? We need to restore them from the stack here I would think.
> 
> You're right.  We don't need to restore r3 since we don't actually use
> it, but we do need to restore r4.
> 
>>> #ifdef CONFIG_PPC_BOOK3S_32
>>> 		/* We set segments as unused segments when invalidating them. So
>>> 		 * treat the respective fault as segment fault. */
>>> -		if (svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT] == SR_INVALID) {
>>> -			kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
>>> -			r = RESUME_GUEST;
>>> +		{
>>> +			struct kvmppc_book3s_shadow_vcpu *svcpu;
>>> +			u32 sr;
>>> +
>>> +			svcpu = svcpu_get(vcpu);
>>> +			sr = svcpu->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT];
>> 
>> Doesn't this break two concurrently running guests now that we don't copy the shadow vcpu anymore? Just move the sr array to a kmalloc'ed area until the whole vcpu is kmalloc'ed. Then you can get rid of all shadow vcpu code.
> 
> This is 32-bit only... the svcpu is already kmalloc'ed, so I'm not
> sure what you're asking for here or why you think this would break
> with multiple guests.

Oh, you're right. It wouldn't. I was confused and thought that svcpu_get() would give you the in-paca view, but on 32bit we already keep it as separate kmalloc'ed area.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages
  2013-08-29  5:17     ` Paul Mackerras
@ 2013-08-29 12:48       ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-08-29 12:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 29.08.2013, at 07:17, Paul Mackerras wrote:

> On Thu, Aug 29, 2013 at 12:56:40AM +0200, Alexander Graf wrote:
>> 
>> On 06.08.2013, at 06:18, Paul Mackerras wrote:
>> 
>>> #ifdef CONFIG_PPC_BOOK3S_64
>>> -	/* default to book3s_64 (970fx) */
>>> +	/*
>>> +	 * Default to the same as the host if we're on a POWER7[+],
>>> +	 * otherwise default to PPC970FX.
>>> +	 */
>>> 	vcpu->arch.pvr = 0x3C0301;
>>> +	if (cpu_has_feature(CPU_FTR_ARCH_206))
>>> +		vcpu->arch.pvr = mfspr(SPRN_PVR);
>> 
>> Unrelated change? Also, why? Any reasonable user space these days should set PVR anyways.
> 
> The issue is that the most widely-deployed userspace user of KVM
> (i.e., QEMU) does the KVM_PPC_GET_SMMU_INFO ioctl *before* it tells
> KVM what it wants the guest PVR to be.  Originally I had
> kvm_vm_ioctl_get_smmu_info() returning the 64k page size only if the
> BOOK3S_HFLAG_MULTI_PGSIZE flag was set, so I had to add this change so
> that userspace would see the 64k page size.  So yes, I could probably
> remove this hunk now.
> 
>>> 
>>> +	/* 64k large page size */
>>> +	info->sps[2].page_shift = 16;
>>> +	info->sps[2].slb_enc = SLB_VSID_L | SLB_VSID_LP_01;
>>> +	info->sps[2].enc[0].page_shift = 16;
>>> +	info->sps[2].enc[0].pte_enc = 1;
>> 
>> We only support this with BOOK3S_HFLAG_MULTI_PGSIZE, no?
> 
> The virtual machine implemented by PR KVM supports 64k pages on any
> hardware, since it is implementing the POWER MMU in software.  That's
> why I didn't make it depend on that flag.  That means that we rely on
> userspace to filter out any capabilities that don't apply to the
> machine it wants to emulate.  We can't do that filtering here because
> userspace queries the MMU capabilities before it sets the PVR.

Ok, works for me :).


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
  2013-08-06  4:23 ` [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts Paul Mackerras
@ 2013-08-30 16:30   ` Alexander Graf
  2013-08-30 22:55     ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-08-30 16:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 06.08.2013, at 06:23, Paul Mackerras wrote:

> When we are running a PR KVM guest on POWER8, we have to disable the
> new POWER8 feature of taking interrupts with relocation on, that is,
> of taking interrupts without disabling the MMU, because the SLB does
> not contain the normal kernel SLB entries while in the guest.
> Currently we disable relocation-on interrupts when a PR guest is
> created, and leave it disabled until there are no more PR guests in
> existence.
> 
> This defers the disabling of relocation-on interrupts until the first

It would've been nice to see the original patch on kvm-ppc@vger.

> time a PR KVM guest vcpu is run.  The reason is that in future we will
> support both PR and HV guests in the same kernel, and this will avoid
> disabling relocation-on interrupts unnecessarily for guests which turn
> out to be HV guests, as we will not know at VM creation time whether
> it will be a PR or a HV guest.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_host.h |  1 +
> arch/powerpc/kvm/book3s_pr.c        | 71 ++++++++++++++++++++++++++-----------
> 2 files changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 4d83972..c012db2 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -264,6 +264,7 @@ struct kvm_arch {
> #endif /* CONFIG_KVM_BOOK3S_64_HV */
> #ifdef CONFIG_KVM_BOOK3S_PR
> 	struct mutex hpt_mutex;
> +	bool relon_disabled;
> #endif
> #ifdef CONFIG_PPC_BOOK3S_64
> 	struct list_head spapr_tce_tables;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 5b06a70..2759ddc 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -1197,6 +1197,47 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
> 	kmem_cache_free(kvm_vcpu_cache, vcpu);
> }
> 
> +/*
> + * On POWER8, we have to disable relocation-on interrupts while
> + * we are in the guest, since the guest doesn't have the normal
> + * kernel SLB contents.  Since disabling relocation-on interrupts
> + * is a fairly heavy-weight operation, we do it once when starting
> + * the first guest vcpu and leave it disabled until the last guest
> + * has been destroyed.
> + */
> +static unsigned int kvm_global_user_count = 0;
> +static DEFINE_SPINLOCK(kvm_global_user_count_lock);
> +
> +static void disable_relon_interrupts(struct kvm *kvm)
> +{
> +	mutex_lock(&kvm->lock);
> +	if (!kvm->arch.relon_disabled) {
> +		if (firmware_has_feature(FW_FEATURE_SET_MODE)) {

Is this the same as the endianness setting rtas call? If so, would a PR guest in an HV guest that provides only endianness setting but no relocation-on setting confuse any of this code?


Alex

> +			spin_lock(&kvm_global_user_count_lock);
> +			if (++kvm_global_user_count == 1)
> +				pSeries_disable_reloc_on_exc();
> +			spin_unlock(&kvm_global_user_count_lock);
> +		}
> +		/* order disabling above with setting relon_disabled */
> +		smp_mb();
> +		kvm->arch.relon_disabled = true;
> +	}
> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static void enable_relon_interrupts(struct kvm *kvm)
> +{
> +	if (kvm->arch.relon_disabled &&
> +	    firmware_has_feature(FW_FEATURE_SET_MODE)) {
> +		spin_lock(&kvm_global_user_count_lock);
> +		BUG_ON(kvm_global_user_count == 0);
> +		if (--kvm_global_user_count == 0)
> +			pSeries_enable_reloc_on_exc();
> +		spin_unlock(&kvm_global_user_count_lock);
> +	}
> +	kvm->arch.relon_disabled = false;
> +}
> +
> int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
> {
> 	int ret;
> @@ -1234,6 +1275,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
> 		goto out;
> 	}
> 
> +	if (!vcpu->kvm->arch.relon_disabled)
> +		disable_relon_interrupts(vcpu->kvm);
> +
> 	/* Save FPU state in stack */
> 	if (current->thread.regs->msr & MSR_FP)
> 		giveup_fpu(current);
> @@ -1400,9 +1444,6 @@ void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> {
> }
> 
> -static unsigned int kvm_global_user_count = 0;
> -static DEFINE_SPINLOCK(kvm_global_user_count_lock);
> -
> int kvmppc_core_init_vm(struct kvm *kvm)
> {
> #ifdef CONFIG_PPC64
> @@ -1411,28 +1452,18 @@ int kvmppc_core_init_vm(struct kvm *kvm)
> #endif
> 	mutex_init(&kvm->arch.hpt_mutex);
> 
> -	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> -		spin_lock(&kvm_global_user_count_lock);
> -		if (++kvm_global_user_count == 1)
> -			pSeries_disable_reloc_on_exc();
> -		spin_unlock(&kvm_global_user_count_lock);
> -	}
> +	/*
> +	 * If we don't have relocation-on interrupts at all,
> +	 * then we can consider them to be already disabled.
> +	 */
> +	kvm->arch.relon_disabled = !firmware_has_feature(FW_FEATURE_SET_MODE);
> +
> 	return 0;
> }
> 
> void kvmppc_core_destroy_vm(struct kvm *kvm)
> {
> -#ifdef CONFIG_PPC64
> -	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
> -#endif
> -
> -	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> -		spin_lock(&kvm_global_user_count_lock);
> -		BUG_ON(kvm_global_user_count == 0);
> -		if (--kvm_global_user_count == 0)
> -			pSeries_enable_reloc_on_exc();
> -		spin_unlock(&kvm_global_user_count_lock);
> -	}
> +	enable_relon_interrupts(kvm);
> }
> 
> static int kvmppc_book3s_init(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
  2013-08-30 16:30   ` Alexander Graf
@ 2013-08-30 22:55     ` Paul Mackerras
  2013-08-30 23:13       ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-08-30 22:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Fri, Aug 30, 2013 at 06:30:50PM +0200, Alexander Graf wrote:
> 
> On 06.08.2013, at 06:23, Paul Mackerras wrote:
> 
> > When we are running a PR KVM guest on POWER8, we have to disable the
> > new POWER8 feature of taking interrupts with relocation on, that is,
> > of taking interrupts without disabling the MMU, because the SLB does
> > not contain the normal kernel SLB entries while in the guest.
> > Currently we disable relocation-on interrupts when a PR guest is
> > created, and leave it disabled until there are no more PR guests in
> > existence.
> > 
> > This defers the disabling of relocation-on interrupts until the first
> 
> It would've been nice to see the original patch on kvm-ppc@vger.

Here are the headers from my copy of the original mail:

> Date: Tue, 6 Aug 2013 14:23:37 +1000
> From: Paul Mackerras <paulus@samba.org>
> To: Alexander Graf <agraf@suse.de>, Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
> Subject: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts

So as far as I can see, I *did* cc it to kvm-ppc@vger.

> > +	if (!kvm->arch.relon_disabled) {
> > +		if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> 
> Is this the same as the endianness setting rtas call? If so, would a PR guest in an HV guest that provides only endianness setting but no relocation-on setting confuse any of this code?

It is the same hcall, but since the interrupts-with-relocation-on
function was defined in the first PAPR version that has H_SET_MODE,
we shouldn't ever hit that situation.  In any case, if we did happen
to run under a (non PAPR-compliant) hypervisor that implemented
H_SET_MODE but not the relocation-on setting, then we couldn't have
enabled relocation-on interrupts in the first place, so it wouldn't
matter.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
  2013-08-30 22:55     ` Paul Mackerras
@ 2013-08-30 23:13       ` Alexander Graf
  2013-08-31  5:42         ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-08-30 23:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 31.08.2013, at 00:55, Paul Mackerras wrote:

> On Fri, Aug 30, 2013 at 06:30:50PM +0200, Alexander Graf wrote:
>> 
>> On 06.08.2013, at 06:23, Paul Mackerras wrote:
>> 
>>> When we are running a PR KVM guest on POWER8, we have to disable the
>>> new POWER8 feature of taking interrupts with relocation on, that is,
>>> of taking interrupts without disabling the MMU, because the SLB does
>>> not contain the normal kernel SLB entries while in the guest.
>>> Currently we disable relocation-on interrupts when a PR guest is
>>> created, and leave it disabled until there are no more PR guests in
>>> existence.
>>> 
>>> This defers the disabling of relocation-on interrupts until the first
>> 
>> It would've been nice to see the original patch on kvm-ppc@vger.
> 
> Here are the headers from my copy of the original mail:
> 
>> Date: Tue, 6 Aug 2013 14:23:37 +1000
>> From: Paul Mackerras <paulus@samba.org>
>> To: Alexander Graf <agraf@suse.de>, Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
>> Subject: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
> 
> So as far as I can see, I *did* cc it to kvm-ppc@vger.

Oh, sorry to not be more explicit here. I meant the one that actually introduced the relocation-on handling:

  https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-December/102355.html

I can't find any trace of that in my inbox, even though it clearly touches KVM PPC code.

> 
>>> +	if (!kvm->arch.relon_disabled) {
>>> +		if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
>> 
>> Is this the same as the endianness setting rtas call? If so, would a PR guest in an HV guest that provides only endianness setting but no relocation-on setting confuse any of this code?
> 
> It is the same hcall, but since the interrupts-with-relocation-on
> function was defined in the first PAPR version that has H_SET_MODE,
> we shouldn't ever hit that situation.  In any case, if we did happen
> to run under a (non PAPR-compliant) hypervisor that implemented
> H_SET_MODE but not the relocation-on setting, then we couldn't have
> enabled relocation-on interrupts in the first place, so it wouldn't
> matter.

Well, I think Anton's patches do exactly that:

  https://lists.nongnu.org/archive/html/qemu-ppc/2013-08/msg00253.html

I really just want to double-check that we're not shooting ourselves in the foot here.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts
  2013-08-30 23:13       ` Alexander Graf
@ 2013-08-31  5:42         ` Paul Mackerras
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-08-31  5:42 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Sat, Aug 31, 2013 at 01:13:07AM +0200, Alexander Graf wrote:
> 
> Oh, sorry to not be more explicit here. I meant the one that actually introduced the relocation-on handling:
> 
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-December/102355.html
> 
> I can't find any trace of that in my inbox, even though it clearly touches KVM PPC code.

True, Ian should have cc'd it to kvm-ppc@vger, I'll mention it to him.

> 
> > 
> >>> +	if (!kvm->arch.relon_disabled) {
> >>> +		if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> >> 
> >> Is this the same as the endianness setting rtas call? If so, would a PR guest in an HV guest that provides only endianness setting but no relocation-on setting confuse any of this code?
> > 
> > It is the same hcall, but since the interrupts-with-relocation-on
> > function was defined in the first PAPR version that has H_SET_MODE,
> > we shouldn't ever hit that situation.  In any case, if we did happen
> > to run under a (non PAPR-compliant) hypervisor that implemented
> > H_SET_MODE but not the relocation-on setting, then we couldn't have
> > enabled relocation-on interrupts in the first place, so it wouldn't
> > matter.
> 
> Well, I think Anton's patches do exactly that:
> 
>   https://lists.nongnu.org/archive/html/qemu-ppc/2013-08/msg00253.html
> 
> I really just want to double-check that we're not shooting ourselves in the foot here.

I still think there's no real problem, since there would be no other
way to enable relocation-on interrupts other than H_SET_MODE.  So if
H_SET_MODE can't control that setting, then it must be disabled
already.

However, we should also make sure that H_SET_MODE supports changing
the relocation-on setting when it first goes in.  I'm going to want
that soon anyway since I'm working on POWER8 KVM support at the
moment.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-08-06  4:26 ` [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest Paul Mackerras
@ 2013-09-12 22:56   ` Alexander Graf
  2013-09-13  0:17     ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-09-12 22:56 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 05.08.2013, at 23:26, Paul Mackerras wrote:

> This makes it possible to have both PR and HV guests running
> concurrently on one machine, by deferring the decision about which type
> of KVM to use for each guest until it either enables the PAPR capability
> or runs a vcpu.  (Of course, this is only possible if both
> CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
> 
> Guests start out essentially as PR guests but with kvm->arch.kvm_mode
> set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
> capability, and the machine is capable of running HV guests (i.e. it
> has suitable CPUs and has a usable hypervisor mode available), the
> guest gets converted to an HV guest at that point.  If userspace runs
> a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
> guest is confirmed as a PR guest at that point.
> 
> This also moves the preloading of the FPU for PR guests from
> kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
> kvmppc_set_msr_pr() can be called before any vcpu has been run, and
> it may be that the guest will end up as a HV guest, and in this case
> the preloading is not appropriate.  Instead it is now done after we
> have emulated a privileged or illegal instruction, if the guest MSR
> now has FP set.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.


Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-08-06  4:27 ` [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages Paul Mackerras
@ 2013-09-12 23:01   ` Alexander Graf
  2013-09-13  0:23     ` Paul Mackerras
  2013-09-14  5:24     ` Paul Mackerras
  0 siblings, 2 replies; 68+ messages in thread
From: Alexander Graf @ 2013-09-12 23:01 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm


On 05.08.2013, at 23:27, Paul Mackerras wrote:

> Currently we request write access to all pages that get mapped into the
> guest, even if the guest is only loading from the page.  This reduces
> the effectiveness of KSM because it means that we unshare every page we
> access.  Also, we always set the changed (C) bit in the guest HPTE if
> it allows writing, even for a guest load.
> 
> This fixes both these problems.  We pass an 'iswrite' flag to the
> mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
> the access is a load or a store.  The mmu.xlate() functions now only
> set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
> instead of gfn_to_pfn() so that it can indicate whether we need write
> access to the page, and get back a 'writable' flag to indicate whether
> the page is writable or not.  If that 'writable' flag is clear, we then
> make the host HPTE read-only even if the guest HPTE allowed writing.
> 
> This means that we can get a protection fault when the guest writes to a
> page that it has mapped read-write but which is read-only on the host
> side (perhaps due to KSM having merged the page).  Thus we now call
> kvmppc_handle_pagefault() for protection faults as well as HPTE not found
> faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
> guest HPTE and we thus need to install a new host HPTE, we then need to
> remove the old host HPTE if there is one.  This is done with a new
> function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
> find and remove the old host HPTE.

Have you measured how much performance we lose by mapping it twice? Usually Linux will mark user pages that are not written to yet as non-writable, no? That's why I assumed that "may_write" is the same as "guest wants to write" back when I wrote this.

I'm also afraid that a sequence like

  ld x,y
  std x,y

in the kernel will trap twice and slow us down heavily. But maybe I'm just being paranoid. Can you please measure bootup time with and without this, as well as a fork bomb (spawn /bin/echo 1000 times and time it) with and without so we get a feeling for its impact?


Thanks a lot!

Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-12 22:56   ` Alexander Graf
@ 2013-09-13  0:17     ` Paul Mackerras
  2013-09-13  1:31       ` Benjamin Herrenschmidt
  2013-09-13  4:17       ` Alexander Graf
  0 siblings, 2 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-09-13  0:17 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Sep 12, 2013 at 05:56:11PM -0500, Alexander Graf wrote:
> 
> On 05.08.2013, at 23:26, Paul Mackerras wrote:
> 
> > This makes it possible to have both PR and HV guests running
> > concurrently on one machine, by deferring the decision about which type
> > of KVM to use for each guest until it either enables the PAPR capability
> > or runs a vcpu.  (Of course, this is only possible if both
> > CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
> > 
> > Guests start out essentially as PR guests but with kvm->arch.kvm_mode
> > set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
> > capability, and the machine is capable of running HV guests (i.e. it
> > has suitable CPUs and has a usable hypervisor mode available), the
> > guest gets converted to an HV guest at that point.  If userspace runs
> > a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
> > guest is confirmed as a PR guest at that point.
> > 
> > This also moves the preloading of the FPU for PR guests from
> > kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
> > kvmppc_set_msr_pr() can be called before any vcpu has been run, and
> > it may be that the guest will end up as a HV guest, and in this case
> > the preloading is not appropriate.  Instead it is now done after we
> > have emulated a privileged or illegal instruction, if the guest MSR
> > now has FP set.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.

Yes, Anthony Liguori mentioned a similar concern to me.

Aneesh and I are currently investigating an alternative approach,
which is much more like the x86 way of doing things.  We are looking
at splitting the code into three modules: a kvm_pr.ko module with the
PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
core kvm.ko module with the generic bits (basically book3s.c,
powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
use).  Basically the core module would have a pointer to a struct
full of function pointers for the various ops that book3s_pr.c and
book3s_hv.c both provide.  You would only be able to have one of
kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
could have them both built in but only one could register its function
pointer struct with the core.  Obviously the kvm_hv module would only
load and register its struct on a machine that had hypervisor mode
available.  If they were both built in I would think we would give HV
the first chance to register itself, and let PR register if we can't
do HV.

How does that sound?

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-09-12 23:01   ` Alexander Graf
@ 2013-09-13  0:23     ` Paul Mackerras
  2013-09-14  5:24     ` Paul Mackerras
  1 sibling, 0 replies; 68+ messages in thread
From: Paul Mackerras @ 2013-09-13  0:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Sep 12, 2013 at 06:01:37PM -0500, Alexander Graf wrote:
> 
> On 05.08.2013, at 23:27, Paul Mackerras wrote:
> 
> > Currently we request write access to all pages that get mapped into the
> > guest, even if the guest is only loading from the page.  This reduces
> > the effectiveness of KSM because it means that we unshare every page we
> > access.  Also, we always set the changed (C) bit in the guest HPTE if
> > it allows writing, even for a guest load.
> > 
> > This fixes both these problems.  We pass an 'iswrite' flag to the
> > mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
> > the access is a load or a store.  The mmu.xlate() functions now only
> > set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
> > instead of gfn_to_pfn() so that it can indicate whether we need write
> > access to the page, and get back a 'writable' flag to indicate whether
> > the page is writable or not.  If that 'writable' flag is clear, we then
> > make the host HPTE read-only even if the guest HPTE allowed writing.
> > 
> > This means that we can get a protection fault when the guest writes to a
> > page that it has mapped read-write but which is read-only on the host
> > side (perhaps due to KSM having merged the page).  Thus we now call
> > kvmppc_handle_pagefault() for protection faults as well as HPTE not found
> > faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
> > guest HPTE and we thus need to install a new host HPTE, we then need to
> > remove the old host HPTE if there is one.  This is done with a new
> > function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
> > find and remove the old host HPTE.
> 
> Have you measured how much performance we lose by mapping it twice? Usually Linux will mark user pages that are not written to yet as non-writable, no? That's why I assumed that "may_write" is the same as "guest wants to write" back when I wrote this.

Anonymous user pages start out both writable and dirty, so I think
it's OK.

> I'm also afraid that a sequence like
> 
>   ld x,y
>   std x,y
> 
> in the kernel will trap twice and slow us down heavily. But maybe I'm just being paranoid. Can you please measure bootup time with and without this, as well as a fork bomb (spawn /bin/echo 1000 times and time it) with and without so we get a feeling for its impact?

OK, I can do that.

If a page is actually writable but the guest is only asking for read
access, we give it write access on the first fault, so I don't expect
to see any slowdown.  We would get the second fault mainly when KSM
has decided to share the underlying page, and there we do need the
second fault in order to do the copy-on-write.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-13  0:17     ` Paul Mackerras
@ 2013-09-13  1:31       ` Benjamin Herrenschmidt
  2013-09-13  4:18         ` Alexander Graf
  2013-09-14 18:33         ` Aneesh Kumar K.V
  2013-09-13  4:17       ` Alexander Graf
  1 sibling, 2 replies; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2013-09-13  1:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Alexander Graf, kvm-ppc, kvm

On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:

> Aneesh and I are currently investigating an alternative approach,
> which is much more like the x86 way of doing things.  We are looking
> at splitting the code into three modules: a kvm_pr.ko module with the
> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
> core kvm.ko module with the generic bits (basically book3s.c,
> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
> use).  Basically the core module would have a pointer to a struct
> full of function pointers for the various ops that book3s_pr.c and
> book3s_hv.c both provide.  You would only be able to have one of
> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
> could have them both built in but only one could register its function
> pointer struct with the core.  Obviously the kvm_hv module would only
> load and register its struct on a machine that had hypervisor mode
> available.  If they were both built in I would think we would give HV
> the first chance to register itself, and let PR register if we can't
> do HV.
> 
> How does that sound?

As long as we can force-load the PR one on a machine that normally runs
HV for the sake of testing ...

Also, all those KVM modules ... they don't auto-load do they ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-13  0:17     ` Paul Mackerras
  2013-09-13  1:31       ` Benjamin Herrenschmidt
@ 2013-09-13  4:17       ` Alexander Graf
  2013-09-18 12:05         ` Paul Mackerras
  1 sibling, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-09-13  4:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 12.09.2013 um 19:17 schrieb Paul Mackerras <paulus@samba.org>:

> On Thu, Sep 12, 2013 at 05:56:11PM -0500, Alexander Graf wrote:
>> 
>> On 05.08.2013, at 23:26, Paul Mackerras wrote:
>> 
>>> This makes it possible to have both PR and HV guests running
>>> concurrently on one machine, by deferring the decision about which type
>>> of KVM to use for each guest until it either enables the PAPR capability
>>> or runs a vcpu.  (Of course, this is only possible if both
>>> CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
>>> 
>>> Guests start out essentially as PR guests but with kvm->arch.kvm_mode
>>> set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
>>> capability, and the machine is capable of running HV guests (i.e. it
>>> has suitable CPUs and has a usable hypervisor mode available), the
>>> guest gets converted to an HV guest at that point.  If userspace runs
>>> a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
>>> guest is confirmed as a PR guest at that point.
>>> 
>>> This also moves the preloading of the FPU for PR guests from
>>> kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
>>> kvmppc_set_msr_pr() can be called before any vcpu has been run, and
>>> it may be that the guest will end up as a HV guest, and in this case
>>> the preloading is not appropriate.  Instead it is now done after we
>>> have emulated a privileged or illegal instruction, if the guest MSR
>>> now has FP set.
>>> 
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>> 
>> We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.
> 
> Yes, Anthony Liguori mentioned a similar concern to me.
> 
> Aneesh and I are currently investigating an alternative approach,
> which is much more like the x86 way of doing things.  We are looking
> at splitting the code into three modules: a kvm_pr.ko module with the
> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
> core kvm.ko module with the generic bits (basically book3s.c,
> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
> use).  Basically the core module would have a pointer to a struct
> full of function pointers for the various ops that book3s_pr.c and
> book3s_hv.c both provide.  You would only be able to have one of
> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
> could have them both built in but only one could register its function
> pointer struct with the core.  Obviously the kvm_hv module would only
> load and register its struct on a machine that had hypervisor mode
> available.  If they were both built in I would think we would give HV
> the first chance to register itself, and let PR register if we can't
> do HV.
> 
> How does that sound?

It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.

I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.

Can't we have both?

Alex

> 
> Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-13  1:31       ` Benjamin Herrenschmidt
@ 2013-09-13  4:18         ` Alexander Graf
  2013-09-14 18:33         ` Aneesh Kumar K.V
  1 sibling, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-09-13  4:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org



Am 12.09.2013 um 20:31 schrieb Benjamin Herrenschmidt <benh@kernel.crashing.org>:

> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
> 
>> Aneesh and I are currently investigating an alternative approach,
>> which is much more like the x86 way of doing things.  We are looking
>> at splitting the code into three modules: a kvm_pr.ko module with the
>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>> core kvm.ko module with the generic bits (basically book3s.c,
>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>> use).  Basically the core module would have a pointer to a struct
>> full of function pointers for the various ops that book3s_pr.c and
>> book3s_hv.c both provide.  You would only be able to have one of
>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>> could have them both built in but only one could register its function
>> pointer struct with the core.  Obviously the kvm_hv module would only
>> load and register its struct on a machine that had hypervisor mode
>> available.  If they were both built in I would think we would give HV
>> the first chance to register itself, and let PR register if we can't
>> do HV.
>> 
>> How does that sound?
> 
> As long as we can force-load the PR one on a machine that normally runs
> HV for the sake of testing ...
> 
> Also, all those KVM modules ... they don't auto-load do they ?

They don't today, but they should.

Alex

> 
> Cheers,
> Ben.
> 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-09-12 23:01   ` Alexander Graf
  2013-09-13  0:23     ` Paul Mackerras
@ 2013-09-14  5:24     ` Paul Mackerras
  2013-09-14 20:23       ` Alexander Graf
  1 sibling, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-09-14  5:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Benjamin Herrenschmidt, kvm-ppc, kvm

On Thu, Sep 12, 2013 at 06:01:37PM -0500, Alexander Graf wrote:
> 
> On 05.08.2013, at 23:27, Paul Mackerras wrote:
> 
> > Currently we request write access to all pages that get mapped into the
> > guest, even if the guest is only loading from the page.  This reduces
> > the effectiveness of KSM because it means that we unshare every page we
> > access.  Also, we always set the changed (C) bit in the guest HPTE if
> > it allows writing, even for a guest load.
>
> Have you measured how much performance we lose by mapping it twice? Usually Linux will mark user pages that are not written to yet as non-writable, no? That's why I assumed that "may_write" is the same as "guest wants to write" back when I wrote this.
> 
> I'm also afraid that a sequence like
> 
>   ld x,y
>   std x,y
> 
> in the kernel will trap twice and slow us down heavily. But maybe I'm just being paranoid. Can you please measure bootup time with and without this, as well as a fork bomb (spawn /bin/echo 1000 times and time it) with and without so we get a feeling for its impact?

Bootup (F19 guest, 3 runs):

Without the patch: average 20.12 seconds, st. dev. 0.17 seconds
With the patch: 20.47 seconds, st. dev. 0.19 seconds

Delta: 0.35 seconds, or 1.7%.

time for i in $(seq 1000); do /bin/echo $i >/dev/null; done:

Without the patch: average 7.27 seconds, st. dev. 0.23 seconds
With the patch: average 7.55 seconds, st. dev. 0.39 seconds

Delta: 0.28 seconds, or 3.8%.

So there appears to be a small effect, of a few percent.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-13  1:31       ` Benjamin Herrenschmidt
  2013-09-13  4:18         ` Alexander Graf
@ 2013-09-14 18:33         ` Aneesh Kumar K.V
  2013-09-14 20:22           ` Alexander Graf
  1 sibling, 1 reply; 68+ messages in thread
From: Aneesh Kumar K.V @ 2013-09-14 18:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Alexander Graf, kvm-ppc, kvm

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>
>> Aneesh and I are currently investigating an alternative approach,
>> which is much more like the x86 way of doing things.  We are looking
>> at splitting the code into three modules: a kvm_pr.ko module with the
>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>> core kvm.ko module with the generic bits (basically book3s.c,
>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>> use).  Basically the core module would have a pointer to a struct
>> full of function pointers for the various ops that book3s_pr.c and
>> book3s_hv.c both provide.  You would only be able to have one of
>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>> could have them both built in but only one could register its function
>> pointer struct with the core.  Obviously the kvm_hv module would only
>> load and register its struct on a machine that had hypervisor mode
>> available.  If they were both built in I would think we would give HV
>> the first chance to register itself, and let PR register if we can't
>> do HV.
>> 
>> How does that sound?
>
> As long as we can force-load the PR one on a machine that normally runs
> HV for the sake of testing ...

This is what I currently have

[root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
[root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
[root@llmp24l02 kvm]# rmmod kvm-hv
[root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
[root@llmp24l02 kvm]# 

So if by force load you mean rmmod kvm-hv and then modprobe kvm-pr that
works. But loading kvm-pr along side kvm-hv is not supported. My
understanding was we didn't want to allow that because that can confuse users
when they are not sure whether it is hv or pr kvm they are using.

-aneesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-14 18:33         ` Aneesh Kumar K.V
@ 2013-09-14 20:22           ` Alexander Graf
  2013-09-15  9:16             ` Aneesh Kumar K.V
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-09-14 20:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Benjamin Herrenschmidt, Paul Mackerras, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:

> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>> 
>>> Aneesh and I are currently investigating an alternative approach,
>>> which is much more like the x86 way of doing things.  We are looking
>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>> core kvm.ko module with the generic bits (basically book3s.c,
>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>> use).  Basically the core module would have a pointer to a struct
>>> full of function pointers for the various ops that book3s_pr.c and
>>> book3s_hv.c both provide.  You would only be able to have one of
>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>> could have them both built in but only one could register its function
>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>> load and register its struct on a machine that had hypervisor mode
>>> available.  If they were both built in I would think we would give HV
>>> the first chance to register itself, and let PR register if we can't
>>> do HV.
>>> 
>>> How does that sound?
>> 
>> As long as we can force-load the PR one on a machine that normally runs
>> HV for the sake of testing ...
> 
> This is what I currently have
> 
> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists

The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.

PR and HV however are not mutually exclusive in hardware. What you really want is

1) distro can force HV/PR
2) admin can force HV/PR
3) user can force HV/PR
4) by default things "just work"

1 can be done through kernel config options.
2 can be done through modules that get loaded or not
3 can be done through a vm ioctl
4 only works if you allow hv and pr to be available at the same time

I can assume who you talked to about this to make these design decisions, but it definitely was not me.


Alex


> [root@llmp24l02 kvm]# rmmod kvm-hv
> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
> [root@llmp24l02 kvm]# 
> 
> So if by force load you mean rmmod kvm-hv and then modprobe kvm-pr that
> works. But loading kvm-pr along side kvm-hv is not supported. My
> understanding was we didn't want to allow that because that can confuse users
> when they are not sure whether it is hv or pr kvm they are using.
> 
> -aneesh
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-09-14  5:24     ` Paul Mackerras
@ 2013-09-14 20:23       ` Alexander Graf
  2013-09-16  4:12         ` Paul Mackerras
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2013-09-14 20:23 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 14.09.2013 um 00:24 schrieb Paul Mackerras <paulus@samba.org>:

> On Thu, Sep 12, 2013 at 06:01:37PM -0500, Alexander Graf wrote:
>> 
>> On 05.08.2013, at 23:27, Paul Mackerras wrote:
>> 
>>> Currently we request write access to all pages that get mapped into the
>>> guest, even if the guest is only loading from the page.  This reduces
>>> the effectiveness of KSM because it means that we unshare every page we
>>> access.  Also, we always set the changed (C) bit in the guest HPTE if
>>> it allows writing, even for a guest load.
>> 
>> Have you measured how much performance we lose by mapping it twice? Usually Linux will mark user pages that are not written to yet as non-writable, no? That's why I assumed that "may_write" is the same as "guest wants to write" back when I wrote this.
>> 
>> I'm also afraid that a sequence like
>> 
>>  ld x,y
>>  std x,y
>> 
>> in the kernel will trap twice and slow us down heavily. But maybe I'm just being paranoid. Can you please measure bootup time with and without this, as well as a fork bomb (spawn /bin/echo 1000 times and time it) with and without so we get a feeling for its impact?
> 
> Bootup (F19 guest, 3 runs):
> 
> Without the patch: average 20.12 seconds, st. dev. 0.17 seconds
> With the patch: 20.47 seconds, st. dev. 0.19 seconds
> 
> Delta: 0.35 seconds, or 1.7%.
> 
> time for i in $(seq 1000); do /bin/echo $i >/dev/null; done:
> 
> Without the patch: average 7.27 seconds, st. dev. 0.23 seconds
> With the patch: average 7.55 seconds, st. dev. 0.39 seconds
> 
> Delta: 0.28 seconds, or 3.8%.
> 
> So there appears to be a small effect, of a few percent.

So in the normal case it slows us down, but allows ksm to be effective. Do we actually want this change then?

Alex

> 
> Paul.
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-14 20:22           ` Alexander Graf
@ 2013-09-15  9:16             ` Aneesh Kumar K.V
  2013-09-15 11:55               ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Aneesh Kumar K.V @ 2013-09-15  9:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

Alexander Graf <agraf@suse.de> writes:

> Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
>
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>> 
>>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>>> 
>>>> Aneesh and I are currently investigating an alternative approach,
>>>> which is much more like the x86 way of doing things.  We are looking
>>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>>> core kvm.ko module with the generic bits (basically book3s.c,
>>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>>> use).  Basically the core module would have a pointer to a struct
>>>> full of function pointers for the various ops that book3s_pr.c and
>>>> book3s_hv.c both provide.  You would only be able to have one of
>>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>>> could have them both built in but only one could register its function
>>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>>> load and register its struct on a machine that had hypervisor mode
>>>> available.  If they were both built in I would think we would give HV
>>>> the first chance to register itself, and let PR register if we can't
>>>> do HV.
>>>> 
>>>> How does that sound?
>>> 
>>> As long as we can force-load the PR one on a machine that normally runs
>>> HV for the sake of testing ...
>> 
>> This is what I currently have
>> 
>> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
>> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
>> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
>
> The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.
>
> PR and HV however are not mutually exclusive in hardware. What you really want is
>
> 1) distro can force HV/PR
> 2) admin can force HV/PR
> 3) user can force HV/PR
> 4) by default things "just work"
>
> 1 can be done through kernel config options.
> 2 can be done through modules that get loaded or not
> 3 can be done through a vm ioctl
> 4 only works if you allow hv and pr to be available at the same time
>
> I can assume who you talked to about this to make these design decisions, but it definitely was not me.

I didn't had much discussion around the design with anybody yet. What
you saw above was me changing/moving code around madly to get
something working in a day. I was hoping to get something that I can post as RFC
early and let others to comment. Good to get the feedback early.

-aneesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-15  9:16             ` Aneesh Kumar K.V
@ 2013-09-15 11:55               ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-09-15 11:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Benjamin Herrenschmidt, Paul Mackerras, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 15.09.2013 um 04:16 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:

> Alexander Graf <agraf@suse.de> writes:
> 
>> Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
>> 
>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>> 
>>>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>>>> 
>>>>> Aneesh and I are currently investigating an alternative approach,
>>>>> which is much more like the x86 way of doing things.  We are looking
>>>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>>>> core kvm.ko module with the generic bits (basically book3s.c,
>>>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>>>> use).  Basically the core module would have a pointer to a struct
>>>>> full of function pointers for the various ops that book3s_pr.c and
>>>>> book3s_hv.c both provide.  You would only be able to have one of
>>>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>>>> could have them both built in but only one could register its function
>>>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>>>> load and register its struct on a machine that had hypervisor mode
>>>>> available.  If they were both built in I would think we would give HV
>>>>> the first chance to register itself, and let PR register if we can't
>>>>> do HV.
>>>>> 
>>>>> How does that sound?
>>>> 
>>>> As long as we can force-load the PR one on a machine that normally runs
>>>> HV for the sake of testing ...
>>> 
>>> This is what I currently have
>>> 
>>> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
>>> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
>>> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
>> 
>> The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.
>> 
>> PR and HV however are not mutually exclusive in hardware. What you really want is
>> 
>> 1) distro can force HV/PR
>> 2) admin can force HV/PR
>> 3) user can force HV/PR
>> 4) by default things "just work"
>> 
>> 1 can be done through kernel config options.
>> 2 can be done through modules that get loaded or not
>> 3 can be done through a vm ioctl
>> 4 only works if you allow hv and pr to be available at the same time
>> 
>> I can assume who you talked to about this to make these design decisions, but it definitely was not me.
> 
> I didn't had much discussion around the design with anybody yet. What
> you saw above was me changing/moving code around madly to get
> something working in a day. I was hoping to get something that I can post as RFC
> early and let others to comment. Good to get the feedback early.

Heh ok :). I think we want to be flexible here unless complexity grows too much of a maintenance burden and/or slows things down.

Alex

> 
> -aneesh
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-09-14 20:23       ` Alexander Graf
@ 2013-09-16  4:12         ` Paul Mackerras
  2013-09-16 12:47           ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-09-16  4:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

On Sat, Sep 14, 2013 at 03:23:53PM -0500, Alexander Graf wrote:
> 
> 
> Am 14.09.2013 um 00:24 schrieb Paul Mackerras <paulus@samba.org>:
> 
> > Bootup (F19 guest, 3 runs):
> > 
> > Without the patch: average 20.12 seconds, st. dev. 0.17 seconds
> > With the patch: 20.47 seconds, st. dev. 0.19 seconds
> > 
> > Delta: 0.35 seconds, or 1.7%.
> > 
> > time for i in $(seq 1000); do /bin/echo $i >/dev/null; done:
> > 
> > Without the patch: average 7.27 seconds, st. dev. 0.23 seconds
> > With the patch: average 7.55 seconds, st. dev. 0.39 seconds
> > 
> > Delta: 0.28 seconds, or 3.8%.
> > 
> > So there appears to be a small effect, of a few percent.
> 
> So in the normal case it slows us down, but allows ksm to be effective. Do we actually want this change then?

I was a bit puzzled why there was a measurable slowdown until I
remembered that this patch was intended to go along with the patch
"powerpc: Implement __get_user_pages_fast()", which Ben took and which
is now upstream in Linus' tree (1f7bf028).  So, I applied that patch
on top of this "Better handling of host-side read-only pages" pages,
and did the same measurements.  The results were:

Bootup (F19 guest, 3 runs): average 20.05 seconds, st. dev. 0.53s

1000 /bin/echo (4 runs): average 7.27 seconds, st. dev. 0.032s

So with both patches applied there is no slowdown at all, and KSM
works properly.  I think we want this patch.

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages
  2013-09-16  4:12         ` Paul Mackerras
@ 2013-09-16 12:47           ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-09-16 12:47 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 15.09.2013 um 23:12 schrieb Paul Mackerras <paulus@samba.org>:

> On Sat, Sep 14, 2013 at 03:23:53PM -0500, Alexander Graf wrote:
>> 
>> 
>> Am 14.09.2013 um 00:24 schrieb Paul Mackerras <paulus@samba.org>:
>> 
>>> Bootup (F19 guest, 3 runs):
>>> 
>>> Without the patch: average 20.12 seconds, st. dev. 0.17 seconds
>>> With the patch: 20.47 seconds, st. dev. 0.19 seconds
>>> 
>>> Delta: 0.35 seconds, or 1.7%.
>>> 
>>> time for i in $(seq 1000); do /bin/echo $i >/dev/null; done:
>>> 
>>> Without the patch: average 7.27 seconds, st. dev. 0.23 seconds
>>> With the patch: average 7.55 seconds, st. dev. 0.39 seconds
>>> 
>>> Delta: 0.28 seconds, or 3.8%.
>>> 
>>> So there appears to be a small effect, of a few percent.
>> 
>> So in the normal case it slows us down, but allows ksm to be effective. Do we actually want this change then?
> 
> I was a bit puzzled why there was a measurable slowdown until I
> remembered that this patch was intended to go along with the patch
> "powerpc: Implement __get_user_pages_fast()", which Ben took and which
> is now upstream in Linus' tree (1f7bf028).  So, I applied that patch
> on top of this "Better handling of host-side read-only pages" pages,
> and did the same measurements.  The results were:
> 
> Bootup (F19 guest, 3 runs): average 20.05 seconds, st. dev. 0.53s
> 
> 1000 /bin/echo (4 runs): average 7.27 seconds, st. dev. 0.032s
> 
> So with both patches applied there is no slowdown at all, and KSM
> works properly.  I think we want this patch.

Ah, cool. Works for me then. Please resend it in a new set along with the other ones that didn't make it in yet :).

Alex

> 
> Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-13  4:17       ` Alexander Graf
@ 2013-09-18 12:05         ` Paul Mackerras
  2013-09-19  7:31           ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2013-09-18 12:05 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org

On Thu, Sep 12, 2013 at 11:17:11PM -0500, Alexander Graf wrote:
> 
> It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.
> 
> I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.
> 
> Can't we have both?

So, one suggestion (from Aneesh) is to use the 'type' argument to
kvm_arch_init_vm() to indicate whether we want a specific type of KVM
(PR or HV), or just the default.  Zero would mean default (fastest
available) whereas other values would indicate a specific choice of PR
or HV.  Then, if we build separate kvm_pr and kvm_hv modules when KVM
is configured to be a module, the sysadmin can control the default
choice by loading and unloading modules.

How does that sound?  Or would you prefer to stick with a single
module and have a module option to control the default choice?

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest
  2013-09-18 12:05         ` Paul Mackerras
@ 2013-09-19  7:31           ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2013-09-19  7:31 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org



Am 18.09.2013 um 07:05 schrieb Paul Mackerras <paulus@samba.org>:

> On Thu, Sep 12, 2013 at 11:17:11PM -0500, Alexander Graf wrote:
>> 
>> It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.
>> 
>> I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.
>> 
>> Can't we have both?
> 
> So, one suggestion (from Aneesh) is to use the 'type' argument to
> kvm_arch_init_vm() to indicate whether we want a specific type of KVM
> (PR or HV), or just the default.  Zero would mean default (fastest
> available) whereas other values would indicate a specific choice of PR
> or HV.  Then, if we build separate kvm_pr and kvm_hv modules when KVM
> is configured to be a module, the sysadmin can control the default
> choice by loading and unloading modules.
> 
> How does that sound?  Or would you prefer to stick with a single
> module and have a module option to control the default choice?

I think keeping 2 modules makes a lot of sense, but I'm not sure a parameter to init_vm works well with the way we model machines in QEMU. IIRC we only know that we force anything in the machine model initialization which happens way past the vm init.

Alex

> 
> Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2013-09-19  7:31 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-06  4:12 [PATCH 00/23] Allow PR and HV KVM to coexist in one kernel Paul Mackerras
2013-08-06  4:13 ` [PATCH 01/23] KVM: PPC: Book3S: Fix compile error in XICS emulation Paul Mackerras
2013-08-28 22:51   ` Alexander Graf
2013-08-06  4:14 ` [PATCH 02/23] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX Paul Mackerras
2013-08-08 15:49   ` Aneesh Kumar K.V
2013-08-28 22:51   ` Alexander Graf
2013-08-06  4:15 ` [PATCH 03/23] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls Paul Mackerras
2013-08-28 22:51   ` Alexander Graf
2013-08-06  4:16 ` [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu Paul Mackerras
2013-08-11 11:06   ` Aneesh Kumar K.V
2013-08-28 22:00   ` Alexander Graf
2013-08-29  5:04     ` Paul Mackerras
2013-08-29 12:46       ` Alexander Graf
2013-08-06  4:18 ` [PATCH 05/23] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() Paul Mackerras
2013-08-28 22:51   ` Alexander Graf
2013-08-06  4:18 ` [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages Paul Mackerras
2013-08-28 22:56   ` Alexander Graf
2013-08-29  5:17     ` Paul Mackerras
2013-08-29 12:48       ` Alexander Graf
2013-08-06  4:19 ` [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible Paul Mackerras
2013-08-28 23:24   ` Alexander Graf
2013-08-29  5:23     ` Paul Mackerras
2013-08-29 12:43       ` Alexander Graf
2013-08-06  4:20 ` [PATCH 08/23] KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs Paul Mackerras
2013-08-06  4:20 ` [PATCH 09/23] KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation Paul Mackerras
2013-08-06  4:21 ` [PATCH 10/23] KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe Paul Mackerras
2013-08-06  4:21 ` [PATCH 11/23] KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache Paul Mackerras
2013-08-12 10:03   ` Aneesh Kumar K.V
2013-08-06  4:22 ` [PATCH 12/23] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode Paul Mackerras
2013-08-06  4:22 ` [PATCH 13/23] KVM: PPC: Book3S: Move skip-interrupt handlers to common code Paul Mackerras
2013-08-06  4:23 ` [PATCH 14/23] KVM: PPC: Book3S PR: Delay disabling relocation-on interrupts Paul Mackerras
2013-08-30 16:30   ` Alexander Graf
2013-08-30 22:55     ` Paul Mackerras
2013-08-30 23:13       ` Alexander Graf
2013-08-31  5:42         ` Paul Mackerras
2013-08-06  4:24 ` [PATCH 15/23] KVM: PPC: Book3S: Rename symbols that exist in both PR and HV KVM Paul Mackerras
2013-08-06  4:24 ` [PATCH 16/23] KVM: PPC: Book3S: Merge implementations of KVM_PPC_GET_SMMU_INFO ioctl Paul Mackerras
2013-08-06  4:25 ` [PATCH 17/23] KVM: PPC: Book3S HV: Factorize kvmppc_core_vcpu_create_hv() Paul Mackerras
2013-08-06  4:25 ` [PATCH 18/23] KVM: PPC: Book3S: Allow both PR and HV KVM to be selected Paul Mackerras
2013-08-06  4:26 ` [PATCH 19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest Paul Mackerras
2013-09-12 22:56   ` Alexander Graf
2013-09-13  0:17     ` Paul Mackerras
2013-09-13  1:31       ` Benjamin Herrenschmidt
2013-09-13  4:18         ` Alexander Graf
2013-09-14 18:33         ` Aneesh Kumar K.V
2013-09-14 20:22           ` Alexander Graf
2013-09-15  9:16             ` Aneesh Kumar K.V
2013-09-15 11:55               ` Alexander Graf
2013-09-13  4:17       ` Alexander Graf
2013-09-18 12:05         ` Paul Mackerras
2013-09-19  7:31           ` Alexander Graf
2013-08-06  4:27 ` [PATCH 20/23] KVM: PPC: Book3S PR: Better handling of host-side read-only pages Paul Mackerras
2013-09-12 23:01   ` Alexander Graf
2013-09-13  0:23     ` Paul Mackerras
2013-09-14  5:24     ` Paul Mackerras
2013-09-14 20:23       ` Alexander Graf
2013-09-16  4:12         ` Paul Mackerras
2013-09-16 12:47           ` Alexander Graf
2013-08-06  4:27 ` [PATCH 21/23] KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() Paul Mackerras
2013-08-07  4:13   ` Bhushan Bharat-R65777
2013-08-07  4:28     ` Paul Mackerras
2013-08-07  5:18       ` Bhushan Bharat-R65777
2013-08-07  5:17   ` Bhushan Bharat-R65777
2013-08-07  8:27     ` Paul Mackerras
2013-08-07  8:31       ` Bhushan Bharat-R65777
2013-08-08 12:06         ` Paul Mackerras
2013-08-06  4:27 ` [PATCH 22/23] KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written Paul Mackerras
2013-08-06  4:28 ` [PATCH 23/23] KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).