linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] KVM: PPC: e500mc: Revert "add load inst fixup"
@ 2014-02-20 16:30 Mihai Caraman
  2014-02-20 16:30 ` [PATCH 2/4] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 Mihai Caraman
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mihai Caraman @ 2014-02-20 16:30 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

Load external pid (lwepx) execution faults, generated by KVM accessing
guest contexts, needs to be handled by KVM. In the current implementation
the host kernel handles lwepx faults, searching for the faulting address
with its own Logical Partition ID (LPID) context! In case a host
translation is found, the execution returns to the lwepx instruction
instead of the fixup ending up in an infinite loop.

Revert the commit 1d628af7 "add load inst fixup". We will address lwepx
issue in a subsequent patch without the need of fixups.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/kvm/bookehv_interrupts.S |   26 +-------------------------
 1 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..20c7a54 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -29,7 +29,6 @@
 #include <asm/asm-compat.h>
 #include <asm/asm-offsets.h>
 #include <asm/bitsperlong.h>
-#include <asm/thread_info.h>
 
 #ifdef CONFIG_64BIT
 #include <asm/exception-64e.h>
@@ -162,32 +161,9 @@
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
 	mtspr	SPRN_EPLC, r8
-
-	/* disable preemption, so we are sure we hit the fixup handler */
-	CURRENT_THREAD_INFO(r8, r1)
-	li	r7, 1
-	stw	r7, TI_PREEMPT(r8)
-
 	isync
-
-	/*
-	 * In case the read goes wrong, we catch it and write an invalid value
-	 * in LAST_INST instead.
-	 */
-1:	lwepx	r9, 0, r5
-2:
-.section .fixup, "ax"
-3:	li	r9, KVM_INST_FETCH_FAILED
-	b	2b
-.previous
-.section __ex_table,"a"
-	PPC_LONG_ALIGN
-	PPC_LONG 1b,3b
-.previous
-
+	lwepx   r9, 0, r5
 	mtspr	SPRN_EPLC, r3
-	li	r7, 0
-	stw	r7, TI_PREEMPT(r8)
 	stw	r9, VCPU_LAST_INST(r4)
 	.endif
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
  2014-02-20 16:30 [PATCH 1/4] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
@ 2014-02-20 16:30 ` Mihai Caraman
  2014-02-20 16:30 ` [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail Mihai Caraman
  2014-02-20 16:30 ` [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Mihai Caraman
  2 siblings, 0 replies; 12+ messages in thread
From: Mihai Caraman @ 2014-02-20 16:30 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/include/asm/mmu-book3e.h |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 936db36..2b23313 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
-#define MAS0_TLBSEL(x)		(((x) << 28) & 0x30000000)
+#define MAS0_TLBSEL_MASK	0x30000000
+#define MAS0_TLBSEL_SHIFT	28
+#define MAS0_TLBSEL(x)		(((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK)
+#define MAS0_GET_TLBSEL(mas0)	(((mas0) & MAS0_TLBSEL_MASK) >> MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK		0x0FFF0000
 #define MAS0_ESEL_SHIFT		16
 #define MAS0_ESEL(x)		(((x) << MAS0_ESEL_SHIFT) & MAS0_ESEL_MASK)
@@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK		0x00000f80
 #define MAS1_TSIZE_SHIFT	7
 #define MAS1_TSIZE(x)		(((x) << MAS1_TSIZE_SHIFT) & MAS1_TSIZE_MASK)
+#define MAS1_GET_TSIZE(mas1)	(((mas1) & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN		(~0xFFFUL)
 #define MAS2_X0			0x00000040
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
  2014-02-20 16:30 [PATCH 1/4] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
  2014-02-20 16:30 ` [PATCH 2/4] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 Mihai Caraman
@ 2014-02-20 16:30 ` Mihai Caraman
  2014-03-26 20:52   ` Scott Wood
  2014-02-20 16:30 ` [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Mihai Caraman
  2 siblings, 1 reply; 12+ messages in thread
From: Mihai Caraman @ 2014-02-20 16:30 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

On booke3e we got the last instruction on the exist path, using
load external pid (lwepx) dedicated instruction. The current
implementation proved to be buggy, and the alternative, to hook
lwepx exceptions into KVM, is considered too intrusive for host.

In the proposed solution we gets the instruction by mapping the
memory in the host, which requires to locate the guest TLB entry.
This may fail if the guest TLB entry gets evicted. In the failure
case we will force a TLB update, by reexecuting the guest instruction
which generates a guest TLB miss exception.

Alex G. suggested to get the last instruction on kvmppc_get_last_inst()
call and to  handle the TLB seach miss on its call chain. This patch
allows the function to fail, by returning an emulation_result error code,
and handles the EMULATE_AGAIN error code in the emulation layer
(leading to guest instruction reexecution).

Emulation_result common structure is not accesible from specific
booke headers, so this patch moves kvmppc_get_last_inst() definitions
to booke3 source files.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>

Please sanity check Book3S changes, I did the integration blindly.
---
 arch/powerpc/include/asm/kvm_book3s.h    |   29 ----------------------------
 arch/powerpc/include/asm/kvm_booke.h     |    5 ----
 arch/powerpc/include/asm/kvm_ppc.h       |    2 +
 arch/powerpc/kvm/book3s.c                |   31 ++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_paired_singles.c |   29 ++++++++++++++++-----------
 arch/powerpc/kvm/book3s_pr.c             |   27 +++++++++++++++++--------
 arch/powerpc/kvm/booke.c                 |   16 ++++++++++++++-
 arch/powerpc/kvm/booke.h                 |    5 ++++
 arch/powerpc/kvm/e500_mmu_host.c         |    5 ++++
 arch/powerpc/kvm/emulate.c               |   19 ++++++++++++-----
 arch/powerpc/kvm/powerpc.c               |   10 +++++++-
 11 files changed, 114 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index bc23b1b..6048eea 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -271,35 +271,6 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.pc;
 }
 
-static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
-{
-	ulong pc = kvmppc_get_pc(vcpu);
-
-	/* Load the instruction manually if it failed to do so in the
-	 * exit path */
-	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
-
-	return vcpu->arch.last_inst;
-}
-
-/*
- * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
- * Because the sc instruction sets SRR0 to point to the following
- * instruction, we have to fetch from pc - 4.
- */
-static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
-{
-	ulong pc = kvmppc_get_pc(vcpu) - 4;
-
-	/* Load the instruction manually if it failed to do so in the
-	 * exit path */
-	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
-		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
-
-	return vcpu->arch.last_inst;
-}
-
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault_dar;
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index dd8f615..7a85a69 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -63,11 +63,6 @@ static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 	return vcpu->arch.xer;
 }
 
-static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.last_inst;
-}
-
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.ctr = val;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index c8317fb..bc0cd21 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -71,6 +71,8 @@ extern int kvmppc_sanity_check(struct kvm_vcpu *vcpu);
 extern int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_subarch_vcpu_uninit(struct kvm_vcpu *vcpu);
 
+extern int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, u32 *inst);
+
 /* Core-specific hooks */
 
 extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gpa_t gpaddr,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 8912608..a86f7a4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -461,6 +461,37 @@ mmio:
 }
 EXPORT_SYMBOL_GPL(kvmppc_ld);
 
+int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, u32 *inst)
+{
+	ulong pc = kvmppc_get_pc(vcpu);
+
+	/* Load the instruction manually if it failed to do so in the
+	 * exit path */
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
+	*inst = vcpu->arch.last_inst;
+
+	return EMULATE_DONE;
+}
+
+/*
+ * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
+ * Because the sc instruction sets SRR0 to point to the following
+ * instruction, we have to fetch from pc - 4.
+ */
+int kvmppc_get_last_sc(struct kvm_vcpu *vcpu, u32 *inst)
+{
+	ulong pc = kvmppc_get_pc(vcpu) - 4;
+
+	/* Load the instruction manually if it failed to do so in the
+	 * exit path */
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
+	*inst = vcpu->arch.last_inst;
+
+	return EMULATE_DONE;
+}
+
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c
index a59a25a..80c533e 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool rc,
 
 int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
-	u32 inst = kvmppc_get_last_inst(vcpu);
+	u32 inst;
 	enum emulation_result emulated = EMULATE_DONE;
-
-	int ax_rd = inst_get_field(inst, 6, 10);
-	int ax_ra = inst_get_field(inst, 11, 15);
-	int ax_rb = inst_get_field(inst, 16, 20);
-	int ax_rc = inst_get_field(inst, 21, 25);
-	short full_d = inst_get_field(inst, 16, 31);
-
-	u64 *fpr_d = &vcpu->arch.fpr[ax_rd];
-	u64 *fpr_a = &vcpu->arch.fpr[ax_ra];
-	u64 *fpr_b = &vcpu->arch.fpr[ax_rb];
-	u64 *fpr_c = &vcpu->arch.fpr[ax_rc];
+	int ax_rd, ax_ra, ax_rb, ax_rc;
+	short full_d;
+	u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c;
+
+	kvmppc_get_last_inst(vcpu, &inst);
+
+	ax_rd = inst_get_field(inst, 6, 10);
+	ax_ra = inst_get_field(inst, 11, 15);
+	ax_rb = inst_get_field(inst, 16, 20);
+	ax_rc = inst_get_field(inst, 21, 25);
+	full_d = inst_get_field(inst, 16, 31);
+
+	fpr_d = &vcpu->arch.fpr[ax_rd];
+	fpr_a = &vcpu->arch.fpr[ax_ra];
+	fpr_b = &vcpu->arch.fpr[ax_rb];
+	fpr_c = &vcpu->arch.fpr[ax_rc];
 
 	bool rcomp = (inst & 1) ? true : false;
 	u32 cr = kvmppc_get_cr(vcpu);
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 5b9e906..b0d884d 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
 static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
 {
 	ulong srr0 = kvmppc_get_pc(vcpu);
-	u32 last_inst = kvmppc_get_last_inst(vcpu);
+	u32 last_inst;
 	int ret;
 
+	kvmppc_get_last_inst(vcpu, &last_inst);
 	ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
 	if (ret == -ENOENT) {
 		ulong msr = vcpu->arch.shared->msr;
@@ -890,15 +891,17 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	{
 		enum emulation_result er;
 		ulong flags;
+		u32 last_inst;
 
 program_interrupt:
 		flags = vcpu->arch.shadow_srr1 & 0x1f0000ull;
+		kvmppc_get_last_inst(vcpu, &last_inst);
 
 		if (vcpu->arch.shared->msr & MSR_PR) {
 #ifdef EXIT_DEBUG
-			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", kvmppc_get_pc(vcpu), kvmppc_get_last_inst(vcpu));
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", kvmppc_get_pc(vcpu), last_inst);
 #endif
-			if ((kvmppc_get_last_inst(vcpu) & 0xff0007ff) !=
+			if ((last_inst & 0xff0007ff) !=
 			    (INS_DCBZ & 0xfffffff7)) {
 				kvmppc_core_queue_program(vcpu, flags);
 				r = RESUME_GUEST;
@@ -917,7 +920,7 @@ program_interrupt:
 			break;
 		case EMULATE_FAIL:
 			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
-			       __func__, kvmppc_get_pc(vcpu), kvmppc_get_last_inst(vcpu));
+			       __func__, kvmppc_get_pc(vcpu), last_inst);
 			kvmppc_core_queue_program(vcpu, flags);
 			r = RESUME_GUEST;
 			break;
@@ -934,8 +937,11 @@ program_interrupt:
 		break;
 	}
 	case BOOK3S_INTERRUPT_SYSCALL:
+	{
+		u32 last_sc;
+		kvmppc_get_last_sc(vcpu, &last_sc);
 		if (vcpu->arch.papr_enabled &&
-		    (kvmppc_get_last_sc(vcpu) == 0x44000022) &&
+		    (last_sc == 0x44000022) &&
 		    !(vcpu->arch.shared->msr & MSR_PR)) {
 			/* SC 1 papr hypercalls */
 			ulong cmd = kvmppc_get_gpr(vcpu, 3);
@@ -980,6 +986,7 @@ program_interrupt:
 			r = RESUME_GUEST;
 		}
 		break;
+	}
 	case BOOK3S_INTERRUPT_FP_UNAVAIL:
 	case BOOK3S_INTERRUPT_ALTIVEC:
 	case BOOK3S_INTERRUPT_VSX:
@@ -1008,15 +1015,17 @@ program_interrupt:
 		break;
 	}
 	case BOOK3S_INTERRUPT_ALIGNMENT:
+	{
+		u32 last_inst;
+		kvmppc_get_last_inst(vcpu, &last_inst);
 		if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
-			vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu,
-				kvmppc_get_last_inst(vcpu));
-			vcpu->arch.shared->dar = kvmppc_alignment_dar(vcpu,
-				kvmppc_get_last_inst(vcpu));
+			vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu, last_inst);
+			vcpu->arch.shared->dar = kvmppc_alignment_dar(vcpu, last_inst);
 			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
 		}
 		r = RESUME_GUEST;
 		break;
+	}
 	case BOOK3S_INTERRUPT_MACHINE_CHECK:
 	case BOOK3S_INTERRUPT_TRACE:
 		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 0591e05..886e511 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -201,7 +201,7 @@ static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
 	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE);
 }
 
-static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
+void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
                                            ulong esr_flags)
 {
 	vcpu->arch.queued_esr = esr_flags;
@@ -772,6 +772,9 @@ static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		 * they were actually modified by emulation. */
 		return RESUME_GUEST_NV;
 
+	case EMULATE_AGAIN:
+		return RESUME_GUEST;
+
 	case EMULATE_DO_DCR:
 		run->exit_reason = KVM_EXIT_DCR;
 		return RESUME_HOST;
@@ -1943,6 +1946,17 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->kvm->arch.kvm_ops->vcpu_put(vcpu);
 }
 
+int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, u32 *inst)
+{
+	int result = EMULATE_DONE;
+
+	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+		result = kvmppc_ld_inst(vcpu, &vcpu->arch.last_inst);
+	*inst = vcpu->arch.last_inst;
+
+	return result;
+}
+
 int __init kvmppc_booke_init(void)
 {
 #ifndef CONFIG_KVM_BOOKE_HV
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 09bfd9b..c7c60c2 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -90,6 +90,9 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
+void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
+                                           ulong esr_flags);
+
 enum int_class {
 	INT_CLASS_NONCRIT,
 	INT_CLASS_CRIT,
@@ -123,6 +126,8 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn,
 extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
 					  ulong *spr_val);
 
+extern int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) ;
+
 /*
  * Load up guest vcpu FP state if it's needed.
  * It also set the MSR_FP in thread so that host know
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index ecf2247..6025cb7 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -34,6 +34,7 @@
 #include "e500.h"
 #include "timing.h"
 #include "e500_mmu_host.h"
+#include "booke.h"
 
 #include "trace_booke.h"
 
@@ -597,6 +598,10 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
 	}
 }
 
+int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) {
+	return EMULATE_FAIL;
+};
+
 /************* MMU Notifiers *************/
 
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 2f9a087..24a8e50 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -225,19 +225,26 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
  * from opcode tables in the future. */
 int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
-	u32 inst = kvmppc_get_last_inst(vcpu);
-	int ra = get_ra(inst);
-	int rs = get_rs(inst);
-	int rt = get_rt(inst);
-	int sprn = get_sprn(inst);
-	enum emulation_result emulated = EMULATE_DONE;
+	u32 inst;
+	int ra, rs, rt, sprn;
+	enum emulation_result emulated;
 	int advance = 1;
 
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	emulated = kvmppc_get_last_inst(vcpu, &inst);
+	if (emulated != EMULATE_DONE) {
+		return emulated;
+	}
+
 	pr_debug("Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
 
+	ra = get_ra(inst);
+	rs = get_rs(inst);
+	rt = get_rt(inst);
+	sprn = get_sprn(inst);
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 9ae9768..b611a5d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -228,6 +228,9 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		 * actually modified. */
 		r = RESUME_GUEST_NV;
 		break;
+	case EMULATE_AGAIN:
+		r = RESUME_GUEST;
+		break;
 	case EMULATE_DO_MMIO:
 		run->exit_reason = KVM_EXIT_MMIO;
 		/* We must reload nonvolatiles because "update" load/store
@@ -237,11 +240,14 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		r = RESUME_HOST_NV;
 		break;
 	case EMULATE_FAIL:
+	{
+		u32 last_inst;
+		kvmppc_get_last_inst(vcpu, &last_inst);
 		/* XXX Deliver Program interrupt to guest. */
-		printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__,
-		       kvmppc_get_last_inst(vcpu));
+		printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__, last_inst);
 		r = RESUME_HOST;
 		break;
+	}
 	default:
 		WARN_ON(1);
 		r = RESUME_GUEST;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-02-20 16:30 [PATCH 1/4] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
  2014-02-20 16:30 ` [PATCH 2/4] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 Mihai Caraman
  2014-02-20 16:30 ` [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail Mihai Caraman
@ 2014-02-20 16:30 ` Mihai Caraman
  2014-03-26 21:17   ` Scott Wood
  2 siblings, 1 reply; 12+ messages in thread
From: Mihai Caraman @ 2014-02-20 16:30 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

Load external pid (lwepx) instruction faults (when called from
KVM with guest context) needs to be handled by KVM. This implies
additional code in DO_KVM macro to identify the source of the
exception (which oiginate from KVM host rather than the guest).
The hook requires to check the Exception Syndrome Register
ESR[EPID] and External PID Load Context Register EPLC[EGS] for
some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB
miss exception is obvious intrusive for the host.

Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst()
by searching for the physical address and kmap it. This fixes an
infinite loop caused by lwepx's data TLB miss handled in the host
and the TODO for TLB eviction and execute-but-not-read entries.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/kvm/bookehv_interrupts.S |   37 +++----------
 arch/powerpc/kvm/e500_mmu_host.c      |   93 +++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index 20c7a54..c50490c 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -119,38 +119,14 @@
 1:
 
 	.if	\flags & NEED_EMU
-	/*
-	 * This assumes you have external PID support.
-	 * To support a bookehv CPU without external PID, you'll
-	 * need to look up the TLB entry and create a temporary mapping.
-	 *
-	 * FIXME: we don't currently handle if the lwepx faults.  PR-mode
-	 * booke doesn't handle it either.  Since Linux doesn't use
-	 * broadcast tlbivax anymore, the only way this should happen is
-	 * if the guest maps its memory execute-but-not-read, or if we
-	 * somehow take a TLB miss in the middle of this entry code and
-	 * evict the relevant entry.  On e500mc, all kernel lowmem is
-	 * bolted into TLB1 large page mappings, and we don't use
-	 * broadcast invalidates, so we should not take a TLB miss here.
-	 *
-	 * Later we'll need to deal with faults here.  Disallowing guest
-	 * mappings that are execute-but-not-read could be an option on
-	 * e500mc, but not on chips with an LRAT if it is used.
-	 */
-
-	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and EGS */
 	PPC_STL	r15, VCPU_GPR(R15)(r4)
 	PPC_STL	r16, VCPU_GPR(R16)(r4)
 	PPC_STL	r17, VCPU_GPR(R17)(r4)
 	PPC_STL	r18, VCPU_GPR(R18)(r4)
 	PPC_STL	r19, VCPU_GPR(R19)(r4)
-	mr	r8, r3
 	PPC_STL	r20, VCPU_GPR(R20)(r4)
-	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
 	PPC_STL	r21, VCPU_GPR(R21)(r4)
-	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
 	PPC_STL	r22, VCPU_GPR(R22)(r4)
-	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
 	PPC_STL	r23, VCPU_GPR(R23)(r4)
 	PPC_STL	r24, VCPU_GPR(R24)(r4)
 	PPC_STL	r25, VCPU_GPR(R25)(r4)
@@ -160,10 +136,15 @@
 	PPC_STL	r29, VCPU_GPR(R29)(r4)
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
-	mtspr	SPRN_EPLC, r8
-	isync
-	lwepx   r9, 0, r5
-	mtspr	SPRN_EPLC, r3
+
+	/*
+	 * We don't use external PID support. lwepx faults would need to be
+	 * handled by KVM and this implies aditional code in DO_KVM (for
+	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
+	 * is too intrusive for the host. Get last instuction in
+	 * kvmppc_get_last_inst().
+	 */
+	li	r9, KVM_INST_FETCH_FAILED
 	stw	r9, VCPU_LAST_INST(r4)
 	.endif
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 6025cb7..1b4cb41 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
 	}
 }
 
+#ifdef CONFIG_KVM_BOOKE_HV
+int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr)
+{
+	gva_t geaddr;
+	hpa_t addr;
+	hfn_t pfn;
+	hva_t eaddr;
+	u32 mas0, mas1, mas2, mas3;
+	u64 mas7_mas3;
+	struct page *page;
+	unsigned int addr_space, psize_shift;
+	bool pr;
+	unsigned long flags;
+
+	/* Search TLB for guest pc to get the real address */
+	geaddr = kvmppc_get_pc(vcpu);
+	addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
+
+	local_irq_save(flags);
+	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space);
+	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
+	isync();
+	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
+	mtspr(SPRN_MAS5, 0);
+	mtspr(SPRN_MAS8, 0);
+	mas0 = mfspr(SPRN_MAS0);
+	mas1 = mfspr(SPRN_MAS1);
+	mas2 = mfspr(SPRN_MAS2);
+	mas3 = mfspr(SPRN_MAS3);
+	mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) | mfspr(SPRN_MAS3);
+	local_irq_restore(flags);
+
+	/*
+	 * If the TLB entry for guest pc was evicted, return to the guest.
+	 * There are high chances to find a valid TLB entry next time.
+	 */
+	if (!(mas1 & MAS1_VALID))
+		return EMULATE_AGAIN;
+
+	/*
+	 * Another thread may rewrite the TLB entry in parallel, don't
+	 * execute from the address if the execute permission is not set
+	 */
+	pr = vcpu->arch.shared->msr & MSR_PR;
+	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & MAS3_SX)))) {
+		kvmppc_core_queue_inst_storage(vcpu, 0);
+		return EMULATE_AGAIN;
+	}
+
+	/*
+	 * We will map the real address through a cacheable page, so we will
+	 * not support cache-inhibited guest pages. Fortunately emulated
+	 * instructions should not live there.
+	 */
+	if (mas2 & MAS2_I) {
+		printk(KERN_CRIT "Instuction emulation from cache-inhibited "
+				"guest pages is not supported\n");
+		return EMULATE_FAIL;
+	}
+
+	/* Get page size */
+	if (MAS0_GET_TLBSEL(mas0) == 0)
+		psize_shift = PAGE_SHIFT;
+	else
+		psize_shift = MAS1_GET_TSIZE(mas1) + 10;
+
+	/* Map a page and get guest's instruction */
+	addr = (mas7_mas3 & (~0ULL << psize_shift)) |
+	       (geaddr & ((1ULL << psize_shift) - 1ULL));
+	pfn = addr >> PAGE_SHIFT;
+
+	if (unlikely(!pfn_valid(pfn))) {
+		printk(KERN_CRIT "Invalid frame number\n");
+		return EMULATE_FAIL;
+	}
+
+	/* Guard us against emulation from devices area */
+	if (unlikely(!page_is_ram(pfn))) {
+		printk(KERN_CRIT "Instruction emulation from non-RAM host "
+				"pages is not supported\n");
+		return EMULATE_FAIL;
+	}
+
+	page = pfn_to_page(pfn);
+	eaddr = (unsigned long)kmap_atomic(page);
+	eaddr |= addr & ~PAGE_MASK;
+	*instr = *(u32 *)eaddr;
+	kunmap_atomic((u32 *)eaddr);
+
+	return EMULATE_DONE;
+}
+#else
 int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) {
 	return EMULATE_FAIL;
 };
+#endif
 
 /************* MMU Notifiers *************/
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
  2014-02-20 16:30 ` [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail Mihai Caraman
@ 2014-03-26 20:52   ` Scott Wood
  2014-03-31 13:32     ` Alexander Graf
  0 siblings, 1 reply; 12+ messages in thread
From: Scott Wood @ 2014-03-26 20:52 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc

On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
> diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c
> index a59a25a..80c533e 100644
> --- a/arch/powerpc/kvm/book3s_paired_singles.c
> +++ b/arch/powerpc/kvm/book3s_paired_singles.c
> @@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool rc,
>  
>  int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu)
>  {
> -	u32 inst = kvmppc_get_last_inst(vcpu);
> +	u32 inst;
>  	enum emulation_result emulated = EMULATE_DONE;
> -
> -	int ax_rd = inst_get_field(inst, 6, 10);
> -	int ax_ra = inst_get_field(inst, 11, 15);
> -	int ax_rb = inst_get_field(inst, 16, 20);
> -	int ax_rc = inst_get_field(inst, 21, 25);
> -	short full_d = inst_get_field(inst, 16, 31);
> -
> -	u64 *fpr_d = &vcpu->arch.fpr[ax_rd];
> -	u64 *fpr_a = &vcpu->arch.fpr[ax_ra];
> -	u64 *fpr_b = &vcpu->arch.fpr[ax_rb];
> -	u64 *fpr_c = &vcpu->arch.fpr[ax_rc];
> +	int ax_rd, ax_ra, ax_rb, ax_rc;
> +	short full_d;
> +	u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c;
> +
> +	kvmppc_get_last_inst(vcpu, &inst);

Should probably check for failure here and elsewhere -- even though it
can't currently fail on book3s, the interface now allows it.

> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 5b9e906..b0d884d 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
>  static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
>  {
>  	ulong srr0 = kvmppc_get_pc(vcpu);
> -	u32 last_inst = kvmppc_get_last_inst(vcpu);
> +	u32 last_inst;
>  	int ret;
>  
> +	kvmppc_get_last_inst(vcpu, &last_inst);
>  	ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);

This isn't new, but this function looks odd to me -- calling
kvmppc_get_last_inst() but ignoring last_inst, then calling kvmppc_ld()
and ignoring anything but failure.  last_inst itself is never read.  And
no comments to explain the weirdness. :-)

I get that kvmppc_get_last_inst() is probably being used for the side
effect of filling in vcpu->arch.last_inst, but why store the return
value without using it?  Why pass the address of it to kvmppc_ld(),
which seems to be used only as an indirect way of determining whether
kvmppc_get_last_inst() failed?  And that whole mechanism becomes
stranger once it becomes possible for kvmppc_get_last_inst() to directly
return failure.

> diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
> index 09bfd9b..c7c60c2 100644
> --- a/arch/powerpc/kvm/booke.h
> +++ b/arch/powerpc/kvm/booke.h
> @@ -90,6 +90,9 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
>  void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
>  void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
>  
> +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
> +                                           ulong esr_flags);

Whitespace

> +
>  enum int_class {
>  	INT_CLASS_NONCRIT,
>  	INT_CLASS_CRIT,
> @@ -123,6 +126,8 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn,
>  extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
>  					  ulong *spr_val);
>  
> +extern int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) ;

Whitespace

> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index ecf2247..6025cb7 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -34,6 +34,7 @@
>  #include "e500.h"
>  #include "timing.h"
>  #include "e500_mmu_host.h"
> +#include "booke.h"
>  
>  #include "trace_booke.h"
>  
> @@ -597,6 +598,10 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
>  	}
>  }
>  
> +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) {
> +	return EMULATE_FAIL;
> +};

Brace placement

>  /************* MMU Notifiers *************/
>  
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
> diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
> index 2f9a087..24a8e50 100644
> --- a/arch/powerpc/kvm/emulate.c
> +++ b/arch/powerpc/kvm/emulate.c
> @@ -225,19 +225,26 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
>   * from opcode tables in the future. */
>  int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
>  {
> -	u32 inst = kvmppc_get_last_inst(vcpu);
> -	int ra = get_ra(inst);
> -	int rs = get_rs(inst);
> -	int rt = get_rt(inst);
> -	int sprn = get_sprn(inst);
> -	enum emulation_result emulated = EMULATE_DONE;
> +	u32 inst;
> +	int ra, rs, rt, sprn;
> +	enum emulation_result emulated;
>  	int advance = 1;
>  
>  	/* this default type might be overwritten by subcategories */
>  	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
>  
> +	emulated = kvmppc_get_last_inst(vcpu, &inst);
> +	if (emulated != EMULATE_DONE) {
> +		return emulated;
> +	}

Unnecessary braces

-Scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-02-20 16:30 ` [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Mihai Caraman
@ 2014-03-26 21:17   ` Scott Wood
  2014-03-31 13:41     ` Alexander Graf
  0 siblings, 1 reply; 12+ messages in thread
From: Scott Wood @ 2014-03-26 21:17 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc

On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
> Load external pid (lwepx) instruction faults (when called from
> KVM with guest context) needs to be handled by KVM. This implies
> additional code in DO_KVM macro to identify the source of the
> exception (which oiginate from KVM host rather than the guest).
> The hook requires to check the Exception Syndrome Register
> ESR[EPID] and External PID Load Context Register EPLC[EGS] for
> some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB
> miss exception is obvious intrusive for the host.
> 
> Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst()
> by searching for the physical address and kmap it. This fixes an
> infinite loop caused by lwepx's data TLB miss handled in the host
> and the TODO for TLB eviction and execute-but-not-read entries.
> 
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
>  arch/powerpc/kvm/bookehv_interrupts.S |   37 +++----------
>  arch/powerpc/kvm/e500_mmu_host.c      |   93 +++++++++++++++++++++++++++++++++
>  2 files changed, 102 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
> index 20c7a54..c50490c 100644
> --- a/arch/powerpc/kvm/bookehv_interrupts.S
> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
> @@ -119,38 +119,14 @@
>  1:
>  
>  	.if	\flags & NEED_EMU
> -	/*
> -	 * This assumes you have external PID support.
> -	 * To support a bookehv CPU without external PID, you'll
> -	 * need to look up the TLB entry and create a temporary mapping.
> -	 *
> -	 * FIXME: we don't currently handle if the lwepx faults.  PR-mode
> -	 * booke doesn't handle it either.  Since Linux doesn't use
> -	 * broadcast tlbivax anymore, the only way this should happen is
> -	 * if the guest maps its memory execute-but-not-read, or if we
> -	 * somehow take a TLB miss in the middle of this entry code and
> -	 * evict the relevant entry.  On e500mc, all kernel lowmem is
> -	 * bolted into TLB1 large page mappings, and we don't use
> -	 * broadcast invalidates, so we should not take a TLB miss here.
> -	 *
> -	 * Later we'll need to deal with faults here.  Disallowing guest
> -	 * mappings that are execute-but-not-read could be an option on
> -	 * e500mc, but not on chips with an LRAT if it is used.
> -	 */
> -
> -	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and EGS */
>  	PPC_STL	r15, VCPU_GPR(R15)(r4)
>  	PPC_STL	r16, VCPU_GPR(R16)(r4)
>  	PPC_STL	r17, VCPU_GPR(R17)(r4)
>  	PPC_STL	r18, VCPU_GPR(R18)(r4)
>  	PPC_STL	r19, VCPU_GPR(R19)(r4)
> -	mr	r8, r3
>  	PPC_STL	r20, VCPU_GPR(R20)(r4)
> -	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
>  	PPC_STL	r21, VCPU_GPR(R21)(r4)
> -	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
>  	PPC_STL	r22, VCPU_GPR(R22)(r4)
> -	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
>  	PPC_STL	r23, VCPU_GPR(R23)(r4)
>  	PPC_STL	r24, VCPU_GPR(R24)(r4)
>  	PPC_STL	r25, VCPU_GPR(R25)(r4)
> @@ -160,10 +136,15 @@
>  	PPC_STL	r29, VCPU_GPR(R29)(r4)
>  	PPC_STL	r30, VCPU_GPR(R30)(r4)
>  	PPC_STL	r31, VCPU_GPR(R31)(r4)
> -	mtspr	SPRN_EPLC, r8
> -	isync
> -	lwepx   r9, 0, r5
> -	mtspr	SPRN_EPLC, r3
> +
> +	/*
> +	 * We don't use external PID support. lwepx faults would need to be
> +	 * handled by KVM and this implies aditional code in DO_KVM (for
> +	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
> +	 * is too intrusive for the host. Get last instuction in
> +	 * kvmppc_get_last_inst().
> +	 */
> +	li	r9, KVM_INST_FETCH_FAILED
>  	stw	r9, VCPU_LAST_INST(r4)
>  	.endif
>  
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index 6025cb7..1b4cb41 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
>  	}
>  }
>  
> +#ifdef CONFIG_KVM_BOOKE_HV
> +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr)

It'd be interesting to see what the performance impact of doing this on
non-HV would be -- it would eliminate divergent code, eliminate the
MSR_DS hack, and make exec-only mappings work.

> +{
> +	gva_t geaddr;
> +	hpa_t addr;
> +	hfn_t pfn;
> +	hva_t eaddr;
> +	u32 mas0, mas1, mas2, mas3;
> +	u64 mas7_mas3;
> +	struct page *page;
> +	unsigned int addr_space, psize_shift;
> +	bool pr;
> +	unsigned long flags;
> +
> +	/* Search TLB for guest pc to get the real address */
> +	geaddr = kvmppc_get_pc(vcpu);
> +	addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
> +
> +	local_irq_save(flags);
> +	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space);
> +	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
> +	isync();
> +	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));

We can probably get away without that isync, despite what the manual
says.  We've been doing it in other contexts on e500 since forever, and
tlbsx has presync serialization which means it already waits for all
previous instructions to complete before beginning execution.

> +	mtspr(SPRN_MAS5, 0);
> +	mtspr(SPRN_MAS8, 0);
> +	mas0 = mfspr(SPRN_MAS0);
> +	mas1 = mfspr(SPRN_MAS1);
> +	mas2 = mfspr(SPRN_MAS2);
> +	mas3 = mfspr(SPRN_MAS3);
> +	mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) | mfspr(SPRN_MAS3);

Why read mas3 twice?

> +	local_irq_restore(flags);
> +
> +	/*
> +	 * If the TLB entry for guest pc was evicted, return to the guest.
> +	 * There are high chances to find a valid TLB entry next time.
> +	 */
> +	if (!(mas1 & MAS1_VALID))
> +		return EMULATE_AGAIN;
> +
> +	/*
> +	 * Another thread may rewrite the TLB entry in parallel, don't
> +	 * execute from the address if the execute permission is not set
> +	 */
> +	pr = vcpu->arch.shared->msr & MSR_PR;
> +	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & MAS3_SX)))) {
> +		kvmppc_core_queue_inst_storage(vcpu, 0);
> +		return EMULATE_AGAIN;
> +	}

s/(!foo)/!foo/g

> +	/*
> +	 * We will map the real address through a cacheable page, so we will
> +	 * not support cache-inhibited guest pages. Fortunately emulated
> +	 * instructions should not live there.
> +	 */
> +	if (mas2 & MAS2_I) {
> +		printk(KERN_CRIT "Instuction emulation from cache-inhibited "
> +				"guest pages is not supported\n");
> +		return EMULATE_FAIL;
> +	}

This message needs to be ratelimited, and use pr_err() (or maybe even
pr_debug()).

> +	/* Get page size */
> +	if (MAS0_GET_TLBSEL(mas0) == 0)
> +		psize_shift = PAGE_SHIFT;
> +	else
> +		psize_shift = MAS1_GET_TSIZE(mas1) + 10;

TSIZE should be correct when reading from TLB0, even if it was incorrect
(and ignored) when writing.

> +	/* Map a page and get guest's instruction */
> +	addr = (mas7_mas3 & (~0ULL << psize_shift)) |
> +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> +	pfn = addr >> PAGE_SHIFT;
> +
> +	if (unlikely(!pfn_valid(pfn))) {
> +		printk(KERN_CRIT "Invalid frame number\n");
> +		return EMULATE_FAIL;
> +	}
> +
> +	/* Guard us against emulation from devices area */
> +	if (unlikely(!page_is_ram(pfn))) {
> +		printk(KERN_CRIT "Instruction emulation from non-RAM host "
> +				"pages is not supported\n");
> +		return EMULATE_FAIL;
> +	}

Same comment as the cache-inhibited message, and you can probably have
one message for both pfn_valid and page_is_ram (is one of those a subset
of the other?).

-Scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
  2014-03-26 20:52   ` Scott Wood
@ 2014-03-31 13:32     ` Alexander Graf
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Graf @ 2014-03-31 13:32 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 03/26/2014 09:52 PM, Scott Wood wrote:
> On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
>> diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c
>> index a59a25a..80c533e 100644
>> --- a/arch/powerpc/kvm/book3s_paired_singles.c
>> +++ b/arch/powerpc/kvm/book3s_paired_singles.c
>> @@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool rc,
>>   
>>   int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu)
>>   {
>> -	u32 inst = kvmppc_get_last_inst(vcpu);
>> +	u32 inst;
>>   	enum emulation_result emulated = EMULATE_DONE;
>> -
>> -	int ax_rd = inst_get_field(inst, 6, 10);
>> -	int ax_ra = inst_get_field(inst, 11, 15);
>> -	int ax_rb = inst_get_field(inst, 16, 20);
>> -	int ax_rc = inst_get_field(inst, 21, 25);
>> -	short full_d = inst_get_field(inst, 16, 31);
>> -
>> -	u64 *fpr_d = &vcpu->arch.fpr[ax_rd];
>> -	u64 *fpr_a = &vcpu->arch.fpr[ax_ra];
>> -	u64 *fpr_b = &vcpu->arch.fpr[ax_rb];
>> -	u64 *fpr_c = &vcpu->arch.fpr[ax_rc];
>> +	int ax_rd, ax_ra, ax_rb, ax_rc;
>> +	short full_d;
>> +	u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c;
>> +
>> +	kvmppc_get_last_inst(vcpu, &inst);
> Should probably check for failure here and elsewhere -- even though it
> can't currently fail on book3s, the interface now allows it.
>
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index 5b9e906..b0d884d 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
>>   static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
>>   {
>>   	ulong srr0 = kvmppc_get_pc(vcpu);
>> -	u32 last_inst = kvmppc_get_last_inst(vcpu);
>> +	u32 last_inst;
>>   	int ret;
>>   
>> +	kvmppc_get_last_inst(vcpu, &last_inst);
>>   	ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
> This isn't new, but this function looks odd to me -- calling
> kvmppc_get_last_inst() but ignoring last_inst, then calling kvmppc_ld()
> and ignoring anything but failure.  last_inst itself is never read.  And
> no comments to explain the weirdness. :-)
>
> I get that kvmppc_get_last_inst() is probably being used for the side
> effect of filling in vcpu->arch.last_inst, but why store the return
> value without using it?  Why pass the address of it to kvmppc_ld(),
> which seems to be used only as an indirect way of determining whether
> kvmppc_get_last_inst() failed?  And that whole mechanism becomes
> stranger once it becomes possible for kvmppc_get_last_inst() to directly
> return failure.

If you're interested in the history of this, here's the patch :)

https://github.com/mirrors/linux-2.6/commit/c7f38f46f2a98d232147e47284cb4e7363296a3e

The idea is that we need 2 things to be good after this function:

   1) vcpu->arch.last_inst is valid
   2) if the last instruction is not readable, return failure

Hence this weird construct. I don't think it's really necessary though - 
just remove the kvmppc_ld() call and only fail read_inst() when the 
caching didn't work and we can't translate the address.


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-03-26 21:17   ` Scott Wood
@ 2014-03-31 13:41     ` Alexander Graf
  2014-03-31 23:03       ` Scott Wood
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Graf @ 2014-03-31 13:41 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 03/26/2014 10:17 PM, Scott Wood wrote:
> On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
>> Load external pid (lwepx) instruction faults (when called from
>> KVM with guest context) needs to be handled by KVM. This implies
>> additional code in DO_KVM macro to identify the source of the
>> exception (which oiginate from KVM host rather than the guest).
>> The hook requires to check the Exception Syndrome Register
>> ESR[EPID] and External PID Load Context Register EPLC[EGS] for
>> some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB
>> miss exception is obvious intrusive for the host.
>>
>> Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst()
>> by searching for the physical address and kmap it. This fixes an
>> infinite loop caused by lwepx's data TLB miss handled in the host
>> and the TODO for TLB eviction and execute-but-not-read entries.
>>
>> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
>> ---
>>   arch/powerpc/kvm/bookehv_interrupts.S |   37 +++----------
>>   arch/powerpc/kvm/e500_mmu_host.c      |   93 +++++++++++++++++++++++++++++++++
>>   2 files changed, 102 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
>> index 20c7a54..c50490c 100644
>> --- a/arch/powerpc/kvm/bookehv_interrupts.S
>> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
>> @@ -119,38 +119,14 @@
>>   1:
>>   
>>   	.if	\flags & NEED_EMU
>> -	/*
>> -	 * This assumes you have external PID support.
>> -	 * To support a bookehv CPU without external PID, you'll
>> -	 * need to look up the TLB entry and create a temporary mapping.
>> -	 *
>> -	 * FIXME: we don't currently handle if the lwepx faults.  PR-mode
>> -	 * booke doesn't handle it either.  Since Linux doesn't use
>> -	 * broadcast tlbivax anymore, the only way this should happen is
>> -	 * if the guest maps its memory execute-but-not-read, or if we
>> -	 * somehow take a TLB miss in the middle of this entry code and
>> -	 * evict the relevant entry.  On e500mc, all kernel lowmem is
>> -	 * bolted into TLB1 large page mappings, and we don't use
>> -	 * broadcast invalidates, so we should not take a TLB miss here.
>> -	 *
>> -	 * Later we'll need to deal with faults here.  Disallowing guest
>> -	 * mappings that are execute-but-not-read could be an option on
>> -	 * e500mc, but not on chips with an LRAT if it is used.
>> -	 */
>> -
>> -	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and EGS */
>>   	PPC_STL	r15, VCPU_GPR(R15)(r4)
>>   	PPC_STL	r16, VCPU_GPR(R16)(r4)
>>   	PPC_STL	r17, VCPU_GPR(R17)(r4)
>>   	PPC_STL	r18, VCPU_GPR(R18)(r4)
>>   	PPC_STL	r19, VCPU_GPR(R19)(r4)
>> -	mr	r8, r3
>>   	PPC_STL	r20, VCPU_GPR(R20)(r4)
>> -	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
>>   	PPC_STL	r21, VCPU_GPR(R21)(r4)
>> -	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
>>   	PPC_STL	r22, VCPU_GPR(R22)(r4)
>> -	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
>>   	PPC_STL	r23, VCPU_GPR(R23)(r4)
>>   	PPC_STL	r24, VCPU_GPR(R24)(r4)
>>   	PPC_STL	r25, VCPU_GPR(R25)(r4)
>> @@ -160,10 +136,15 @@
>>   	PPC_STL	r29, VCPU_GPR(R29)(r4)
>>   	PPC_STL	r30, VCPU_GPR(R30)(r4)
>>   	PPC_STL	r31, VCPU_GPR(R31)(r4)
>> -	mtspr	SPRN_EPLC, r8
>> -	isync
>> -	lwepx   r9, 0, r5
>> -	mtspr	SPRN_EPLC, r3
>> +
>> +	/*
>> +	 * We don't use external PID support. lwepx faults would need to be
>> +	 * handled by KVM and this implies aditional code in DO_KVM (for
>> +	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
>> +	 * is too intrusive for the host. Get last instuction in
>> +	 * kvmppc_get_last_inst().
>> +	 */
>> +	li	r9, KVM_INST_FETCH_FAILED
>>   	stw	r9, VCPU_LAST_INST(r4)
>>   	.endif
>>   
>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
>> index 6025cb7..1b4cb41 100644
>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>> @@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_KVM_BOOKE_HV
>> +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr)
> It'd be interesting to see what the performance impact of doing this on
> non-HV would be -- it would eliminate divergent code, eliminate the
> MSR_DS hack, and make exec-only mappings work.

We hit the instruction emulation path a lot more often on non-HV, so 
even a slight performance impact that might not be a major bummer for HV 
would become critical for PR.

But I agree - it'd be interesting to see numbers.

>
>> +{
>> +	gva_t geaddr;
>> +	hpa_t addr;
>> +	hfn_t pfn;
>> +	hva_t eaddr;
>> +	u32 mas0, mas1, mas2, mas3;
>> +	u64 mas7_mas3;
>> +	struct page *page;
>> +	unsigned int addr_space, psize_shift;
>> +	bool pr;
>> +	unsigned long flags;
>> +
>> +	/* Search TLB for guest pc to get the real address */
>> +	geaddr = kvmppc_get_pc(vcpu);
>> +	addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
>> +
>> +	local_irq_save(flags);
>> +	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space);
>> +	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
>> +	isync();
>> +	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
> We can probably get away without that isync, despite what the manual
> says.  We've been doing it in other contexts on e500 since forever, and
> tlbsx has presync serialization which means it already waits for all
> previous instructions to complete before beginning execution.
>
>> +	mtspr(SPRN_MAS5, 0);
>> +	mtspr(SPRN_MAS8, 0);
>> +	mas0 = mfspr(SPRN_MAS0);
>> +	mas1 = mfspr(SPRN_MAS1);
>> +	mas2 = mfspr(SPRN_MAS2);
>> +	mas3 = mfspr(SPRN_MAS3);
>> +	mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) | mfspr(SPRN_MAS3);
> Why read mas3 twice?
>
>> +	local_irq_restore(flags);
>> +
>> +	/*
>> +	 * If the TLB entry for guest pc was evicted, return to the guest.
>> +	 * There are high chances to find a valid TLB entry next time.
>> +	 */
>> +	if (!(mas1 & MAS1_VALID))
>> +		return EMULATE_AGAIN;
>> +
>> +	/*
>> +	 * Another thread may rewrite the TLB entry in parallel, don't
>> +	 * execute from the address if the execute permission is not set
>> +	 */

What happens when another thread rewrites the TLB entry in parallel? 
Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? 
Are the contents of the MAS registers consistent at this point or 
inconsistent?

There has to be a good way to detect such a race and deal with it, no?

>> +	pr = vcpu->arch.shared->msr & MSR_PR;
>> +	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & MAS3_SX)))) {
>> +		kvmppc_core_queue_inst_storage(vcpu, 0);
>> +		return EMULATE_AGAIN;
>> +	}
> s/(!foo)/!foo/g
>
>> +	/*
>> +	 * We will map the real address through a cacheable page, so we will
>> +	 * not support cache-inhibited guest pages. Fortunately emulated
>> +	 * instructions should not live there.
>> +	 */
>> +	if (mas2 & MAS2_I) {
>> +		printk(KERN_CRIT "Instuction emulation from cache-inhibited "
>> +				"guest pages is not supported\n");
>> +		return EMULATE_FAIL;
>> +	}
> This message needs to be ratelimited, and use pr_err() (or maybe even
> pr_debug()).

I'd go for pr_debug(). If anything we'd want a trace point indicating 
whether instruction fetching worked.


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-03-31 13:41     ` Alexander Graf
@ 2014-03-31 23:03       ` Scott Wood
  2014-04-01  5:47         ` Alexander Graf
  0 siblings, 1 reply; 12+ messages in thread
From: Scott Wood @ 2014-03-31 23:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
> On 03/26/2014 10:17 PM, Scott Wood wrote:
> > On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
> >> +	/*
> >> +	 * Another thread may rewrite the TLB entry in parallel, don't
> >> +	 * execute from the address if the execute permission is not set
> >> +	 */
> 
> What happens when another thread rewrites the TLB entry in parallel? 
> Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? 
> Are the contents of the MAS registers consistent at this point or 
> inconsistent?

If another thread rewrites the TLB entry, then we use the new TLB entry,
just as if it had raced in hardware.  This check ensures that we don't
execute from the new TLB entry if it doesn't have execute permissions
(just as hardware would).

What happens if the new TLB entry is valid and executable, and the
instruction pointed to is valid, but doesn't trap (and thus we don't
have emulation for it)?

> There has to be a good way to detect such a race and deal with it, no?

How would you detect it?  We don't get any information from the trap
about what physical address the instruction was fetched from, or what
the instruction was.

-Scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-03-31 23:03       ` Scott Wood
@ 2014-04-01  5:47         ` Alexander Graf
  2014-04-01 16:58           ` Scott Wood
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Graf @ 2014-04-01  5:47 UTC (permalink / raw)
  To: Scott Wood
  Cc: Mihai Caraman, <linuxppc-dev@lists.ozlabs.org>,
	<kvm@vger.kernel.org>, <kvm-ppc@vger.kernel.org>



> Am 01.04.2014 um 01:03 schrieb Scott Wood <scottwood@freescale.com>:
>=20
>> On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
>>> On 03/26/2014 10:17 PM, Scott Wood wrote:
>>>> On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
>>>> +    /*
>>>> +     * Another thread may rewrite the TLB entry in parallel, don't
>>>> +     * execute from the address if the execute permission is not set
>>>> +     */
>>=20
>> What happens when another thread rewrites the TLB entry in parallel?=20
>> Does tlbsx succeed? Does it fail? Do we see failure indicated somehow?=20=

>> Are the contents of the MAS registers consistent at this point or=20
>> inconsistent?
>=20
> If another thread rewrites the TLB entry, then we use the new TLB entry,
> just as if it had raced in hardware.  This check ensures that we don't
> execute from the new TLB entry if it doesn't have execute permissions
> (just as hardware would).
>=20
> What happens if the new TLB entry is valid and executable, and the
> instruction pointed to is valid, but doesn't trap (and thus we don't
> have emulation for it)?
>=20
>> There has to be a good way to detect such a race and deal with it, no?
>=20
> How would you detect it?  We don't get any information from the trap
> about what physical address the instruction was fetched from, or what
> the instruction was.

Ah, this is a guest race where the guest modifies exactly the TLB in questio=
n. I see.

Why would this ever happen in practice (in a case where we're not executing m=
alicious code)?


Alex


>=20
> -Scott
>=20
>=20

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-04-01  5:47         ` Alexander Graf
@ 2014-04-01 16:58           ` Scott Wood
  2014-04-01 17:11             ` Alexander Graf
  0 siblings, 1 reply; 12+ messages in thread
From: Scott Wood @ 2014-04-01 16:58 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Mihai Caraman, <linuxppc-dev@lists.ozlabs.org>,
	<kvm@vger.kernel.org>, <kvm-ppc@vger.kernel.org>

On Tue, 2014-04-01 at 07:47 +0200, Alexander Graf wrote:
> 
> > Am 01.04.2014 um 01:03 schrieb Scott Wood <scottwood@freescale.com>:
> > 
> >> On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
> >>> On 03/26/2014 10:17 PM, Scott Wood wrote:
> >>>> On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
> >>>> +    /*
> >>>> +     * Another thread may rewrite the TLB entry in parallel, don't
> >>>> +     * execute from the address if the execute permission is not set
> >>>> +     */
> >> 
> >> What happens when another thread rewrites the TLB entry in parallel? 
> >> Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? 
> >> Are the contents of the MAS registers consistent at this point or 
> >> inconsistent?
> > 
> > If another thread rewrites the TLB entry, then we use the new TLB entry,
> > just as if it had raced in hardware.  This check ensures that we don't
> > execute from the new TLB entry if it doesn't have execute permissions
> > (just as hardware would).
> > 
> > What happens if the new TLB entry is valid and executable, and the
> > instruction pointed to is valid, but doesn't trap (and thus we don't
> > have emulation for it)?
> > 
> >> There has to be a good way to detect such a race and deal with it, no?
> > 
> > How would you detect it?  We don't get any information from the trap
> > about what physical address the instruction was fetched from, or what
> > the instruction was.
> 
> Ah, this is a guest race where the guest modifies exactly the TLB in question. I see.
> 
> Why would this ever happen in practice (in a case where we're not executing malicious code)?

I don't know.  It probably wouldn't.  But it seems wrong to not check
the exec bit.

-Scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
  2014-04-01 16:58           ` Scott Wood
@ 2014-04-01 17:11             ` Alexander Graf
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Graf @ 2014-04-01 17:11 UTC (permalink / raw)
  To: Scott Wood
  Cc: Mihai Caraman, <linuxppc-dev@lists.ozlabs.org>,
	<kvm@vger.kernel.org>, <kvm-ppc@vger.kernel.org>

On 04/01/2014 06:58 PM, Scott Wood wrote:
> On Tue, 2014-04-01 at 07:47 +0200, Alexander Graf wrote:
>>> Am 01.04.2014 um 01:03 schrieb Scott Wood <scottwood@freescale.com>:
>>>
>>>> On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
>>>>> On 03/26/2014 10:17 PM, Scott Wood wrote:
>>>>>> On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
>>>>>> +    /*
>>>>>> +     * Another thread may rewrite the TLB entry in parallel, don't
>>>>>> +     * execute from the address if the execute permission is not set
>>>>>> +     */
>>>> What happens when another thread rewrites the TLB entry in parallel?
>>>> Does tlbsx succeed? Does it fail? Do we see failure indicated somehow?
>>>> Are the contents of the MAS registers consistent at this point or
>>>> inconsistent?
>>> If another thread rewrites the TLB entry, then we use the new TLB entry,
>>> just as if it had raced in hardware.  This check ensures that we don't
>>> execute from the new TLB entry if it doesn't have execute permissions
>>> (just as hardware would).
>>>
>>> What happens if the new TLB entry is valid and executable, and the
>>> instruction pointed to is valid, but doesn't trap (and thus we don't
>>> have emulation for it)?
>>>
>>>> There has to be a good way to detect such a race and deal with it, no?
>>> How would you detect it?  We don't get any information from the trap
>>> about what physical address the instruction was fetched from, or what
>>> the instruction was.
>> Ah, this is a guest race where the guest modifies exactly the TLB in question. I see.
>>
>> Why would this ever happen in practice (in a case where we're not executing malicious code)?
> I don't know.  It probably wouldn't.  But it seems wrong to not check
> the exec bit.

Right, I'm just saying that a program interrupt is not too bad of an 
answer in case the guest does something as stupid as this in kernel 
space :). It's definitely good practice to check for the exec bit.


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-01 17:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 16:30 [PATCH 1/4] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
2014-02-20 16:30 ` [PATCH 2/4] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 Mihai Caraman
2014-02-20 16:30 ` [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail Mihai Caraman
2014-03-26 20:52   ` Scott Wood
2014-03-31 13:32     ` Alexander Graf
2014-02-20 16:30 ` [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Mihai Caraman
2014-03-26 21:17   ` Scott Wood
2014-03-31 13:41     ` Alexander Graf
2014-03-31 23:03       ` Scott Wood
2014-04-01  5:47         ` Alexander Graf
2014-04-01 16:58           ` Scott Wood
2014-04-01 17:11             ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).