LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 Documentation/kvm/ppc-pv.txt |  164 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt

diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 0000000..7cbcd51
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,164 @@
+The PPC KVM paravirtual interface
+=================================
+
+The basic execution principle by which KVM on PowerPC works is to run all kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+======================
+
+To find out if we're running on KVM or not, we overlay the PVR register. Usually
+the PVR register contains an id that identifies your CPU type. If, however, you
+pass KVM_PVR_PARA in the register that you want the PVR result in, the register
+still contains KVM_PVR_PARA after the mfpvr call.
+
+	LOAD_REG_IMM(r5, KVM_PVR_PARA)
+	mfpvr	r5
+	[r5 still contains KVM_PVR_PARA]
+
+Once determined to run under a PV capable KVM, you can now use hypercalls as
+described below.
+
+PPC hypercalls
+==============
+
+The only viable ways to reliably get from guest context to host context are:
+
+	1) Call an invalid instruction
+	2) Call the "sc" instruction with a parameter to "sc"
+	3) Call the "sc" instruction with parameters in GPRs
+
+Method 1 is always a bad idea. Invalid instructions can be replaced later on
+by valid instructions, rendering the interface broken.
+
+Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is
+rather unclear if the sc is targeted directly for the hypervisor or the
+supervisor. It would also require that we read the syscall issuing instruction
+every time a syscall is issued, slowing down guest syscalls.
+
+Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and
+KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these
+magic values arrives from the guest's kernel mode, we take the syscall as a
+hypercall.
+
+The parameters are as follows:
+
+	r3		KVM_SC_MAGIC_R3
+	r4		KVM_SC_MAGIC_R4
+	r5		Hypercall number
+	r6		First parameter
+	r7		Second parameter
+	r8		Third parameter
+	r9		Fourth parameter
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike.
+
+The magic page
+==============
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+	ld	rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+	__u64 scratch1;
+	__u64 scratch2;
+	__u64 scratch3;
+	__u64 critical;		/* Guest may not get interrupts if == r1 */
+	__u64 sprg0;
+	__u64 sprg1;
+	__u64 sprg2;
+	__u64 sprg3;
+	__u64 srr0;
+	__u64 srr1;
+	__u64 dar;
+	__u64 msr;
+	__u32 dsisr;
+	__u32 int_pending;	/* Tells the guest if we have an interrupt */
+};
+
+Additions to the page must only occur at the end. Struct fields are always 32
+bit aligned.
+
+Patched instructions
+====================
+
+The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
+respectively on 32 bit systems with an added offset of 4 to accomodate for big
+endianness.
+
+From			To
+====			==
+
+mfmsr	rX		ld	rX, magic_page->msr
+mfsprg	rX, 0		ld	rX, magic_page->sprg0
+mfsprg	rX, 1		ld	rX, magic_page->sprg1
+mfsprg	rX, 2		ld	rX, magic_page->sprg2
+mfsprg	rX, 3		ld	rX, magic_page->sprg3
+mfsrr0	rX		ld	rX, magic_page->srr0
+mfsrr1	rX		ld	rX, magic_page->srr1
+mfdar	rX		ld	rX, magic_page->dar
+mfdsisr	rX		ld	rX, magic_page->dsisr
+
+mtmsr	rX		std	rX, magic_page->msr
+mtsprg	0, rX		std	rX, magic_page->sprg0
+mtsprg	1, rX		std	rX, magic_page->sprg1
+mtsprg	2, rX		std	rX, magic_page->sprg2
+mtsprg	3, rX		std	rX, magic_page->sprg3
+mtsrr0	rX		std	rX, magic_page->srr0
+mtsrr1	rX		std	rX, magic_page->srr1
+mtdar	rX		std	rX, magic_page->dar
+mtdsisr	rX		std	rX, magic_page->dsisr
+
+tlbsync			nop
+
+mtmsrd	rX, 0		b	<special mtmsr section>
+mtmsr			b	<special mtmsr section>
+
+mtmsrd	rX, 1		b	<special mtmsrd section>
+
+[BookE only]
+wrteei	[0|1]		b	<special wrteei section>
+
+
+Some instructions require more logic to determine what's going on than a load
+or store instruction can deliver. To enable patching of those, we keep some
+RAM around where we can live translate instructions to. What happens is the
+following:
+
+	1) copy emulation code to memory
+	2) patch that code to fit the emulated instruction
+	3) patch that code to return to the original pc + 4
+	4) patch the original instruction to branch to the new code
+
+That way we can inject an arbitrary amount of code as replacement for a single
+instruction. This allows us to check for pending interrupts when setting EE=1
+for example.
+
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 16/26] KVM: Move kvm_guest_init out of generic code
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.

So let's move the x86 specific definition of that function over to the x86
specfic header file.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/include/asm/kvm_para.h |    6 ++++++
 include/linux/kvm_para.h        |    5 -----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 05eba5e..7b562b6 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -158,6 +158,12 @@ static inline unsigned int kvm_arch_para_features(void)
 	return cpuid_eax(KVM_CPUID_FEATURES);
 }
 
+#ifdef CONFIG_KVM_GUEST
+void __init kvm_guest_init(void);
+#else
+#define kvm_guest_init() do { } while (0)
 #endif
 
+#endif /* __KERNEL__ */
+
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ac2015a..47a070b 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -26,11 +26,6 @@
 #include <asm/kvm_para.h>
 
 #ifdef __KERNEL__
-#ifdef CONFIG_KVM_GUEST
-void __init kvm_guest_init(void);
-#else
-#define kvm_guest_init() do { } while (0)
-#endif
 
 static inline int kvm_para_has_feature(unsigned int feature)
 {
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 20/26] KVM: PPC: PV tlbsync to nop
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index b165b20..b091f94 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -61,6 +61,8 @@
 #define KVM_INST_MTSPR_DAR	0x7c1303a6
 #define KVM_INST_MTSPR_DSISR	0x7c1203a6
 
+#define KVM_INST_TLBSYNC	0x7c00046c
+
 static bool kvm_patching_worked = true;
 
 static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
@@ -91,6 +93,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
 	*inst = KVM_INST_STW | rt | (addr & 0x0000fffc);
 }
 
+static void kvm_patch_ins_nop(u32 *inst)
+{
+	*inst = KVM_INST_NOP;
+}
+
 static void kvm_map_magic_page(void *data)
 {
 	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -159,6 +166,11 @@ static void kvm_check_ins(u32 *inst)
 	case KVM_INST_MTSPR_DSISR:
 		kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
 		break;
+
+	/* Nops */
+	case KVM_INST_TLBSYNC:
+		kvm_patch_ins_nop(inst);
+		break;
 	}
 
 	switch (_inst) {
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.

This patch still only contains stubs. But at least it loops through the
text section :).

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 2d8dd73..d873bc6 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,3 +32,62 @@
 #define KVM_MAGIC_PAGE		(-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+static bool kvm_patching_worked = true;
+
+static void kvm_map_magic_page(void *data)
+{
+	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
+		       KVM_MAGIC_PAGE,  /* Physical Address */
+		       KVM_MAGIC_PAGE); /* Effective Address */
+}
+
+static void kvm_check_ins(u32 *inst)
+{
+	u32 _inst = *inst;
+	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
+	u32 inst_rt = _inst & KVM_MASK_RT;
+
+	switch (inst_no_rt) {
+	}
+
+	switch (_inst) {
+	}
+
+	flush_icache_range((ulong)inst, (ulong)inst + 4);
+}
+
+static void kvm_use_magic_page(void)
+{
+	u32 *p;
+	u32 *start, *end;
+
+	/* Tell the host to map the magic page to -4096 on all CPUs */
+
+	on_each_cpu(kvm_map_magic_page, NULL, 1);
+
+	/* Now loop through all code and find instructions */
+
+	start = (void*)_stext;
+	end = (void*)_etext;
+
+	for (p = start; p < end; p++)
+		kvm_check_ins(p);
+}
+
+static int __init kvm_guest_init(void)
+{
+	char *p;
+
+	if (!kvm_para_available())
+		return 0;
+
+	if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
+		kvm_use_magic_page();
+
+	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
+			 kvm_patching_worked ? "worked" : "failed");
+
+	return 0;
+}
+
+postcore_initcall(kvm_guest_init);
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 12/26] KVM: PPC: First magic page steps
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.

This patch introduces simple defines, so the follow-up patches are easier to
read.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h |    2 ++
 include/linux/kvm_para.h            |    1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e35c1ac..5f8c214 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 	struct kvm_vcpu_arch_shared *shared;
+	unsigned long magic_page_pa; /* phys addr to map the magic page to */
+	unsigned long magic_page_ea; /* effect. addr to map the magic page to */
 
 #ifdef CONFIG_PPC_BOOK3S
 	struct kmem_cache *hpte_cache;
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 3b8080e..ac2015a 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -18,6 +18,7 @@
 #define KVM_HC_VAPIC_POLL_IRQ		1
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
+#define KVM_HC_PPC_MAP_MAGIC_PAGE	4
 
 /*
  * hypercalls use architecture specific
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, the PPC32 one never uses the
L=1 form but does mtmsr even for simple things like only changing EE.

So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c      |   51 ++++++++++++++++++++++++
 arch/powerpc/kernel/kvm_emul.S |   84 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 135 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 71153d0..3557bc8 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,7 +62,9 @@
 #define KVM_INST_MTSPR_DSISR	0x7c1203a6
 
 #define KVM_INST_TLBSYNC	0x7c00046c
+#define KVM_INST_MTMSRD_L0	0x7c000164
 #define KVM_INST_MTMSRD_L1	0x7c010164
+#define KVM_INST_MTMSR		0x7c000124
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -155,6 +157,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 	*inst = KVM_INST_B | (distance_start & KVM_INST_B_MASK);
 }
 
+extern u32 kvm_emulate_mtmsr_branch_offs;
+extern u32 kvm_emulate_mtmsr_reg1_offs;
+extern u32 kvm_emulate_mtmsr_reg2_offs;
+extern u32 kvm_emulate_mtmsr_reg3_offs;
+extern u32 kvm_emulate_mtmsr_orig_ins_offs;
+extern u32 kvm_emulate_mtmsr_len;
+extern u32 kvm_emulate_mtmsr[];
+
+static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
+{
+	u32 *p;
+	int distance_start;
+	int distance_end;
+	ulong next_inst;
+
+	p = kvm_alloc(kvm_emulate_mtmsr_len * 4);
+	if (!p)
+		return;
+
+	/* Find out where we are and put everything there */
+	distance_start = (ulong)p - (ulong)inst;
+	next_inst = ((ulong)inst + 4);
+	distance_end = next_inst - (ulong)&p[kvm_emulate_mtmsr_branch_offs];
+
+	/* Make sure we only write valid b instructions */
+	if (distance_start > KVM_INST_B_MAX) {
+		kvm_patching_worked = false;
+		return;
+	}
+
+	/* Modify the chunk to fit the invocation */
+	memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4);
+	p[kvm_emulate_mtmsr_branch_offs] |= distance_end & KVM_INST_B_MASK;
+	p[kvm_emulate_mtmsr_reg1_offs] |= rt;
+	p[kvm_emulate_mtmsr_reg2_offs] |= rt;
+	p[kvm_emulate_mtmsr_reg3_offs] |= rt;
+	p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
+	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
+
+	/* Patch the invocation */
+	*inst = KVM_INST_B | (distance_start & KVM_INST_B_MASK);
+}
+
 static void kvm_map_magic_page(void *data)
 {
 	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -235,6 +280,12 @@ static void kvm_check_ins(u32 *inst)
 		if (get_rt(inst_rt) < 30)
 			kvm_patch_ins_mtmsrd(inst, inst_rt);
 		break;
+	case KVM_INST_MTMSR:
+	case KVM_INST_MTMSRD_L0:
+		/* We use r30 and r31 during the hook */
+		if (get_rt(inst_rt) < 30)
+			kvm_patch_ins_mtmsr(inst, inst_rt);
+		break;
 	}
 
 	switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 25e6683..ccf5a42 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -110,3 +110,87 @@ kvm_emulate_mtmsrd_reg_offs:
 .global kvm_emulate_mtmsrd_len
 kvm_emulate_mtmsrd_len:
 	.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
+
+
+#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI)
+#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS
+
+.global kvm_emulate_mtmsr
+kvm_emulate_mtmsr:
+
+	SCRATCH_SAVE
+
+	/* Fetch old MSR in r31 */
+	LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+	/* Find the changed bits between old and new MSR */
+kvm_emulate_mtmsr_reg1:
+	xor	r31, r0, r31
+
+	/* Check if we need to really do mtmsr */
+	LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS)
+	and.	r31, r31, r30
+
+	/* No critical bits changed? Maybe we can stay in the guest. */
+	beq	maybe_stay_in_guest
+
+do_mtmsr:
+
+	SCRATCH_RESTORE
+
+	/* Just fire off the mtmsr if it's critical */
+kvm_emulate_mtmsr_orig_ins:
+	mtmsr	r0
+
+	b	kvm_emulate_mtmsr_branch
+
+maybe_stay_in_guest:
+
+	/* Check if we have to fetch an interrupt */
+	lwz	r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+	cmpwi	r31, 0
+	beq+	no_mtmsr
+
+	/* Check if we may trigger an interrupt */
+kvm_emulate_mtmsr_reg2:
+	andi.	r31, r0, MSR_EE
+	beq	no_mtmsr
+
+	b	do_mtmsr
+
+no_mtmsr:
+
+	/* Put MSR into magic page because we don't call mtmsr */
+kvm_emulate_mtmsr_reg3:
+	STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+	SCRATCH_RESTORE
+
+	/* Go back to caller */
+kvm_emulate_mtmsr_branch:
+	b	.
+kvm_emulate_mtmsr_end:
+
+.global kvm_emulate_mtmsr_branch_offs
+kvm_emulate_mtmsr_branch_offs:
+	.long (kvm_emulate_mtmsr_branch - kvm_emulate_mtmsr) / 4
+
+.global kvm_emulate_mtmsr_reg1_offs
+kvm_emulate_mtmsr_reg1_offs:
+	.long (kvm_emulate_mtmsr_reg1 - kvm_emulate_mtmsr) / 4
+
+.global kvm_emulate_mtmsr_reg2_offs
+kvm_emulate_mtmsr_reg2_offs:
+	.long (kvm_emulate_mtmsr_reg2 - kvm_emulate_mtmsr) / 4
+
+.global kvm_emulate_mtmsr_reg3_offs
+kvm_emulate_mtmsr_reg3_offs:
+	.long (kvm_emulate_mtmsr_reg3 - kvm_emulate_mtmsr) / 4
+
+.global kvm_emulate_mtmsr_orig_ins_offs
+kvm_emulate_mtmsr_orig_ins_offs:
+	.long (kvm_emulate_mtmsr_orig_ins - kvm_emulate_mtmsr) / 4
+
+.global kvm_emulate_mtmsr_len
+kvm_emulate_mtmsr_len:
+	.long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 25/26] KVM: PPC: PV wrteei
From: Alexander Graf @ 2010-06-25 23:25 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277508314-915-1-git-send-email-agraf@suse.de>

On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.

So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c      |   50 ++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/kvm_emul.S |   41 ++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 3557bc8..85e2163 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -66,6 +66,9 @@
 #define KVM_INST_MTMSRD_L1	0x7c010164
 #define KVM_INST_MTMSR		0x7c000124
 
+#define KVM_INST_WRTEEI_0	0x7c000146
+#define KVM_INST_WRTEEI_1	0x7c008146
+
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
 static int kvm_tmp_index;
@@ -200,6 +203,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
 	*inst = KVM_INST_B | (distance_start & KVM_INST_B_MASK);
 }
 
+#ifdef CONFIG_BOOKE
+
+extern u32 kvm_emulate_wrteei_branch_offs;
+extern u32 kvm_emulate_wrteei_ee_offs;
+extern u32 kvm_emulate_wrteei_len;
+extern u32 kvm_emulate_wrteei[];
+
+static void kvm_patch_ins_wrteei(u32 *inst)
+{
+	u32 *p;
+	int distance_start;
+	int distance_end;
+	ulong next_inst;
+
+	p = kvm_alloc(kvm_emulate_wrteei_len * 4);
+	if (!p)
+		return;
+
+	/* Find out where we are and put everything there */
+	distance_start = (ulong)p - (ulong)inst;
+	next_inst = ((ulong)inst + 4);
+	distance_end = next_inst - (ulong)&p[kvm_emulate_wrteei_branch_offs];
+
+	/* Make sure we only write valid b instructions */
+	if (distance_start > KVM_INST_B_MAX) {
+		kvm_patching_worked = false;
+		return;
+	}
+
+	/* Modify the chunk to fit the invocation */
+	memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4);
+	p[kvm_emulate_wrteei_branch_offs] |= distance_end & KVM_INST_B_MASK;
+	p[kvm_emulate_wrteei_ee_offs] |= (*inst & MSR_EE);
+	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4);
+
+	/* Patch the invocation */
+	*inst = KVM_INST_B | (distance_start & KVM_INST_B_MASK);
+}
+
+#endif
+
 static void kvm_map_magic_page(void *data)
 {
 	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -289,6 +333,12 @@ static void kvm_check_ins(u32 *inst)
 	}
 
 	switch (_inst) {
+#ifdef CONFIG_BOOKE
+	case KVM_INST_WRTEEI_0:
+	case KVM_INST_WRTEEI_1:
+		kvm_patch_ins_wrteei(inst);
+		break;
+#endif
 	}
 
 	flush_icache_range((ulong)inst, (ulong)inst + 4);
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index ccf5a42..b79b9de 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs:
 .global kvm_emulate_mtmsr_len
 kvm_emulate_mtmsr_len:
 	.long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4
+
+
+
+.global kvm_emulate_wrteei
+kvm_emulate_wrteei:
+
+	SCRATCH_SAVE
+
+	/* Fetch old MSR in r31 */
+	LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+	/* Remove MSR_EE from old MSR */
+	li	r30, 0
+	ori	r30, r30, MSR_EE
+	andc	r31, r31, r30
+
+	/* OR new MSR_EE onto the old MSR */
+kvm_emulate_wrteei_ee:
+	ori	r31, r31, 0
+
+	/* Write new MSR value back */
+	STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+	SCRATCH_RESTORE
+
+	/* Go back to caller */
+kvm_emulate_wrteei_branch:
+	b	.
+kvm_emulate_wrteei_end:
+
+.global kvm_emulate_wrteei_branch_offs
+kvm_emulate_wrteei_branch_offs:
+	.long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_ee_offs
+kvm_emulate_wrteei_ee_offs:
+	.long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_len
+kvm_emulate_wrteei_len:
+	.long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4
-- 
1.6.0.2

^ permalink raw reply related

* RE: JFFS2 corruption when mounting filesystem with filenames oflength> 7
From: Steve Deiters @ 2010-06-25 23:48 UTC (permalink / raw)
  To: linux-mtd; +Cc: linuxppc-dev
In-Reply-To: <181804936ABC2349BE503168465576460F1AB73E@exchserver.basler.com>

> -----Original Message-----
> From: linux-mtd-bounces@lists.infradead.org=20
> [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of=20
> Steve Deiters
> Sent: Thursday, June 24, 2010 3:02 PM
> To: linux-mtd@lists.infradead.org
> Subject: RE: JFFS2 corruption when mounting filesystem with=20
> filenames oflength> 7
>=20
> > -----Original Message-----
> > From: linux-mtd-bounces@lists.infradead.org
> > [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Steve=20
> > Deiters
> > Sent: Wednesday, June 23, 2010 5:42 PM
> > To: linux-mtd@lists.infradead.org
> > Subject: RE: JFFS2 corruption when mounting filesystem with=20
> filenames=20
> > oflength > 7
> >=20
> > > -----Original Message-----
> > > From: linux-mtd-bounces@lists.infradead.org
> > > [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Steve=20
> > > Deiters
> > > Sent: Wednesday, June 23, 2010 5:21 PM
> > > To: linux-mtd@lists.infradead.org
> > > Subject: JFFS2 corruption when mounting filesystem with
> > filenames of
> > > length > 7
> > >=20
> > > I found an archived post which seems to be identical to my issue.
> > > However, this is quite old and there never seemed to be any=20
> > > resolution.
> > >=20
> > > http://www.infradead.org/pipermail/linux-mtd/2006-September/01
> > > 6491.html
> > >=20
> > > If I mount a filesystem that has filenames greater than 7
> > characters
> > > in length, the files are corrupted when I mount.
> > > In my case, I am making a
> > > JFFS2 image with mkfs.jffs2 and flashing it in with u-boot. =20
> > > However, I have attached a workflow where I erase the Flash
> > and create
> > > a new filesystem completely within Linux and it gives the same=20
> > > behavior.  I can list the files with the 'ls'
> > > command from within u-boot.  If I mount from within=20
> Linux, and then=20
> > > reboot into u-boot, it will not display any files that had
> > a filename
> > > greater than 7 characters.
> > >=20
> > > I enabled the MTD debug verbosity at level 2 for the
> > attached example
> > > session.
> > >=20
> > > I am running on a custom board with a MPC5121 and Linux 2.6.33.4.
> > >=20
> > > Thanks in advance for any help.
> >=20
> >=20
> > Sorry for the jumbled mess.  Looks like the line endings are messed=20
> > up.
> > Trying again.  I also provided this as an attachment in=20
> case it gets=20
> > messed up again.
>=20
> Once again sorry for the mess.
>=20
> I tried this again with the DENX-v2.6.34 tag in the DENX git=20
> repository (git://git.denx.de/linux-2.6-denx.git).  The only=20
> modification I made was to add my dts file.  I still get the=20
> same issue I had before.
>=20
> I've attached my kernel config if that gives any clues.
>=20
> Are there any thoughts on what may be causing this?
>=20
> Thanks.


I think there may be something weird going on with the memcpy in my
build.  If I use the following patch I no longer get errors when I mount
the filesystem.  All I did was replace the memcpy with a loop.

I'm not sure what's special about this particular use of memcpy.  I
can't believe that things would be working as well as they do if memcpy
was broken in general.

This is on a PowerPC 32 bit build for a MPC5121.  I am using a GCC 4.1.2
to compile.  Is anyone aware of any issues with memcpy in this
configuration?

Thanks.

-------

diff --git a/fs/jffs2/scan.c b/fs/jffs2/scan.c
index 46f870d..673caa2 100644
--- a/fs/jffs2/scan.c
+++ b/fs/jffs2/scan.c
@@ -1038,7 +1038,10 @@ static int jffs2_scan_dirent_node(struct
jffs2_sb_info *c, struct jffs2_eraseblo
 	if (!fd) {
 		return -ENOMEM;
 	}
-	memcpy(&fd->name, rd->name, checkedlen);
+	int i;
+	for(i =3D 0; i < checkedlen; i++)
+		((unsigned char*)fd->name)[i] =3D ((const unsigned
char*)rd->name)[i];
+
 	fd->name[checkedlen] =3D 0;
=20
 	crc =3D crc32(0, fd->name, rd->nsize);

^ permalink raw reply related

* RE: JFFS2 corruption when mounting filesystem with filenames oflength> 7
From: Joakim Tjernlund @ 2010-06-26 13:27 UTC (permalink / raw)
  To: Steve Deiters; +Cc: linuxppc-dev, linux-mtd
In-Reply-To: <181804936ABC2349BE503168465576460F20CB90@exchserver.basler.com>

>
> > -----Original Message-----
> > From: linux-mtd-bounces@lists.infradead.org
> > [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of
> > Steve Deiters
> > Sent: Thursday, June 24, 2010 3:02 PM
> > To: linux-mtd@lists.infradead.org
> > Subject: RE: JFFS2 corruption when mounting filesystem with
> > filenames oflength> 7
> >
> > > -----Original Message-----
> > > From: linux-mtd-bounces@lists.infradead.org
> > > [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Steve
> > > Deiters
> > > Sent: Wednesday, June 23, 2010 5:42 PM
> > > To: linux-mtd@lists.infradead.org
> > > Subject: RE: JFFS2 corruption when mounting filesystem with
> > filenames
> > > oflength > 7
> > >
> > > > -----Original Message-----
> > > > From: linux-mtd-bounces@lists.infradead.org
> > > > [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Steve
> > > > Deiters
> > > > Sent: Wednesday, June 23, 2010 5:21 PM
> > > > To: linux-mtd@lists.infradead.org
> > > > Subject: JFFS2 corruption when mounting filesystem with
> > > filenames of
> > > > length > 7
> > > >
> > > > I found an archived post which seems to be identical to my issue.
> > > > However, this is quite old and there never seemed to be any
> > > > resolution.
> > > >
> > > > http://www.infradead.org/pipermail/linux-mtd/2006-September/01
> > > > 6491.html
> > > >
> > > > If I mount a filesystem that has filenames greater than 7
> > > characters
> > > > in length, the files are corrupted when I mount.
> > > > In my case, I am making a
> > > > JFFS2 image with mkfs.jffs2 and flashing it in with u-boot.
> > > > However, I have attached a workflow where I erase the Flash
> > > and create
> > > > a new filesystem completely within Linux and it gives the same
> > > > behavior.  I can list the files with the 'ls'
> > > > command from within u-boot.  If I mount from within
> > Linux, and then
> > > > reboot into u-boot, it will not display any files that had
> > > a filename
> > > > greater than 7 characters.
> > > >
> > > > I enabled the MTD debug verbosity at level 2 for the
> > > attached example
> > > > session.
> > > >
> > > > I am running on a custom board with a MPC5121 and Linux 2.6.33.4.
> > > >
> > > > Thanks in advance for any help.
> > >
> > >
> > > Sorry for the jumbled mess.  Looks like the line endings are messed
> > > up.
> > > Trying again.  I also provided this as an attachment in
> > case it gets
> > > messed up again.
> >
> > Once again sorry for the mess.
> >
> > I tried this again with the DENX-v2.6.34 tag in the DENX git
> > repository (git://git.denx.de/linux-2.6-denx.git).  The only
> > modification I made was to add my dts file.  I still get the
> > same issue I had before.
> >
> > I've attached my kernel config if that gives any clues.
> >
> > Are there any thoughts on what may be causing this?
> >
> > Thanks.
>
>
> I think there may be something weird going on with the memcpy in my
> build.  If I use the following patch I no longer get errors when I mount
> the filesystem.  All I did was replace the memcpy with a loop.
>
> I'm not sure what's special about this particular use of memcpy.  I
> can't believe that things would be working as well as they do if memcpy
> was broken in general.
>
> This is on a PowerPC 32 bit build for a MPC5121.  I am using a GCC 4.1.2
> to compile.  Is anyone aware of any issues with memcpy in this
> configuration?
>
> Thanks.
>
> -------
>
> diff --git a/fs/jffs2/scan.c b/fs/jffs2/scan.c
> index 46f870d..673caa2 100644
> --- a/fs/jffs2/scan.c
> +++ b/fs/jffs2/scan.c
> @@ -1038,7 +1038,10 @@ static int jffs2_scan_dirent_node(struct
> jffs2_sb_info *c, struct jffs2_eraseblo
>     if (!fd) {
>        return -ENOMEM;
>     }
> -   memcpy(&fd->name, rd->name, checkedlen);

Are the pointers to memcpy overlapping? If so memcpy is undefined
and you have to use memmove().

> +   int i;
> +   for(i = 0; i < checkedlen; i++)
> +      ((unsigned char*)fd->name)[i] = ((const unsigned
> char*)rd->name)[i];
> +
>     fd->name[checkedlen] = 0;
>
>     crc = crc32(0, fd->name, rd->nsize);
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>

^ permalink raw reply

* Re: [PATCH 11/26] KVM: PPC: Make RMO a define
From: Segher Boessenkool @ 2010-06-26 16:52 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-12-git-send-email-agraf@suse.de>

> On PowerPC it's very normal to not support all of the physical RAM  
> in real mode.

Oh?  Are you referring to "real mode limit", or 32-bit  
implementations with
more than 32 address lines, or something else?

Either way, RMO is a really bad name for this, since that name is  
already
used for a similar but different concept.

Also, it seems you construct the physical address by masking out bits  
from
the effective address.  Most implementations will trap or machine  
check if
you address outside of physical address space, instead.


Segher

^ permalink raw reply

* Re: [PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr
From: Segher Boessenkool @ 2010-06-26 17:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-25-git-send-email-agraf@suse.de>

> There is also a form of mtmsr where all bits need to be addressed.  
> While the
> PPC64 Linux kernel behaves resonably well here, the PPC32 one never  
> uses the
> L=1 form but does mtmsr even for simple things like only changing EE.

You make it sound like the 32-bit kernel does something stupid, while
there is no other choice.  The "L=1" thing only exists for 64-bit.


Segher

^ permalink raw reply

* Re: [PATCH 1/2] KVM: PPC: Add generic hpte management functions
From: Benjamin Herrenschmidt @ 2010-06-26 22:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-ppc, linuxppc-dev, Alexander Graf, kvm
In-Reply-To: <4C20AA9B.8000807@redhat.com>

On Tue, 2010-06-22 at 15:20 +0300, Avi Kivity wrote:
> On 06/22/2010 03:14 PM, Alexander Graf wrote:
> > Avi Kivity wrote:
> >    
> >> On 06/22/2010 03:10 PM, Alexander Graf wrote:
> >>      
> >>> If you have more performance hints, I'll gladly take them :).
> >>>
> >>>        
> >> Using a cpu that virtualizes the mmu in hardware helps tremendously.
> >>
> >>      
> > PPC never does that. Even with the virtualization extensions the MMU is
> > still software managed.
> 
> Then mmu intensive loads can expect to be slow.

Well, depends. ppc64 indeed requires the hash to be managed by the
hypervisor, so inserting or invalidating translations will mean a
roundtrip to the hypervisor, though there are ways at least the
insertion could be alleviated (for example, the HV could service the
hash misses directly walking the guest page tables).

But that's due in part to a design choice (whether it's a good one or
not I'm not going to argue here) which favors huge reasonably static
workloads where the hash is expected to contain all translations for
everything.

However, note that BookE (the embedded variant of the architecture) uses
a different model for virtualization, including options in its latest
variant for a HW logical->real translation (via a small dedicated TLB)
and direct access to some TLB ops from the guest.

> > I was also more thinking of hints like
> > "kmem_cache_zalloc is slow" or so ;).
> >    
> 
> Stuff like that is usually worthless.  To give real feedback I need to 
> understand the hardware, so I'm reduced to coding style and indentation 
> review.

In that case, I'd say that BAT manipulation is rare enough (mostly only
at boot time) to warrant indeed speeding up the normal PTE operations &
invalidations at the expense of the BAT change case.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] pcf857x: support working w/o platform data
From: Dmitry Eremin-Solenikov @ 2010-06-27  6:38 UTC (permalink / raw)
  To: David Brownell; +Cc: Jean Delvare, linux-kernel, linuxppc-dev
In-Reply-To: <352874.17151.qm@web180303.mail.gq1.yahoo.com>

Hello,

On 6/25/10, David Brownell <david-b@pacbell.net> wrote:
>
> --- On Thu, 6/17/10, Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> wrote:
>
>> Provide sane defaults for pcf857x, so
>> the driver can be used w/o
>> providing platform data (and thus can be simply bound via OF tree).
>
>
> Maybe we can get an ack from some OF folk
> who are using it in that way?

Please see arch/powerpc/boot//dts/mpc8349emitx.dts (the platform I'm using it).

>> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
>> ---
>>  drivers/gpio/pcf857x.c |    9 ++++-----
>>  1 files changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpio/pcf857x.c
>> b/drivers/gpio/pcf857x.c
>> index 29f19ce..879b473 100644
>> --- a/drivers/gpio/pcf857x.c
>> +++ b/drivers/gpio/pcf857x.c
>> @@ -190,7 +190,6 @@ static int pcf857x_probe(struct
>> i2c_client *client,
>>      pdata = client->dev.platform_data;
>>      if (!pdata) {
>>
>> dev_dbg(&client->dev, "no platform data\n");
>> -        return -EINVAL;
>>      }
>>
>>      /* Allocate, initialize, and register
>> this gpio_chip. */
>> @@ -200,7 +199,7 @@ static int pcf857x_probe(struct
>> i2c_client *client,
>>
>>      mutex_init(&gpio->lock);
>>
>> -    gpio->chip.base =
>> pdata->gpio_base;
>> +    gpio->chip.base = pdata ?
>> pdata->gpio_base : -1;
>>      gpio->chip.can_sleep = 1;
>>      gpio->chip.dev =
>> &client->dev;
>>      gpio->chip.owner = THIS_MODULE;
>> @@ -278,7 +277,7 @@ static int pcf857x_probe(struct
>> i2c_client *client,
>>       * to zero, our software copy
>> of the "latch" then matches the chip's
>>       * all-ones reset
>> state.  Otherwise it flags pins to be driven low.
>>       */
>> -    gpio->out = ~pdata->n_latch;
>> +    gpio->out = pdata ?
>> ~pdata->n_latch : ~0;
>>
>>      status =
>> gpiochip_add(&gpio->chip);
>>      if (status < 0)
>> @@ -299,7 +298,7 @@ static int pcf857x_probe(struct
>> i2c_client *client,
>>      /* Let platform code set up the GPIOs
>> and their users.
>>       * Now is the first time
>> anyone could use them.
>>       */
>> -    if (pdata->setup) {
>> +    if (pdata && pdata->setup)
>> {
>>          status =
>> pdata->setup(client,
>>
>>     gpio->chip.base, gpio->chip.ngpio,
>>
>>     pdata->context);
>> @@ -322,7 +321,7 @@ static int pcf857x_remove(struct
>> i2c_client *client)
>>      struct pcf857x
>>         *gpio =
>> i2c_get_clientdata(client);
>>      int
>>
>> status = 0;
>>
>> -    if (pdata->teardown) {
>> +    if (pdata &&
>> pdata->teardown) {
>>          status =
>> pdata->teardown(client,
>>
>>     gpio->chip.base, gpio->chip.ngpio,
>>
>>     pdata->context);
>> --
>> 1.7.1
>>
>>
>
>


-- 
With best wishes
Dmitry

^ permalink raw reply

* Re: [PATCH 1/2] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-27  7:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: kvm-ppc, linuxppc-dev, Alexander Graf, kvm
In-Reply-To: <1277593118.4200.122.camel@pasglop>

On 06/27/2010 01:58 AM, Benjamin Herrenschmidt wrote:
>
>> Then mmu intensive loads can expect to be slow.
>>      
> Well, depends. ppc64 indeed requires the hash to be managed by the
> hypervisor, so inserting or invalidating translations will mean a
> roundtrip to the hypervisor, though there are ways at least the
> insertion could be alleviated (for example, the HV could service the
> hash misses directly walking the guest page tables).
>    

But the guest page tables are software defined, no?  That means the 
interface will break if the page table format changes.

> But that's due in part to a design choice (whether it's a good one or
> not I'm not going to argue here) which favors huge reasonably static
> workloads where the hash is expected to contain all translations for
> everything.
>    

What about when you have memory pressure?  The hash will have to reflect 
those pte_clear_flush_young(), no?

It seems horribly expensive.

> However, note that BookE (the embedded variant of the architecture) uses
> a different model for virtualization, including options in its latest
> variant for a HW logical->real translation (via a small dedicated TLB)
> and direct access to some TLB ops from the guest.
>    

I'm somewhat familiar with it, yes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-27  8:14 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-27-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We just introduced a new PV interface that screams for documentation. So here
> it is - a shiny new and awesome text file describing the internal works of
> the PPC KVM paravirtual interface.
>    

Good, that lets people who have no idea what they're talking about 
participate in the review.

> +
> +PPC hypercalls
> +==============
> +
> +The only viable ways to reliably get from guest context to host context are:
> +
> +	1) Call an invalid instruction
> +	2) Call the "sc" instruction with a parameter to "sc"
> +	3) Call the "sc" instruction with parameters in GPRs
> +
> +Method 1 is always a bad idea. Invalid instructions can be replaced later on
> +by valid instructions, rendering the interface broken.
> +
> +Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is
> +rather unclear if the sc is targeted directly for the hypervisor or the
> +supervisor. It would also require that we read the syscall issuing instruction
> +every time a syscall is issued, slowing down guest syscalls.
> +
> +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and
> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these
> +magic values arrives from the guest's kernel mode, we take the syscall as a
> +hypercall.
>    

Is there any chance a normal syscall will have those values in r3 and r4?

If so, maybe it's better to use pc as they key for hypercalls.  Let the 
guest designate one instruction address as the hypercall call point; kvm 
can easily check it and reflect it back to the guest if it doesn't match.

Is it valid and useful to issue sc from privileged mode anyway, except 
for calling the hypervisor?

> +
> +The parameters are as follows:
> +
> +	r3		KVM_SC_MAGIC_R3
> +	r4		KVM_SC_MAGIC_R4
> +	r5		Hypercall number
> +	r6		First parameter
> +	r7		Second parameter
> +	r8		Third parameter
> +	r9		Fourth parameter
> +
> +Hypercall definitions are shared in generic code, so the same hypercall numbers
> +apply for x86 and powerpc alike.
>    

Addresses passed in hypercall paramters are guest physical addresses.

Do you have >32 bit physical addresses on 32-bit guests?  if so, you'll 
need to pass physical addresses in two registers.

> +
> +The magic page
> +==============
> +
> +To enable communication between the hypervisor and guest there is a new shared
> +page that contains parts of supervisor visible register state. The guest can
> +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
> +
> +With this hypercall issued the guest always gets the magic page mapped at the
> +desired location in effective and physical address space. For now, we always
> +map the page to -4096. This way we can access it using absolute load and store
> +functions. The following instruction reads the first field of the magic page:
> +
> +	ld	rX, -4096(0)
>    

Is the address guest controlled or host controlled?

> +
> +The interface is designed to be extensible should there be need later to add
> +additional registers to the magic page. If you add fields to the magic page,
> +also define a new hypercall feature to indicate that the host can give you more
> +registers. Only if the host supports the additional features, make use of them.
> +
> +The magic page has the following layout as described in
> +arch/powerpc/include/asm/kvm_para.h:
> +
> +struct kvm_vcpu_arch_shared {
> +	__u64 scratch1;
> +	__u64 scratch2;
> +	__u64 scratch3;
> +	__u64 critical;		/* Guest may not get interrupts if == r1 */
>    

Elaborate?

> +	__u64 sprg0;
> +	__u64 sprg1;
> +	__u64 sprg2;
> +	__u64 sprg3;
> +	__u64 srr0;
> +	__u64 srr1;
> +	__u64 dar;
> +	__u64 msr;
> +	__u32 dsisr;
> +	__u32 int_pending;	/* Tells the guest if we have an interrupt */
> +};
> +
> +Additions to the page must only occur at the end. Struct fields are always 32
> +bit aligned.
> +
> +Patched instructions
> +====================
> +
> +The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
> +respectively on 32 bit systems with an added offset of 4 to accomodate for big
> +endianness.
>    

Who does the patching? guest or host?

> +
> +From			To
> +====			==
> +
> +mfmsr	rX		ld	rX, magic_page->msr
> +mfsprg	rX, 0		ld	rX, magic_page->sprg0
> +mfsprg	rX, 1		ld	rX, magic_page->sprg1
> +mfsprg	rX, 2		ld	rX, magic_page->sprg2
> +mfsprg	rX, 3		ld	rX, magic_page->sprg3
> +mfsrr0	rX		ld	rX, magic_page->srr0
> +mfsrr1	rX		ld	rX, magic_page->srr1
> +mfdar	rX		ld	rX, magic_page->dar
> +mfdsisr	rX		ld	rX, magic_page->dsisr
> +
> +mtmsr	rX		std	rX, magic_page->msr
> +mtsprg	0, rX		std	rX, magic_page->sprg0
> +mtsprg	1, rX		std	rX, magic_page->sprg1
> +mtsprg	2, rX		std	rX, magic_page->sprg2
> +mtsprg	3, rX		std	rX, magic_page->sprg3
> +mtsrr0	rX		std	rX, magic_page->srr0
> +mtsrr1	rX		std	rX, magic_page->srr1
> +mtdar	rX		std	rX, magic_page->dar
> +mtdsisr	rX		std	rX, magic_page->dsisr
> +
> +tlbsync			nop
> +
> +mtmsrd	rX, 0		b	<special mtmsr section>
> +mtmsr			b	<special mtmsr section>
> +
> +mtmsrd	rX, 1		b	<special mtmsrd section>
> +
> +[BookE only]
> +wrteei	[0|1]		b	<special wrteei section>
>    

Probably the guest, as only it can arrange for special * sections.  Good.

> +
> +Some instructions require more logic to determine what's going on than a load
> +or store instruction can deliver. To enable patching of those, we keep some
> +RAM around where we can live translate instructions to. What happens is the
> +following:
> +
> +	1) copy emulation code to memory
> +	2) patch that code to fit the emulated instruction
> +	3) patch that code to return to the original pc + 4
> +	4) patch the original instruction to branch to the new code
> +
> +That way we can inject an arbitrary amount of code as replacement for a single
> +instruction. This allows us to check for pending interrupts when setting EE=1
> +for example.
> +
>    

Or not.

What about transitions from paravirt to non-paravirt?  For example, a 
system reset.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Avi Kivity @ 2010-06-27  8:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-3-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> One of the most obvious registers to share with the guest directly is the
> MSR. The MSR contains the "interrupts enabled" flag which the guest has to
> toggle in critical sections.
>
> So in order to bring the overhead of interrupt en- and disabling down, let's
> put msr into the shared page. Keep in mind that even though you can fully read
> its contents, writing to it doesn't always update all state. There are a few
> safe fields that don't require hypervisor interaction. See the guest
> implementation that follows later for reference.
>    


You mean, see the documentation for reference.

It should be possible to write the guest code looking only at the 
documentation.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27  8:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-9-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> When running in hooked code we need a way to disable interrupts without
> clobbering any interrupts or exiting out to the hypervisor.
>
> To achieve this, we have an additional critical field in the shared page. If
> that field is equal to the r1 register of the guest, it tells the hypervisor
> that we're in such a critical section and thus may not receive any interrupts.
>    

Is r1 reserved for this purpose?  Can't it match accidentally?

Why won't zero/nonzero work for this?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Avi Kivity @ 2010-06-27  8:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-10-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> While running in hooked code we need to store register contents out because
> we must not clobber any registers.
>
> So let's add some fields to the shared page we can just happily write to.
>
>    

How are these protected during interrupts?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 12/26] KVM: PPC: First magic page steps
From: Avi Kivity @ 2010-06-27  8:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-13-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We will be introducing a method to project the shared page in guest context.
> As soon as we're talking about this coupling, the shared page is colled magic
> page.
>
> This patch introduces simple defines, so the follow-up patches are easier to
> read.
>
>
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index e35c1ac..5f8c214 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
>   	u64 dec_jiffies;
>   	unsigned long pending_exceptions;
>   	struct kvm_vcpu_arch_shared *shared;
> +	unsigned long magic_page_pa; /* phys addr to map the magic page to */
> +	unsigned long magic_page_ea; /* effect. addr to map the magic page to */
>    

Is ea like a va?  If so, can't the guest specify it by manipulating the 
hash table (or tlb)?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-27  8:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-19-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
>
> This patch still only contains stubs. But at least it loops through the
> text section :).
>
>
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
> +	u32 inst_rt = _inst&  KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
>    

Shouldn't we flush only if we patched something?

> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p<  end; p++)
> +		kvm_check_ins(p);
> +}
> +
>    

Or, flush the entire thing here.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-27  8:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-27-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We just introduced a new PV interface that screams for documentation. So here
> it is - a shiny new and awesome text file describing the internal works of
> the PPC KVM paravirtual interface.
>
>
> +Querying for existence
> +======================
> +
> +To find out if we're running on KVM or not, we overlay the PVR register. Usually
> +the PVR register contains an id that identifies your CPU type. If, however, you
> +pass KVM_PVR_PARA in the register that you want the PVR result in, the register
> +still contains KVM_PVR_PARA after the mfpvr call.
> +
> +	LOAD_REG_IMM(r5, KVM_PVR_PARA)
> +	mfpvr	r5
> +	[r5 still contains KVM_PVR_PARA]
> +
> +Once determined to run under a PV capable KVM, you can now use hypercalls as
> +described below.
>    

On x86 we allow host userspace to determine whether the guest sees the 
paravirt interface (and what features are exposed).  This allows you to 
live migrate from a newer host to an older host, by not exposing the 
newer features.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 11/26] KVM: PPC: Make RMO a define
From: Alexander Graf @ 2010-06-27  9:08 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <2078D8A9-7D36-4B5D-A779-9BBAB545A53D@kernel.crashing.org>


Am 26.06.2010 um 18:52 schrieb Segher Boessenkool <segher@kernel.crashing.org 
 >:

>> On PowerPC it's very normal to not support all of the physical RAM  
>> in real mode.
>
> Oh?  Are you referring to "real mode limit", or 32-bit  
> implementations with
> more than 32 address lines, or something else?

The former.

>
> Either way, RMO is a really bad name for this, since that name is  
> already
> used for a similar but different concept.

It's the same concept, no? Not all physical memory is accessible from  
real mode.

>
> Also, it seems you construct the physical address by masking out  
> bits from
> the effective address.  Most implementations will trap or machine  
> check if
> you address outside of physical address space, instead.

Well the only case where I remember to have hit a real RMO case is on  
the PS3 - that issues a data/instruction storage interrupt when  
accessing anything > 8MB in real mode.

So I'd argue this is heavily implementation specific.

Apart from that what I'm trying to cover is that on ppc64 accessing  
0xc0000000000000 in real mode gets you 0x0. Is there a better name for  
this?

Alex

>

^ permalink raw reply

* Re: [PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr
From: Alexander Graf @ 2010-06-27  9:10 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <EDF0A567-C440-4F1B-9AF5-2E0F8203D566@kernel.crashing.org>


Am 26.06.2010 um 19:03 schrieb Segher Boessenkool <segher@kernel.crashing.org 
 >:

>> There is also a form of mtmsr where all bits need to be addressed.  
>> While the
>> PPC64 Linux kernel behaves resonably well here, the PPC32 one never  
>> uses the
>> L=1 form but does mtmsr even for simple things like only changing EE.
>
> You make it sound like the 32-bit kernel does something stupid, while
> there is no other choice.  The "L=1" thing only exists for 64-bit.

Oh, so that's why :). That doesn't really change the fact that it's  
very hard to distinguish between a mtmsr that only changes MSR_EE vs  
one that changes MSR_IR for example :).

Alex

>

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-27  9:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270876.2050806@redhat.com>


Am 27.06.2010 um 10:14 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We just introduced a new PV interface that screams for  
>> documentation. So here
>> it is - a shiny new and awesome text file describing the internal  
>> works of
>> the PPC KVM paravirtual interface.
>>
>
> Good, that lets people who have no idea what they're talking about  
> participate in the review.

Heh, I knew you'd like this :).

>
>> +
>> +PPC hypercalls
>> +==============
>> +
>> +The only viable ways to reliably get from guest context to host  
>> context are:
>> +
>> +    1) Call an invalid instruction
>> +    2) Call the "sc" instruction with a parameter to "sc"
>> +    3) Call the "sc" instruction with parameters in GPRs
>> +
>> +Method 1 is always a bad idea. Invalid instructions can be  
>> replaced later on
>> +by valid instructions, rendering the interface broken.
>> +
>> +Method 2 also has downfalls. If the parameter to "sc" is != 0 the  
>> spec is
>> +rather unclear if the sc is targeted directly for the hypervisor  
>> or the
>> +supervisor. It would also require that we read the syscall issuing  
>> instruction
>> +every time a syscall is issued, slowing down guest syscalls.
>> +
>> +Method 3 is what KVM uses. We pass magic constants  
>> (KVM_SC_MAGIC_R3 and
>> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall  
>> instruction with these
>> +magic values arrives from the guest's kernel mode, we take the  
>> syscall as a
>> +hypercall.
>>
>
> Is there any chance a normal syscall will have those values in r3  
> and r4?

r3 is the syscall number. So as long as the guest doesn't reuse that  
value, we're safe. Since in general syscall numbers are not randomly  
scattered throughout the number range, we should be ok here.

>
> If so, maybe it's better to use pc as they key for hypercalls.  Let  
> the guest designate one instruction address as the hypercall call  
> point; kvm can easily check it and reflect it back to the guest if  
> it doesn't match.
>

You mean the guest would tell the hv where the hypercall lies? That  
would require a hypercall, no? Defining it statically is tricky. I  
want to PV'nize osx using a kernel module later, so I don't have  
control over the physical layout.

> Is it valid and useful to issue sc from privileged mode anyway,  
> except for calling the hypervisor?

Same as a syscall on x86 really. The kernel can and does issue  
syscalls within itself.

>
>> +
>> +The parameters are as follows:
>> +
>> +    r3        KVM_SC_MAGIC_R3
>> +    r4        KVM_SC_MAGIC_R4
>> +    r5        Hypercall number
>> +    r6        First parameter
>> +    r7        Second parameter
>> +    r8        Third parameter
>> +    r9        Fourth parameter
>> +
>> +Hypercall definitions are shared in generic code, so the same  
>> hypercall numbers
>> +apply for x86 and powerpc alike.
>>
>
> Addresses passed in hypercall paramters are guest physical addresses.
>
> Do you have >32 bit physical addresses on 32-bit guests?  if so,  
> you'll need to pass physical addresses in two registers.

I think theoretically it's possible. Will we ever support it?  
Doubtful. Do we need to pass hogh memory addresses to the hv? Even  
more doubtful.

If we hit such a case, I'd just disable the hypercall for 32 bit. Or  
define param1 and param2 to contain the address if the guest is in 32- 
bit mode. No need to always make all params 64 bit imho.

>
>> +
>> +The magic page
>> +==============
>> +
>> +To enable communication between the hypervisor and guest there is  
>> a new shared
>> +page that contains parts of supervisor visible register state. The  
>> guest can
>> +map this shared page using the KVM hypercall  
>> KVM_HC_PPC_MAP_MAGIC_PAGE.
>> +
>> +With this hypercall issued the guest always gets the magic page  
>> mapped at the
>> +desired location in effective and physical address space. For now,  
>> we always
>> +map the page to -4096. This way we can access it using absolute  
>> load and store
>> +functions. The following instruction reads the first field of the  
>> magic page:
>> +
>> +    ld    rX, -4096(0)
>>
>
> Is the address guest controlled or host controlled?

Guest controlled. It's passed in to the map_magic_page hypercall.

>
>> +
>> +The interface is designed to be extensible should there be need  
>> later to add
>> +additional registers to the magic page. If you add fields to the  
>> magic page,
>> +also define a new hypercall feature to indicate that the host can  
>> give you more
>> +registers. Only if the host supports the additional features, make  
>> use of them.
>> +
>> +The magic page has the following layout as described in
>> +arch/powerpc/include/asm/kvm_para.h:
>> +
>> +struct kvm_vcpu_arch_shared {
>> +    __u64 scratch1;
>> +    __u64 scratch2;
>> +    __u64 scratch3;
>> +    __u64 critical;        /* Guest may not get interrupts if ==  
>> r1 */
>>
>
> Elaborate?

I think I have a description in the respective patch. Probably a good  
idea to add it to the documentation.

>
>> +    __u64 sprg0;
>> +    __u64 sprg1;
>> +    __u64 sprg2;
>> +    __u64 sprg3;
>> +    __u64 srr0;
>> +    __u64 srr1;
>> +    __u64 dar;
>> +    __u64 msr;
>> +    __u32 dsisr;
>> +    __u32 int_pending;    /* Tells the guest if we have an  
>> interrupt */
>> +};
>> +
>> +Additions to the page must only occur at the end. Struct fields  
>> are always 32
>> +bit aligned.
>> +
>> +Patched instructions
>> +====================
>> +
>> +The "ld" and "std" instructions are transormed to "lwz" and "stw"  
>> instructions
>> +respectively on 32 bit systems with an added offset of 4 to  
>> accomodate for big
>> +endianness.
>>
>
> Who does the patching? guest or host?

All patching is done by the guest. Probably worth mentioning, yeah.

>
>> +
>> +From            To
>> +====            ==
>> +
>> +mfmsr    rX        ld    rX, magic_page->msr
>> +mfsprg    rX, 0        ld    rX, magic_page->sprg0
>> +mfsprg    rX, 1        ld    rX, magic_page->sprg1
>> +mfsprg    rX, 2        ld    rX, magic_page->sprg2
>> +mfsprg    rX, 3        ld    rX, magic_page->sprg3
>> +mfsrr0    rX        ld    rX, magic_page->srr0
>> +mfsrr1    rX        ld    rX, magic_page->srr1
>> +mfdar    rX        ld    rX, magic_page->dar
>> +mfdsisr    rX        ld    rX, magic_page->dsisr
>> +
>> +mtmsr    rX        std    rX, magic_page->msr
>> +mtsprg    0, rX        std    rX, magic_page->sprg0
>> +mtsprg    1, rX        std    rX, magic_page->sprg1
>> +mtsprg    2, rX        std    rX, magic_page->sprg2
>> +mtsprg    3, rX        std    rX, magic_page->sprg3
>> +mtsrr0    rX        std    rX, magic_page->srr0
>> +mtsrr1    rX        std    rX, magic_page->srr1
>> +mtdar    rX        std    rX, magic_page->dar
>> +mtdsisr    rX        std    rX, magic_page->dsisr
>> +
>> +tlbsync            nop
>> +
>> +mtmsrd    rX, 0        b    <special mtmsr section>
>> +mtmsr            b    <special mtmsr section>
>> +
>> +mtmsrd    rX, 1        b    <special mtmsrd section>
>> +
>> +[BookE only]
>> +wrteei    [0|1]        b    <special wrteei section>
>>
>
> Probably the guest, as only it can arrange for special * sections.   
> Good.
>
>> +
>> +Some instructions require more logic to determine what's going on  
>> than a load
>> +or store instruction can deliver. To enable patching of those, we  
>> keep some
>> +RAM around where we can live translate instructions to. What  
>> happens is the
>> +following:
>> +
>> +    1) copy emulation code to memory
>> +    2) patch that code to fit the emulated instruction
>> +    3) patch that code to return to the original pc + 4
>> +    4) patch the original instruction to branch to the new code
>> +
>> +That way we can inject an arbitrary amount of code as replacement  
>> for a single
>> +instruction. This allows us to check for pending interrupts when  
>> setting EE=1
>> +for example.
>> +
>>
>
> Or not.
>
> What about transitions from paravirt to non-paravirt?  For example,  
> a system reset.

That ... eh ... good question. It would leave the map pending, but  
everything still continues working.

I don't really know in kvm when a reset occured. So we have to make  
qemu set the map to 0 on reset. Let's add then when we add migration  
support and actually expose all those missing states to userspace.  
Currently we only expose half the necessary state for migration  
anyway :).


Alex

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Alexander Graf @ 2010-06-27  9:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2708EB.9020500@redhat.com>


Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> One of the most obvious registers to share with the guest directly  
>> is the
>> MSR. The MSR contains the "interrupts enabled" flag which the guest  
>> has to
>> toggle in critical sections.
>>
>> So in order to bring the overhead of interrupt en- and disabling  
>> down, let's
>> put msr into the shared page. Keep in mind that even though you can  
>> fully read
>> its contents, writing to it doesn't always update all state. There  
>> are a few
>> safe fields that don't require hypervisor interaction. See the guest
>> implementation that follows later for reference.
>>
>
>
> You mean, see the documentation for reference.
>
> It should be possible to write the guest code looking only at the  
> documentation.

*shrug* since we're writing open source I don't mind telling people to  
read code for a reference implemenration. If well written, that's more  
comprehensible than documentation anyways :).

But either way, you can take a look at both - documentation and code,  
yes.

What I really meant here is that the list of registers we patch should  
be taken from the patch code. I didn't want to write out all of them  
in the description.


Alex

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox