All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: kvm-ppc@vger.kernel.org
Subject: Re: [kvm-ppc-devel] KVM kernel/userspace TLB interface
Date: Wed, 23 Jan 2008 13:38:43 +0000	[thread overview]
Message-ID: <47974363.1070008@linux.vnet.ibm.com> (raw)
In-Reply-To: <1199751688.5741.40.camel@basalt>

[-- Attachment #1: Type: text/plain, Size: 6174 bytes --]

Christian Ehrhardt wrote:

This mail now has the reworked implementation of the patch for ppc440 in the #2 style attached where the guest tlb can be get/set completely and the shadow tlb can be read from userspace (now separate ioctls for guest/shadow).
Actually this should work for most if not all use cases we have. Especially our current use case of setting up a proper initial state is very arch dependent anyway and if we really can save userspace code for shared usage depends on the fact if we will be able to create a shared bit format for flags (I start to put a common "map this" call down on my prio list).

I just downloaded the "PowerPC e500 Core Family Reference Manual" and will read a bit to see where the differences are in relation to this interface, but atm everything works for me because the #2 style interface in this patch is sufficient for the only use case we have atm.
@Zhang Wei: I would really like to see your code for the solution in the "map this" style you have mentioned that you have written for e500.

> It looks like we have two different needs we can solve with the approach 
> Hollis suggested by implementing #2 (arch specific) and #3 (generic map 
> this) interfaces.
> 
> Use cases - for #2 (set/get TLB elements including architecture specific 
> information):
> - Dump TLB, comparable to vcpu_dump we will need a tlb_dump for 
> debugging and error reporting.
> - Save/Restore/Migrate - even not being a need right now this might 
> become a need for us all end so we will need a get/set function that is 
> able to completely save/restore the guest TLB
> 
> Use Cases for #3 (insert a single mapping )
> - simplifying shared userspace code by using only one common interface 
> wherever possible that does not need to know which hardware it adresses
> 
> ---
> 
> #2 - "get/set TLB"
> I attached a prototype implementation of an architecture dependent mass 
> KVM_GET_TLB & KVM_SET_TLB which get and set the full status the guest & 
> shadow tlb structs assigned to a vcpu.
> I intend to eventually have different calls for guest and shadow 
> structure because e.g. we actually will never want to set shadow tlb 
> from userspace - this prototype is still in development and any comments 
> are welcome (I'll resend it once it is changed to different 
> getter/setter for guest and shadow tlb).
> @Zhang Wei - the interface I have in mind for this arch specific 
> implementation is an array of tlbe[MAX_TLB_SIZE] get-/setting all 
> elements of either guest or shadow tlb at once. Every architecture can 
> define it's own tlbe structure in the kernel headers and can implement 
> special needs in their backend for these ioctls (KVM_GET_GUEST_TLB, 
> KVM_GET_SHADOW_TLB, KVM_SET_GUEST_TLB, but no KVM_SET_SHADOW_TLB).
> 
> ---
> 
> #3 - "map this"
> For #3 we need some discussion how the data structure of a shared 
> interface might look like. For ppc440 it would be the { virtual addr, 
> real addr, size, flags } as you see the tlbe definition that is moved in 
> my patch. I wanted to ask what more/less needs e500 has for a "map this" 
> interface to insert a single tlb-entry. We need also to clarify what 
> flags we all need to define a shared set of bits (if possible to share 
> that flags) in the flags variable we can use that every architecture can 
> map to it's own meaning in the backend to this ioctl.
> When I know the needs of e500 for the "map this" interface (and any 
> other needs brought up that might affect future hardware we want to be 
> compatible with) I intend to implement a prototype of this interface for 
> ppc440 and then iterate it until we are happy with it for ppc440, e500 
> and whatever hw we have in mind for the future.
> 
> 
> Zhang Wei wrote:
>> Hi, Hollis,
>>
>>> Hi Zhang, the administrative tools have a need to access the guest's TLB
>>> state, for example to create the mapping for the initial code to execute
>>> in. This requires a kernel interface to be exported to userspace, but as
>>> you noted, the TLBs are very different between 440 and e500.
>>>
>>> Here are a couple ideas:
> [...]
>>>      #2* Have a per-core interface, e.g. KVM_440_SET_TLB_ENTRY and
>>>         KVM_e500_SET_TLB_ENTRY. These would communicate via completely
>>>         different structures. For example, the 440 structure would look
>>>         something like { index, TID, word 0, word 1, word 2 }. We'd
>>>         probably want to sanity check each ioctl to make sure it matches
>>>         the vcpu type. Not only that, but we would need core-specific
>>>         code in userspace to actually make the call.
>>>      #3* Try to abstract it so that we have a higher-level "map this"
>>>         interface. This could look something like { virtual addr, real
>>>         addr, size, flags }. It's up to core-specific code to select in
>>>         which TLB to create the mapping, which entry to clobber,
>>>         interpret the meaning of "flags", etc. I guess the big downside
>>>         here is that if we want to dump guest state for debugging
>>>         purposes we'd still need the core-specific interface.
>>>
>>> We could also try supporting both #2 and #3, so that "generic" code like
>>> qemu could use the abstracted (and shared) "map this" call, while tools
>>> with more specific needs could still get at the underlying details.
>>>
>>
>> Agree with you!
>> In my porting implementation, I used #3. I think #3 should be more
>> flexible.
>> Maybe we'll add the other TLB support to it in the future. We should
>> define a good data structure about TLB content, which will include 440
>> and E500 TLB meaning.
>>
>> And If we take out TLB operations, rest codes are more generic.
>>
>> Cheers!
>> Wei.
> 
> And Avi Kivity wrote:
>> Hollis Blanchard wrote:   [...]
>> An unabstracted interface is needed for save/restore/migrate, which 
>> may or may not be important for your target.  I do recommend having 
>> getters and setters for all architectural state.
>>
> 

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

[-- Attachment #2: ioctl-tlb-getset --]
[-- Type: text/plain, Size: 3604 bytes --]

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -26,6 +26,7 @@
 #include <linux/vmalloc.h>
 #include <asm/cputable.h>
 #include <asm/cacheflush.h>
+#include <asm/uaccess.h>
 
 #include "44x_tlb.h"
 
@@ -578,24 +579,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *
 /* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
-	struct tlbe *tlbe = &vcpu->arch.guest_tlb[0];
-
-	tlbe->tid = 0;
-	tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
-	tlbe->word1 = 0;
-	tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
-
-	tlbe++;
-	tlbe->tid = 0;
-	tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
-	tlbe->word1 = 0xef600000;
-	tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
-	              | PPC44x_TLB_I | PPC44x_TLB_G;
-
 	vcpu->arch.pc = 0;
 	vcpu->arch.guest_msr = 0;
 	vcpu->arch.shadow_msr = MSR_PR|MSR_EE|MSR_IS|MSR_DS;
-	vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */
 
 	/* Eye-catching number so we know if the guest takes an interrupt
 	 * before it's programmed its own IVPR. */
@@ -820,6 +806,33 @@ long kvm_arch_vcpu_ioctl(struct file *fi
 	long r;
 
 	switch (ioctl) {
+	case KVM_GET_GUEST_TLB: {
+		struct kvm_vcpu *vcpu = filp->private_data;
+		struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg;
+ 
+		r = copy_to_user(&(utlb->tlb), &(vcpu->arch.guest_tlb), sizeof(struct kvm_single_tlb));
+		if (r)
+			return r;
+		break;
+	}
+	case KVM_SET_GUEST_TLB: {
+		struct kvm_vcpu *vcpu = filp->private_data;
+		struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg;
+ 
+		r = copy_from_user(&(vcpu->arch.guest_tlb), &(utlb->tlb), sizeof(struct kvm_single_tlb));
+		if (r)
+			return r;
+ 		break;
+	}
+	case KVM_GET_SHADOW_TLB: {
+		struct kvm_vcpu *vcpu = filp->private_data;
+		struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg;
+ 
+		r = copy_to_user(&(utlb->tlb), &(vcpu->arch.shadow_tlb), sizeof(struct kvm_single_tlb));
+		if (r)
+			return r;
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/asm-powerpc/kvm.h b/include/asm-powerpc/kvm.h
--- a/include/asm-powerpc/kvm.h
+++ b/include/asm-powerpc/kvm.h
@@ -21,6 +21,7 @@
 #define __POWERPC_KVM_H__
 
 #include <asm/types.h>
+#include <asm/mmu-44x.h>
 
 struct kvm_regs {
 	__u32 pc;
@@ -53,4 +54,14 @@ struct kvm_fpu {
 struct kvm_fpu {
 };
 
+struct tlbe {
+	__u32 tid; /* Only the low 8 bits are used. */
+	__u32 word0;
+	__u32 word1;
+	__u32 word2;
+};
+
+struct kvm_single_tlb {
+	struct tlbe tlb[PPC44x_TLB_SIZE];
+};
 #endif /* __POWERPC_KVM_H__ */
diff --git a/include/asm-powerpc/kvm_host.h b/include/asm-powerpc/kvm_host.h
--- a/include/asm-powerpc/kvm_host.h
+++ b/include/asm-powerpc/kvm_host.h
@@ -39,13 +39,6 @@ struct kvm_vcpu_stat {
 	u32 mmio_exits;
 	u32 signal_exits;
 	u32 light_exits;
-};
-
-struct tlbe {
-	u32 tid; /* Only the low 8 bits are used. */
-	u32 word0;
-	u32 word1;
-	u32 word2;
 };
 
 struct kvm_arch {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -289,4 +289,8 @@ struct kvm_vapic_addr {
 /* Available with KVM_CAP_VAPIC */
 #define KVM_SET_VAPIC_ADDR        _IOW(KVMIO,  0x93, struct kvm_vapic_addr)
 
+#define KVM_GET_GUEST_TLB         _IOR(KVMIO,  0x94, struct kvm_single_tlb)
+#define KVM_SET_GUEST_TLB         _IOW(KVMIO,  0x95, struct kvm_single_tlb)
+#define KVM_GET_SHADOW_TLB        _IOR(KVMIO,  0x96, struct kvm_single_tlb)
+
 #endif

[-- Attachment #3: ioctl-tlb-getset-userspace --]
[-- Type: text/plain, Size: 4225 bytes --]

diff --git a/libkvm/libkvm-powerpc.c b/libkvm/libkvm-powerpc.c
--- a/libkvm/libkvm-powerpc.c
+++ b/libkvm/libkvm-powerpc.c
@@ -25,6 +25,7 @@
 #include "kvm-powerpc.h"
 #include <errno.h>
 #include <stdio.h>
+#include <sys/ioctl.h>
 
 int handle_dcr(struct kvm_run *run,  kvm_context_t kvm)
 {
@@ -110,3 +111,69 @@ int kvm_arch_run(struct kvm_run *run, kv
 	}
 	return ret;
 }
+
+int kvm_get_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb)
+{
+	int r = 0;
+
+	r = ioctl(kvm->vcpu_fd[vcpu], KVM_GET_GUEST_TLB, tlb);
+	if (r) {
+		fprintf(stderr,"%s failed (r='%d')\n",__func__,r);
+	}
+
+	return r;
+}
+
+int kvm_get_shadow_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb)
+{
+	int r = 0;
+
+	r = ioctl(kvm->vcpu_fd[vcpu], KVM_GET_SHADOW_TLB, tlb);
+	if (r) {
+		fprintf(stderr,"%s failed (r='%d')\n",__func__,r);
+	}
+
+	return r;
+}
+
+int kvm_set_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb)
+{
+	int r = 0;
+
+	r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_GUEST_TLB, tlb);
+	if (r) {
+		fprintf(stderr,"%s - failed (r='%d')\n",__func__,r);
+	}
+
+	return r;
+}
+
+static void print_tlb(struct kvm_single_tlb *tlb, char* name, char* flag)
+{
+	int i;
+
+	fprintf(stderr,"%s:\n",name);
+	fprintf(stderr,"| %2s | %8s | %8s | %8s | %8s |\n","nr"," tid ",
+						"word0","word1","word2");
+	for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+		if (tlb->tlb[i].word0 & PPC44x_TLB_VALID)
+			fprintf(stderr,"%s%2d | %08X | %08X | %08X | %08X |\n",
+				flag,i,
+				tlb->tlb[i].tid,
+				tlb->tlb[i].word0,
+				tlb->tlb[i].word1,
+				tlb->tlb[i].word2);
+	}
+	fflush(stdout);
+}
+
+void kvm_dump_tlb(kvm_context_t kvm, int vcpu)
+{
+	struct kvm_single_tlb tlb;
+
+	kvm_get_guest_tlb(kvm, vcpu, &tlb);
+	print_tlb(&tlb,"Guest TLB","G");
+
+	kvm_get_shadow_tlb(kvm, vcpu, &tlb);
+	print_tlb(&tlb,"Shadow TLB","S");
+}
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -556,4 +556,10 @@ int kvm_enable_vapic(kvm_context_t kvm, 
 
 #endif
 
+#if defined(__powerpc__)
+int kvm_get_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb);
+int kvm_set_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb);
+void kvm_dump_tlb(kvm_context_t kvm, int vcpu);
 #endif
+
+#endif
diff --git a/qemu/hw/ppc440_boards_kvm.c b/qemu/hw/ppc440_boards_kvm.c
--- a/qemu/hw/ppc440_boards_kvm.c
+++ b/qemu/hw/ppc440_boards_kvm.c
@@ -124,7 +124,6 @@ void ppc440_init(CPUState *env,
 	ppc_dcr_register(env, DCRN_CPR0_CFGDATA, cpr0, dcr_read_cpr0, NULL);
 }
 
-
 static void bambooKVM_init(ram_addr_t ram_size, int vga_ram_size, 
 			const char *boot_device, DisplayState *ds,  
 			const char *kernel_filename,
@@ -220,7 +219,9 @@ static void bambooKVM_init(ram_addr_t ra
 		}
 	}
 
-	/* XXX insert TLB entries */
+	/* insert initial TLB entry state */
+	kvm_ppc440_tlb_init(env);
+	/* set up initial stack pointer */
 	env->gpr[1] = (16<<20) - 8;
 	env->gpr[4] = initrd_base;
 	env->gpr[5] = initrd_size;
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -204,4 +204,24 @@ int handle_powerpc_dcr_write(uint32_t dc
     return 0; /* XXX ignore failed DCR ops */
 }
 
+void kvm_ppc440_tlb_init(CPUState *env)
+{
+    static struct kvm_single_tlb tlb;
+    struct tlbe *gtlbe = (tlb.tlb);
+
+    gtlbe->tid = 0;
+    gtlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
+    gtlbe->word1 = 0;
+    gtlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
+
+    gtlbe++;
+    gtlbe->tid = 0;
+    gtlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
+    gtlbe->word1 = 0xef600000;
+    gtlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
+                   | PPC44x_TLB_I | PPC44x_TLB_G;
+
+    kvm_set_guest_tlb(kvm_context, env->cpu_index, &tlb);
+}
+
 #endif
diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -51,6 +51,7 @@ int handle_tpr_access(void *opaque, int 
 #ifdef TARGET_PPC
 int handle_powerpc_dcr_read(uint32_t dcrn, uint32_t *data);
 int handle_powerpc_dcr_write(uint32_t dcrn, uint32_t data);
+void kvm_ppc440_tlb_init(CPUState *env);
 #endif
 
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))

[-- Attachment #4: Type: text/plain, Size: 228 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

[-- Attachment #5: Type: text/plain, Size: 170 bytes --]

_______________________________________________
kvm-ppc-devel mailing list
kvm-ppc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel

  parent reply	other threads:[~2008-01-23 13:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-08  0:21 [kvm-ppc-devel] KVM kernel/userspace TLB interface Hollis Blanchard
2008-01-08  7:12 ` Zhang Wei
2008-01-08 14:03 ` Avi Kivity
2008-01-22 13:25 ` Christian Ehrhardt
2008-01-23 13:38 ` Christian Ehrhardt [this message]
2008-01-23 21:05 ` Hollis Blanchard
2008-01-24  7:23 ` Christian Ehrhardt
2008-01-24  7:37 ` Hollis Blanchard
2008-01-24  7:58 ` Zhang Wei
2008-01-24  8:01 ` Zhang Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47974363.1070008@linux.vnet.ibm.com \
    --to=ehrhardt@linux.vnet.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.