From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Date: Wed, 23 Jan 2008 13:38:43 +0000 Subject: Re: [kvm-ppc-devel] KVM kernel/userspace TLB interface Message-Id: <47974363.1070008@linux.vnet.ibm.com> MIME-Version: 1 Content-Type: multipart/mixed; boundary="------------030408090601050007070403" List-Id: References: <1199751688.5741.40.camel@basalt> In-Reply-To: <1199751688.5741.40.camel@basalt> To: kvm-ppc@vger.kernel.org This is a multi-part message in MIME format. --------------030408090601050007070403 Content-Type: text/plain; charset="iso-8859-1"; format="flowed" Content-Transfer-Encoding: quoted-printable Christian Ehrhardt wrote: This mail now has the reworked implementation of the patch for ppc440 in th= e #2 style attached where the guest tlb can be get/set completely and the s= hadow tlb can be read from userspace (now separate ioctls for guest/shadow). Actually this should work for most if not all use cases we have. Especially= our current use case of setting up a proper initial state is very arch dep= endent anyway and if we really can save userspace code for shared usage dep= ends on the fact if we will be able to create a shared bit format for flags= (I start to put a common "map this" call down on my prio list). I just downloaded the "PowerPC e500 Core Family Reference Manual" and will = read a bit to see where the differences are in relation to this interface, = but atm everything works for me because the #2 style interface in this patc= h is sufficient for the only use case we have atm. @Zhang Wei: I would really like to see your code for the solution in the "m= ap this" style you have mentioned that you have written for e500. > It looks like we have two different needs we can solve with the approach = > Hollis suggested by implementing #2 (arch specific) and #3 (generic map=20 > this) interfaces. >=20 > Use cases - for #2 (set/get TLB elements including architecture specific = > information): > - Dump TLB, comparable to vcpu_dump we will need a tlb_dump for=20 > debugging and error reporting. > - Save/Restore/Migrate - even not being a need right now this might=20 > become a need for us all end so we will need a get/set function that is=20 > able to completely save/restore the guest TLB >=20 > Use Cases for #3 (insert a single mapping ) > - simplifying shared userspace code by using only one common interface=20 > wherever possible that does not need to know which hardware it adresses >=20 > --- >=20 > #2 - "get/set TLB" > I attached a prototype implementation of an architecture dependent mass=20 > KVM_GET_TLB & KVM_SET_TLB which get and set the full status the guest &=20 > shadow tlb structs assigned to a vcpu. > I intend to eventually have different calls for guest and shadow=20 > structure because e.g. we actually will never want to set shadow tlb=20 > from userspace - this prototype is still in development and any comments = > are welcome (I'll resend it once it is changed to different=20 > getter/setter for guest and shadow tlb). > @Zhang Wei - the interface I have in mind for this arch specific=20 > implementation is an array of tlbe[MAX_TLB_SIZE] get-/setting all=20 > elements of either guest or shadow tlb at once. Every architecture can=20 > define it's own tlbe structure in the kernel headers and can implement=20 > special needs in their backend for these ioctls (KVM_GET_GUEST_TLB,=20 > KVM_GET_SHADOW_TLB, KVM_SET_GUEST_TLB, but no KVM_SET_SHADOW_TLB). >=20 > --- >=20 > #3 - "map this" > For #3 we need some discussion how the data structure of a shared=20 > interface might look like. For ppc440 it would be the { virtual addr,=20 > real addr, size, flags } as you see the tlbe definition that is moved in = > my patch. I wanted to ask what more/less needs e500 has for a "map this" = > interface to insert a single tlb-entry. We need also to clarify what=20 > flags we all need to define a shared set of bits (if possible to share=20 > that flags) in the flags variable we can use that every architecture can = > map to it's own meaning in the backend to this ioctl. > When I know the needs of e500 for the "map this" interface (and any=20 > other needs brought up that might affect future hardware we want to be=20 > compatible with) I intend to implement a prototype of this interface for = > ppc440 and then iterate it until we are happy with it for ppc440, e500=20 > and whatever hw we have in mind for the future. >=20 >=20 > Zhang Wei wrote: >> Hi, Hollis, >> >>> Hi Zhang, the administrative tools have a need to access the guest's TLB >>> state, for example to create the mapping for the initial code to execute >>> in. This requires a kernel interface to be exported to userspace, but as >>> you noted, the TLBs are very different between 440 and e500. >>> >>> Here are a couple ideas: > [...] >>> #2* Have a per-core interface, e.g. KVM_440_SET_TLB_ENTRY and >>> KVM_e500_SET_TLB_ENTRY. These would communicate via completely >>> different structures. For example, the 440 structure would look >>> something like { index, TID, word 0, word 1, word 2 }. We'd >>> probably want to sanity check each ioctl to make sure it matches >>> the vcpu type. Not only that, but we would need core-specific >>> code in userspace to actually make the call. >>> #3* Try to abstract it so that we have a higher-level "map this" >>> interface. This could look something like { virtual addr, real >>> addr, size, flags }. It's up to core-specific code to select in >>> which TLB to create the mapping, which entry to clobber, >>> interpret the meaning of "flags", etc. I guess the big downside >>> here is that if we want to dump guest state for debugging >>> purposes we'd still need the core-specific interface. >>> >>> We could also try supporting both #2 and #3, so that "generic" code like >>> qemu could use the abstracted (and shared) "map this" call, while tools >>> with more specific needs could still get at the underlying details. >>> >> >> Agree with you! >> In my porting implementation, I used #3. I think #3 should be more >> flexible. >> Maybe we'll add the other TLB support to it in the future. We should >> define a good data structure about TLB content, which will include 440 >> and E500 TLB meaning. >> >> And If we take out TLB operations, rest codes are more generic. >> >> Cheers! >> Wei. >=20 > And Avi Kivity wrote: >> Hollis Blanchard wrote: [...] >> An unabstracted interface is needed for save/restore/migrate, which=20 >> may or may not be important for your target. I do recommend having=20 >> getters and setters for all architectural state. >> >=20 --=20 Gr=FCsse / regards,=20 Christian Ehrhardt IBM Linux Technology Center, Open Virtualization --------------030408090601050007070403 Content-Type: text/plain; name="ioctl-tlb-getset" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ioctl-tlb-getset" diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "44x_tlb.h" @@ -578,24 +579,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu * /* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) { - struct tlbe *tlbe = &vcpu->arch.guest_tlb[0]; - - tlbe->tid = 0; - tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID; - tlbe->word1 = 0; - tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR; - - tlbe++; - tlbe->tid = 0; - tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID; - tlbe->word1 = 0xef600000; - tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR - | PPC44x_TLB_I | PPC44x_TLB_G; - vcpu->arch.pc = 0; vcpu->arch.guest_msr = 0; vcpu->arch.shadow_msr = MSR_PR|MSR_EE|MSR_IS|MSR_DS; - vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */ /* Eye-catching number so we know if the guest takes an interrupt * before it's programmed its own IVPR. */ @@ -820,6 +806,33 @@ long kvm_arch_vcpu_ioctl(struct file *fi long r; switch (ioctl) { + case KVM_GET_GUEST_TLB: { + struct kvm_vcpu *vcpu = filp->private_data; + struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg; + + r = copy_to_user(&(utlb->tlb), &(vcpu->arch.guest_tlb), sizeof(struct kvm_single_tlb)); + if (r) + return r; + break; + } + case KVM_SET_GUEST_TLB: { + struct kvm_vcpu *vcpu = filp->private_data; + struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg; + + r = copy_from_user(&(vcpu->arch.guest_tlb), &(utlb->tlb), sizeof(struct kvm_single_tlb)); + if (r) + return r; + break; + } + case KVM_GET_SHADOW_TLB: { + struct kvm_vcpu *vcpu = filp->private_data; + struct kvm_single_tlb __user *utlb = (struct kvm_single_tlb __user *)arg; + + r = copy_to_user(&(utlb->tlb), &(vcpu->arch.shadow_tlb), sizeof(struct kvm_single_tlb)); + if (r) + return r; + break; + } default: r = -EINVAL; } diff --git a/include/asm-powerpc/kvm.h b/include/asm-powerpc/kvm.h --- a/include/asm-powerpc/kvm.h +++ b/include/asm-powerpc/kvm.h @@ -21,6 +21,7 @@ #define __POWERPC_KVM_H__ #include +#include struct kvm_regs { __u32 pc; @@ -53,4 +54,14 @@ struct kvm_fpu { struct kvm_fpu { }; +struct tlbe { + __u32 tid; /* Only the low 8 bits are used. */ + __u32 word0; + __u32 word1; + __u32 word2; +}; + +struct kvm_single_tlb { + struct tlbe tlb[PPC44x_TLB_SIZE]; +}; #endif /* __POWERPC_KVM_H__ */ diff --git a/include/asm-powerpc/kvm_host.h b/include/asm-powerpc/kvm_host.h --- a/include/asm-powerpc/kvm_host.h +++ b/include/asm-powerpc/kvm_host.h @@ -39,13 +39,6 @@ struct kvm_vcpu_stat { u32 mmio_exits; u32 signal_exits; u32 light_exits; -}; - -struct tlbe { - u32 tid; /* Only the low 8 bits are used. */ - u32 word0; - u32 word1; - u32 word2; }; struct kvm_arch { diff --git a/include/linux/kvm.h b/include/linux/kvm.h --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -289,4 +289,8 @@ struct kvm_vapic_addr { /* Available with KVM_CAP_VAPIC */ #define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr) +#define KVM_GET_GUEST_TLB _IOR(KVMIO, 0x94, struct kvm_single_tlb) +#define KVM_SET_GUEST_TLB _IOW(KVMIO, 0x95, struct kvm_single_tlb) +#define KVM_GET_SHADOW_TLB _IOR(KVMIO, 0x96, struct kvm_single_tlb) + #endif --------------030408090601050007070403 Content-Type: text/plain; name="ioctl-tlb-getset-userspace" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ioctl-tlb-getset-userspace" diff --git a/libkvm/libkvm-powerpc.c b/libkvm/libkvm-powerpc.c --- a/libkvm/libkvm-powerpc.c +++ b/libkvm/libkvm-powerpc.c @@ -25,6 +25,7 @@ #include "kvm-powerpc.h" #include #include +#include int handle_dcr(struct kvm_run *run, kvm_context_t kvm) { @@ -110,3 +111,69 @@ int kvm_arch_run(struct kvm_run *run, kv } return ret; } + +int kvm_get_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb) +{ + int r = 0; + + r = ioctl(kvm->vcpu_fd[vcpu], KVM_GET_GUEST_TLB, tlb); + if (r) { + fprintf(stderr,"%s failed (r='%d')\n",__func__,r); + } + + return r; +} + +int kvm_get_shadow_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb) +{ + int r = 0; + + r = ioctl(kvm->vcpu_fd[vcpu], KVM_GET_SHADOW_TLB, tlb); + if (r) { + fprintf(stderr,"%s failed (r='%d')\n",__func__,r); + } + + return r; +} + +int kvm_set_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb) +{ + int r = 0; + + r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_GUEST_TLB, tlb); + if (r) { + fprintf(stderr,"%s - failed (r='%d')\n",__func__,r); + } + + return r; +} + +static void print_tlb(struct kvm_single_tlb *tlb, char* name, char* flag) +{ + int i; + + fprintf(stderr,"%s:\n",name); + fprintf(stderr,"| %2s | %8s | %8s | %8s | %8s |\n","nr"," tid ", + "word0","word1","word2"); + for (i = 0; i < PPC44x_TLB_SIZE; i++) { + if (tlb->tlb[i].word0 & PPC44x_TLB_VALID) + fprintf(stderr,"%s%2d | %08X | %08X | %08X | %08X |\n", + flag,i, + tlb->tlb[i].tid, + tlb->tlb[i].word0, + tlb->tlb[i].word1, + tlb->tlb[i].word2); + } + fflush(stdout); +} + +void kvm_dump_tlb(kvm_context_t kvm, int vcpu) +{ + struct kvm_single_tlb tlb; + + kvm_get_guest_tlb(kvm, vcpu, &tlb); + print_tlb(&tlb,"Guest TLB","G"); + + kvm_get_shadow_tlb(kvm, vcpu, &tlb); + print_tlb(&tlb,"Shadow TLB","S"); +} diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -556,4 +556,10 @@ int kvm_enable_vapic(kvm_context_t kvm, #endif +#if defined(__powerpc__) +int kvm_get_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb); +int kvm_set_guest_tlb(kvm_context_t kvm, int vcpu, struct kvm_single_tlb *tlb); +void kvm_dump_tlb(kvm_context_t kvm, int vcpu); #endif + +#endif diff --git a/qemu/hw/ppc440_boards_kvm.c b/qemu/hw/ppc440_boards_kvm.c --- a/qemu/hw/ppc440_boards_kvm.c +++ b/qemu/hw/ppc440_boards_kvm.c @@ -124,7 +124,6 @@ void ppc440_init(CPUState *env, ppc_dcr_register(env, DCRN_CPR0_CFGDATA, cpr0, dcr_read_cpr0, NULL); } - static void bambooKVM_init(ram_addr_t ram_size, int vga_ram_size, const char *boot_device, DisplayState *ds, const char *kernel_filename, @@ -220,7 +219,9 @@ static void bambooKVM_init(ram_addr_t ra } } - /* XXX insert TLB entries */ + /* insert initial TLB entry state */ + kvm_ppc440_tlb_init(env); + /* set up initial stack pointer */ env->gpr[1] = (16<<20) - 8; env->gpr[4] = initrd_base; env->gpr[5] = initrd_size; diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c --- a/qemu/qemu-kvm-powerpc.c +++ b/qemu/qemu-kvm-powerpc.c @@ -204,4 +204,24 @@ int handle_powerpc_dcr_write(uint32_t dc return 0; /* XXX ignore failed DCR ops */ } +void kvm_ppc440_tlb_init(CPUState *env) +{ + static struct kvm_single_tlb tlb; + struct tlbe *gtlbe = (tlb.tlb); + + gtlbe->tid = 0; + gtlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID; + gtlbe->word1 = 0; + gtlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR; + + gtlbe++; + gtlbe->tid = 0; + gtlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID; + gtlbe->word1 = 0xef600000; + gtlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR + | PPC44x_TLB_I | PPC44x_TLB_G; + + kvm_set_guest_tlb(kvm_context, env->cpu_index, &tlb); +} + #endif diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -51,6 +51,7 @@ int handle_tpr_access(void *opaque, int #ifdef TARGET_PPC int handle_powerpc_dcr_read(uint32_t dcrn, uint32_t *data); int handle_powerpc_dcr_write(uint32_t dcrn, uint32_t data); +void kvm_ppc440_tlb_init(CPUState *env); #endif #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) --------------030408090601050007070403 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ --------------030408090601050007070403 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kvm-ppc-devel mailing list kvm-ppc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel --------------030408090601050007070403--