From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) (using TLSv1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id C27181A0005 for ; Thu, 3 Jul 2014 23:53:06 +1000 (EST) Message-ID: <53B5603E.807@suse.de> Date: Thu, 03 Jul 2014 15:53:02 +0200 From: Alexander Graf MIME-Version: 1.0 To: Mihai Caraman , kvm-ppc@vger.kernel.org Subject: Re: [PATCH 5/5 v4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation References: <1403909347-31622-1-git-send-email-mihai.caraman@freescale.com> <1403909347-31622-6-git-send-email-mihai.caraman@freescale.com> In-Reply-To: <1403909347-31622-6-git-send-email-mihai.caraman@freescale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 28.06.14 00:49, Mihai Caraman wrote: > On book3e, KVM uses load external pid (lwepx) dedicated instruction to read > guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI > and LRAT), generated by loading a guest address, needs to be handled by KVM. > These exceptions are generated in a substituted guest translation context > (EPLC[EGS] = 1) from host context (MSR[GS] = 0). > > Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), > doing minimal checks on the fast path to avoid host performance degradation. > lwepx exceptions originate from host state (MSR[GS] = 0) which implies > additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking > at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context > Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious > too intrusive for the host. > > Read guest last instruction from kvmppc_load_last_inst() by searching for the > physical address and kmap it. This address the TODO for TLB eviction and > execute-but-not-read entries, and allow us to get rid of lwepx until we are > able to handle failures. > > A simple stress benchmark shows a 1% sys performance degradation compared with > previous approach (lwepx without failure handling): > > time for i in `seq 1 10000`; do /bin/echo > /dev/null; done > > real 0m 8.85s > user 0m 4.34s > sys 0m 4.48s > > vs > > real 0m 8.84s > user 0m 4.36s > sys 0m 4.44s > > An alternative solution, to handle lwepx exceptions in KVM, is to temporary > highjack the interrupt vector from host. Some cores share host IVOR registers > between hardware threads, which is the case of FSL e6500, which impose additional > synchronization logic for this solution to work. The optimization can be addressed > later on top of this patch. > > Signed-off-by: Mihai Caraman > --- > v4: > - add switch and new function when getting last inst earlier > - use enum instead of prev semnatic > - get rid of mas0, optimize mas7_mas3 > - give more context in visible messages > - check storage attributes mismatch on MMUv2 > - get rid of pfn_valid check > > v3: > - reworked patch description > - use unaltered kmap addr for kunmap > - get last instruction before beeing preempted > > v2: > - reworked patch description > - used pr_* functions > - addressed cosmetic feedback > > arch/powerpc/kvm/booke.c | 44 +++++++++++++++++ > arch/powerpc/kvm/bookehv_interrupts.S | 37 ++++---------- > arch/powerpc/kvm/e500_mmu_host.c | 91 +++++++++++++++++++++++++++++++++++ > 3 files changed, 144 insertions(+), 28 deletions(-) > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index 34a42b9..843077b 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -869,6 +869,28 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, > } > } > > +static int kvmppc_resume_inst_load(struct kvm_run *run, struct kvm_vcpu *vcpu, > + enum emulation_result emulated, u32 last_inst) > +{ > + switch (emulated) { > + case EMULATE_AGAIN: > + return RESUME_GUEST; > + > + case EMULATE_FAIL: > + pr_debug("%s: load instruction from guest address %lx failed\n", > + __func__, vcpu->arch.pc); > + /* For debugging, encode the failing instruction and > + * report it to userspace. */ > + run->hw.hardware_exit_reason = ~0ULL << 32; > + run->hw.hardware_exit_reason |= last_inst; > + kvmppc_core_queue_program(vcpu, ESR_PIL); > + return RESUME_HOST; > + > + default: > + BUG(); > + } > +} > + > /** > * kvmppc_handle_exit > * > @@ -880,6 +902,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > int r = RESUME_HOST; > int s; > int idx; > + u32 last_inst = KVM_INST_FETCH_FAILED; > + enum emulation_result emulated = EMULATE_DONE; > > /* update before a new last_exit_type is rewritten */ > kvmppc_update_timing_stats(vcpu); > @@ -887,6 +911,20 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > /* restart interrupts if they were meant for the host */ > kvmppc_restart_interrupt(vcpu, exit_nr); > > + /* > + * get last instruction before beeing preempted > + * TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR & ESR_DATA > + */ > + switch (exit_nr) { > + case BOOKE_INTERRUPT_DATA_STORAGE: > + case BOOKE_INTERRUPT_DTLB_MISS: > + case BOOKE_INTERRUPT_HV_PRIV: > + emulated = kvmppc_get_last_inst(vcpu, false, &last_inst); > + break; > + default: > + break; > + } > + > local_irq_enable(); > > trace_kvm_exit(exit_nr, vcpu); > @@ -895,6 +933,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > run->exit_reason = KVM_EXIT_UNKNOWN; > run->ready_for_interrupt_injection = 1; > > + if (emulated != EMULATE_DONE) { > + r = kvmppc_resume_inst_load(run, vcpu, emulated, last_inst); > + goto out; > + } > + > switch (exit_nr) { > case BOOKE_INTERRUPT_MACHINE_CHECK: > printk("MACHINE CHECK: %lx\n", mfspr(SPRN_MCSR)); > @@ -1184,6 +1227,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > BUG(); > } > > +out: > /* > * To avoid clobbering exit_reason, only check for signals if we > * aren't already exiting to userspace for some other reason. > diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S > index 6ff4480..e000b39 100644 > --- a/arch/powerpc/kvm/bookehv_interrupts.S > +++ b/arch/powerpc/kvm/bookehv_interrupts.S > @@ -121,38 +121,14 @@ > 1: > > .if \flags & NEED_EMU > - /* > - * This assumes you have external PID support. > - * To support a bookehv CPU without external PID, you'll > - * need to look up the TLB entry and create a temporary mapping. > - * > - * FIXME: we don't currently handle if the lwepx faults. PR-mode > - * booke doesn't handle it either. Since Linux doesn't use > - * broadcast tlbivax anymore, the only way this should happen is > - * if the guest maps its memory execute-but-not-read, or if we > - * somehow take a TLB miss in the middle of this entry code and > - * evict the relevant entry. On e500mc, all kernel lowmem is > - * bolted into TLB1 large page mappings, and we don't use > - * broadcast invalidates, so we should not take a TLB miss here. > - * > - * Later we'll need to deal with faults here. Disallowing guest > - * mappings that are execute-but-not-read could be an option on > - * e500mc, but not on chips with an LRAT if it is used. > - */ > - > - mfspr r3, SPRN_EPLC /* will already have correct ELPID and EGS */ > PPC_STL r15, VCPU_GPR(R15)(r4) > PPC_STL r16, VCPU_GPR(R16)(r4) > PPC_STL r17, VCPU_GPR(R17)(r4) > PPC_STL r18, VCPU_GPR(R18)(r4) > PPC_STL r19, VCPU_GPR(R19)(r4) > - mr r8, r3 > PPC_STL r20, VCPU_GPR(R20)(r4) > - rlwimi r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS > PPC_STL r21, VCPU_GPR(R21)(r4) > - rlwimi r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR > PPC_STL r22, VCPU_GPR(R22)(r4) > - rlwimi r8, r10, EPC_EPID_SHIFT, EPC_EPID > PPC_STL r23, VCPU_GPR(R23)(r4) > PPC_STL r24, VCPU_GPR(R24)(r4) > PPC_STL r25, VCPU_GPR(R25)(r4) > @@ -162,10 +138,15 @@ > PPC_STL r29, VCPU_GPR(R29)(r4) > PPC_STL r30, VCPU_GPR(R30)(r4) > PPC_STL r31, VCPU_GPR(R31)(r4) > - mtspr SPRN_EPLC, r8 > - isync > - lwepx r9, 0, r5 > - mtspr SPRN_EPLC, r3 > + > + /* > + * We don't use external PID support. lwepx faults would need to be > + * handled by KVM and this implies aditional code in DO_KVM (for > + * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which > + * is too intrusive for the host. Get last instuction in > + * kvmppc_get_last_inst(). > + */ > + li r9, KVM_INST_FETCH_FAILED > stw r9, VCPU_LAST_INST(r4) > .endif > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c > index 4b4e8d6..57463e5 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -606,11 +606,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr, > } > } > > +#ifdef CONFIG_KVM_BOOKE_HV > int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, > u32 *instr) > { > + gva_t geaddr; > + hpa_t addr; > + hfn_t pfn; > + hva_t eaddr; > + u32 mas1, mas2, mas3; > + u64 mas7_mas3; > + struct page *page; > + unsigned int addr_space, psize_shift; > + bool pr; > + unsigned long flags; > + > + /* Search TLB for guest pc to get the real address */ > + geaddr = kvmppc_get_pc(vcpu); > + > + addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG; > + > + local_irq_save(flags); > + mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space); > + mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid); > + asm volatile("tlbsx 0, %[geaddr]\n" : : > + [geaddr] "r" (geaddr)); > + mtspr(SPRN_MAS5, 0); > + mtspr(SPRN_MAS8, 0); > + mas1 = mfspr(SPRN_MAS1); > + mas2 = mfspr(SPRN_MAS2); > + mas3 = mfspr(SPRN_MAS3); > +#ifdef CONFIG_64BIT > + mas7_mas3 = mfspr(SPRN_MAS7_MAS3); > +#else > + mas7_mas3 = ((u64)mfspr(SPRN_MAS7) << 32) | mas3; > +#endif > + local_irq_restore(flags); > + > + /* > + * If the TLB entry for guest pc was evicted, return to the guest. > + * There are high chances to find a valid TLB entry next time. > + */ > + if (!(mas1 & MAS1_VALID)) > + return EMULATE_AGAIN; > + > + /* > + * Another thread may rewrite the TLB entry in parallel, don't > + * execute from the address if the execute permission is not set > + */ > + pr = vcpu->arch.shared->msr & MSR_PR; > + if (unlikely((pr && !(mas3 & MAS3_UX)) || > + (!pr && !(mas3 & MAS3_SX)))) { > + pr_debug("%s: Instuction emulation from guest addres %08lx without execute permission\n", > + __func__, geaddr); > + return EMULATE_FAIL; In this case how did we ever get here? Why can't we just evict the entry and return EMULATE_AGAIN? > + } > + > + /* > + * The real address will be mapped by a cacheable, memory coherent, > + * write-back page. Check for mismatches when LRAT is used. > + */ > + if (has_feature(vcpu, VCPU_FTR_MMU_V2) && > + unlikely((mas2 & MAS2_I) || (mas2 & MAS2_W) || !(mas2 & MAS2_M))) { > + pr_debug("%s: Instuction emulation from guest addres %08lx mismatches storage attributes\n", > + __func__, geaddr); > + return EMULATE_FAIL; Hrm - do we really want to deal with injecting faults here? I'd say it's ok to just end up in an endless EMULATE_AGAIN loop. > + } > + > + /* Get page size */ > + psize_shift = MAS1_GET_TSIZE(mas1) + 10; > + > + /* Map a page and get guest's instruction */ > + addr = (mas7_mas3 & (~0ULL << psize_shift)) | > + (geaddr & ((1ULL << psize_shift) - 1ULL)); > + pfn = addr >> PAGE_SHIFT; > + > + /* Guard us against emulation from devices area */ > + if (unlikely(!page_is_ram(pfn))) { > + pr_debug("%s: Instruction emulation from non-RAM host addres %08llx is not supported\n", > + __func__, addr); > + return EMULATE_FAIL; Same here :). > + } > + > + page = pfn_to_page(pfn); > + eaddr = (unsigned long)kmap_atomic(page); > + *instr = *(u32 *)(eaddr | (addr & ~PAGE_MASK)); > + kunmap_atomic((u32 *)eaddr); Doesn't kmap_atomic() have to be guarded somehow? Alex