* [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour @ 2017-01-09 11:40 Aravinda Prasad 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Aravinda Prasad @ 2017-01-09 11:40 UTC (permalink / raw) To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini Cc: mahesh, mpe, kvm, david This patch introduces a new KVM capability to control how KVM behaves on machine check exception (MCE). Without this capability, KVM redirects machine check exceptions to guest's 0x200 vector, if the address in error belongs to the guest. With this capability KVM causes a guest exit with NMI exit reason. The new capability is required to avoid problems if a new kernel/KVM is used with an old QEMU for guests that don't issue "ibm,nmi-register". As old QEMU does not understand the NMI exit type, it treats it as a fatal error. However, the guest could have handled the machine check error if the exception was delivered to guest's 0x200 interrupt vector instead of NMI exit in case of old QEMU. QEMU part can be found at: http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html Change Log v4: - Allow host-side handling of the machine check exception before passing on the exception to the guest. Change Log v3: - Split the patch into 2. First patch introduces the new capability while the second one enhances KVM to redirect MCE. - Fix access width bug Change Log v2: - Added KVM capability Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/powerpc.c | 7 +++++++ include/uapi/linux/kvm.h | 1 + 4 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 28350a2..018c684 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -264,6 +264,7 @@ struct kvm_arch { int hpt_cma_alloc; struct dentry *debugfs_dir; struct dentry *htab_dentry; + u8 fwnmi_enabled; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE struct mutex hpt_mutex; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index caec7bf..3acd503 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -494,6 +494,7 @@ int main(void) DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits)); DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); + DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 70963c8..a4405a8 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -604,6 +604,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = cpu_has_feature(CPU_FTR_TM_COMP) && is_kvmppc_hv_enabled(kvm); break; + case KVM_CAP_PPC_FWNMI: + r = 1; + break; default: r = 0; break; @@ -1204,6 +1207,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, break; } #endif /* CONFIG_KVM_XICS */ + case KVM_CAP_PPC_FWNMI: + r = 0; + vcpu->kvm->arch.fwnmi_enabled = true; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4ee67cb..3e41c42 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -870,6 +870,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_S390_USER_INSTR0 130 #define KVM_CAP_MSI_DEVID 131 #define KVM_CAP_PPC_HTM 132 +#define KVM_CAP_PPC_FWNMI 133 #ifdef KVM_CAP_IRQ_ROUTING ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled 2017-01-09 11:40 [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Aravinda Prasad @ 2017-01-09 11:40 ` Aravinda Prasad 2017-01-12 3:16 ` David Gibson 2017-01-12 9:05 ` Balbir Singh 2017-01-10 2:24 ` [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Balbir Singh 2017-01-12 9:56 ` Paul Mackerras 2 siblings, 2 replies; 8+ messages in thread From: Aravinda Prasad @ 2017-01-09 11:40 UTC (permalink / raw) To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini Cc: mahesh, mpe, kvm, david Enhance KVM to cause a guest exit with KVM_EXIT_NMI exit reason upon a machine check exception (MCE) in the guest address space if the KVM_CAP_PPC_FWNMI capability is enabled (instead of delivering a 0x200 interrupt to guest). This enables QEMU to build error log and deliver machine check exception to guest via guest registered machine check handler. This approach simplifies the delivery of machine check exception to guest OS compared to the earlier approach of KVM directly invoking 0x200 guest interrupt vector. This design/approach is based on the feedback for the QEMU patches to handle machine check exception. Details of earlier approach of handling machine check exception in QEMU and related discussions can be found at: https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html Note: This patch introduces a hook which is invoked at the time of guest exit to facilitate the host-side handling of machine check exception before the exception is passed on to the guest. Hence, the host-side handling which was performed earlier via machine_check_fwnmi is removed. The reasons for this approach is (i) it is not possible to distinguish whether the exception occurred in the guest or the host from the pt_regs passed on the machine_check_exception(). Hence machine_check_exception() calls panic, instead of passing on the exception to the guest, if the machine check exception is not recoverable. (ii) the approach introduced in this patch gives opportunity to the host kernel to perform actions in virtual mode before passing on the exception to the guest. This approach does not require complex tweaks to machine_check_fwnmi and friends. Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> --- arch/powerpc/kvm/book3s_hv.c | 27 +++++++++++++----- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 47 ++++++++++++++++--------------- arch/powerpc/platforms/powernv/opal.c | 10 +++++++ 3 files changed, 54 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3686471..cae4921 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -123,6 +123,7 @@ MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by"); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +static void kvmppc_machine_check_hook(void); static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc, int *ip) @@ -954,15 +955,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_GUEST; break; case BOOK3S_INTERRUPT_MACHINE_CHECK: + /* Exit to guest with KVM_EXIT_NMI as exit reason */ + run->exit_reason = KVM_EXIT_NMI; + r = RESUME_HOST; /* - * Deliver a machine check interrupt to the guest. - * We have to do this, even if the host has handled the - * machine check, because machine checks use SRR0/1 and - * the interrupt might have trashed guest state in them. + * Invoke host-kernel handler to perform any host-side + * handling before exiting the guest. */ - kvmppc_book3s_queue_irqprio(vcpu, - BOOK3S_INTERRUPT_MACHINE_CHECK); - r = RESUME_GUEST; + kvmppc_machine_check_hook(); break; case BOOK3S_INTERRUPT_PROGRAM: { @@ -3491,6 +3491,19 @@ static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons, } #endif +/* + * Hook to handle machine check exceptions occurred inside a guest. + * This hook is invoked from host virtual mode from KVM before exiting + * the guest with KVM_EXIT_NMI exit reason. This gives an opportunity + * for the host to take action (if any) before passing on the machine + * check exception to the guest kernel. + */ +static void kvmppc_machine_check_hook(void) +{ + if (ppc_md.machine_check_exception) + ppc_md.machine_check_exception(NULL); +} + static long kvm_arch_vm_ioctl_hv(struct file *filp, unsigned int ioctl, unsigned long arg) { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index c3c1d1b..9b41390 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -134,21 +134,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) stb r0, HSTATE_HWTHREAD_REQ(r13) /* - * For external and machine check interrupts, we need - * to call the Linux handler to process the interrupt. - * We do that by jumping to absolute address 0x500 for - * external interrupts, or the machine_check_fwnmi label - * for machine checks (since firmware might have patched - * the vector area at 0x200). The [h]rfid at the end of the - * handler will return to the book3s_hv_interrupts.S code. - * For other interrupts we do the rfid to get back - * to the book3s_hv_interrupts.S code here. + * For external interrupts we need to call the Linux + * handler to process the interrupt. We do that by jumping + * to absolute address 0x500 for external interrupts. + * The [h]rfid at the end of the handler will return to + * the book3s_hv_interrupts.S code. For other interrupts + * we do the rfid to get back to the book3s_hv_interrupts.S + * code here. */ ld r8, 112+PPC_LR_STKOFF(r1) addi r1, r1, 112 ld r7, HSTATE_HOST_MSR(r13) - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL beq 11f cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL @@ -163,7 +160,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtmsrd r6, 1 /* Clear RI in MSR */ mtsrr0 r8 mtsrr1 r7 - beq cr1, 13f /* machine check */ + /* + * BOOK3S_INTERRUPT_MACHINE_CHECK is handled at the + * time of guest exit + */ RFI /* On POWER7, we have external interrupts set to use HSRR0/1 */ @@ -171,8 +171,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtspr SPRN_HSRR1, r7 ba 0x500 -13: b machine_check_fwnmi - 14: mtspr SPRN_HSRR0, r8 mtspr SPRN_HSRR1, r7 b hmi_exception_after_realmode @@ -2338,15 +2336,13 @@ machine_check_realmode: ld r9, HSTATE_KVM_VCPU(r13) li r12, BOOK3S_INTERRUPT_MACHINE_CHECK /* - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through - * machine check interrupt (set HSRR0 to 0x200). And for handled - * errors (no-fatal), just go back to guest execution with current - * HSRR0 instead of exiting guest. This new approach will inject - * machine check to guest for fatal error causing guest to crash. - * - * The old code used to return to host for unhandled errors which - * was causing guest to hang with soft lockups inside guest and - * makes it difficult to recover guest instance. + * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either + * through machine check interrupt (set HSRR0 to 0x200) or by + * exiting the guest with KVM_EXIT_NMI exit reason if guest is + * FWNMI capable. For handled errors (no-fatal), just go back + * to guest execution with current HSRR0. This new approach + * injects machine check errors in guest address space to guest + * enabling guest kernel to suitably handle such errors. * * if we receive machine check with MSR(RI=0) then deliver it to * guest as machine check causing guest to crash. @@ -2360,7 +2356,12 @@ machine_check_realmode: cmpdi r3, 0 /* Did we handle MCE ? */ bne 2f /* Continue guest execution. */ /* If not, deliver a machine check. SRR0/1 are already set */ -1: li r10, BOOK3S_INTERRUPT_MACHINE_CHECK + /* Check if guest is capable of handling NMI exit */ +1: ld r3, VCPU_KVM(r9) + lbz r3, KVM_FWNMI(r3) + cmpdi r3, 1 /* FWNMI capable? */ + beq mc_cont + li r10, BOOK3S_INTERRUPT_MACHINE_CHECK bl kvmppc_msr_interrupt 2: b fast_interrupt_c_return diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 6c9a65b..25749c6 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -446,6 +446,16 @@ int opal_machine_check(struct pt_regs *regs) } machine_check_print_event_info(&evt); + /* + * If regs is NULL, then the machine check exception occurred + * in the guest. Currently no action is performed in the host + * other than printing the event information. The machine check + * exception is passed on to the guest kernel and the guest + * kernel will attempt for recovery. + */ + if (!regs) + return 0; + if (opal_recover_mce(regs, &evt)) return 1; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad @ 2017-01-12 3:16 ` David Gibson 2017-01-12 9:05 ` Balbir Singh 1 sibling, 0 replies; 8+ messages in thread From: David Gibson @ 2017-01-12 3:16 UTC (permalink / raw) To: Aravinda Prasad Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, mpe, kvm [-- Attachment #1: Type: text/plain, Size: 9348 bytes --] On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote: > Enhance KVM to cause a guest exit with KVM_EXIT_NMI > exit reason upon a machine check exception (MCE) in > the guest address space if the KVM_CAP_PPC_FWNMI > capability is enabled (instead of delivering a 0x200 > interrupt to guest). This enables QEMU to build error > log and deliver machine check exception to guest via > guest registered machine check handler. > > This approach simplifies the delivery of machine > check exception to guest OS compared to the earlier > approach of KVM directly invoking 0x200 guest interrupt > vector. > > This design/approach is based on the feedback for the > QEMU patches to handle machine check exception. Details > of earlier approach of handling machine check exception > in QEMU and related discussions can be found at: > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html > > Note: > > This patch introduces a hook which is invoked at the time > of guest exit to facilitate the host-side handling of > machine check exception before the exception is passed > on to the guest. Hence, the host-side handling which was > performed earlier via machine_check_fwnmi is removed. > > The reasons for this approach is (i) it is not possible > to distinguish whether the exception occurred in the > guest or the host from the pt_regs passed on the > machine_check_exception(). Hence machine_check_exception() > calls panic, instead of passing on the exception to > the guest, if the machine check exception is not > recoverable. (ii) the approach introduced in this > patch gives opportunity to the host kernel to perform > actions in virtual mode before passing on the exception > to the guest. This approach does not require complex > tweaks to machine_check_fwnmi and friends. > > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > --- > arch/powerpc/kvm/book3s_hv.c | 27 +++++++++++++----- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 47 ++++++++++++++++--------------- > arch/powerpc/platforms/powernv/opal.c | 10 +++++++ > 3 files changed, 54 insertions(+), 30 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 3686471..cae4921 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -123,6 +123,7 @@ MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by"); > > static void kvmppc_end_cede(struct kvm_vcpu *vcpu); > static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); > +static void kvmppc_machine_check_hook(void); > > static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc, > int *ip) > @@ -954,15 +955,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, > r = RESUME_GUEST; > break; > case BOOK3S_INTERRUPT_MACHINE_CHECK: > + /* Exit to guest with KVM_EXIT_NMI as exit reason */ > + run->exit_reason = KVM_EXIT_NMI; > + r = RESUME_HOST; > /* > - * Deliver a machine check interrupt to the guest. > - * We have to do this, even if the host has handled the > - * machine check, because machine checks use SRR0/1 and > - * the interrupt might have trashed guest state in them. > + * Invoke host-kernel handler to perform any host-side > + * handling before exiting the guest. > */ > - kvmppc_book3s_queue_irqprio(vcpu, > - BOOK3S_INTERRUPT_MACHINE_CHECK); > - r = RESUME_GUEST; > + kvmppc_machine_check_hook(); > break; > case BOOK3S_INTERRUPT_PROGRAM: > { > @@ -3491,6 +3491,19 @@ static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons, > } > #endif > > +/* > + * Hook to handle machine check exceptions occurred inside a guest. > + * This hook is invoked from host virtual mode from KVM before exiting > + * the guest with KVM_EXIT_NMI exit reason. This gives an opportunity > + * for the host to take action (if any) before passing on the machine > + * check exception to the guest kernel. > + */ > +static void kvmppc_machine_check_hook(void) > +{ > + if (ppc_md.machine_check_exception) > + ppc_md.machine_check_exception(NULL); > +} > + > static long kvm_arch_vm_ioctl_hv(struct file *filp, > unsigned int ioctl, unsigned long arg) > { > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index c3c1d1b..9b41390 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -134,21 +134,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > stb r0, HSTATE_HWTHREAD_REQ(r13) > > /* > - * For external and machine check interrupts, we need > - * to call the Linux handler to process the interrupt. > - * We do that by jumping to absolute address 0x500 for > - * external interrupts, or the machine_check_fwnmi label > - * for machine checks (since firmware might have patched > - * the vector area at 0x200). The [h]rfid at the end of the > - * handler will return to the book3s_hv_interrupts.S code. > - * For other interrupts we do the rfid to get back > - * to the book3s_hv_interrupts.S code here. > + * For external interrupts we need to call the Linux > + * handler to process the interrupt. We do that by jumping > + * to absolute address 0x500 for external interrupts. > + * The [h]rfid at the end of the handler will return to > + * the book3s_hv_interrupts.S code. For other interrupts > + * we do the rfid to get back to the book3s_hv_interrupts.S > + * code here. > */ > ld r8, 112+PPC_LR_STKOFF(r1) > addi r1, r1, 112 > ld r7, HSTATE_HOST_MSR(r13) > > - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > beq 11f > cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL > @@ -163,7 +160,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtmsrd r6, 1 /* Clear RI in MSR */ > mtsrr0 r8 > mtsrr1 r7 > - beq cr1, 13f /* machine check */ > + /* > + * BOOK3S_INTERRUPT_MACHINE_CHECK is handled at the > + * time of guest exit > + */ > RFI > > /* On POWER7, we have external interrupts set to use HSRR0/1 */ > @@ -171,8 +171,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtspr SPRN_HSRR1, r7 > ba 0x500 > > -13: b machine_check_fwnmi > - > 14: mtspr SPRN_HSRR0, r8 > mtspr SPRN_HSRR1, r7 > b hmi_exception_after_realmode > @@ -2338,15 +2336,13 @@ machine_check_realmode: > ld r9, HSTATE_KVM_VCPU(r13) > li r12, BOOK3S_INTERRUPT_MACHINE_CHECK > /* > - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through > - * machine check interrupt (set HSRR0 to 0x200). And for handled > - * errors (no-fatal), just go back to guest execution with current > - * HSRR0 instead of exiting guest. This new approach will inject > - * machine check to guest for fatal error causing guest to crash. > - * > - * The old code used to return to host for unhandled errors which > - * was causing guest to hang with soft lockups inside guest and > - * makes it difficult to recover guest instance. > + * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either > + * through machine check interrupt (set HSRR0 to 0x200) or by > + * exiting the guest with KVM_EXIT_NMI exit reason if guest is > + * FWNMI capable. For handled errors (no-fatal), just go back > + * to guest execution with current HSRR0. This new approach > + * injects machine check errors in guest address space to guest > + * enabling guest kernel to suitably handle such errors. > * > * if we receive machine check with MSR(RI=0) then deliver it to > * guest as machine check causing guest to crash. > @@ -2360,7 +2356,12 @@ machine_check_realmode: > cmpdi r3, 0 /* Did we handle MCE ? */ > bne 2f /* Continue guest execution. */ > /* If not, deliver a machine check. SRR0/1 are already set */ > -1: li r10, BOOK3S_INTERRUPT_MACHINE_CHECK > + /* Check if guest is capable of handling NMI exit */ > +1: ld r3, VCPU_KVM(r9) > + lbz r3, KVM_FWNMI(r3) > + cmpdi r3, 1 /* FWNMI capable? */ > + beq mc_cont > + li r10, BOOK3S_INTERRUPT_MACHINE_CHECK > bl kvmppc_msr_interrupt > 2: b fast_interrupt_c_return > > diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c > index 6c9a65b..25749c6 100644 > --- a/arch/powerpc/platforms/powernv/opal.c > +++ b/arch/powerpc/platforms/powernv/opal.c > @@ -446,6 +446,16 @@ int opal_machine_check(struct pt_regs *regs) > } > machine_check_print_event_info(&evt); > > + /* > + * If regs is NULL, then the machine check exception occurred > + * in the guest. Currently no action is performed in the host > + * other than printing the event information. The machine check > + * exception is passed on to the guest kernel and the guest > + * kernel will attempt for recovery. > + */ > + if (!regs) > + return 0; > + > if (opal_recover_mce(regs, &evt)) > return 1; > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad 2017-01-12 3:16 ` David Gibson @ 2017-01-12 9:05 ` Balbir Singh 2017-01-12 10:14 ` Aravinda Prasad 1 sibling, 1 reply; 8+ messages in thread From: Balbir Singh @ 2017-01-12 9:05 UTC (permalink / raw) To: Aravinda Prasad Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, david, kvm On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote: > Enhance KVM to cause a guest exit with KVM_EXIT_NMI > exit reason upon a machine check exception (MCE) in > the guest address space if the KVM_CAP_PPC_FWNMI > capability is enabled (instead of delivering a 0x200 > interrupt to guest). This enables QEMU to build error > log and deliver machine check exception to guest via > guest registered machine check handler. > > This approach simplifies the delivery of machine > check exception to guest OS compared to the earlier > approach of KVM directly invoking 0x200 guest interrupt > vector. > > This design/approach is based on the feedback for the > QEMU patches to handle machine check exception. Details > of earlier approach of handling machine check exception > in QEMU and related discussions can be found at: > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html > > Note: > > This patch introduces a hook which is invoked at the time > of guest exit to facilitate the host-side handling of > machine check exception before the exception is passed > on to the guest. Hence, the host-side handling which was > performed earlier via machine_check_fwnmi is removed. > > The reasons for this approach is (i) it is not possible > to distinguish whether the exception occurred in the > guest or the host from the pt_regs passed on the > machine_check_exception(). Hence machine_check_exception() > calls panic, instead of passing on the exception to > the guest, if the machine check exception is not > recoverable. (ii) the approach introduced in this > patch gives opportunity to the host kernel to perform > actions in virtual mode before passing on the exception > to the guest. This approach does not require complex > tweaks to machine_check_fwnmi and friends. It would be good to qualify the different types of MCE and what action we expect across hypervisor and guest. > > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> > --- > arch/powerpc/kvm/book3s_hv.c | 27 +++++++++++++----- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 47 ++++++++++++++++--------------- > arch/powerpc/platforms/powernv/opal.c | 10 +++++++ > 3 files changed, 54 insertions(+), 30 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 3686471..cae4921 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -123,6 +123,7 @@ MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by"); > > static void kvmppc_end_cede(struct kvm_vcpu *vcpu); > static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); > +static void kvmppc_machine_check_hook(void); > > static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc, > int *ip) > @@ -954,15 +955,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, > r = RESUME_GUEST; > break; > case BOOK3S_INTERRUPT_MACHINE_CHECK: > + /* Exit to guest with KVM_EXIT_NMI as exit reason */ > + run->exit_reason = KVM_EXIT_NMI; > + r = RESUME_HOST; > /* > - * Deliver a machine check interrupt to the guest. > - * We have to do this, even if the host has handled the > - * machine check, because machine checks use SRR0/1 and > - * the interrupt might have trashed guest state in them. > + * Invoke host-kernel handler to perform any host-side > + * handling before exiting the guest. > */ > - kvmppc_book3s_queue_irqprio(vcpu, > - BOOK3S_INTERRUPT_MACHINE_CHECK); > - r = RESUME_GUEST; > + kvmppc_machine_check_hook(); > break; > case BOOK3S_INTERRUPT_PROGRAM: > { > @@ -3491,6 +3491,19 @@ static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons, > } > #endif > > +/* > + * Hook to handle machine check exceptions occurred inside a guest. > + * This hook is invoked from host virtual mode from KVM before exiting > + * the guest with KVM_EXIT_NMI exit reason. This gives an opportunity > + * for the host to take action (if any) before passing on the machine > + * check exception to the guest kernel. > + */ > +static void kvmppc_machine_check_hook(void) > +{ > + if (ppc_md.machine_check_exception) > + ppc_md.machine_check_exception(NULL); > +} > + > static long kvm_arch_vm_ioctl_hv(struct file *filp, > unsigned int ioctl, unsigned long arg) > { > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index c3c1d1b..9b41390 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -134,21 +134,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > stb r0, HSTATE_HWTHREAD_REQ(r13) > > /* > - * For external and machine check interrupts, we need > - * to call the Linux handler to process the interrupt. > - * We do that by jumping to absolute address 0x500 for > - * external interrupts, or the machine_check_fwnmi label > - * for machine checks (since firmware might have patched > - * the vector area at 0x200). The [h]rfid at the end of the > - * handler will return to the book3s_hv_interrupts.S code. > - * For other interrupts we do the rfid to get back > - * to the book3s_hv_interrupts.S code here. > + * For external interrupts we need to call the Linux > + * handler to process the interrupt. We do that by jumping > + * to absolute address 0x500 for external interrupts. > + * The [h]rfid at the end of the handler will return to > + * the book3s_hv_interrupts.S code. For other interrupts > + * we do the rfid to get back to the book3s_hv_interrupts.S > + * code here. > */ > ld r8, 112+PPC_LR_STKOFF(r1) > addi r1, r1, 112 > ld r7, HSTATE_HOST_MSR(r13) > > - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > beq 11f > cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL > @@ -163,7 +160,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtmsrd r6, 1 /* Clear RI in MSR */ > mtsrr0 r8 > mtsrr1 r7 > - beq cr1, 13f /* machine check */ > + /* > + * BOOK3S_INTERRUPT_MACHINE_CHECK is handled at the > + * time of guest exit > + */ > RFI > > /* On POWER7, we have external interrupts set to use HSRR0/1 */ > @@ -171,8 +171,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtspr SPRN_HSRR1, r7 > ba 0x500 > > -13: b machine_check_fwnmi > - > 14: mtspr SPRN_HSRR0, r8 > mtspr SPRN_HSRR1, r7 > b hmi_exception_after_realmode > @@ -2338,15 +2336,13 @@ machine_check_realmode: > ld r9, HSTATE_KVM_VCPU(r13) > li r12, BOOK3S_INTERRUPT_MACHINE_CHECK > /* > - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through > - * machine check interrupt (set HSRR0 to 0x200). And for handled > - * errors (no-fatal), just go back to guest execution with current > - * HSRR0 instead of exiting guest. This new approach will inject > - * machine check to guest for fatal error causing guest to crash. > - * > - * The old code used to return to host for unhandled errors which > - * was causing guest to hang with soft lockups inside guest and > - * makes it difficult to recover guest instance. > + * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either > + * through machine check interrupt (set HSRR0 to 0x200) or by > + * exiting the guest with KVM_EXIT_NMI exit reason if guest is > + * FWNMI capable. For handled errors (no-fatal), just go back > + * to guest execution with current HSRR0. This new approach > + * injects machine check errors in guest address space to guest > + * enabling guest kernel to suitably handle such errors. > * > * if we receive machine check with MSR(RI=0) then deliver it to > * guest as machine check causing guest to crash. > @@ -2360,7 +2356,12 @@ machine_check_realmode: > cmpdi r3, 0 /* Did we handle MCE ? */ > bne 2f /* Continue guest execution. */ > /* If not, deliver a machine check. SRR0/1 are already set */ > -1: li r10, BOOK3S_INTERRUPT_MACHINE_CHECK > + /* Check if guest is capable of handling NMI exit */ > +1: ld r3, VCPU_KVM(r9) > + lbz r3, KVM_FWNMI(r3) > + cmpdi r3, 1 /* FWNMI capable? */ > + beq mc_cont > + li r10, BOOK3S_INTERRUPT_MACHINE_CHECK > bl kvmppc_msr_interrupt > 2: b fast_interrupt_c_return > > diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c > index 6c9a65b..25749c6 100644 > --- a/arch/powerpc/platforms/powernv/opal.c > +++ b/arch/powerpc/platforms/powernv/opal.c > @@ -446,6 +446,16 @@ int opal_machine_check(struct pt_regs *regs) > } > machine_check_print_event_info(&evt); > > + /* > + * If regs is NULL, then the machine check exception occurred > + * in the guest. Currently no action is performed in the host > + * other than printing the event information. The machine check > + * exception is passed on to the guest kernel and the guest > + * kernel will attempt for recovery. > + */ > + if (!regs) > + return 0; > + Shouldn't the host take action for example poison bad pages? > if (opal_recover_mce(regs, &evt)) > return 1; > > Balbir Singh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled 2017-01-12 9:05 ` Balbir Singh @ 2017-01-12 10:14 ` Aravinda Prasad 0 siblings, 0 replies; 8+ messages in thread From: Aravinda Prasad @ 2017-01-12 10:14 UTC (permalink / raw) To: Balbir Singh Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david On Thursday 12 January 2017 02:35 PM, Balbir Singh wrote: > On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote: [ . . .] >> The reasons for this approach is (i) it is not possible >> to distinguish whether the exception occurred in the >> guest or the host from the pt_regs passed on the >> machine_check_exception(). Hence machine_check_exception() >> calls panic, instead of passing on the exception to >> the guest, if the machine check exception is not >> recoverable. (ii) the approach introduced in this >> patch gives opportunity to the host kernel to perform >> actions in virtual mode before passing on the exception >> to the guest. This approach does not require complex >> tweaks to machine_check_fwnmi and friends. > > It would be good to qualify the different types of MCE > and what action we expect across hypervisor and guest. The hypervisor performs actions depending on the type of MCE (SLB multihit, UEs, etc). If the hypervisor is unable to recover from the MCE and if the address in error belongs to the guest, then this patch set forwards the error to the guest kernel for handling. The main goal of this patch set is to pass on the unrecoverable MCE errors in the guest address space to the guest kernel, instead of crashing the hypervisor. The action taken by the hypervisor and the guest kernel upon MCE remains unchanged. [ . . . ] > > Shouldn't the host take action for example poison bad pages? > We want to give the guest kernel a chance to recover the clean part of the page before poisoning. As in case of an UE only few bytes of a page are affected. Hence we don't immediately poison the bad pages in the host. It is expected that the guest kernel performs the poisoning of the bad pages after performing recovery action. This prevents the guest from reusing the bad page. However, the missing part is to communicate back to the host when guest is done with the recovery. This is mainly to prevent reuse of bad pages by the host when the guest shutdowns/reboots/crashes/migrates. We are planning to address this part as a separate patch set. Regards, Aravinda >> if (opal_recover_mce(regs, &evt)) >> return 1; >> >> > > Balbir Singh > -- Regards, Aravinda ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour 2017-01-09 11:40 [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Aravinda Prasad 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad @ 2017-01-10 2:24 ` Balbir Singh 2017-01-12 9:56 ` Paul Mackerras 2 siblings, 0 replies; 8+ messages in thread From: Balbir Singh @ 2017-01-10 2:24 UTC (permalink / raw) To: Aravinda Prasad Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, david, kvm On Mon, Jan 09, 2017 at 05:10:35PM +0530, Aravinda Prasad wrote: > This patch introduces a new KVM capability to control > how KVM behaves on machine check exception (MCE). > Without this capability, KVM redirects machine check > exceptions to guest's 0x200 vector, if the address in > error belongs to the guest. With this capability KVM > causes a guest exit with NMI exit reason. > > The new capability is required to avoid problems if > a new kernel/KVM is used with an old QEMU for guests > that don't issue "ibm,nmi-register". As old QEMU does > not understand the NMI exit type, it treats it as a > fatal error. However, the guest could have handled > the machine check error if the exception was delivered > to guest's 0x200 interrupt vector instead of NMI exit > in case of old QEMU. > Can you move these to a cover letter, the description here does not match the changes > QEMU part can be found at: > http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html > Balbir Singh. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour 2017-01-09 11:40 [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Aravinda Prasad 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad 2017-01-10 2:24 ` [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Balbir Singh @ 2017-01-12 9:56 ` Paul Mackerras 2017-01-13 7:22 ` Aravinda Prasad 2 siblings, 1 reply; 8+ messages in thread From: Paul Mackerras @ 2017-01-12 9:56 UTC (permalink / raw) To: Aravinda Prasad Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, mpe, kvm, david On Mon, Jan 09, 2017 at 05:10:35PM +0530, Aravinda Prasad wrote: > This patch introduces a new KVM capability to control > how KVM behaves on machine check exception (MCE). > Without this capability, KVM redirects machine check > exceptions to guest's 0x200 vector, if the address in > error belongs to the guest. With this capability KVM > causes a guest exit with NMI exit reason. > > The new capability is required to avoid problems if > a new kernel/KVM is used with an old QEMU for guests > that don't issue "ibm,nmi-register". As old QEMU does > not understand the NMI exit type, it treats it as a > fatal error. However, the guest could have handled > the machine check error if the exception was delivered > to guest's 0x200 interrupt vector instead of NMI exit > in case of old QEMU. You need to add a description of the new capability to Documentation/virtual/kvm/api.txt. Paul. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour 2017-01-12 9:56 ` Paul Mackerras @ 2017-01-13 7:22 ` Aravinda Prasad 0 siblings, 0 replies; 8+ messages in thread From: Aravinda Prasad @ 2017-01-13 7:22 UTC (permalink / raw) To: Paul Mackerras Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, mpe, kvm, david On Thursday 12 January 2017 03:26 PM, Paul Mackerras wrote: > On Mon, Jan 09, 2017 at 05:10:35PM +0530, Aravinda Prasad wrote: >> This patch introduces a new KVM capability to control >> how KVM behaves on machine check exception (MCE). >> Without this capability, KVM redirects machine check >> exceptions to guest's 0x200 vector, if the address in >> error belongs to the guest. With this capability KVM >> causes a guest exit with NMI exit reason. >> >> The new capability is required to avoid problems if >> a new kernel/KVM is used with an old QEMU for guests >> that don't issue "ibm,nmi-register". As old QEMU does >> not understand the NMI exit type, it treats it as a >> fatal error. However, the guest could have handled >> the machine check error if the exception was delivered >> to guest's 0x200 interrupt vector instead of NMI exit >> in case of old QEMU. > > You need to add a description of the new capability to > Documentation/virtual/kvm/api.txt. sure. > > Paul. > -- Regards, Aravinda ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-01-13 7:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-09 11:40 [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Aravinda Prasad 2017-01-09 11:40 ` [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad 2017-01-12 3:16 ` David Gibson 2017-01-12 9:05 ` Balbir Singh 2017-01-12 10:14 ` Aravinda Prasad 2017-01-10 2:24 ` [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour Balbir Singh 2017-01-12 9:56 ` Paul Mackerras 2017-01-13 7:22 ` Aravinda Prasad
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).