* [PATCH v3 1/4] KVM: VMX: use kvm_event_needs_reinjection
@ 2017-08-24 4:21 Wanpeng Li
2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Wanpeng Li @ 2017-08-24 4:21 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li
From: Wanpeng Li <wanpeng.li@hotmail.com>
Use kvm_event_needs_reinjection() encapsulation.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
arch/x86/kvm/vmx.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dd710d3..c5f43ab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10988,9 +10988,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- if (vcpu->arch.exception.pending ||
- vcpu->arch.nmi_injected ||
- vcpu->arch.interrupt.pending)
+ if (kvm_event_needs_reinjection(vcpu))
return -EBUSY;
if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 4:21 [PATCH v3 1/4] KVM: VMX: use kvm_event_needs_reinjection Wanpeng Li @ 2017-08-24 4:21 ` Wanpeng Li 2017-08-24 6:52 ` Wanpeng Li [not found] ` <1503548506-4457-2-git-send-email-wanpeng.li-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> 2017-08-24 4:21 ` [PATCH v3 3/4] KVM: VMX: Move the nested_vmx_inject_exception_vmexit call from nested_vmx_check_exception to vmx_queue_exception Wanpeng Li 2017-08-24 4:21 ` [PATCH v3 4/4] KVM: nVMX: Fix trying to cancel vmlauch/vmresume Wanpeng Li 2 siblings, 2 replies; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 4:21 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li From: Wanpeng Li <wanpeng.li@hotmail.com> vmx_complete_interrupts() assumes that the exception is always injected, so it would be dropped by kvm_clear_exception_queue(). This patch separates exception.pending from exception.injected, exception.inject represents the exception is injected or the exception should be reinjected due to vmexit occurs during event delivery in VMX non-root operation. exception.pending represents the exception is queued and will be cleared when injecting the exception to the guest. So exception.pending and exception.injected can cooperate to guarantee exception will not be lost. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- v2 -> v3: * the changes to inject_pending_event and adds the WARN_ON * combine injected and pending exception for GET/SET_VCPU_EVENTS arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm.c | 2 +- arch/x86/kvm/vmx.c | 4 +-- arch/x86/kvm/x86.c | 56 ++++++++++++++++++++++++++--------------- arch/x86/kvm/x86.h | 4 +-- 5 files changed, 42 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9d90787..6e385ac 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -547,8 +547,8 @@ struct kvm_vcpu_arch { struct kvm_queued_exception { bool pending; + bool injected; bool has_error_code; - bool reinject; u8 nr; u32 error_code; u8 nested_apf; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index a2fddce..6a439a1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -655,7 +655,7 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); unsigned nr = vcpu->arch.exception.nr; bool has_error_code = vcpu->arch.exception.has_error_code; - bool reinject = vcpu->arch.exception.reinject; + bool reinject = vcpu->arch.exception.injected; u32 error_code = vcpu->arch.exception.error_code; /* diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c5f43ab..902b780 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2503,7 +2503,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned nr = vcpu->arch.exception.nr; bool has_error_code = vcpu->arch.exception.has_error_code; - bool reinject = vcpu->arch.exception.reinject; + bool reinject = vcpu->arch.exception.injected; u32 error_code = vcpu->arch.exception.error_code; u32 intr_info = nr | INTR_INFO_VALID_MASK; @@ -10948,7 +10948,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, u32 idt_vectoring; unsigned int nr; - if (vcpu->arch.exception.pending && vcpu->arch.exception.reinject) { + if (vcpu->arch.exception.injected) { nr = vcpu->arch.exception.nr; idt_vectoring = nr | VECTORING_INFO_VALID_MASK; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f41b88..b698b2f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -389,15 +389,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, kvm_make_request(KVM_REQ_EVENT, vcpu); - if (!vcpu->arch.exception.pending) { + if (!vcpu->arch.exception.pending || + !vcpu->arch.exception.injected) { queue: if (has_error && !is_protmode(vcpu)) has_error = false; - vcpu->arch.exception.pending = true; + if (reinject) + vcpu->arch.exception.injected = true; + else + vcpu->arch.exception.pending = true; vcpu->arch.exception.has_error_code = has_error; vcpu->arch.exception.nr = nr; vcpu->arch.exception.error_code = error_code; - vcpu->arch.exception.reinject = reinject; return; } @@ -3069,8 +3072,14 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, struct kvm_vcpu_events *events) { process_nmi(vcpu); + /* + * FIXME: pass injected and pending separately. This is only + * needed for nested virtualization, whose state cannot be + * migrated yet. For now we combine them just in case. + */ events->exception.injected = - vcpu->arch.exception.pending && + (vcpu->arch.exception.pending || + vcpu->arch.exception.injected) && !kvm_exception_is_soft(vcpu->arch.exception.nr); events->exception.nr = vcpu->arch.exception.nr; events->exception.has_error_code = vcpu->arch.exception.has_error_code; @@ -3125,6 +3134,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, return -EINVAL; process_nmi(vcpu); + vcpu->arch.exception.injected = false; vcpu->arch.exception.pending = events->exception.injected; vcpu->arch.exception.nr = events->exception.nr; vcpu->arch.exception.has_error_code = events->exception.has_error_code; @@ -6350,21 +6360,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) int r; /* try to reinject previous events if any */ - if (vcpu->arch.exception.pending) { - trace_kvm_inj_exception(vcpu->arch.exception.nr, - vcpu->arch.exception.has_error_code, - vcpu->arch.exception.error_code); - - if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) - __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | - X86_EFLAGS_RF); - - if (vcpu->arch.exception.nr == DB_VECTOR && - (vcpu->arch.dr7 & DR7_GD)) { - vcpu->arch.dr7 &= ~DR7_GD; - kvm_update_dr7(vcpu); - } - + if (vcpu->arch.exception.injected) { kvm_x86_ops->queue_exception(vcpu); return 0; } @@ -6386,7 +6382,25 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) } /* try to inject new event if pending */ - if (vcpu->arch.smi_pending && !is_smm(vcpu)) { + if (vcpu->arch.exception.pending) { + trace_kvm_inj_exception(vcpu->arch.exception.nr, + vcpu->arch.exception.has_error_code, + vcpu->arch.exception.error_code); + + vcpu->arch.exception.pending = false; + vcpu->arch.exception.injected = true; + kvm_x86_ops->queue_exception(vcpu); + + if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) + __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | + X86_EFLAGS_RF); + + if (vcpu->arch.exception.nr == DB_VECTOR && + (vcpu->arch.dr7 & DR7_GD)) { + vcpu->arch.dr7 &= ~DR7_GD; + kvm_update_dr7(vcpu); + } + } else if (vcpu->arch.smi_pending && !is_smm(vcpu)) { vcpu->arch.smi_pending = false; enter_smm(vcpu); } else if (vcpu->arch.nmi_pending && kvm_x86_ops->nmi_allowed(vcpu)) { @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_x86_ops->enable_nmi_window(vcpu); if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) kvm_x86_ops->enable_irq_window(vcpu); + WARN_ON(vcpu->arch.exception.pending); } if (kvm_lapic_enabled(vcpu)) { @@ -7738,6 +7753,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.nmi_injected = false; kvm_clear_interrupt_queue(vcpu); kvm_clear_exception_queue(vcpu); + vcpu->arch.exception.pending = false; memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); kvm_update_dr0123(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 1134603..e6ec0de 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -11,7 +11,7 @@ static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) { - vcpu->arch.exception.pending = false; + vcpu->arch.exception.injected = false; } static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector, @@ -29,7 +29,7 @@ static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu) static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu) { - return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending || + return vcpu->arch.exception.injected || vcpu->arch.interrupt.pending || vcpu->arch.nmi_injected; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li @ 2017-08-24 6:52 ` Wanpeng Li 2017-08-24 8:57 ` Paolo Bonzini [not found] ` <1503548506-4457-2-git-send-email-wanpeng.li-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> 1 sibling, 1 reply; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 6:52 UTC (permalink / raw) To: linux-kernel@vger.kernel.org, kvm Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li 2017-08-24 12:21 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > vmx_complete_interrupts() assumes that the exception is always injected, > so it would be dropped by kvm_clear_exception_queue(). This patch separates > exception.pending from exception.injected, exception.inject represents the > exception is injected or the exception should be reinjected due to vmexit > occurs during event delivery in VMX non-root operation. exception.pending > represents the exception is queued and will be cleared when injecting the > exception to the guest. So exception.pending and exception.injected can > cooperate to guarantee exception will not be lost. > > Reported-by: Radim Krčmář <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- > v2 -> v3: > * the changes to inject_pending_event and adds the WARN_ON > * combine injected and pending exception for GET/SET_VCPU_EVENTS > > arch/x86/include/asm/kvm_host.h | 2 +- > arch/x86/kvm/svm.c | 2 +- > arch/x86/kvm/vmx.c | 4 +-- > arch/x86/kvm/x86.c | 56 ++++++++++++++++++++++++++--------------- > arch/x86/kvm/x86.h | 4 +-- > 5 files changed, 42 insertions(+), 26 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 9d90787..6e385ac 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -547,8 +547,8 @@ struct kvm_vcpu_arch { > > struct kvm_queued_exception { > bool pending; > + bool injected; > bool has_error_code; > - bool reinject; > u8 nr; > u32 error_code; > u8 nested_apf; > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index a2fddce..6a439a1 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -655,7 +655,7 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu) > struct vcpu_svm *svm = to_svm(vcpu); > unsigned nr = vcpu->arch.exception.nr; > bool has_error_code = vcpu->arch.exception.has_error_code; > - bool reinject = vcpu->arch.exception.reinject; > + bool reinject = vcpu->arch.exception.injected; > u32 error_code = vcpu->arch.exception.error_code; > > /* > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c5f43ab..902b780 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2503,7 +2503,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) > struct vcpu_vmx *vmx = to_vmx(vcpu); > unsigned nr = vcpu->arch.exception.nr; > bool has_error_code = vcpu->arch.exception.has_error_code; > - bool reinject = vcpu->arch.exception.reinject; > + bool reinject = vcpu->arch.exception.injected; > u32 error_code = vcpu->arch.exception.error_code; > u32 intr_info = nr | INTR_INFO_VALID_MASK; > > @@ -10948,7 +10948,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, > u32 idt_vectoring; > unsigned int nr; > > - if (vcpu->arch.exception.pending && vcpu->arch.exception.reinject) { > + if (vcpu->arch.exception.injected) { > nr = vcpu->arch.exception.nr; > idt_vectoring = nr | VECTORING_INFO_VALID_MASK; > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 8f41b88..b698b2f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -389,15 +389,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, > > kvm_make_request(KVM_REQ_EVENT, vcpu); > > - if (!vcpu->arch.exception.pending) { > + if (!vcpu->arch.exception.pending || > + !vcpu->arch.exception.injected) { > queue: > if (has_error && !is_protmode(vcpu)) > has_error = false; > - vcpu->arch.exception.pending = true; > + if (reinject) > + vcpu->arch.exception.injected = true; > + else > + vcpu->arch.exception.pending = true; > vcpu->arch.exception.has_error_code = has_error; > vcpu->arch.exception.nr = nr; > vcpu->arch.exception.error_code = error_code; > - vcpu->arch.exception.reinject = reinject; > return; > } > > @@ -3069,8 +3072,14 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > struct kvm_vcpu_events *events) > { > process_nmi(vcpu); > + /* > + * FIXME: pass injected and pending separately. This is only > + * needed for nested virtualization, whose state cannot be > + * migrated yet. For now we combine them just in case. > + */ > events->exception.injected = > - vcpu->arch.exception.pending && > + (vcpu->arch.exception.pending || > + vcpu->arch.exception.injected) && > !kvm_exception_is_soft(vcpu->arch.exception.nr); > events->exception.nr = vcpu->arch.exception.nr; > events->exception.has_error_code = vcpu->arch.exception.has_error_code; > @@ -3125,6 +3134,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, > return -EINVAL; > > process_nmi(vcpu); > + vcpu->arch.exception.injected = false; > vcpu->arch.exception.pending = events->exception.injected; > vcpu->arch.exception.nr = events->exception.nr; > vcpu->arch.exception.has_error_code = events->exception.has_error_code; > @@ -6350,21 +6360,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) > int r; > > /* try to reinject previous events if any */ > - if (vcpu->arch.exception.pending) { > - trace_kvm_inj_exception(vcpu->arch.exception.nr, > - vcpu->arch.exception.has_error_code, > - vcpu->arch.exception.error_code); > - > - if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) > - __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | > - X86_EFLAGS_RF); > - > - if (vcpu->arch.exception.nr == DB_VECTOR && > - (vcpu->arch.dr7 & DR7_GD)) { > - vcpu->arch.dr7 &= ~DR7_GD; > - kvm_update_dr7(vcpu); > - } > - > + if (vcpu->arch.exception.injected) { > kvm_x86_ops->queue_exception(vcpu); > return 0; > } > @@ -6386,7 +6382,25 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) > } > > /* try to inject new event if pending */ > - if (vcpu->arch.smi_pending && !is_smm(vcpu)) { > + if (vcpu->arch.exception.pending) { > + trace_kvm_inj_exception(vcpu->arch.exception.nr, > + vcpu->arch.exception.has_error_code, > + vcpu->arch.exception.error_code); > + > + vcpu->arch.exception.pending = false; > + vcpu->arch.exception.injected = true; > + kvm_x86_ops->queue_exception(vcpu); > + > + if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) > + __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | > + X86_EFLAGS_RF); > + > + if (vcpu->arch.exception.nr == DB_VECTOR && > + (vcpu->arch.dr7 & DR7_GD)) { > + vcpu->arch.dr7 &= ~DR7_GD; > + kvm_update_dr7(vcpu); > + } > + } else if (vcpu->arch.smi_pending && !is_smm(vcpu)) { > vcpu->arch.smi_pending = false; > enter_smm(vcpu); > } else if (vcpu->arch.nmi_pending && kvm_x86_ops->nmi_allowed(vcpu)) { > @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > kvm_x86_ops->enable_nmi_window(vcpu); > if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) > kvm_x86_ops->enable_irq_window(vcpu); > + WARN_ON(vcpu->arch.exception.pending); This WARN_ON() is suggested during the review of last version, however, there are many cases in inject_pending_event() can result in return directly w/ vcpu->arch.exception.pending is true. Actually I have already catched the warning several times during the testing. I think we should remove it when committing. Regards, Wanpeng Li > } > > if (kvm_lapic_enabled(vcpu)) { > @@ -7738,6 +7753,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > vcpu->arch.nmi_injected = false; > kvm_clear_interrupt_queue(vcpu); > kvm_clear_exception_queue(vcpu); > + vcpu->arch.exception.pending = false; > > memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); > kvm_update_dr0123(vcpu); > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index 1134603..e6ec0de 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -11,7 +11,7 @@ > > static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) > { > - vcpu->arch.exception.pending = false; > + vcpu->arch.exception.injected = false; > } > > static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector, > @@ -29,7 +29,7 @@ static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu) > > static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu) > { > - return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending || > + return vcpu->arch.exception.injected || vcpu->arch.interrupt.pending || > vcpu->arch.nmi_injected; > } > > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 6:52 ` Wanpeng Li @ 2017-08-24 8:57 ` Paolo Bonzini 2017-08-24 9:13 ` Wanpeng Li 0 siblings, 1 reply; 16+ messages in thread From: Paolo Bonzini @ 2017-08-24 8:57 UTC (permalink / raw) To: Wanpeng Li, linux-kernel@vger.kernel.org, kvm Cc: Radim Krčmář, Wanpeng Li On 24/08/2017 08:52, Wanpeng Li wrote: >> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >> kvm_x86_ops->enable_nmi_window(vcpu); >> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >> kvm_x86_ops->enable_irq_window(vcpu); >> + WARN_ON(vcpu->arch.exception.pending); > > This WARN_ON() is suggested during the review of last version, > however, there are many cases in inject_pending_event() can result in > return directly w/ vcpu->arch.exception.pending is true. Actually I > have already catched the warning several times during the testing. I > think we should remove it when committing. No, it's a good thing that it's failing, because it's finding a bug. There's no such thing as an "exception window", so at the very least it should set req_immediate_exit to true. Does this help? diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b698b2f135a2..76d5a192be6c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) return 0; } - if (vcpu->arch.nmi_injected) { - kvm_x86_ops->set_nmi(vcpu); - return 0; - } + /* + * Exceptions must be injected immediately, or the exception + * frame will have the address of the NMI or interrupt handler. + */ + if (!vcpu->arch.exception.pending) { + if (vcpu->arch.nmi_injected) { + kvm_x86_ops->set_nmi(vcpu); + return 0; + } - if (vcpu->arch.interrupt.pending) { - kvm_x86_ops->set_irq(vcpu); - return 0; + if (vcpu->arch.interrupt.pending) { + kvm_x86_ops->set_irq(vcpu); + return 0; + } } if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { Paolo ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 8:57 ` Paolo Bonzini @ 2017-08-24 9:13 ` Wanpeng Li 2017-08-24 9:34 ` Wanpeng Li 2017-08-24 9:35 ` Paolo Bonzini 0 siblings, 2 replies; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 9:13 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm, Radim Krčmář, Wanpeng Li 2017-08-24 16:57 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > On 24/08/2017 08:52, Wanpeng Li wrote: >>> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >>> kvm_x86_ops->enable_nmi_window(vcpu); >>> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >>> kvm_x86_ops->enable_irq_window(vcpu); >>> + WARN_ON(vcpu->arch.exception.pending); >> >> This WARN_ON() is suggested during the review of last version, >> however, there are many cases in inject_pending_event() can result in >> return directly w/ vcpu->arch.exception.pending is true. Actually I >> have already catched the warning several times during the testing. I >> think we should remove it when committing. > > No, it's a good thing that it's failing, because it's finding a bug. > There's no such thing as an "exception window", so at the very least it Good point, the code looks good, I will fold it in next version. However, I still can observe the warning. Regards, Wanpeng Li > should set req_immediate_exit to true. > > Does this help? > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index b698b2f135a2..76d5a192be6c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) > return 0; > } > > - if (vcpu->arch.nmi_injected) { > - kvm_x86_ops->set_nmi(vcpu); > - return 0; > - } > + /* > + * Exceptions must be injected immediately, or the exception > + * frame will have the address of the NMI or interrupt handler. > + */ > + if (!vcpu->arch.exception.pending) { > + if (vcpu->arch.nmi_injected) { > + kvm_x86_ops->set_nmi(vcpu); > + return 0; > + } > > - if (vcpu->arch.interrupt.pending) { > - kvm_x86_ops->set_irq(vcpu); > - return 0; > + if (vcpu->arch.interrupt.pending) { > + kvm_x86_ops->set_irq(vcpu); > + return 0; > + } > } > > if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { > > Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 9:13 ` Wanpeng Li @ 2017-08-24 9:34 ` Wanpeng Li 2017-08-24 9:47 ` Paolo Bonzini 2017-08-24 9:35 ` Paolo Bonzini 1 sibling, 1 reply; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 9:34 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm, Radim Krčmář, Wanpeng Li 2017-08-24 17:13 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: > 2017-08-24 16:57 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >> On 24/08/2017 08:52, Wanpeng Li wrote: >>>> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >>>> kvm_x86_ops->enable_nmi_window(vcpu); >>>> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >>>> kvm_x86_ops->enable_irq_window(vcpu); >>>> + WARN_ON(vcpu->arch.exception.pending); >>> >>> This WARN_ON() is suggested during the review of last version, >>> however, there are many cases in inject_pending_event() can result in >>> return directly w/ vcpu->arch.exception.pending is true. Actually I >>> have already catched the warning several times during the testing. I >>> think we should remove it when committing. >> >> No, it's a good thing that it's failing, because it's finding a bug. >> There's no such thing as an "exception window", so at the very least it > > Good point, the code looks good, I will fold it in next version. > However, I still can observe the warning. > > Regards, > Wanpeng Li I observed sometimes both vcpu->arch.exception.pending and vcpu->arch.expception.injected are true before executing below codes: if (vcpu->arch.exception.injected) { kvm_x86_ops->queue_exception(vcpu); return 0; } Regards, Wanpeng Li > >> should set req_immediate_exit to true. >> >> Does this help? >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index b698b2f135a2..76d5a192be6c 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) >> return 0; >> } >> >> - if (vcpu->arch.nmi_injected) { >> - kvm_x86_ops->set_nmi(vcpu); >> - return 0; >> - } >> + /* >> + * Exceptions must be injected immediately, or the exception >> + * frame will have the address of the NMI or interrupt handler. >> + */ >> + if (!vcpu->arch.exception.pending) { >> + if (vcpu->arch.nmi_injected) { >> + kvm_x86_ops->set_nmi(vcpu); >> + return 0; >> + } >> >> - if (vcpu->arch.interrupt.pending) { >> - kvm_x86_ops->set_irq(vcpu); >> - return 0; >> + if (vcpu->arch.interrupt.pending) { >> + kvm_x86_ops->set_irq(vcpu); >> + return 0; >> + } >> } >> >> if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { >> >> Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 9:34 ` Wanpeng Li @ 2017-08-24 9:47 ` Paolo Bonzini 0 siblings, 0 replies; 16+ messages in thread From: Paolo Bonzini @ 2017-08-24 9:47 UTC (permalink / raw) To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm, Radim Krčmář, Wanpeng Li On 24/08/2017 11:34, Wanpeng Li wrote: > 2017-08-24 17:13 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: >> 2017-08-24 16:57 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >>> On 24/08/2017 08:52, Wanpeng Li wrote: >>>>> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >>>>> kvm_x86_ops->enable_nmi_window(vcpu); >>>>> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >>>>> kvm_x86_ops->enable_irq_window(vcpu); >>>>> + WARN_ON(vcpu->arch.exception.pending); >>>> >>>> This WARN_ON() is suggested during the review of last version, >>>> however, there are many cases in inject_pending_event() can result in >>>> return directly w/ vcpu->arch.exception.pending is true. Actually I >>>> have already catched the warning several times during the testing. I >>>> think we should remove it when committing. >>> >>> No, it's a good thing that it's failing, because it's finding a bug. >>> There's no such thing as an "exception window", so at the very least it >> >> Good point, the code looks good, I will fold it in next version. >> However, I still can observe the warning. >> >> Regards, >> Wanpeng Li > > I observed sometimes both vcpu->arch.exception.pending and > vcpu->arch.expception.injected are true before executing below codes: > > if (vcpu->arch.exception.injected) { > kvm_x86_ops->queue_exception(vcpu); > return 0; > } More missing pieces: 1) kvm_x86_ops->queue_exception must be called in the vcpu->arch.exception.pending case. Compare it with the others, which are calling enter_smm, kvm_x86_ops->set_irq, kvm_x86_ops->set_nmi. 2) here: if (!vcpu->arch.exception.pending || !vcpu->arch.exception.injected) { queue: if (has_error && !is_protmode(vcpu)) has_error = false; if (reinject) vcpu->arch.exception.injected = true; else vcpu->arch.exception.pending = true; you need to reset the other field, because you can get here from the double-fault case. Likewise below: if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) { /* generate double fault per SDM Table 5-5 */ vcpu->arch.exception.pending = true; injected must be cleared. > Regards, > Wanpeng Li > >> >>> should set req_immediate_exit to true. >>> >>> Does this help? >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index b698b2f135a2..76d5a192be6c 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) >>> return 0; >>> } >>> >>> - if (vcpu->arch.nmi_injected) { >>> - kvm_x86_ops->set_nmi(vcpu); >>> - return 0; >>> - } >>> + /* >>> + * Exceptions must be injected immediately, or the exception >>> + * frame will have the address of the NMI or interrupt handler. >>> + */ >>> + if (!vcpu->arch.exception.pending) { >>> + if (vcpu->arch.nmi_injected) { >>> + kvm_x86_ops->set_nmi(vcpu); >>> + return 0; >>> + } >>> >>> - if (vcpu->arch.interrupt.pending) { >>> - kvm_x86_ops->set_irq(vcpu); >>> - return 0; >>> + if (vcpu->arch.interrupt.pending) { >>> + kvm_x86_ops->set_irq(vcpu); >>> + return 0; >>> + } >>> } >>> >>> if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { >>> >>> Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 9:13 ` Wanpeng Li 2017-08-24 9:34 ` Wanpeng Li @ 2017-08-24 9:35 ` Paolo Bonzini 2017-08-24 9:47 ` Wanpeng Li 1 sibling, 1 reply; 16+ messages in thread From: Paolo Bonzini @ 2017-08-24 9:35 UTC (permalink / raw) To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm, Radim Krčmář, Wanpeng Li On 24/08/2017 11:13, Wanpeng Li wrote: > 2017-08-24 16:57 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >> On 24/08/2017 08:52, Wanpeng Li wrote: >>>> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >>>> kvm_x86_ops->enable_nmi_window(vcpu); >>>> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >>>> kvm_x86_ops->enable_irq_window(vcpu); >>>> + WARN_ON(vcpu->arch.exception.pending); >>> >>> This WARN_ON() is suggested during the review of last version, >>> however, there are many cases in inject_pending_event() can result in >>> return directly w/ vcpu->arch.exception.pending is true. Actually I >>> have already catched the warning several times during the testing. I >>> think we should remove it when committing. >> >> No, it's a good thing that it's failing, because it's finding a bug. >> There's no such thing as an "exception window", so at the very least it > > Good point, the code looks good, I will fold it in next version. > However, I still can observe the warning. In patch 4, vmx_check_nested_events must clear vcpu->ex.pending. Paolo > Regards, > Wanpeng Li > >> should set req_immediate_exit to true. >> >> Does this help? >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index b698b2f135a2..76d5a192be6c 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) >> return 0; >> } >> >> - if (vcpu->arch.nmi_injected) { >> - kvm_x86_ops->set_nmi(vcpu); >> - return 0; >> - } >> + /* >> + * Exceptions must be injected immediately, or the exception >> + * frame will have the address of the NMI or interrupt handler. >> + */ >> + if (!vcpu->arch.exception.pending) { >> + if (vcpu->arch.nmi_injected) { >> + kvm_x86_ops->set_nmi(vcpu); >> + return 0; >> + } >> >> - if (vcpu->arch.interrupt.pending) { >> - kvm_x86_ops->set_irq(vcpu); >> - return 0; >> + if (vcpu->arch.interrupt.pending) { >> + kvm_x86_ops->set_irq(vcpu); >> + return 0; >> + } >> } >> >> if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { >> >> Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 9:35 ` Paolo Bonzini @ 2017-08-24 9:47 ` Wanpeng Li 0 siblings, 0 replies; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 9:47 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm, Radim Krčmář, Wanpeng Li 2017-08-24 17:35 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > On 24/08/2017 11:13, Wanpeng Li wrote: >> 2017-08-24 16:57 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >>> On 24/08/2017 08:52, Wanpeng Li wrote: >>>>> @@ -6862,6 +6876,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >>>>> kvm_x86_ops->enable_nmi_window(vcpu); >>>>> if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win) >>>>> kvm_x86_ops->enable_irq_window(vcpu); >>>>> + WARN_ON(vcpu->arch.exception.pending); >>>> >>>> This WARN_ON() is suggested during the review of last version, >>>> however, there are many cases in inject_pending_event() can result in >>>> return directly w/ vcpu->arch.exception.pending is true. Actually I >>>> have already catched the warning several times during the testing. I >>>> think we should remove it when committing. >>> >>> No, it's a good thing that it's failing, because it's finding a bug. >>> There's no such thing as an "exception window", so at the very least it >> >> Good point, the code looks good, I will fold it in next version. >> However, I still can observe the warning. > > In patch 4, vmx_check_nested_events must clear vcpu->ex.pending. It is true, but I can still observe the warning. diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6f88a79..4f97c4f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10990,6 +10990,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) if (vmx->nested.nested_run_pending) return -EBUSY; + vcpu->arch.exception.pending = false; + vcpu->arch.exception.injected = true; nested_vmx_inject_exception_vmexit(vcpu, exit_qual); return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b698b2f..77e3031 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6365,14 +6365,20 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) return 0; } - if (vcpu->arch.nmi_injected) { - kvm_x86_ops->set_nmi(vcpu); - return 0; - } + /* + * Exceptions must be injected immediately, or the exception + * frame will have the address of the NMI or interrupt handler. + */ + if (!vcpu->arch.exception.pending) { + if (vcpu->arch.nmi_injected) { + kvm_x86_ops->set_nmi(vcpu); + return 0; + } - if (vcpu->arch.interrupt.pending) { - kvm_x86_ops->set_irq(vcpu); - return 0; + if (vcpu->arch.interrupt.pending) { + kvm_x86_ops->set_irq(vcpu); + return 0; + } } if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <1503548506-4457-2-git-send-email-wanpeng.li-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org>]
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li @ 2018-01-07 7:26 ` Ross Zwisler [not found] ` <1503548506-4457-2-git-send-email-wanpeng.li-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> 1 sibling, 0 replies; 16+ messages in thread From: Ross Zwisler @ 2018-01-07 7:26 UTC (permalink / raw) To: Wanpeng Li Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Radim Krčmář, linux-nvdimm, LKML, Paolo Bonzini, Wanpeng Li On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > vmx_complete_interrupts() assumes that the exception is always injected, > so it would be dropped by kvm_clear_exception_queue(). This patch separates > exception.pending from exception.injected, exception.inject represents the > exception is injected or the exception should be reinjected due to vmexit > occurs during event delivery in VMX non-root operation. exception.pending > represents the exception is queued and will be cleared when injecting the > exception to the guest. So exception.pending and exception.injected can > cooperate to guarantee exception will not be lost. > > Reported-by: Radim Krčmář <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- I'm seeing a regression in my QEMU based NVDIMM testing system, and I bisected it to this commit. The behavior I'm seeing is that heavy I/O to simulated NVDIMMs in multiple virtual machines causes the QEMU guests to receive double faults, crashing them. Here's an example backtrace: [ 1042.653816] PANIC: double fault, error_code: 0x0 [ 1042.654398] CPU: 2 PID: 30257 Comm: fsstress Not tainted 4.15.0-rc5 #1 [ 1042.655169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 1042.656121] RIP: 0010:memcpy_flushcache+0x4d/0x180 [ 1042.656631] RSP: 0018:ffffac098c7d3808 EFLAGS: 00010286 [ 1042.657245] RAX: ffffac0d18ca8000 RBX: 0000000000000fe0 RCX: ffffac0d18ca8000 [ 1042.658085] RDX: ffff921aaa5df000 RSI: ffff921aaa5e0000 RDI: 000019f26e6c9000 [ 1042.658802] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000000 [ 1042.659503] R10: 0000000000000000 R11: 0000000000000000 R12: ffff921aaa5df020 [ 1042.660306] R13: ffffac0d18ca8000 R14: fffff4c102a977c0 R15: 0000000000001000 [ 1042.661132] FS: 00007f71530b90c0(0000) GS:ffff921b3b280000(0000) knlGS:0000000000000000 [ 1042.662051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1042.662528] CR2: 0000000001156002 CR3: 000000012a936000 CR4: 00000000000006e0 [ 1042.663093] Call Trace: [ 1042.663329] write_pmem+0x6c/0xa0 [nd_pmem] [ 1042.663668] pmem_do_bvec+0x15f/0x330 [nd_pmem] [ 1042.664056] ? kmem_alloc+0x61/0xe0 [xfs] [ 1042.664393] pmem_make_request+0xdd/0x220 [nd_pmem] [ 1042.664781] generic_make_request+0x11f/0x300 [ 1042.665135] ? submit_bio+0x6c/0x140 [ 1042.665436] submit_bio+0x6c/0x140 [ 1042.665754] ? next_bio+0x18/0x40 [ 1042.666025] ? _cond_resched+0x15/0x40 [ 1042.666341] submit_bio_wait+0x53/0x80 [ 1042.666804] blkdev_issue_zeroout+0xdc/0x210 [ 1042.667336] ? __dax_zero_page_range+0xb5/0x140 [ 1042.667810] __dax_zero_page_range+0xb5/0x140 [ 1042.668197] ? xfs_file_iomap_begin+0x2bd/0x8e0 [xfs] [ 1042.668611] iomap_zero_range_actor+0x7c/0x1b0 [ 1042.668974] ? iomap_write_actor+0x170/0x170 [ 1042.669318] iomap_apply+0xa4/0x110 [ 1042.669616] ? iomap_write_actor+0x170/0x170 [ 1042.669958] iomap_zero_range+0x52/0x80 [ 1042.670255] ? iomap_write_actor+0x170/0x170 [ 1042.670616] xfs_setattr_size+0xd4/0x330 [xfs] [ 1042.670995] xfs_ioc_space+0x27e/0x2f0 [xfs] [ 1042.671332] ? terminate_walk+0x87/0xf0 [ 1042.671662] xfs_file_ioctl+0x862/0xa40 [xfs] [ 1042.672035] ? _copy_to_user+0x22/0x30 [ 1042.672346] ? cp_new_stat+0x150/0x180 [ 1042.672663] do_vfs_ioctl+0xa1/0x610 [ 1042.672960] ? SYSC_newfstat+0x3c/0x60 [ 1042.673264] SyS_ioctl+0x74/0x80 [ 1042.673661] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 1042.674239] RIP: 0033:0x7f71525a2dc7 [ 1042.674681] RSP: 002b:00007ffef97aa778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1042.675664] RAX: ffffffffffffffda RBX: 00000000000112bc RCX: 00007f71525a2dc7 [ 1042.676592] RDX: 00007ffef97aa7a0 RSI: 0000000040305825 RDI: 0000000000000003 [ 1042.677520] RBP: 0000000000000009 R08: 0000000000000045 R09: 00007ffef97aa78c [ 1042.678442] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 1042.679330] R13: 0000000000019e38 R14: 00000000000fcca7 R15: 0000000000000016 [ 1042.680216] Code: 48 8d 5d e0 4c 8d 62 20 48 89 cf 48 29 d7 48 89 de 48 83 e6 e0 4c 01 e6 48 8d 04 17 4c 8b 02 4c 8b 4a 08 4c 8b 52 10 4c 8b 5a 18 <4c> 0f c3 00 4c 0f c3 48 08 4c 0f c3 50 10 4c 0f c3 58 18 48 83 This appears to be independent of both the guest kernel version (this backtrace has v4.15.0-rc5, but I've seen it with other kernels) as well as independent of the host QMEU version (mine happens to be qemu-2.10.1-2.fc27 in Fedora 27). The new behavior is due to this commit being present in the host OS kernel. Prior to this commit I could fire up 4 VMs and run xfstests on my simulated NVDIMMs, but after this commit such testing results in multiple of my VMs crashing almost immediately. Reproduction is very simple, at least on my development box. All you need are a pair of VMs (I just did it with clean installs of Fedora 27) with NVDIMMs. Here's a sample QEMU command to get one of these: # qemu-system-x86_64 /home/rzwisler/vms/Fedora27.qcow2 -m 4G,slots=3,maxmem=512G -smp 12 -machine pc,accel=kvm,nvdimm -enable-kvm -object memory-backend-file,id=mem1,share,mem-path=/home/rzwisler/nvdimms/nvdimm-1,size=17G -device nvdimm,memdev=mem1,id=nv1 In my setup my NVDIMMs backing files (/home/rzwisler/nvdimms/nvdimm-1) are being created on a filesystem on an SSD. After these two qemu guests are up, run write I/Os to the resulting /dev/pmem0 devices. I've done this with xfstests and fio to get the error, but the simplest way is just: # dd if=/dev/zero of=/dev/pmem0 The double fault should happen in under a minute, definitely before the DDs run out of space on their /dev/pmem0 devices. I've reproduced this on multiple development boxes, so I'm pretty sure it's not related to a flakey hardware setup. Thanks, - Ross _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected @ 2018-01-07 7:26 ` Ross Zwisler 0 siblings, 0 replies; 16+ messages in thread From: Ross Zwisler @ 2018-01-07 7:26 UTC (permalink / raw) To: Wanpeng Li Cc: LKML, kvm, Paolo Bonzini, Radim Krčmář, Wanpeng Li, Ross Zwisler, linux-nvdimm On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > vmx_complete_interrupts() assumes that the exception is always injected, > so it would be dropped by kvm_clear_exception_queue(). This patch separates > exception.pending from exception.injected, exception.inject represents the > exception is injected or the exception should be reinjected due to vmexit > occurs during event delivery in VMX non-root operation. exception.pending > represents the exception is queued and will be cleared when injecting the > exception to the guest. So exception.pending and exception.injected can > cooperate to guarantee exception will not be lost. > > Reported-by: Radim Krčmář <rkrcmar@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- I'm seeing a regression in my QEMU based NVDIMM testing system, and I bisected it to this commit. The behavior I'm seeing is that heavy I/O to simulated NVDIMMs in multiple virtual machines causes the QEMU guests to receive double faults, crashing them. Here's an example backtrace: [ 1042.653816] PANIC: double fault, error_code: 0x0 [ 1042.654398] CPU: 2 PID: 30257 Comm: fsstress Not tainted 4.15.0-rc5 #1 [ 1042.655169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 1042.656121] RIP: 0010:memcpy_flushcache+0x4d/0x180 [ 1042.656631] RSP: 0018:ffffac098c7d3808 EFLAGS: 00010286 [ 1042.657245] RAX: ffffac0d18ca8000 RBX: 0000000000000fe0 RCX: ffffac0d18ca8000 [ 1042.658085] RDX: ffff921aaa5df000 RSI: ffff921aaa5e0000 RDI: 000019f26e6c9000 [ 1042.658802] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000000 [ 1042.659503] R10: 0000000000000000 R11: 0000000000000000 R12: ffff921aaa5df020 [ 1042.660306] R13: ffffac0d18ca8000 R14: fffff4c102a977c0 R15: 0000000000001000 [ 1042.661132] FS: 00007f71530b90c0(0000) GS:ffff921b3b280000(0000) knlGS:0000000000000000 [ 1042.662051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1042.662528] CR2: 0000000001156002 CR3: 000000012a936000 CR4: 00000000000006e0 [ 1042.663093] Call Trace: [ 1042.663329] write_pmem+0x6c/0xa0 [nd_pmem] [ 1042.663668] pmem_do_bvec+0x15f/0x330 [nd_pmem] [ 1042.664056] ? kmem_alloc+0x61/0xe0 [xfs] [ 1042.664393] pmem_make_request+0xdd/0x220 [nd_pmem] [ 1042.664781] generic_make_request+0x11f/0x300 [ 1042.665135] ? submit_bio+0x6c/0x140 [ 1042.665436] submit_bio+0x6c/0x140 [ 1042.665754] ? next_bio+0x18/0x40 [ 1042.666025] ? _cond_resched+0x15/0x40 [ 1042.666341] submit_bio_wait+0x53/0x80 [ 1042.666804] blkdev_issue_zeroout+0xdc/0x210 [ 1042.667336] ? __dax_zero_page_range+0xb5/0x140 [ 1042.667810] __dax_zero_page_range+0xb5/0x140 [ 1042.668197] ? xfs_file_iomap_begin+0x2bd/0x8e0 [xfs] [ 1042.668611] iomap_zero_range_actor+0x7c/0x1b0 [ 1042.668974] ? iomap_write_actor+0x170/0x170 [ 1042.669318] iomap_apply+0xa4/0x110 [ 1042.669616] ? iomap_write_actor+0x170/0x170 [ 1042.669958] iomap_zero_range+0x52/0x80 [ 1042.670255] ? iomap_write_actor+0x170/0x170 [ 1042.670616] xfs_setattr_size+0xd4/0x330 [xfs] [ 1042.670995] xfs_ioc_space+0x27e/0x2f0 [xfs] [ 1042.671332] ? terminate_walk+0x87/0xf0 [ 1042.671662] xfs_file_ioctl+0x862/0xa40 [xfs] [ 1042.672035] ? _copy_to_user+0x22/0x30 [ 1042.672346] ? cp_new_stat+0x150/0x180 [ 1042.672663] do_vfs_ioctl+0xa1/0x610 [ 1042.672960] ? SYSC_newfstat+0x3c/0x60 [ 1042.673264] SyS_ioctl+0x74/0x80 [ 1042.673661] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 1042.674239] RIP: 0033:0x7f71525a2dc7 [ 1042.674681] RSP: 002b:00007ffef97aa778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1042.675664] RAX: ffffffffffffffda RBX: 00000000000112bc RCX: 00007f71525a2dc7 [ 1042.676592] RDX: 00007ffef97aa7a0 RSI: 0000000040305825 RDI: 0000000000000003 [ 1042.677520] RBP: 0000000000000009 R08: 0000000000000045 R09: 00007ffef97aa78c [ 1042.678442] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 1042.679330] R13: 0000000000019e38 R14: 00000000000fcca7 R15: 0000000000000016 [ 1042.680216] Code: 48 8d 5d e0 4c 8d 62 20 48 89 cf 48 29 d7 48 89 de 48 83 e6 e0 4c 01 e6 48 8d 04 17 4c 8b 02 4c 8b 4a 08 4c 8b 52 10 4c 8b 5a 18 <4c> 0f c3 00 4c 0f c3 48 08 4c 0f c3 50 10 4c 0f c3 58 18 48 83 This appears to be independent of both the guest kernel version (this backtrace has v4.15.0-rc5, but I've seen it with other kernels) as well as independent of the host QMEU version (mine happens to be qemu-2.10.1-2.fc27 in Fedora 27). The new behavior is due to this commit being present in the host OS kernel. Prior to this commit I could fire up 4 VMs and run xfstests on my simulated NVDIMMs, but after this commit such testing results in multiple of my VMs crashing almost immediately. Reproduction is very simple, at least on my development box. All you need are a pair of VMs (I just did it with clean installs of Fedora 27) with NVDIMMs. Here's a sample QEMU command to get one of these: # qemu-system-x86_64 /home/rzwisler/vms/Fedora27.qcow2 -m 4G,slots=3,maxmem=512G -smp 12 -machine pc,accel=kvm,nvdimm -enable-kvm -object memory-backend-file,id=mem1,share,mem-path=/home/rzwisler/nvdimms/nvdimm-1,size=17G -device nvdimm,memdev=mem1,id=nv1 In my setup my NVDIMMs backing files (/home/rzwisler/nvdimms/nvdimm-1) are being created on a filesystem on an SSD. After these two qemu guests are up, run write I/Os to the resulting /dev/pmem0 devices. I've done this with xfstests and fio to get the error, but the simplest way is just: # dd if=/dev/zero of=/dev/pmem0 The double fault should happen in under a minute, definitely before the DDs run out of space on their /dev/pmem0 devices. I've reproduced this on multiple development boxes, so I'm pretty sure it's not related to a flakey hardware setup. Thanks, - Ross ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAOxpaSUBf8QoOZQ1p4KfUp0jq76OKfGY4Uxs-Gg8ngReD99xww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected 2018-01-07 7:26 ` Ross Zwisler @ 2018-01-09 1:24 ` Haozhong Zhang -1 siblings, 0 replies; 16+ messages in thread From: Haozhong Zhang @ 2018-01-09 1:24 UTC (permalink / raw) To: Ross Zwisler Cc: Wanpeng Li, kvm-u79uwXL29TY76Z2rM5mHXA, Radim Krčmář, linux-nvdimm, LKML, Paolo Bonzini, Wanpeng Li On 01/07/18 00:26 -0700, Ross Zwisler wrote: > On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > > From: Wanpeng Li <wanpeng.li@hotmail.com> > > > > vmx_complete_interrupts() assumes that the exception is always injected, > > so it would be dropped by kvm_clear_exception_queue(). This patch separates > > exception.pending from exception.injected, exception.inject represents the > > exception is injected or the exception should be reinjected due to vmexit > > occurs during event delivery in VMX non-root operation. exception.pending > > represents the exception is queued and will be cleared when injecting the > > exception to the guest. So exception.pending and exception.injected can > > cooperate to guarantee exception will not be lost. > > > > Reported-by: Radim Krčmář <rkrcmar@redhat.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Radim Krčmář <rkrcmar@redhat.com> > > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > > --- > > I'm seeing a regression in my QEMU based NVDIMM testing system, and I > bisected it to this commit. > > The behavior I'm seeing is that heavy I/O to simulated NVDIMMs in > multiple virtual machines causes the QEMU guests to receive double > faults, crashing them. Here's an example backtrace: > > [ 1042.653816] PANIC: double fault, error_code: 0x0 > [ 1042.654398] CPU: 2 PID: 30257 Comm: fsstress Not tainted 4.15.0-rc5 #1 > [ 1042.655169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-2.fc27 04/01/2014 > [ 1042.656121] RIP: 0010:memcpy_flushcache+0x4d/0x180 > [ 1042.656631] RSP: 0018:ffffac098c7d3808 EFLAGS: 00010286 > [ 1042.657245] RAX: ffffac0d18ca8000 RBX: 0000000000000fe0 RCX: ffffac0d18ca8000 > [ 1042.658085] RDX: ffff921aaa5df000 RSI: ffff921aaa5e0000 RDI: 000019f26e6c9000 > [ 1042.658802] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000000 > [ 1042.659503] R10: 0000000000000000 R11: 0000000000000000 R12: ffff921aaa5df020 > [ 1042.660306] R13: ffffac0d18ca8000 R14: fffff4c102a977c0 R15: 0000000000001000 > [ 1042.661132] FS: 00007f71530b90c0(0000) GS:ffff921b3b280000(0000) > knlGS:0000000000000000 > [ 1042.662051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1042.662528] CR2: 0000000001156002 CR3: 000000012a936000 CR4: 00000000000006e0 > [ 1042.663093] Call Trace: > [ 1042.663329] write_pmem+0x6c/0xa0 [nd_pmem] > [ 1042.663668] pmem_do_bvec+0x15f/0x330 [nd_pmem] > [ 1042.664056] ? kmem_alloc+0x61/0xe0 [xfs] > [ 1042.664393] pmem_make_request+0xdd/0x220 [nd_pmem] > [ 1042.664781] generic_make_request+0x11f/0x300 > [ 1042.665135] ? submit_bio+0x6c/0x140 > [ 1042.665436] submit_bio+0x6c/0x140 > [ 1042.665754] ? next_bio+0x18/0x40 > [ 1042.666025] ? _cond_resched+0x15/0x40 > [ 1042.666341] submit_bio_wait+0x53/0x80 > [ 1042.666804] blkdev_issue_zeroout+0xdc/0x210 > [ 1042.667336] ? __dax_zero_page_range+0xb5/0x140 > [ 1042.667810] __dax_zero_page_range+0xb5/0x140 > [ 1042.668197] ? xfs_file_iomap_begin+0x2bd/0x8e0 [xfs] > [ 1042.668611] iomap_zero_range_actor+0x7c/0x1b0 > [ 1042.668974] ? iomap_write_actor+0x170/0x170 > [ 1042.669318] iomap_apply+0xa4/0x110 > [ 1042.669616] ? iomap_write_actor+0x170/0x170 > [ 1042.669958] iomap_zero_range+0x52/0x80 > [ 1042.670255] ? iomap_write_actor+0x170/0x170 > [ 1042.670616] xfs_setattr_size+0xd4/0x330 [xfs] > [ 1042.670995] xfs_ioc_space+0x27e/0x2f0 [xfs] > [ 1042.671332] ? terminate_walk+0x87/0xf0 > [ 1042.671662] xfs_file_ioctl+0x862/0xa40 [xfs] > [ 1042.672035] ? _copy_to_user+0x22/0x30 > [ 1042.672346] ? cp_new_stat+0x150/0x180 > [ 1042.672663] do_vfs_ioctl+0xa1/0x610 > [ 1042.672960] ? SYSC_newfstat+0x3c/0x60 > [ 1042.673264] SyS_ioctl+0x74/0x80 > [ 1042.673661] entry_SYSCALL_64_fastpath+0x1a/0x7d > [ 1042.674239] RIP: 0033:0x7f71525a2dc7 > [ 1042.674681] RSP: 002b:00007ffef97aa778 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 1042.675664] RAX: ffffffffffffffda RBX: 00000000000112bc RCX: 00007f71525a2dc7 > [ 1042.676592] RDX: 00007ffef97aa7a0 RSI: 0000000040305825 RDI: 0000000000000003 > [ 1042.677520] RBP: 0000000000000009 R08: 0000000000000045 R09: 00007ffef97aa78c > [ 1042.678442] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > [ 1042.679330] R13: 0000000000019e38 R14: 00000000000fcca7 R15: 0000000000000016 > [ 1042.680216] Code: 48 8d 5d e0 4c 8d 62 20 48 89 cf 48 29 d7 48 89 > de 48 83 e6 e0 4c 01 e6 48 8d 04 17 4c 8b 02 4c 8b 4a 08 4c 8b 52 10 > 4c 8b 5a 18 <4c> 0f c3 00 4c 0f c3 48 08 4c 0f c3 50 10 4c 0f c3 58 18 > 48 83 > > This appears to be independent of both the guest kernel version (this > backtrace has v4.15.0-rc5, but I've seen it with other kernels) as > well as independent of the host QMEU version (mine happens to be > qemu-2.10.1-2.fc27 in Fedora 27). > > The new behavior is due to this commit being present in the host OS > kernel. Prior to this commit I could fire up 4 VMs and run xfstests > on my simulated NVDIMMs, but after this commit such testing results in > multiple of my VMs crashing almost immediately. > > Reproduction is very simple, at least on my development box. All you > need are a pair of VMs (I just did it with clean installs of Fedora > 27) with NVDIMMs. Here's a sample QEMU command to get one of these: > > # qemu-system-x86_64 /home/rzwisler/vms/Fedora27.qcow2 -m > 4G,slots=3,maxmem=512G -smp 12 -machine pc,accel=kvm,nvdimm > -enable-kvm -object > memory-backend-file,id=mem1,share,mem-path=/home/rzwisler/nvdimms/nvdimm-1,size=17G > -device nvdimm,memdev=mem1,id=nv1 > > In my setup my NVDIMMs backing files (/home/rzwisler/nvdimms/nvdimm-1) > are being created on a filesystem on an SSD. > > After these two qemu guests are up, run write I/Os to the resulting > /dev/pmem0 devices. I've done this with xfstests and fio to get the > error, but the simplest way is just: > > # dd if=/dev/zero of=/dev/pmem0 > > The double fault should happen in under a minute, definitely before > the DDs run out of space on their /dev/pmem0 devices. > > I've reproduced this on multiple development boxes, so I'm pretty sure > it's not related to a flakey hardware setup. > Thanks for reporting this issue. I'll look into this issue. Haozhong _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected @ 2018-01-09 1:24 ` Haozhong Zhang 0 siblings, 0 replies; 16+ messages in thread From: Haozhong Zhang @ 2018-01-09 1:24 UTC (permalink / raw) To: Ross Zwisler Cc: Wanpeng Li, kvm, Radim Krčmář, linux-nvdimm, LKML, Paolo Bonzini, Wanpeng Li On 01/07/18 00:26 -0700, Ross Zwisler wrote: > On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > > From: Wanpeng Li <wanpeng.li@hotmail.com> > > > > vmx_complete_interrupts() assumes that the exception is always injected, > > so it would be dropped by kvm_clear_exception_queue(). This patch separates > > exception.pending from exception.injected, exception.inject represents the > > exception is injected or the exception should be reinjected due to vmexit > > occurs during event delivery in VMX non-root operation. exception.pending > > represents the exception is queued and will be cleared when injecting the > > exception to the guest. So exception.pending and exception.injected can > > cooperate to guarantee exception will not be lost. > > > > Reported-by: Radim Krčmář <rkrcmar@redhat.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Radim Krčmář <rkrcmar@redhat.com> > > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > > --- > > I'm seeing a regression in my QEMU based NVDIMM testing system, and I > bisected it to this commit. > > The behavior I'm seeing is that heavy I/O to simulated NVDIMMs in > multiple virtual machines causes the QEMU guests to receive double > faults, crashing them. Here's an example backtrace: > > [ 1042.653816] PANIC: double fault, error_code: 0x0 > [ 1042.654398] CPU: 2 PID: 30257 Comm: fsstress Not tainted 4.15.0-rc5 #1 > [ 1042.655169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-2.fc27 04/01/2014 > [ 1042.656121] RIP: 0010:memcpy_flushcache+0x4d/0x180 > [ 1042.656631] RSP: 0018:ffffac098c7d3808 EFLAGS: 00010286 > [ 1042.657245] RAX: ffffac0d18ca8000 RBX: 0000000000000fe0 RCX: ffffac0d18ca8000 > [ 1042.658085] RDX: ffff921aaa5df000 RSI: ffff921aaa5e0000 RDI: 000019f26e6c9000 > [ 1042.658802] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000000 > [ 1042.659503] R10: 0000000000000000 R11: 0000000000000000 R12: ffff921aaa5df020 > [ 1042.660306] R13: ffffac0d18ca8000 R14: fffff4c102a977c0 R15: 0000000000001000 > [ 1042.661132] FS: 00007f71530b90c0(0000) GS:ffff921b3b280000(0000) > knlGS:0000000000000000 > [ 1042.662051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1042.662528] CR2: 0000000001156002 CR3: 000000012a936000 CR4: 00000000000006e0 > [ 1042.663093] Call Trace: > [ 1042.663329] write_pmem+0x6c/0xa0 [nd_pmem] > [ 1042.663668] pmem_do_bvec+0x15f/0x330 [nd_pmem] > [ 1042.664056] ? kmem_alloc+0x61/0xe0 [xfs] > [ 1042.664393] pmem_make_request+0xdd/0x220 [nd_pmem] > [ 1042.664781] generic_make_request+0x11f/0x300 > [ 1042.665135] ? submit_bio+0x6c/0x140 > [ 1042.665436] submit_bio+0x6c/0x140 > [ 1042.665754] ? next_bio+0x18/0x40 > [ 1042.666025] ? _cond_resched+0x15/0x40 > [ 1042.666341] submit_bio_wait+0x53/0x80 > [ 1042.666804] blkdev_issue_zeroout+0xdc/0x210 > [ 1042.667336] ? __dax_zero_page_range+0xb5/0x140 > [ 1042.667810] __dax_zero_page_range+0xb5/0x140 > [ 1042.668197] ? xfs_file_iomap_begin+0x2bd/0x8e0 [xfs] > [ 1042.668611] iomap_zero_range_actor+0x7c/0x1b0 > [ 1042.668974] ? iomap_write_actor+0x170/0x170 > [ 1042.669318] iomap_apply+0xa4/0x110 > [ 1042.669616] ? iomap_write_actor+0x170/0x170 > [ 1042.669958] iomap_zero_range+0x52/0x80 > [ 1042.670255] ? iomap_write_actor+0x170/0x170 > [ 1042.670616] xfs_setattr_size+0xd4/0x330 [xfs] > [ 1042.670995] xfs_ioc_space+0x27e/0x2f0 [xfs] > [ 1042.671332] ? terminate_walk+0x87/0xf0 > [ 1042.671662] xfs_file_ioctl+0x862/0xa40 [xfs] > [ 1042.672035] ? _copy_to_user+0x22/0x30 > [ 1042.672346] ? cp_new_stat+0x150/0x180 > [ 1042.672663] do_vfs_ioctl+0xa1/0x610 > [ 1042.672960] ? SYSC_newfstat+0x3c/0x60 > [ 1042.673264] SyS_ioctl+0x74/0x80 > [ 1042.673661] entry_SYSCALL_64_fastpath+0x1a/0x7d > [ 1042.674239] RIP: 0033:0x7f71525a2dc7 > [ 1042.674681] RSP: 002b:00007ffef97aa778 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 1042.675664] RAX: ffffffffffffffda RBX: 00000000000112bc RCX: 00007f71525a2dc7 > [ 1042.676592] RDX: 00007ffef97aa7a0 RSI: 0000000040305825 RDI: 0000000000000003 > [ 1042.677520] RBP: 0000000000000009 R08: 0000000000000045 R09: 00007ffef97aa78c > [ 1042.678442] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > [ 1042.679330] R13: 0000000000019e38 R14: 00000000000fcca7 R15: 0000000000000016 > [ 1042.680216] Code: 48 8d 5d e0 4c 8d 62 20 48 89 cf 48 29 d7 48 89 > de 48 83 e6 e0 4c 01 e6 48 8d 04 17 4c 8b 02 4c 8b 4a 08 4c 8b 52 10 > 4c 8b 5a 18 <4c> 0f c3 00 4c 0f c3 48 08 4c 0f c3 50 10 4c 0f c3 58 18 > 48 83 > > This appears to be independent of both the guest kernel version (this > backtrace has v4.15.0-rc5, but I've seen it with other kernels) as > well as independent of the host QMEU version (mine happens to be > qemu-2.10.1-2.fc27 in Fedora 27). > > The new behavior is due to this commit being present in the host OS > kernel. Prior to this commit I could fire up 4 VMs and run xfstests > on my simulated NVDIMMs, but after this commit such testing results in > multiple of my VMs crashing almost immediately. > > Reproduction is very simple, at least on my development box. All you > need are a pair of VMs (I just did it with clean installs of Fedora > 27) with NVDIMMs. Here's a sample QEMU command to get one of these: > > # qemu-system-x86_64 /home/rzwisler/vms/Fedora27.qcow2 -m > 4G,slots=3,maxmem=512G -smp 12 -machine pc,accel=kvm,nvdimm > -enable-kvm -object > memory-backend-file,id=mem1,share,mem-path=/home/rzwisler/nvdimms/nvdimm-1,size=17G > -device nvdimm,memdev=mem1,id=nv1 > > In my setup my NVDIMMs backing files (/home/rzwisler/nvdimms/nvdimm-1) > are being created on a filesystem on an SSD. > > After these two qemu guests are up, run write I/Os to the resulting > /dev/pmem0 devices. I've done this with xfstests and fio to get the > error, but the simplest way is just: > > # dd if=/dev/zero of=/dev/pmem0 > > The double fault should happen in under a minute, definitely before > the DDs run out of space on their /dev/pmem0 devices. > > I've reproduced this on multiple development boxes, so I'm pretty sure > it's not related to a flakey hardware setup. > Thanks for reporting this issue. I'll look into this issue. Haozhong ^ permalink raw reply [flat|nested] 16+ messages in thread
* Patch "KVM MMU: check pending exception before injecting APF" has been added to the 4.14-stable tree 2018-01-07 7:26 ` Ross Zwisler (?) (?) @ 2018-02-13 16:13 ` gregkh -1 siblings, 0 replies; 16+ messages in thread From: gregkh @ 2018-02-13 16:13 UTC (permalink / raw) To: haozhong.zhang, ab, gregkh, pbonzini, zwisler; +Cc: stable, stable-commits This is a note to let you know that I've just added the patch titled KVM MMU: check pending exception before injecting APF to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: kvm-mmu-check-pending-exception-before-injecting-apf.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@vger.kernel.org> know about it. >From 2a266f23550be997d783f27e704b9b40c4010292 Mon Sep 17 00:00:00 2001 From: Haozhong Zhang <haozhong.zhang@intel.com> Date: Wed, 10 Jan 2018 21:44:42 +0800 Subject: KVM MMU: check pending exception before injecting APF From: Haozhong Zhang <haozhong.zhang@intel.com> commit 2a266f23550be997d783f27e704b9b40c4010292 upstream. For example, when two APF's for page ready happen after one exit and the first one becomes pending, the second one will result in #DF. Instead, just handle the second page fault synchronously. Reported-by: Ross Zwisler <zwisler@gmail.com> Message-ID: <CAOxpaSUBf8QoOZQ1p4KfUp0jq76OKfGY4Uxs-Gg8ngReD99xww@mail.gmail.com> Reported-by: Alec Blayne <ab@tevsa.net> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- arch/x86/kvm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3784,7 +3784,8 @@ static int kvm_arch_setup_async_pf(struc bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu) { if (unlikely(!lapic_in_kernel(vcpu) || - kvm_event_needs_reinjection(vcpu))) + kvm_event_needs_reinjection(vcpu) || + vcpu->arch.exception.pending)) return false; if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu)) Patches currently in stable-queue which might be from haozhong.zhang@intel.com are queue-4.14/kvm-mmu-check-pending-exception-before-injecting-apf.patch ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 3/4] KVM: VMX: Move the nested_vmx_inject_exception_vmexit call from nested_vmx_check_exception to vmx_queue_exception 2017-08-24 4:21 [PATCH v3 1/4] KVM: VMX: use kvm_event_needs_reinjection Wanpeng Li 2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li @ 2017-08-24 4:21 ` Wanpeng Li 2017-08-24 4:21 ` [PATCH v3 4/4] KVM: nVMX: Fix trying to cancel vmlauch/vmresume Wanpeng Li 2 siblings, 0 replies; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 4:21 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li From: Wanpeng Li <wanpeng.li@hotmail.com> Move the nested_vmx_inject_exception_vmexit call from nested_vmx_check_exception to vmx_queue_exception. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- arch/x86/kvm/vmx.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 902b780..21760b8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2459,15 +2459,14 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu, * KVM wants to inject page-faults which it got to the guest. This function * checks whether in a nested guest, we need to inject them to L1 or L2. */ -static int nested_vmx_check_exception(struct kvm_vcpu *vcpu) +static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long *exit_qual) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); unsigned int nr = vcpu->arch.exception.nr; if (nr == PF_VECTOR) { if (vcpu->arch.exception.nested_apf) { - nested_vmx_inject_exception_vmexit(vcpu, - vcpu->arch.apf.nested_apf_token); + *exit_qual = vcpu->arch.apf.nested_apf_token; return 1; } /* @@ -2481,16 +2480,15 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu) */ if (nested_vmx_is_page_fault_vmexit(vmcs12, vcpu->arch.exception.error_code)) { - nested_vmx_inject_exception_vmexit(vcpu, vcpu->arch.cr2); + *exit_qual = vcpu->arch.cr2; return 1; } } else { - unsigned long exit_qual = 0; - if (nr == DB_VECTOR) - exit_qual = vcpu->arch.dr6; - if (vmcs12->exception_bitmap & (1u << nr)) { - nested_vmx_inject_exception_vmexit(vcpu, exit_qual); + if (nr == DB_VECTOR) + *exit_qual = vcpu->arch.dr6; + else + *exit_qual = 0; return 1; } } @@ -2506,10 +2504,13 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) bool reinject = vcpu->arch.exception.injected; u32 error_code = vcpu->arch.exception.error_code; u32 intr_info = nr | INTR_INFO_VALID_MASK; + unsigned long exit_qual; if (!reinject && is_guest_mode(vcpu) && - nested_vmx_check_exception(vcpu)) + nested_vmx_check_exception(vcpu, &exit_qual)) { + nested_vmx_inject_exception_vmexit(vcpu, exit_qual); return; + } if (has_error_code) { vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code); -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 4/4] KVM: nVMX: Fix trying to cancel vmlauch/vmresume 2017-08-24 4:21 [PATCH v3 1/4] KVM: VMX: use kvm_event_needs_reinjection Wanpeng Li 2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li 2017-08-24 4:21 ` [PATCH v3 3/4] KVM: VMX: Move the nested_vmx_inject_exception_vmexit call from nested_vmx_check_exception to vmx_queue_exception Wanpeng Li @ 2017-08-24 4:21 ` Wanpeng Li 2 siblings, 0 replies; 16+ messages in thread From: Wanpeng Li @ 2017-08-24 4:21 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li From: Wanpeng Li <wanpeng.li@hotmail.com> ------------[ cut here ]------------ WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc4+ #11 RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] Call Trace: ? kvm_multiple_exception+0x149/0x170 [kvm] ? handle_emulation_failure+0x79/0x230 [kvm] ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] ? check_chain_key+0x137/0x1e0 ? reexecute_instruction.part.168+0x130/0x130 [kvm] nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] vmx_queue_exception+0x197/0x300 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] ? preempt_count_sub+0x18/0xc0 ? restart_apic_timer+0x17d/0x300 [kvm] ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] The flag "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point. I catch this in the queue exception path, this patch fixes it by requesting an immediate VM exit from L2 and keeping the exception for L1 pending for a subsequent nested VM exit. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- arch/x86/kvm/vmx.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 21760b8..6f88a79 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2501,16 +2501,8 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned nr = vcpu->arch.exception.nr; bool has_error_code = vcpu->arch.exception.has_error_code; - bool reinject = vcpu->arch.exception.injected; u32 error_code = vcpu->arch.exception.error_code; u32 intr_info = nr | INTR_INFO_VALID_MASK; - unsigned long exit_qual; - - if (!reinject && is_guest_mode(vcpu) && - nested_vmx_check_exception(vcpu, &exit_qual)) { - nested_vmx_inject_exception_vmexit(vcpu, exit_qual); - return; - } if (has_error_code) { vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code); @@ -10988,10 +10980,20 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) { struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long exit_qual; if (kvm_event_needs_reinjection(vcpu)) return -EBUSY; + if (vcpu->arch.exception.pending && + nested_vmx_check_exception(vcpu, &exit_qual)) { + if (vmx->nested.nested_run_pending) + return -EBUSY; + + nested_vmx_inject_exception_vmexit(vcpu, exit_qual); + return 0; + } + if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && vmx->nested.preemption_timer_expired) { if (vmx->nested.nested_run_pending) -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-02-13 16:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-24 4:21 [PATCH v3 1/4] KVM: VMX: use kvm_event_needs_reinjection Wanpeng Li
2017-08-24 4:21 ` [PATCH v3 2/4] KVM: X86: Fix loss of exception which has not yet injected Wanpeng Li
2017-08-24 6:52 ` Wanpeng Li
2017-08-24 8:57 ` Paolo Bonzini
2017-08-24 9:13 ` Wanpeng Li
2017-08-24 9:34 ` Wanpeng Li
2017-08-24 9:47 ` Paolo Bonzini
2017-08-24 9:35 ` Paolo Bonzini
2017-08-24 9:47 ` Wanpeng Li
[not found] ` <1503548506-4457-2-git-send-email-wanpeng.li-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org>
2018-01-07 7:26 ` Ross Zwisler
2018-01-07 7:26 ` Ross Zwisler
[not found] ` <CAOxpaSUBf8QoOZQ1p4KfUp0jq76OKfGY4Uxs-Gg8ngReD99xww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-09 1:24 ` Haozhong Zhang
2018-01-09 1:24 ` Haozhong Zhang
2018-02-13 16:13 ` Patch "KVM MMU: check pending exception before injecting APF" has been added to the 4.14-stable tree gregkh
2017-08-24 4:21 ` [PATCH v3 3/4] KVM: VMX: Move the nested_vmx_inject_exception_vmexit call from nested_vmx_check_exception to vmx_queue_exception Wanpeng Li
2017-08-24 4:21 ` [PATCH v3 4/4] KVM: nVMX: Fix trying to cancel vmlauch/vmresume Wanpeng Li
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.