From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liran Alon Subject: Re: [PATCH] KVM: nVMX: Fix bug of injecting L2 exception into L1 Date: Tue, 21 Nov 2017 00:46:01 +0200 Message-ID: <5A135B29.9080805@ORACLE.COM> References: <1511108743-11752-1-git-send-email-liran.alon@oracle.com> <2d56b269-0d21-10a3-80ce-19e989d6903b@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: jmattson@google.com, wanpeng.li@hotmail.com, idan.brown@ORACLE.COM, Krish Sadhukhan To: Paolo Bonzini , rkrcmar@redhat.com, kvm@vger.kernel.org Return-path: Received: from userp1040.oracle.com ([156.151.31.81]:43515 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751149AbdKTWqL (ORCPT ); Mon, 20 Nov 2017 17:46:11 -0500 In-Reply-To: <2d56b269-0d21-10a3-80ce-19e989d6903b@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 20/11/17 23:47, Paolo Bonzini wrote: > On 19/11/2017 17:25, Liran Alon wrote: >> L2 RDMSR could be handled as described below: >> 1) L2 exits to L0 on RDMSR and calls handle_rdmsr(). >> 2) handle_rdmsr() calls kvm_inject_gp() which sets >> KVM_REQ_EVENT, exception.pending=true and exception.injected=false. >> 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls >> inject_pending_event() which calls vmx_check_nested_events() >> which sees that exception.pending=true but >> nested_vmx_check_exception() returns 0 and therefore does nothing at >> this point. However let's assume it later sees vmx-preemption-timer >> expired and therefore exits from L2 to L1 by calling >> nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12() >> which calls vmcs12_save_pending_event() but it does nothing as >> exception.injected is false. Also prepare_vmcs12() calls >> kvm_clear_exception_queue() which does nothing as >> exception.injected is already false. >> 5) We now return from vmx_check_nested_events() with 0 while still >> having exception.pending=true! >> 6) Therefore inject_pending_event() continues >> and we inject L2 exception to L1!... > > But this is buggy as well, because the #GP is lost, isn't it? I don't think so. Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected"), there is a fundamental difference between a pending exception and an injected exception. A pending exception means that no side-effects of the exception have been applied yet. Including incrementing the RIP after the instruction which cause exception. In our case for example, handle_wrmsr() calls kvm_inject_gp() and returns without calling kvm_skip_emulated_instruction() which increments the RIP. Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can safely clear exception.pending because when L1 will resume L2, the exception will be raised again (the same WRMSR instruction will be run again which will raise #GP again). This is also why vmcs12_save_pending_event() only makes sure to save in VMCS12 idt-vectoring-info the "injected" events and not the "pending" events (interrupt.pending is misleading name and I would rename it in upcoming patch to interrupt.injected. See explanation below). And this is also why exception.pending is intercepted but exception.injected is not. I can confirm this patch works because I have wrote a kvm-unit-test which reproduce this issue. And after the fix the #GP is not lost and raised to L2 directly correctly. (I haven't posted the unit-test yet because it is very dependent on correct vmx-preemption-timer timer config that varies between environments). > > Is the bug that the preemption timer vmexit should only be injected if > there are no pending exceptions? In fact, the same bug could also > happened for NMIs or interrupts, I think. As explained above, I don't think so. In general there is a bit of mess in KVM code regarding clean separation between a "pending" event and an "injected" event. NMIs & Exceptions are now separated nicely. However, interrupt.pending is actually interrupt.injected as it is signaled after the side-effects have occurred (bit moved from IRR to ISR for example). I am going to post another series (which is a v2 of a previous series I posted here) tomorrow that will attempt to clean this and on the way fix a couple of bugs in this area. > > So, something like (101% untested): > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 5b436f4e6e3e..64eecb8b126d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > bool block_nested_events = > vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu); > > - if (vcpu->arch.exception.pending && > - nested_vmx_check_exception(vcpu, &exit_qual)) { > + if (vcpu->arch.exception.pending) { > + if (!nested_vmx_check_exception(vcpu, &exit_qual)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_inject_exception_vmexit(vcpu, exit_qual); > @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && > - vmx->nested.preemption_timer_expired) { > - if (block_nested_events) > - return -EBUSY; > - nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); > - return 0; > - } > - > - if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) { > + if (vcpu->arch.nmi_pending) { > + if (!nested_exit_on_nmi(vcpu)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, > @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && > - nested_exit_on_intr(vcpu)) { > + if (kvm_cpu_has_interrupt(vcpu) || external_intr) { > + if (!nested_exit_on_intr(vcpu)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); > return 0; > } > > + if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && > + vmx->nested.preemption_timer_expired) { > + if (block_nested_events) > + return -EBUSY; > + nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); > + return 0; > + } > + > vmx_complete_nested_posted_interrupt(vcpu); > return 0; > } > > Paolo > >> This commit will fix above issue by changing step (4) to >> clear exception.pending in kvm_clear_exception_queue(). >> >> Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not >> yet been injected") >> >> Signed-off-by: Liran Alon >> Reviewed-by: Nikita Leshenko >> Reviewed-by: Krish Sadhukhan >> Signed-off-by: Krish Sadhukhan >> --- >> arch/x86/kvm/vmx.c | 1 - >> arch/x86/kvm/x86.h | 1 + >> 2 files changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 7c3522a989d0..bee08782c781 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) >> if (vmx->nested.nested_run_pending) >> return -EBUSY; >> nested_vmx_inject_exception_vmexit(vcpu, exit_qual); >> - vcpu->arch.exception.pending = false; >> return 0; >> } >> >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h >> index d0b95b7a90b4..6d112d8f799c 100644 >> --- a/arch/x86/kvm/x86.h >> +++ b/arch/x86/kvm/x86.h >> @@ -12,6 +12,7 @@ >> >> static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) >> { >> + vcpu->arch.exception.pending = false; >> vcpu->arch.exception.injected = false; >> } >> >> > > Should kvm_clear_interrupt_queue do the same? interrupts currently only have interrupt.pending (which as I said above, it is actually interrupt.injected). Therefore kvm_clear_interrupt_queue() clears all there is... > > Thanks, > > Paolo > Regards, -Liran