Re: [PATCH] KVM: nVMX: Fix bug of injecting L2 exception into L1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Liran Alon <LIRAN.ALON@ORACLE.COM>
To: Paolo Bonzini <pbonzini@redhat.com>,
	rkrcmar@redhat.com, kvm@vger.kernel.org
Cc: jmattson@google.com, wanpeng.li@hotmail.com,
	idan.brown@ORACLE.COM,
	Krish Sadhukhan <krish.sadhukhan@ORACLE.COM>
Subject: Re: [PATCH] KVM: nVMX: Fix bug of injecting L2 exception into L1
Date: Tue, 21 Nov 2017 00:46:01 +0200	[thread overview]
Message-ID: <5A135B29.9080805@ORACLE.COM> (raw)
In-Reply-To: <2d56b269-0d21-10a3-80ce-19e989d6903b@redhat.com>



On 20/11/17 23:47, Paolo Bonzini wrote:
> On 19/11/2017 17:25, Liran Alon wrote:
>> L2 RDMSR could be handled as described below:
>> 1) L2 exits to L0 on RDMSR and calls handle_rdmsr().
>> 2) handle_rdmsr() calls kvm_inject_gp() which sets
>> KVM_REQ_EVENT, exception.pending=true and exception.injected=false.
>> 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls
>> inject_pending_event() which calls vmx_check_nested_events()
>> which sees that exception.pending=true but
>> nested_vmx_check_exception() returns 0 and therefore does nothing at
>> this point. However let's assume it later sees vmx-preemption-timer
>> expired and therefore exits from L2 to L1 by calling
>> nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12()
>> which calls vmcs12_save_pending_event() but it does nothing as
>> exception.injected is false. Also prepare_vmcs12() calls
>> kvm_clear_exception_queue() which does nothing as
>> exception.injected is already false.
>> 5) We now return from vmx_check_nested_events() with 0 while still
>> having exception.pending=true!
>> 6) Therefore inject_pending_event() continues
>> and we inject L2 exception to L1!...
>
> But this is buggy as well, because the #GP is lost, isn't it?

I don't think so.

Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has 
not yet been injected"), there is a fundamental difference between a 
pending exception and an injected exception.
A pending exception means that no side-effects of the exception have 
been applied yet. Including incrementing the RIP after the instruction 
which cause exception. In our case for example, handle_wrmsr() calls 
kvm_inject_gp() and returns without calling 
kvm_skip_emulated_instruction() which increments the RIP.

Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can 
safely clear exception.pending because when L1 will resume L2, the 
exception will be raised again (the same WRMSR instruction will be run 
again which will raise #GP again).
This is also why vmcs12_save_pending_event() only makes sure to save in 
VMCS12 idt-vectoring-info the "injected" events and not the "pending" 
events (interrupt.pending is misleading name and I would rename it in 
upcoming patch to interrupt.injected. See explanation below). And this 
is also why exception.pending is intercepted but exception.injected is not.

I can confirm this patch works because I have wrote a kvm-unit-test 
which reproduce this issue. And after the fix the #GP is not lost and 
raised to L2 directly correctly.
(I haven't posted the unit-test yet because it is very dependent on 
correct vmx-preemption-timer timer config that varies between environments).

>
> Is the bug that the preemption timer vmexit should only be injected if
> there are no pending exceptions?  In fact, the same bug could also
> happened for NMIs or interrupts, I think.
As explained above, I don't think so.

In general there is a bit of mess in KVM code regarding clean separation 
between a "pending" event and an "injected" event. NMIs & Exceptions are 
now separated nicely. However, interrupt.pending is actually 
interrupt.injected as it is signaled after the side-effects have 
occurred (bit moved from IRR to ISR for example).

I am going to post another series (which is a v2 of a previous series I 
posted here) tomorrow that will attempt to clean this and on the way fix 
a couple of bugs in this area.

>
> So, something like (101% untested):
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5b436f4e6e3e..64eecb8b126d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   	bool block_nested_events =
>   	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
>
> -	if (vcpu->arch.exception.pending &&
> -		nested_vmx_check_exception(vcpu, &exit_qual)) {
> +	if (vcpu->arch.exception.pending) {
> +		if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   		return 0;
>   	}
>
> -	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
> -	    vmx->nested.preemption_timer_expired) {
> -		if (block_nested_events)
> -			return -EBUSY;
> -		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> -		return 0;
> -	}
> -
> -	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +	if (vcpu->arch.nmi_pending) {
> +		if (!nested_exit_on_nmi(vcpu))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>   		return 0;
>   	}
>
> -	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> -	    nested_exit_on_intr(vcpu)) {
> +	if (kvm_cpu_has_interrupt(vcpu) || external_intr) {
> +		if (!nested_exit_on_intr(vcpu))
> +			return 0;
>   		if (block_nested_events)
>   			return -EBUSY;
>   		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>   		return 0;
>   	}
>
> +	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
> +	    vmx->nested.preemption_timer_expired) {
> +		if (block_nested_events)
> +			return -EBUSY;
> +		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> +		return 0;
> +	}
> +
>   	vmx_complete_nested_posted_interrupt(vcpu);
>   	return 0;
>   }
>
> Paolo
>
>> This commit will fix above issue by changing step (4) to
>> clear exception.pending in kvm_clear_exception_queue().
>>
>> Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not
>> yet been injected")
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> ---
>>   arch/x86/kvm/vmx.c | 1 -
>>   arch/x86/kvm/x86.h | 1 +
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 7c3522a989d0..bee08782c781 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>   		if (vmx->nested.nested_run_pending)
>>   			return -EBUSY;
>>   		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
>> -		vcpu->arch.exception.pending = false;
>>   		return 0;
>>   	}
>>
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index d0b95b7a90b4..6d112d8f799c 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -12,6 +12,7 @@
>>
>>   static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
>>   {
>> +	vcpu->arch.exception.pending = false;
>>   	vcpu->arch.exception.injected = false;
>>   }
>>
>>
>
> Should kvm_clear_interrupt_queue do the same?
interrupts currently only have interrupt.pending (which as I said above, 
it is actually interrupt.injected). Therefore 
kvm_clear_interrupt_queue() clears all there is...
>
> Thanks,
>
> Paolo
>

Regards,
-Liran

next prev parent reply	other threads:[~2017-11-20 22:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-19 16:25 [PATCH] KVM: nVMX: Fix bug of injecting L2 exception into L1 Liran Alon
2017-11-20 21:47 ` Paolo Bonzini
2017-11-20 22:46   ` Liran Alon [this message]
2017-11-20 22:58     ` Paolo Bonzini
2018-01-09 10:08 ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2017-12-12  0:06 Liran Alon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A135B29.9080805@ORACLE.COM \
    --to=liran.alon@oracle.com \
    --cc=idan.brown@ORACLE.COM \
    --cc=jmattson@google.com \
    --cc=krish.sadhukhan@ORACLE.COM \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.