[PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
@ 2016-03-30 19:24 David Matlack
  2016-03-31  8:05 ` Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: David Matlack @ 2016-03-30 19:24 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, linux-kernel, luto, David Matlack, stable

An interrupt handler that uses the fpu can kill a KVM VM, if it runs
under the following conditions:
 - the guest's xcr0 register is loaded on the cpu
 - the guest's fpu context is not loaded
 - the host is using eagerfpu

Note that the guest's xcr0 register and fpu context are not loaded as
part of the atomic world switch into "guest mode". They are loaded by
KVM while the cpu is still in "host mode".

Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
interrupt handler will look something like this:

if (irq_fpu_usable()) {
        kernel_fpu_begin();

        [... code that uses the fpu ...]

        kernel_fpu_end();
}

As long as the guest's fpu is not loaded and the host is using eager
fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
returns true). The interrupt handler proceeds to use the fpu with
the guest's xcr0 live.

kernel_fpu_begin() saves the current fpu context. If this uses
XSAVE[OPT], it may leave the xsave area in an undesirable state.
According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
xcr0[i] == 0 following an XSAVE.

kernel_fpu_end() restores the fpu context. Now if any bit i in
XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
fault is trapped and SIGSEGV is delivered to the current process.

Only pre-4.2 kernels appear to be vulnerable to this sequence of
events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.

This patch fixes the bug by keeping the host's xcr0 loaded outside
of the interrupts-disabled region where KVM switches into guest mode.

Cc: stable@vger.kernel.org
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/x86.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e260ccb..8df1167 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
 		if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
 			return 1;
 	}
-	kvm_put_guest_xcr0(vcpu);
 	vcpu->arch.xcr0 = xcr0;

 	if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
@@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->prepare_guest_switch(vcpu);
 	if (vcpu->fpu_active)
 		kvm_load_guest_fpu(vcpu);
-	kvm_load_guest_xcr0(vcpu);
-
 	vcpu->mode = IN_GUEST_MODE;

 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
@@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

 	local_irq_disable();

+	kvm_load_guest_xcr0(vcpu);
+
 	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	vcpu->mode = OUTSIDE_GUEST_MODE;
 	smp_wmb();

+	kvm_put_guest_xcr0(vcpu);
+
 	/* Interrupt is enabled by handle_external_intr() */
 	kvm_x86_ops->handle_external_intr(vcpu);

@@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 	 * and assume host would use all available bits.
 	 * Guest xcr0 would be loaded later.
 	 */
-	kvm_put_guest_xcr0(vcpu);
 	vcpu->guest_fpu_loaded = 1;
 	__kernel_fpu_begin();
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
@@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)

 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	kvm_put_guest_xcr0(vcpu);
-
 	if (!vcpu->guest_fpu_loaded) {
 		vcpu->fpu_counter = 0;
 		return;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
@ 2016-03-31  8:05 ` Paolo Bonzini
  2016-04-05 11:28 ` Paolo Bonzini
  2016-04-22  7:30 ` Wanpeng Li
  2 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-03-31  8:05 UTC (permalink / raw)
  To: David Matlack, kvm; +Cc: linux-kernel, luto, stable



On 30/03/2016 21:24, David Matlack wrote:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
>  - the guest's xcr0 register is loaded on the cpu
>  - the guest's fpu context is not loaded
>  - the host is using eagerfpu
> 
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
> 
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
> 
> if (irq_fpu_usable()) {
>         kernel_fpu_begin();
> 
>         [... code that uses the fpu ...]
> 
>         kernel_fpu_end();
> }
> 
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
> 
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.
> 
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
> 
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
> 
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
> 
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/x86.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)

On guest entry we get:

> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);

One fewer kvm_put_guest_xcr0, at least in eager mode.

> -	kvm_load_guest_xcr0(vcpu);
> -

One fewer kvm_load_guest_xcr0.

>  	vcpu->mode = IN_GUEST_MODE;
>  
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	local_irq_disable();
>  
> +	kvm_load_guest_xcr0(vcpu);

One more kvm_load_guest_xcr0.

>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	vcpu->mode = OUTSIDE_GUEST_MODE;
>  	smp_wmb();
>  
> +	kvm_put_guest_xcr0(vcpu);

One more kvm_put_guest_xcr0.

So everything balances out.  Considering that the logic is cleaner, I
can apply this to all released kernels.

Paolo

>  	/* Interrupt is enabled by handle_external_intr() */
>  	kvm_x86_ops->handle_external_intr(vcpu);
>  
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  	 * and assume host would use all available bits.
>  	 * Guest xcr0 would be loaded later.
>  	 */
> -	kvm_put_guest_xcr0(vcpu);
>  	vcpu->guest_fpu_loaded = 1;
>  	__kernel_fpu_begin();
>  	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	kvm_put_guest_xcr0(vcpu);
> -
>  	if (!vcpu->guest_fpu_loaded) {
>  		vcpu->fpu_counter = 0;
>  		return;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
  2016-03-31  8:05 ` Paolo Bonzini
@ 2016-04-05 11:28 ` Paolo Bonzini
  2016-04-05 15:56   ` David Matlack
  2016-04-22  7:30 ` Wanpeng Li
  2 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-05 11:28 UTC (permalink / raw)
  To: David Matlack, kvm; +Cc: linux-kernel, luto, stable



On 30/03/2016 21:24, David Matlack wrote:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
>  - the guest's xcr0 register is loaded on the cpu
>  - the guest's fpu context is not loaded
>  - the host is using eagerfpu
> 
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
> 
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
> 
> if (irq_fpu_usable()) {
>         kernel_fpu_begin();
> 
>         [... code that uses the fpu ...]
> 
>         kernel_fpu_end();
> }
> 
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
> 
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.
> 
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
> 
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
> 
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
> 
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/x86.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)

While running my acceptance tests, in one case I got one CPU whose xcr0 
had leaked into the host.  This showed up as a SIGILL in strncasecmp's 
AVX code, and a simple program confirmed it:

    $ cat xgetbv.c
    #include <stdio.h>
    int main(void)
    {
        unsigned xcr0_h, xcr0_l;
        asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
        printf("%08x:%08x\n", xcr0_h, xcr0_l);
    }
    $ gcc xgetbv.c -O2
    $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
    19 00000000:00000003

I'm going to rerun the tests without this patch, as it seems the most
likely culprit, and leave it out of the pull request if they pass.

Paolo

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e260ccb..8df1167 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>  		if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
>  			return 1;
>  	}
> -	kvm_put_guest_xcr0(vcpu);
>  	vcpu->arch.xcr0 = xcr0;
>  
>  	if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);
> -	kvm_load_guest_xcr0(vcpu);
> -
>  	vcpu->mode = IN_GUEST_MODE;
>  
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	local_irq_disable();
>  
> +	kvm_load_guest_xcr0(vcpu);
> +
>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	vcpu->mode = OUTSIDE_GUEST_MODE;
>  	smp_wmb();
>  
> +	kvm_put_guest_xcr0(vcpu);
> +
>  	/* Interrupt is enabled by handle_external_intr() */
>  	kvm_x86_ops->handle_external_intr(vcpu);
>  
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  	 * and assume host would use all available bits.
>  	 * Guest xcr0 would be loaded later.
>  	 */
> -	kvm_put_guest_xcr0(vcpu);
>  	vcpu->guest_fpu_loaded = 1;
>  	__kernel_fpu_begin();
>  	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	kvm_put_guest_xcr0(vcpu);
> -
>  	if (!vcpu->guest_fpu_loaded) {
>  		vcpu->fpu_counter = 0;
>  		return;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-05 11:28 ` Paolo Bonzini
@ 2016-04-05 15:56   ` David Matlack
  2016-04-05 16:07     ` Paolo Bonzini
  2016-04-07  9:08     ` Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: David Matlack @ 2016-04-05 15:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable

On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
...
>
> While running my acceptance tests, in one case I got one CPU whose xcr0
> had leaked into the host.  This showed up as a SIGILL in strncasecmp's
> AVX code, and a simple program confirmed it:
>
>     $ cat xgetbv.c
>     #include <stdio.h>
>     int main(void)
>     {
>         unsigned xcr0_h, xcr0_l;
>         asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
>         printf("%08x:%08x\n", xcr0_h, xcr0_l);
>     }
>     $ gcc xgetbv.c -O2
>     $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
>     19 00000000:00000003
>
> I'm going to rerun the tests without this patch, as it seems the most
> likely culprit, and leave it out of the pull request if they pass.

Agreed this is a very likely culprit. I think I see one way the
guest's xcr0 can leak into the host. I will do some testing an send
another version. Thanks.

>
> Paolo
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e260ccb..8df1167 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>>               if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
>>                       return 1;
>>       }
>> -     kvm_put_guest_xcr0(vcpu);
>>       vcpu->arch.xcr0 = xcr0;
>>
>>       if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
>> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>       kvm_x86_ops->prepare_guest_switch(vcpu);
>>       if (vcpu->fpu_active)
>>               kvm_load_guest_fpu(vcpu);
>> -     kvm_load_guest_xcr0(vcpu);
>> -
>>       vcpu->mode = IN_GUEST_MODE;
>>
>>       srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>
>>       local_irq_disable();
>>
>> +     kvm_load_guest_xcr0(vcpu);
>> +
>>       if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>           || need_resched() || signal_pending(current)) {

Here, after we've loaded the guest xcr0, if we enter this if
statement, we return from vcpu_enter_guest with the guest's xcr0 still
loaded.

>>               vcpu->mode = OUTSIDE_GUEST_MODE;
>> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>       vcpu->mode = OUTSIDE_GUEST_MODE;
>>       smp_wmb();
>>
>> +     kvm_put_guest_xcr0(vcpu);
>> +
>>       /* Interrupt is enabled by handle_external_intr() */
>>       kvm_x86_ops->handle_external_intr(vcpu);
>>
>> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>>        * and assume host would use all available bits.
>>        * Guest xcr0 would be loaded later.
>>        */
>> -     kvm_put_guest_xcr0(vcpu);
>>       vcpu->guest_fpu_loaded = 1;
>>       __kernel_fpu_begin();
>>       __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
>> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>>
>>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>>  {
>> -     kvm_put_guest_xcr0(vcpu);
>> -
>>       if (!vcpu->guest_fpu_loaded) {
>>               vcpu->fpu_counter = 0;
>>               return;
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-05 15:56   ` David Matlack
@ 2016-04-05 16:07     ` Paolo Bonzini
  2016-04-07  9:08     ` Paolo Bonzini
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-05 16:07 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable



On 05/04/2016 17:56, David Matlack wrote:
> > I'm going to rerun the tests without this patch, as it seems the most
> > likely culprit, and leave it out of the pull request if they pass.
>
> Agreed this is a very likely culprit. I think I see one way the
> guest's xcr0 can leak into the host. I will do some testing an send
> another version. Thanks.

Tests passed without this patch.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-05 15:56   ` David Matlack
  2016-04-05 16:07     ` Paolo Bonzini
@ 2016-04-07  9:08     ` Paolo Bonzini
  2016-04-07 18:18       ` David Matlack
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-07  9:08 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable



On 05/04/2016 17:56, David Matlack wrote:
> On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
> ...
>>
>> While running my acceptance tests, in one case I got one CPU whose xcr0
>> had leaked into the host.  This showed up as a SIGILL in strncasecmp's
>> AVX code, and a simple program confirmed it:
>>
>>     $ cat xgetbv.c
>>     #include <stdio.h>
>>     int main(void)
>>     {
>>         unsigned xcr0_h, xcr0_l;
>>         asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
>>         printf("%08x:%08x\n", xcr0_h, xcr0_l);
>>     }
>>     $ gcc xgetbv.c -O2
>>     $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
>>     19 00000000:00000003
>>
>> I'm going to rerun the tests without this patch, as it seems the most
>> likely culprit, and leave it out of the pull request if they pass.
> 
> Agreed this is a very likely culprit. I think I see one way the
> guest's xcr0 can leak into the host.

That's cancel_injection, right?  If it's just about moving the load call
below, I can do that.  Hmm, I will even test that today. :)

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-07  9:08     ` Paolo Bonzini
@ 2016-04-07 18:18       ` David Matlack
  2016-04-07 19:03         ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-07 18:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable

On Thu, Apr 7, 2016 at 2:08 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 05/04/2016 17:56, David Matlack wrote:
>> On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>> ...
>>>
>>> While running my acceptance tests, in one case I got one CPU whose xcr0
>>> had leaked into the host.  This showed up as a SIGILL in strncasecmp's
>>> AVX code, and a simple program confirmed it:
>>>
>>>     $ cat xgetbv.c
>>>     #include <stdio.h>
>>>     int main(void)
>>>     {
>>>         unsigned xcr0_h, xcr0_l;
>>>         asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
>>>         printf("%08x:%08x\n", xcr0_h, xcr0_l);
>>>     }
>>>     $ gcc xgetbv.c -O2
>>>     $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
>>>     19 00000000:00000003
>>>
>>> I'm going to rerun the tests without this patch, as it seems the most
>>> likely culprit, and leave it out of the pull request if they pass.
>>
>> Agreed this is a very likely culprit. I think I see one way the
>> guest's xcr0 can leak into the host.
>
> That's cancel_injection, right?  If it's just about moving the load call
> below, I can do that.  Hmm, I will even test that today. :)

Yes that's what I was thinking, move kvm_load_guest_xcr0 below that if.

Thank you :). Let me know how testing goes.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-07 18:18       ` David Matlack
@ 2016-04-07 19:03         ` Paolo Bonzini
  2016-04-08 16:25           ` David Matlack
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-07 19:03 UTC (permalink / raw)
  To: David Matlack; +Cc: kvm list, linux-kernel, Andy Lutomirski, stable



----- Original Message -----
> >>> While running my acceptance tests, in one case I got one CPU whose xcr0
> >>> had leaked into the host.  This showed up as a SIGILL in strncasecmp's
> >>> AVX code, and a simple program confirmed it:
> >>>
> >>>     $ cat xgetbv.c
> >>>     #include <stdio.h>
> >>>     int main(void)
> >>>     {
> >>>         unsigned xcr0_h, xcr0_l;
> >>>         asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
> >>>         printf("%08x:%08x\n", xcr0_h, xcr0_l);
> >>>     }
> >>>     $ gcc xgetbv.c -O2
> >>>     $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep
> >>>     -v 007
> >>>     19 00000000:00000003
> >>>
> >>> I'm going to rerun the tests without this patch, as it seems the most
> >>> likely culprit, and leave it out of the pull request if they pass.
> >>
> >> Agreed this is a very likely culprit. I think I see one way the
> >> guest's xcr0 can leak into the host.
> >
> > That's cancel_injection, right?  If it's just about moving the load call
> > below, I can do that.  Hmm, I will even test that today. :)
> 
> Yes that's what I was thinking, move kvm_load_guest_xcr0 below that if.
> 
> Thank you :). Let me know how testing goes.

It went well.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-07 19:03         ` Paolo Bonzini
@ 2016-04-08 16:25           ` David Matlack
  2016-04-08 16:50             ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-08 16:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable

On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> Thank you :). Let me know how testing goes.
>
> It went well.

Great! How should we proceed?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-08 16:25           ` David Matlack
@ 2016-04-08 16:50             ` Paolo Bonzini
  2016-04-08 16:54               ` David Matlack
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-08 16:50 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable



On 08/04/2016 18:25, David Matlack wrote:
> On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Thank you :). Let me know how testing goes.
>>
>> It went well.
> 
> Great! How should we proceed?

It will appear very soon on kvm/next and Radim will send the pull
request to Linus next week (I'm having him practice before I go on
vacation ;)).

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-08 16:50             ` Paolo Bonzini
@ 2016-04-08 16:54               ` David Matlack
  0 siblings, 0 replies; 14+ messages in thread
From: David Matlack @ 2016-04-08 16:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable

On Fri, Apr 8, 2016 at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 08/04/2016 18:25, David Matlack wrote:
>> On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> Thank you :). Let me know how testing goes.
>>>
>>> It went well.
>>
>> Great! How should we proceed?
>
> It will appear very soon on kvm/next and Radim will send the pull
> request to Linus next week (I'm having him practice before I go on
> vacation ;)).

Makes sense. Thanks for taking care of it!

>
> Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
  2016-03-31  8:05 ` Paolo Bonzini
  2016-04-05 11:28 ` Paolo Bonzini
@ 2016-04-22  7:30 ` Wanpeng Li
  2016-04-22 17:21   ` David Matlack
  2 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2016-04-22  7:30 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
	stable

Hi Paolo and David,
2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
>  - the guest's xcr0 register is loaded on the cpu
>  - the guest's fpu context is not loaded
>  - the host is using eagerfpu
>
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
>
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
>
> if (irq_fpu_usable()) {
>         kernel_fpu_begin();
>
>         [... code that uses the fpu ...]
>
>         kernel_fpu_end();
> }
>
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
>
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.

How XSAVE save bit i since SDM mentioned that "XSAVE saves state
component i if and only if RFBM[i] = 1. "?  RFBM[i] will be 0 if
XSTATE_BV[i] == 1 && guest xcr0[i] == 0.

Regards,
Wanpeng Li

>
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
>
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
>
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
>
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/x86.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e260ccb..8df1167 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>                 if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
>                         return 1;
>         }
> -       kvm_put_guest_xcr0(vcpu);
>         vcpu->arch.xcr0 = xcr0;
>
>         if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         kvm_x86_ops->prepare_guest_switch(vcpu);
>         if (vcpu->fpu_active)
>                 kvm_load_guest_fpu(vcpu);
> -       kvm_load_guest_xcr0(vcpu);
> -
>         vcpu->mode = IN_GUEST_MODE;
>
>         srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
>         local_irq_disable();
>
> +       kvm_load_guest_xcr0(vcpu);
> +
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>             || need_resched() || signal_pending(current)) {
>                 vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         vcpu->mode = OUTSIDE_GUEST_MODE;
>         smp_wmb();
>
> +       kvm_put_guest_xcr0(vcpu);
> +
>         /* Interrupt is enabled by handle_external_intr() */
>         kvm_x86_ops->handle_external_intr(vcpu);
>
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>          * and assume host would use all available bits.
>          * Guest xcr0 would be loaded later.
>          */
> -       kvm_put_guest_xcr0(vcpu);
>         vcpu->guest_fpu_loaded = 1;
>         __kernel_fpu_begin();
>         __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -       kvm_put_guest_xcr0(vcpu);
> -
>         if (!vcpu->guest_fpu_loaded) {
>                 vcpu->fpu_counter = 0;
>                 return;
> --
> 2.8.0.rc3.226.g39d4020
>



-- 
Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-22  7:30 ` Wanpeng Li
@ 2016-04-22 17:21   ` David Matlack
  2016-04-23 22:37     ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-22 17:21 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
	stable

On Fri, Apr 22, 2016 at 12:30 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Paolo and David,
> 2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
>>
>> kernel_fpu_begin() saves the current fpu context. If this uses
>> XSAVE[OPT], it may leave the xsave area in an undesirable state.
>> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
>> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
>> xcr0[i] == 0 following an XSAVE.
>
> How XSAVE save bit i since SDM mentioned that "XSAVE saves state
> component i if and only if RFBM[i] = 1. "?  RFBM[i] will be 0 if
> XSTATE_BV[i] == 1 && guest xcr0[i] == 0.

You are correct, RFBM[i] will be 0 and XSAVE does not save state
component i in this case. However, XSTATE_BV[i] is left untouched by
XSAVE (left as 1). On XRSTOR, the CPU checks if XSTATE_BV[i] == 1 &&
xcr0[i] == 0, and if so delivers a #GP.

If you are wondering how XSTATE_BV[i] could be 1 in the first place, I
suspect it is left over from a previous XSAVE (which sets XSTATE_BV[i]
to the value in XINUSE[i]).

>
> Regards,
> Wanpeng Li
>
>>
>> kernel_fpu_end() restores the fpu context. Now if any bit i in
>> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
>> fault is trapped and SIGSEGV is delivered to the current process.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
  2016-04-22 17:21   ` David Matlack
@ 2016-04-23 22:37     ` Wanpeng Li
  0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2016-04-23 22:37 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
	stable

2016-04-23 1:21 GMT+08:00 David Matlack <dmatlack@google.com>:
> On Fri, Apr 22, 2016 at 12:30 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Paolo and David,
>> 2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
>>>
>>> kernel_fpu_begin() saves the current fpu context. If this uses
>>> XSAVE[OPT], it may leave the xsave area in an undesirable state.
>>> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
>>> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
>>> xcr0[i] == 0 following an XSAVE.
>>
>> How XSAVE save bit i since SDM mentioned that "XSAVE saves state
>> component i if and only if RFBM[i] = 1. "?  RFBM[i] will be 0 if
>> XSTATE_BV[i] == 1 && guest xcr0[i] == 0.
>
> You are correct, RFBM[i] will be 0 and XSAVE does not save state
> component i in this case. However, XSTATE_BV[i] is left untouched by
> XSAVE (left as 1). On XRSTOR, the CPU checks if XSTATE_BV[i] == 1 &&
> xcr0[i] == 0, and if so delivers a #GP.

However, SDM also mentioned that "If RFBM[i] = 0, XRSTOR does not
update state component i." So we #GP on a don't need restore bit i if
XSTATE_BV[I] == 1 && xcr0[0] ==0. That's where I miss I think, thanks
for your explanation.

Regard,
Wanpeng Li

>
> If you are wondering how XSTATE_BV[i] could be 1 in the first place, I
> suspect it is left over from a previous XSAVE (which sets XSTATE_BV[i]
> to the value in XINUSE[i]).

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-04-23 22:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
2016-03-31  8:05 ` Paolo Bonzini
2016-04-05 11:28 ` Paolo Bonzini
2016-04-05 15:56   ` David Matlack
2016-04-05 16:07     ` Paolo Bonzini
2016-04-07  9:08     ` Paolo Bonzini
2016-04-07 18:18       ` David Matlack
2016-04-07 19:03         ` Paolo Bonzini
2016-04-08 16:25           ` David Matlack
2016-04-08 16:50             ` Paolo Bonzini
2016-04-08 16:54               ` David Matlack
2016-04-22  7:30 ` Wanpeng Li
2016-04-22 17:21   ` David Matlack
2016-04-23 22:37     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).