* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
@ 2016-03-31 8:05 ` Paolo Bonzini
2016-04-05 11:28 ` Paolo Bonzini
2016-04-22 7:30 ` Wanpeng Li
2 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-03-31 8:05 UTC (permalink / raw)
To: David Matlack, kvm; +Cc: linux-kernel, luto, stable
On 30/03/2016 21:24, David Matlack wrote:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
> - the guest's xcr0 register is loaded on the cpu
> - the guest's fpu context is not loaded
> - the host is using eagerfpu
>
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
>
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
>
> if (irq_fpu_usable()) {
> kernel_fpu_begin();
>
> [... code that uses the fpu ...]
>
> kernel_fpu_end();
> }
>
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
>
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.
>
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
>
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
>
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
>
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> arch/x86/kvm/x86.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
On guest entry we get:
> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_x86_ops->prepare_guest_switch(vcpu);
> if (vcpu->fpu_active)
> kvm_load_guest_fpu(vcpu);
One fewer kvm_put_guest_xcr0, at least in eager mode.
> - kvm_load_guest_xcr0(vcpu);
> -
One fewer kvm_load_guest_xcr0.
> vcpu->mode = IN_GUEST_MODE;
>
> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
> local_irq_disable();
>
> + kvm_load_guest_xcr0(vcpu);
One more kvm_load_guest_xcr0.
> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> || need_resched() || signal_pending(current)) {
> vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> vcpu->mode = OUTSIDE_GUEST_MODE;
> smp_wmb();
>
> + kvm_put_guest_xcr0(vcpu);
One more kvm_put_guest_xcr0.
So everything balances out. Considering that the logic is cleaner, I
can apply this to all released kernels.
Paolo
> /* Interrupt is enabled by handle_external_intr() */
> kvm_x86_ops->handle_external_intr(vcpu);
>
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> * and assume host would use all available bits.
> * Guest xcr0 would be loaded later.
> */
> - kvm_put_guest_xcr0(vcpu);
> vcpu->guest_fpu_loaded = 1;
> __kernel_fpu_begin();
> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>
> void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
> {
> - kvm_put_guest_xcr0(vcpu);
> -
> if (!vcpu->guest_fpu_loaded) {
> vcpu->fpu_counter = 0;
> return;
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
2016-03-31 8:05 ` Paolo Bonzini
@ 2016-04-05 11:28 ` Paolo Bonzini
2016-04-05 15:56 ` David Matlack
2016-04-22 7:30 ` Wanpeng Li
2 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-05 11:28 UTC (permalink / raw)
To: David Matlack, kvm; +Cc: linux-kernel, luto, stable
On 30/03/2016 21:24, David Matlack wrote:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
> - the guest's xcr0 register is loaded on the cpu
> - the guest's fpu context is not loaded
> - the host is using eagerfpu
>
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
>
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
>
> if (irq_fpu_usable()) {
> kernel_fpu_begin();
>
> [... code that uses the fpu ...]
>
> kernel_fpu_end();
> }
>
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
>
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.
>
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
>
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
>
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
>
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> arch/x86/kvm/x86.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
While running my acceptance tests, in one case I got one CPU whose xcr0
had leaked into the host. This showed up as a SIGILL in strncasecmp's
AVX code, and a simple program confirmed it:
$ cat xgetbv.c
#include <stdio.h>
int main(void)
{
unsigned xcr0_h, xcr0_l;
asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
printf("%08x:%08x\n", xcr0_h, xcr0_l);
}
$ gcc xgetbv.c -O2
$ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
19 00000000:00000003
I'm going to rerun the tests without this patch, as it seems the most
likely culprit, and leave it out of the pull request if they pass.
Paolo
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e260ccb..8df1167 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
> if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
> return 1;
> }
> - kvm_put_guest_xcr0(vcpu);
> vcpu->arch.xcr0 = xcr0;
>
> if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_x86_ops->prepare_guest_switch(vcpu);
> if (vcpu->fpu_active)
> kvm_load_guest_fpu(vcpu);
> - kvm_load_guest_xcr0(vcpu);
> -
> vcpu->mode = IN_GUEST_MODE;
>
> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
> local_irq_disable();
>
> + kvm_load_guest_xcr0(vcpu);
> +
> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> || need_resched() || signal_pending(current)) {
> vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> vcpu->mode = OUTSIDE_GUEST_MODE;
> smp_wmb();
>
> + kvm_put_guest_xcr0(vcpu);
> +
> /* Interrupt is enabled by handle_external_intr() */
> kvm_x86_ops->handle_external_intr(vcpu);
>
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> * and assume host would use all available bits.
> * Guest xcr0 would be loaded later.
> */
> - kvm_put_guest_xcr0(vcpu);
> vcpu->guest_fpu_loaded = 1;
> __kernel_fpu_begin();
> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>
> void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
> {
> - kvm_put_guest_xcr0(vcpu);
> -
> if (!vcpu->guest_fpu_loaded) {
> vcpu->fpu_counter = 0;
> return;
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-05 11:28 ` Paolo Bonzini
@ 2016-04-05 15:56 ` David Matlack
2016-04-05 16:07 ` Paolo Bonzini
2016-04-07 9:08 ` Paolo Bonzini
0 siblings, 2 replies; 14+ messages in thread
From: David Matlack @ 2016-04-05 15:56 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
...
>
> While running my acceptance tests, in one case I got one CPU whose xcr0
> had leaked into the host. This showed up as a SIGILL in strncasecmp's
> AVX code, and a simple program confirmed it:
>
> $ cat xgetbv.c
> #include <stdio.h>
> int main(void)
> {
> unsigned xcr0_h, xcr0_l;
> asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
> printf("%08x:%08x\n", xcr0_h, xcr0_l);
> }
> $ gcc xgetbv.c -O2
> $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
> 19 00000000:00000003
>
> I'm going to rerun the tests without this patch, as it seems the most
> likely culprit, and leave it out of the pull request if they pass.
Agreed this is a very likely culprit. I think I see one way the
guest's xcr0 can leak into the host. I will do some testing an send
another version. Thanks.
>
> Paolo
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e260ccb..8df1167 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>> if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
>> return 1;
>> }
>> - kvm_put_guest_xcr0(vcpu);
>> vcpu->arch.xcr0 = xcr0;
>>
>> if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
>> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> kvm_x86_ops->prepare_guest_switch(vcpu);
>> if (vcpu->fpu_active)
>> kvm_load_guest_fpu(vcpu);
>> - kvm_load_guest_xcr0(vcpu);
>> -
>> vcpu->mode = IN_GUEST_MODE;
>>
>> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>
>> local_irq_disable();
>>
>> + kvm_load_guest_xcr0(vcpu);
>> +
>> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>> || need_resched() || signal_pending(current)) {
Here, after we've loaded the guest xcr0, if we enter this if
statement, we return from vcpu_enter_guest with the guest's xcr0 still
loaded.
>> vcpu->mode = OUTSIDE_GUEST_MODE;
>> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> vcpu->mode = OUTSIDE_GUEST_MODE;
>> smp_wmb();
>>
>> + kvm_put_guest_xcr0(vcpu);
>> +
>> /* Interrupt is enabled by handle_external_intr() */
>> kvm_x86_ops->handle_external_intr(vcpu);
>>
>> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>> * and assume host would use all available bits.
>> * Guest xcr0 would be loaded later.
>> */
>> - kvm_put_guest_xcr0(vcpu);
>> vcpu->guest_fpu_loaded = 1;
>> __kernel_fpu_begin();
>> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
>> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>>
>> void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>> {
>> - kvm_put_guest_xcr0(vcpu);
>> -
>> if (!vcpu->guest_fpu_loaded) {
>> vcpu->fpu_counter = 0;
>> return;
>>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-05 15:56 ` David Matlack
@ 2016-04-05 16:07 ` Paolo Bonzini
2016-04-07 9:08 ` Paolo Bonzini
1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-05 16:07 UTC (permalink / raw)
To: David Matlack
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On 05/04/2016 17:56, David Matlack wrote:
> > I'm going to rerun the tests without this patch, as it seems the most
> > likely culprit, and leave it out of the pull request if they pass.
>
> Agreed this is a very likely culprit. I think I see one way the
> guest's xcr0 can leak into the host. I will do some testing an send
> another version. Thanks.
Tests passed without this patch.
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-05 15:56 ` David Matlack
2016-04-05 16:07 ` Paolo Bonzini
@ 2016-04-07 9:08 ` Paolo Bonzini
2016-04-07 18:18 ` David Matlack
1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-07 9:08 UTC (permalink / raw)
To: David Matlack
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On 05/04/2016 17:56, David Matlack wrote:
> On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
> ...
>>
>> While running my acceptance tests, in one case I got one CPU whose xcr0
>> had leaked into the host. This showed up as a SIGILL in strncasecmp's
>> AVX code, and a simple program confirmed it:
>>
>> $ cat xgetbv.c
>> #include <stdio.h>
>> int main(void)
>> {
>> unsigned xcr0_h, xcr0_l;
>> asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
>> printf("%08x:%08x\n", xcr0_h, xcr0_l);
>> }
>> $ gcc xgetbv.c -O2
>> $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
>> 19 00000000:00000003
>>
>> I'm going to rerun the tests without this patch, as it seems the most
>> likely culprit, and leave it out of the pull request if they pass.
>
> Agreed this is a very likely culprit. I think I see one way the
> guest's xcr0 can leak into the host.
That's cancel_injection, right? If it's just about moving the load call
below, I can do that. Hmm, I will even test that today. :)
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-07 9:08 ` Paolo Bonzini
@ 2016-04-07 18:18 ` David Matlack
2016-04-07 19:03 ` Paolo Bonzini
0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-07 18:18 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On Thu, Apr 7, 2016 at 2:08 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 05/04/2016 17:56, David Matlack wrote:
>> On Tue, Apr 5, 2016 at 4:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>> ...
>>>
>>> While running my acceptance tests, in one case I got one CPU whose xcr0
>>> had leaked into the host. This showed up as a SIGILL in strncasecmp's
>>> AVX code, and a simple program confirmed it:
>>>
>>> $ cat xgetbv.c
>>> #include <stdio.h>
>>> int main(void)
>>> {
>>> unsigned xcr0_h, xcr0_l;
>>> asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
>>> printf("%08x:%08x\n", xcr0_h, xcr0_l);
>>> }
>>> $ gcc xgetbv.c -O2
>>> $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007
>>> 19 00000000:00000003
>>>
>>> I'm going to rerun the tests without this patch, as it seems the most
>>> likely culprit, and leave it out of the pull request if they pass.
>>
>> Agreed this is a very likely culprit. I think I see one way the
>> guest's xcr0 can leak into the host.
>
> That's cancel_injection, right? If it's just about moving the load call
> below, I can do that. Hmm, I will even test that today. :)
Yes that's what I was thinking, move kvm_load_guest_xcr0 below that if.
Thank you :). Let me know how testing goes.
>
> Paolo
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-07 18:18 ` David Matlack
@ 2016-04-07 19:03 ` Paolo Bonzini
2016-04-08 16:25 ` David Matlack
0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-07 19:03 UTC (permalink / raw)
To: David Matlack; +Cc: kvm list, linux-kernel, Andy Lutomirski, stable
----- Original Message -----
> >>> While running my acceptance tests, in one case I got one CPU whose xcr0
> >>> had leaked into the host. This showed up as a SIGILL in strncasecmp's
> >>> AVX code, and a simple program confirmed it:
> >>>
> >>> $ cat xgetbv.c
> >>> #include <stdio.h>
> >>> int main(void)
> >>> {
> >>> unsigned xcr0_h, xcr0_l;
> >>> asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0));
> >>> printf("%08x:%08x\n", xcr0_h, xcr0_l);
> >>> }
> >>> $ gcc xgetbv.c -O2
> >>> $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep
> >>> -v 007
> >>> 19 00000000:00000003
> >>>
> >>> I'm going to rerun the tests without this patch, as it seems the most
> >>> likely culprit, and leave it out of the pull request if they pass.
> >>
> >> Agreed this is a very likely culprit. I think I see one way the
> >> guest's xcr0 can leak into the host.
> >
> > That's cancel_injection, right? If it's just about moving the load call
> > below, I can do that. Hmm, I will even test that today. :)
>
> Yes that's what I was thinking, move kvm_load_guest_xcr0 below that if.
>
> Thank you :). Let me know how testing goes.
It went well.
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-07 19:03 ` Paolo Bonzini
@ 2016-04-08 16:25 ` David Matlack
2016-04-08 16:50 ` Paolo Bonzini
0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-08 16:25 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> Thank you :). Let me know how testing goes.
>
> It went well.
Great! How should we proceed?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-08 16:25 ` David Matlack
@ 2016-04-08 16:50 ` Paolo Bonzini
2016-04-08 16:54 ` David Matlack
0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2016-04-08 16:50 UTC (permalink / raw)
To: David Matlack
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On 08/04/2016 18:25, David Matlack wrote:
> On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Thank you :). Let me know how testing goes.
>>
>> It went well.
>
> Great! How should we proceed?
It will appear very soon on kvm/next and Radim will send the pull
request to Linus next week (I'm having him practice before I go on
vacation ;)).
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-08 16:50 ` Paolo Bonzini
@ 2016-04-08 16:54 ` David Matlack
0 siblings, 0 replies; 14+ messages in thread
From: David Matlack @ 2016-04-08 16:54 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm list, linux-kernel@vger.kernel.org, Andy Lutomirski, stable
On Fri, Apr 8, 2016 at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 08/04/2016 18:25, David Matlack wrote:
>> On Thu, Apr 7, 2016 at 12:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> Thank you :). Let me know how testing goes.
>>>
>>> It went well.
>>
>> Great! How should we proceed?
>
> It will appear very soon on kvm/next and Radim will send the pull
> request to Linus next week (I'm having him practice before I go on
> vacation ;)).
Makes sense. Thanks for taking care of it!
>
> Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-03-30 19:24 [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers David Matlack
2016-03-31 8:05 ` Paolo Bonzini
2016-04-05 11:28 ` Paolo Bonzini
@ 2016-04-22 7:30 ` Wanpeng Li
2016-04-22 17:21 ` David Matlack
2 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2016-04-22 7:30 UTC (permalink / raw)
To: David Matlack
Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
stable
Hi Paolo and David,
2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
> An interrupt handler that uses the fpu can kill a KVM VM, if it runs
> under the following conditions:
> - the guest's xcr0 register is loaded on the cpu
> - the guest's fpu context is not loaded
> - the host is using eagerfpu
>
> Note that the guest's xcr0 register and fpu context are not loaded as
> part of the atomic world switch into "guest mode". They are loaded by
> KVM while the cpu is still in "host mode".
>
> Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
> interrupt handler will look something like this:
>
> if (irq_fpu_usable()) {
> kernel_fpu_begin();
>
> [... code that uses the fpu ...]
>
> kernel_fpu_end();
> }
>
> As long as the guest's fpu is not loaded and the host is using eager
> fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
> returns true). The interrupt handler proceeds to use the fpu with
> the guest's xcr0 live.
>
> kernel_fpu_begin() saves the current fpu context. If this uses
> XSAVE[OPT], it may leave the xsave area in an undesirable state.
> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
> xcr0[i] == 0 following an XSAVE.
How XSAVE save bit i since SDM mentioned that "XSAVE saves state
component i if and only if RFBM[i] = 1. "? RFBM[i] will be 0 if
XSTATE_BV[i] == 1 && guest xcr0[i] == 0.
Regards,
Wanpeng Li
>
> kernel_fpu_end() restores the fpu context. Now if any bit i in
> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
> fault is trapped and SIGSEGV is delivered to the current process.
>
> Only pre-4.2 kernels appear to be vulnerable to this sequence of
> events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
> from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
>
> This patch fixes the bug by keeping the host's xcr0 loaded outside
> of the interrupts-disabled region where KVM switches into guest mode.
>
> Cc: stable@vger.kernel.org
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> arch/x86/kvm/x86.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e260ccb..8df1167 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
> if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
> return 1;
> }
> - kvm_put_guest_xcr0(vcpu);
> vcpu->arch.xcr0 = xcr0;
>
> if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
> @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_x86_ops->prepare_guest_switch(vcpu);
> if (vcpu->fpu_active)
> kvm_load_guest_fpu(vcpu);
> - kvm_load_guest_xcr0(vcpu);
> -
> vcpu->mode = IN_GUEST_MODE;
>
> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
> local_irq_disable();
>
> + kvm_load_guest_xcr0(vcpu);
> +
> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> || need_resched() || signal_pending(current)) {
> vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> vcpu->mode = OUTSIDE_GUEST_MODE;
> smp_wmb();
>
> + kvm_put_guest_xcr0(vcpu);
> +
> /* Interrupt is enabled by handle_external_intr() */
> kvm_x86_ops->handle_external_intr(vcpu);
>
> @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> * and assume host would use all available bits.
> * Guest xcr0 would be loaded later.
> */
> - kvm_put_guest_xcr0(vcpu);
> vcpu->guest_fpu_loaded = 1;
> __kernel_fpu_begin();
> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
> @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>
> void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
> {
> - kvm_put_guest_xcr0(vcpu);
> -
> if (!vcpu->guest_fpu_loaded) {
> vcpu->fpu_counter = 0;
> return;
> --
> 2.8.0.rc3.226.g39d4020
>
--
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-22 7:30 ` Wanpeng Li
@ 2016-04-22 17:21 ` David Matlack
2016-04-23 22:37 ` Wanpeng Li
0 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2016-04-22 17:21 UTC (permalink / raw)
To: Wanpeng Li
Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
stable
On Fri, Apr 22, 2016 at 12:30 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Paolo and David,
> 2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
>>
>> kernel_fpu_begin() saves the current fpu context. If this uses
>> XSAVE[OPT], it may leave the xsave area in an undesirable state.
>> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
>> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
>> xcr0[i] == 0 following an XSAVE.
>
> How XSAVE save bit i since SDM mentioned that "XSAVE saves state
> component i if and only if RFBM[i] = 1. "? RFBM[i] will be 0 if
> XSTATE_BV[i] == 1 && guest xcr0[i] == 0.
You are correct, RFBM[i] will be 0 and XSAVE does not save state
component i in this case. However, XSTATE_BV[i] is left untouched by
XSAVE (left as 1). On XRSTOR, the CPU checks if XSTATE_BV[i] == 1 &&
xcr0[i] == 0, and if so delivers a #GP.
If you are wondering how XSTATE_BV[i] could be 1 in the first place, I
suspect it is left over from a previous XSAVE (which sets XSTATE_BV[i]
to the value in XINUSE[i]).
>
> Regards,
> Wanpeng Li
>
>>
>> kernel_fpu_end() restores the fpu context. Now if any bit i in
>> XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
>> fault is trapped and SIGSEGV is delivered to the current process.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers
2016-04-22 17:21 ` David Matlack
@ 2016-04-23 22:37 ` Wanpeng Li
0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2016-04-23 22:37 UTC (permalink / raw)
To: David Matlack
Cc: kvm, Paolo Bonzini, linux-kernel@vger.kernel.org, Andy Lutomirski,
stable
2016-04-23 1:21 GMT+08:00 David Matlack <dmatlack@google.com>:
> On Fri, Apr 22, 2016 at 12:30 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Paolo and David,
>> 2016-03-31 3:24 GMT+08:00 David Matlack <dmatlack@google.com>:
>>>
>>> kernel_fpu_begin() saves the current fpu context. If this uses
>>> XSAVE[OPT], it may leave the xsave area in an undesirable state.
>>> According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
>>> if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
>>> xcr0[i] == 0 following an XSAVE.
>>
>> How XSAVE save bit i since SDM mentioned that "XSAVE saves state
>> component i if and only if RFBM[i] = 1. "? RFBM[i] will be 0 if
>> XSTATE_BV[i] == 1 && guest xcr0[i] == 0.
>
> You are correct, RFBM[i] will be 0 and XSAVE does not save state
> component i in this case. However, XSTATE_BV[i] is left untouched by
> XSAVE (left as 1). On XRSTOR, the CPU checks if XSTATE_BV[i] == 1 &&
> xcr0[i] == 0, and if so delivers a #GP.
However, SDM also mentioned that "If RFBM[i] = 0, XRSTOR does not
update state component i." So we #GP on a don't need restore bit i if
XSTATE_BV[I] == 1 && xcr0[0] ==0. That's where I miss I think, thanks
for your explanation.
Regard,
Wanpeng Li
>
> If you are wondering how XSTATE_BV[i] could be 1 in the first place, I
> suspect it is left over from a previous XSAVE (which sets XSTATE_BV[i]
> to the value in XINUSE[i]).
^ permalink raw reply [flat|nested] 14+ messages in thread