regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15
@ 2015-03-18  8:46 Stefan Bader
  2015-03-18  9:18 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2015-03-18  8:46 UTC (permalink / raw)
  To: kvm, Linux Kernel Mailing List, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

Someone reported[1] that some of their L1 guests fail to load the kvm-intel
module (without much details). Turns out that this was (at least) caused by

KVM: vmx: Allow the guest to run with dirty debug registers

as this adds VM_EXIT_SAVE_DEBUG_CONTROLS to the required MSR_IA32_VMX_EXIT_CTLS
bits. Not sure this should be fixed up in pre 3.15 kernels or the other way
round. Maybe naively asked but would it be sufficient to add this as required to
older kernels vmcs setup (without the code to make any use of it)?

Regardless of that, I wonder whether the below (this version untested) sound
acceptable for upstream? At least it would make debugging much simpler. :)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct
        ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */

        /* Ensure minimum (required) set of control bits are supported. */
-       if (ctl_min & ~ctl)
+       if (ctl_min & ~ctl) {
+               printk(KERN_ERR "vmx: msr(%08x) does not match requirements. "
+                               "req=%08x cur=%08x\n", msr, ctl_min, ctl);
                return -EIO;
+       }

        *result = ctl;
        return 0;

Thanks,
-Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1431473

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15
  2015-03-18  8:46 regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15 Stefan Bader
@ 2015-03-18  9:18 ` Paolo Bonzini
  2015-03-18  9:59   ` Stefan Bader
  2015-03-19 19:58   ` regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10 Stefan Bader
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-03-18  9:18 UTC (permalink / raw)
  To: Stefan Bader, kvm, Linux Kernel Mailing List



On 18/03/2015 09:46, Stefan Bader wrote:
> 
> Regardless of that, I wonder whether the below (this version untested) sound
> acceptable for upstream? At least it would make debugging much simpler. :)
> 
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct
>         ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */
> 
>         /* Ensure minimum (required) set of control bits are supported. */
> -       if (ctl_min & ~ctl)
> +       if (ctl_min & ~ctl) {
> +               printk(KERN_ERR "vmx: msr(%08x) does not match requirements. "
> +                               "req=%08x cur=%08x\n", msr, ctl_min, ctl);
>                 return -EIO;
> +       }
> 
>         *result = ctl;
>         return 0;

Yes, this is nice.  Maybe -ENODEV.

Also, a minimal patch for Ubuntu would probably be:

@@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 		      vmx_capability.ept, vmx_capability.vpid);
 	}
 
-	min = 0;
+	min = VM_EXIT_SAVE_DEBUG_CONTROLS;
 #ifdef CONFIG_X86_64
 	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
 #endif

but I don't think it's a good idea to add it to stable kernels.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15
  2015-03-18  9:18 ` Paolo Bonzini
@ 2015-03-18  9:59   ` Stefan Bader
  2015-03-18 10:27     ` Paolo Bonzini
  2015-03-19 19:58   ` regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10 Stefan Bader
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2015-03-18  9:59 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

On 18.03.2015 10:18, Paolo Bonzini wrote:
> 
> 
> On 18/03/2015 09:46, Stefan Bader wrote:
>>
>> Regardless of that, I wonder whether the below (this version untested) sound
>> acceptable for upstream? At least it would make debugging much simpler. :)
>>
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct
>>         ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */
>>
>>         /* Ensure minimum (required) set of control bits are supported. */
>> -       if (ctl_min & ~ctl)
>> +       if (ctl_min & ~ctl) {
>> +               printk(KERN_ERR "vmx: msr(%08x) does not match requirements. "
>> +                               "req=%08x cur=%08x\n", msr, ctl_min, ctl);
>>                 return -EIO;
>> +       }
>>
>>         *result = ctl;
>>         return 0;
> 
> Yes, this is nice.  Maybe -ENODEV.

Maybe, though I did not change that. Just added to give some kind of hint when
the module would otherwise fail with just an IO error.

> 
> Also, a minimal patch for Ubuntu would probably be:
> 
> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  		      vmx_capability.ept, vmx_capability.vpid);
>  	}
>  
> -	min = 0;
> +	min = VM_EXIT_SAVE_DEBUG_CONTROLS;
>  #ifdef CONFIG_X86_64
>  	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
>  #endif
> 
> but I don't think it's a good idea to add it to stable kernels.

Why is that? Because it has a risk of causing the module failing to load on L0
where it did work before? Which would be something I would rather avoid.
Generally I think it would be good to have something that can be generally
applied. Given the speed that cloud service providers tend to move forward (ok
they may not actively push the ability to go nested).

-Stefan
> 
> Paolo
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15
  2015-03-18  9:59   ` Stefan Bader
@ 2015-03-18 10:27     ` Paolo Bonzini
  2015-03-18 10:30       ` Stefan Bader
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2015-03-18 10:27 UTC (permalink / raw)
  To: Stefan Bader, kvm, Linux Kernel Mailing List



On 18/03/2015 10:59, Stefan Bader wrote:
>> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct
>> vmcs_config *vmcs_conf) vmx_capability.ept,
>> vmx_capability.vpid); }
>> 
>> -	min = 0; +	min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef
>> CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif
>> 
>> but I don't think it's a good idea to add it to stable kernels.
> 
> Why is that? Because it has a risk of causing the module failing to
> load on L0 where it did work before?

Because if we wanted to make 3.14 nested VMX stable-ish we would need
several more, at least these:

      KVM: nVMX: fix lifetime issues for vmcs02
      KVM: nVMX: clean up nested_release_vmcs12 and code around it
      KVM: nVMX: Rework interception of IRQs and NMIs
      KVM: nVMX: Do not inject NMI vmexits when L2 has a pending
                 interrupt
      KVM: nVMX: Disable preemption while reading from shadow VMCS

and for 3.13:

      KVM: nVMX: Leave VMX mode on clearing of feature control MSR

There are also several L2-crash-L1 bugs too in Nadav Amit's patches.

Basically, nested VMX was never considered stable-worthy.  Perhaps
that can change soon---but not retroactively.

So I'd rather avoid giving false impressions of the stability of nVMX
in 3.14.

Even if we considered nVMX stable, I'd _really_ not want to consider
the L1<->L2 boundary a secure one for a longer time.

> Which would be something I would rather avoid. Generally I think it
> would be good to have something that can be generally applied.
> Given the speed that cloud service providers tend to move forward
> (ok they may not actively push the ability to go nested).

And if they did, I'd really not want them to do it with a 3.14 kernel.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15
  2015-03-18 10:27     ` Paolo Bonzini
@ 2015-03-18 10:30       ` Stefan Bader
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Bader @ 2015-03-18 10:30 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

On 18.03.2015 11:27, Paolo Bonzini wrote:
> 
> 
> On 18/03/2015 10:59, Stefan Bader wrote:
>>> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct
>>> vmcs_config *vmcs_conf) vmx_capability.ept,
>>> vmx_capability.vpid); }
>>>
>>> -	min = 0; +	min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef
>>> CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif
>>>
>>> but I don't think it's a good idea to add it to stable kernels.
>>
>> Why is that? Because it has a risk of causing the module failing to
>> load on L0 where it did work before?
> 
> Because if we wanted to make 3.14 nested VMX stable-ish we would need
> several more, at least these:
> 
>       KVM: nVMX: fix lifetime issues for vmcs02
>       KVM: nVMX: clean up nested_release_vmcs12 and code around it
>       KVM: nVMX: Rework interception of IRQs and NMIs
>       KVM: nVMX: Do not inject NMI vmexits when L2 has a pending
>                  interrupt
>       KVM: nVMX: Disable preemption while reading from shadow VMCS
> 
> and for 3.13:
> 
>       KVM: nVMX: Leave VMX mode on clearing of feature control MSR
> 
> There are also several L2-crash-L1 bugs too in Nadav Amit's patches.
> 
> Basically, nested VMX was never considered stable-worthy.  Perhaps
> that can change soon---but not retroactively.
> 
> So I'd rather avoid giving false impressions of the stability of nVMX
> in 3.14.
> 
> Even if we considered nVMX stable, I'd _really_ not want to consider
> the L1<->L2 boundary a secure one for a longer time.
> 
>> Which would be something I would rather avoid. Generally I think it
>> would be good to have something that can be generally applied.
>> Given the speed that cloud service providers tend to move forward
>> (ok they may not actively push the ability to go nested).
> 
> And if they did, I'd really not want them to do it with a 3.14 kernel.

3.14... you are optimistic. :) But thanks a lot for the detailed info.

-Stefan

> 
> Paolo
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10
  2015-03-18  9:18 ` Paolo Bonzini
  2015-03-18  9:59   ` Stefan Bader
@ 2015-03-19 19:58   ` Stefan Bader
  2015-03-19 20:08     ` Paolo Bonzini
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2015-03-19 19:58 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Linux Kernel Mailing List, Ben Hutchings

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On 18.03.2015 10:18, Paolo Bonzini wrote:
> 
> 
> On 18/03/2015 09:46, Stefan Bader wrote:
>>
>> Regardless of that, I wonder whether the below (this version untested) sound
>> acceptable for upstream? At least it would make debugging much simpler. :)
>>
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct
>>         ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */
>>
>>         /* Ensure minimum (required) set of control bits are supported. */
>> -       if (ctl_min & ~ctl)
>> +       if (ctl_min & ~ctl) {
>> +               printk(KERN_ERR "vmx: msr(%08x) does not match requirements. "
>> +                               "req=%08x cur=%08x\n", msr, ctl_min, ctl);
>>                 return -EIO;
>> +       }
>>
>>         *result = ctl;
>>         return 0;
> 
> Yes, this is nice.  Maybe -ENODEV.
> 
> Also, a minimal patch for Ubuntu would probably be:
> 
> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  		      vmx_capability.ept, vmx_capability.vpid);
>  	}
>  
> -	min = 0;
> +	min = VM_EXIT_SAVE_DEBUG_CONTROLS;
>  #ifdef CONFIG_X86_64
>  	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
>  #endif
> 
> but I don't think it's a good idea to add it to stable kernels.

Sorry, I got a bit confused on my assumptions. While the change above causes
guests to fail but the statement to say this is caused by host kernels before
this change was against better knowledge and wrong.

The actual range was hosts running 3.2 which (maybe not perfect but at least
well enough) allowed to use nested vmx for guest kernel <3.15 will break. But
running 3.13 on the host has no issues.

Comparing the rdmsr values of guests between those two host kernels, I found
that on 3.2 the exit control msr was very sparsely initialized. And looking at
the changes between 3.2 and 3.13 I found

commit 33fb20c39e98b90813b5ab2d9a0d6faa6300caca
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Wed Mar 6 15:44:03 2013 +0100

    KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS

This was added in 3.10. So the range of kernels affected <3.10 back to when
nested vmx became somewhat usable. For 3.2 Ben (and obviously us) would be
affected. Apart from that, I believe, it is only 3.4 which has an active
longterm. At least that change looks safer for stable as it sounds like
correcting things and not adding a feature. I was able to cherry-pick that into
a 3.2 kernel and then a 3.16 guest successfully can load the kvm-intel module
again, of course with the same shortcomings as before.

-Stefan
> 
> Paolo
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10
  2015-03-19 19:58   ` regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10 Stefan Bader
@ 2015-03-19 20:08     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-03-19 20:08 UTC (permalink / raw)
  To: Stefan Bader, kvm, Linux Kernel Mailing List, Ben Hutchings



On 19/03/2015 20:58, Stefan Bader wrote:
> This was added in 3.10. So the range of kernels affected <3.10 back
> to when nested vmx became somewhat usable. For 3.2 Ben (and
> obviously us) would be affected. Apart from that, I believe, it is
> only 3.4 which has an active longterm. At least that change looks
> safer for stable as it sounds like correcting things and not adding
> a feature. I was able to cherry-pick that into a 3.2 kernel and
> then a 3.16 guest successfully can load the kvm-intel module again,
> of course with the same shortcomings as before.

Feel free to backport whatever you want to distro kernels.  But I'm
going to NACK for stable@ anything that is related to nested virt.

The code has changed so much that I simply cannot do a meaningful
review of most patches when applied to old codebases.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-19 20:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-18  8:46 regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.15 Stefan Bader
2015-03-18  9:18 ` Paolo Bonzini
2015-03-18  9:59   ` Stefan Bader
2015-03-18 10:27     ` Paolo Bonzini
2015-03-18 10:30       ` Stefan Bader
2015-03-19 19:58   ` regression: nested: L1 3.15+ fails to load kvm-intel on L0 <3.10 Stefan Bader
2015-03-19 20:08     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).