* nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
@ 2014-10-08 8:29 Jan Kiszka
2014-10-08 8:47 ` Paolo Bonzini
2014-10-08 9:25 ` Wanpeng Li
0 siblings, 2 replies; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 8:29 UTC (permalink / raw)
To: kvm, Paolo Bonzini, Bandan Das
Hi all,
after migrating a Jailhouse VM to a newer host platform with shadow VMCS
support I found a bug. As you may know, Jailhouse doesn't intercept
interrupts, thus also never requests an interrupt window. Nevertheless:
qemu-system-x86-5777 [001] 74970.625324: kvm_mmio: mmio write len 4 gpa 0xfebf5008 val 0x20 qemu-system-x86-5777 [001] 74970.625325: kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
qemu-system-x86-5777 [001] 74970.625330: kvm_entry: vcpu 1
qemu-system-x86-5777 [001] 74970.625333: kvm_exit: reason PENDING_INTERRUPT rip 0xffffffff81043e54 info 0 0
qemu-system-x86-5777 [001] 74970.625333: kvm_nested_vmexit: rip: 0xffffffff81043e54 reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
qemu-system-x86-5777 [001] 74970.625334: kvm_nested_vmexit_inject: reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
qemu-system-x86-5777 [001] 74970.625339: kvm_entry: vcpu 1
qemu-system-x86-5777 [001] 74970.625341: kvm_exit: reason EPT_MISCONFIG rip 0xfffffffff0002307 info 0 0
qemu-system-x86-5777 [001] 74970.625343: kvm_emulate_insn: 0:fffffffff0002307:8b 40 20 (prot64)
And then Jailhouse crashes (which is also interesting to understand why
- L1 host state is corrupt). Anyway, the point is that we leak
CPU_BASED_VIRTUAL_INTR_PENDING from L0 into vmcs12. L0 sets it before
entering L2, and then we transfer it from the hardware state to vmc12 on
exit because that VMCS field is shadowed. The crash disappears when
disabling VMCS shadowing.
Can we simply stop shadowing CPU_BASED_VM_EXEC_CONTROL when
CPU_BASED_VIRTUAL_INTR_PENDING is injected to L2?
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 8:29 nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken Jan Kiszka
@ 2014-10-08 8:47 ` Paolo Bonzini
2014-10-08 8:54 ` Jan Kiszka
2014-10-08 9:25 ` Wanpeng Li
1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2014-10-08 8:47 UTC (permalink / raw)
To: Jan Kiszka, kvm, Bandan Das
Il 08/10/2014 10:29, Jan Kiszka ha scritto:
> Hi all,
>
> after migrating a Jailhouse VM to a newer host platform with shadow VMCS
> support I found a bug. As you may know, Jailhouse doesn't intercept
> interrupts, thus also never requests an interrupt window. Nevertheless:
>
> qemu-system-x86-5777 [001] 74970.625324: kvm_mmio: mmio write len 4 gpa 0xfebf5008 val 0x20 qemu-system-x86-5777 [001] 74970.625325: kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
> qemu-system-x86-5777 [001] 74970.625330: kvm_entry: vcpu 1
> qemu-system-x86-5777 [001] 74970.625333: kvm_exit: reason PENDING_INTERRUPT rip 0xffffffff81043e54 info 0 0
> qemu-system-x86-5777 [001] 74970.625333: kvm_nested_vmexit: rip: 0xffffffff81043e54 reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> qemu-system-x86-5777 [001] 74970.625334: kvm_nested_vmexit_inject: reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> qemu-system-x86-5777 [001] 74970.625339: kvm_entry: vcpu 1
> qemu-system-x86-5777 [001] 74970.625341: kvm_exit: reason EPT_MISCONFIG rip 0xfffffffff0002307 info 0 0
> qemu-system-x86-5777 [001] 74970.625343: kvm_emulate_insn: 0:fffffffff0002307:8b 40 20 (prot64)
>
> And then Jailhouse crashes (which is also interesting to understand why
> - L1 host state is corrupt). Anyway, the point is that we leak
> CPU_BASED_VIRTUAL_INTR_PENDING from L0 into vmcs12. L0 sets it before
> entering L2, and then we transfer it from the hardware state to vmc12 on
> exit because that VMCS field is shadowed. The crash disappears when
> disabling VMCS shadowing.
>
> Can we simply stop shadowing CPU_BASED_VM_EXEC_CONTROL when
> CPU_BASED_VIRTUAL_INTR_PENDING is injected to L2?
The main problem is that we have a single shadowing bitmap for all
virtual machines. I cannot think of a simple solution except not
shadowing that field at all.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 8:47 ` Paolo Bonzini
@ 2014-10-08 8:54 ` Jan Kiszka
0 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 8:54 UTC (permalink / raw)
To: Paolo Bonzini, kvm, Bandan Das
On 2014-10-08 10:47, Paolo Bonzini wrote:
> Il 08/10/2014 10:29, Jan Kiszka ha scritto:
>> Hi all,
>>
>> after migrating a Jailhouse VM to a newer host platform with shadow VMCS
>> support I found a bug. As you may know, Jailhouse doesn't intercept
>> interrupts, thus also never requests an interrupt window. Nevertheless:
>>
>> qemu-system-x86-5777 [001] 74970.625324: kvm_mmio: mmio write len 4 gpa 0xfebf5008 val 0x20 qemu-system-x86-5777 [001] 74970.625325: kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
>> qemu-system-x86-5777 [001] 74970.625330: kvm_entry: vcpu 1
>> qemu-system-x86-5777 [001] 74970.625333: kvm_exit: reason PENDING_INTERRUPT rip 0xffffffff81043e54 info 0 0
>> qemu-system-x86-5777 [001] 74970.625333: kvm_nested_vmexit: rip: 0xffffffff81043e54 reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
>> qemu-system-x86-5777 [001] 74970.625334: kvm_nested_vmexit_inject: reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
>> qemu-system-x86-5777 [001] 74970.625339: kvm_entry: vcpu 1
>> qemu-system-x86-5777 [001] 74970.625341: kvm_exit: reason EPT_MISCONFIG rip 0xfffffffff0002307 info 0 0
>> qemu-system-x86-5777 [001] 74970.625343: kvm_emulate_insn: 0:fffffffff0002307:8b 40 20 (prot64)
>>
>> And then Jailhouse crashes (which is also interesting to understand why
>> - L1 host state is corrupt). Anyway, the point is that we leak
>> CPU_BASED_VIRTUAL_INTR_PENDING from L0 into vmcs12. L0 sets it before
>> entering L2, and then we transfer it from the hardware state to vmc12 on
>> exit because that VMCS field is shadowed. The crash disappears when
>> disabling VMCS shadowing.
>>
>> Can we simply stop shadowing CPU_BASED_VM_EXEC_CONTROL when
>> CPU_BASED_VIRTUAL_INTR_PENDING is injected to L2?
>
> The main problem is that we have a single shadowing bitmap for all
> virtual machines. I cannot think of a simple solution except not
> shadowing that field at all.
But that could be changed to per VCPU, no? For the time being, dropping
the field from the shadowed list is likely the quickest fix.
Do we have more of such cases where the host writes to the hw state that
is also used for shadowing? Guess it's good to have a second look...
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 8:29 nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken Jan Kiszka
2014-10-08 8:47 ` Paolo Bonzini
@ 2014-10-08 9:25 ` Wanpeng Li
2014-10-08 9:51 ` Jan Kiszka
1 sibling, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2014-10-08 9:25 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm, Paolo Bonzini, Bandan Das
Hi Jan,
On Wed, Oct 08, 2014 at 10:29:45AM +0200, Jan Kiszka wrote:
>Hi all,
>
>after migrating a Jailhouse VM to a newer host platform with shadow VMCS
>support I found a bug. As you may know, Jailhouse doesn't intercept
>interrupts, thus also never requests an interrupt window. Nevertheless:
>
> qemu-system-x86-5777 [001] 74970.625324: kvm_mmio: mmio write len 4 gpa 0xfebf5008 val 0x20 qemu-system-x86-5777 [001] 74970.625325: kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
> qemu-system-x86-5777 [001] 74970.625330: kvm_entry: vcpu 1
> qemu-system-x86-5777 [001] 74970.625333: kvm_exit: reason PENDING_INTERRUPT rip 0xffffffff81043e54 info 0 0
> qemu-system-x86-5777 [001] 74970.625333: kvm_nested_vmexit: rip: 0xffffffff81043e54 reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> qemu-system-x86-5777 [001] 74970.625334: kvm_nested_vmexit_inject: reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> qemu-system-x86-5777 [001] 74970.625339: kvm_entry: vcpu 1
> qemu-system-x86-5777 [001] 74970.625341: kvm_exit: reason EPT_MISCONFIG rip 0xfffffffff0002307 info 0 0
> qemu-system-x86-5777 [001] 74970.625343: kvm_emulate_insn: 0:fffffffff0002307:8b 40 20 (prot64)
>
>And then Jailhouse crashes (which is also interesting to understand why
>- L1 host state is corrupt). Anyway, the point is that we leak
>CPU_BASED_VIRTUAL_INTR_PENDING from L0 into vmcs12. L0 sets it before
In prepare_vmcs02:
exec_control = vmx_exec_control(vmx); /* L0's desires */
exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
exec_control &= ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12->cpu_based_vm_exec_control;
Could you point out where the other places L0 sets
CPU_BASED_VIRTUAL_INTR_PENDING before entering L2?
Regards,
Wanpeng Li
>entering L2, and then we transfer it from the hardware state to vmc12 on
>exit because that VMCS field is shadowed. The crash disappears when
>disabling VMCS shadowing.
>
>Can we simply stop shadowing CPU_BASED_VM_EXEC_CONTROL when
>CPU_BASED_VIRTUAL_INTR_PENDING is injected to L2?
>
>Jan
>
>--
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 9:25 ` Wanpeng Li
@ 2014-10-08 9:51 ` Jan Kiszka
2014-10-08 10:20 ` Paolo Bonzini
0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 9:51 UTC (permalink / raw)
To: Wanpeng Li; +Cc: kvm, Paolo Bonzini, Bandan Das
On 2014-10-08 11:25, Wanpeng Li wrote:
> Hi Jan,
> On Wed, Oct 08, 2014 at 10:29:45AM +0200, Jan Kiszka wrote:
>> Hi all,
>>
>> after migrating a Jailhouse VM to a newer host platform with shadow VMCS
>> support I found a bug. As you may know, Jailhouse doesn't intercept
>> interrupts, thus also never requests an interrupt window. Nevertheless:
>>
>> qemu-system-x86-5777 [001] 74970.625324: kvm_mmio: mmio write len 4 gpa 0xfebf5008 val 0x20 qemu-system-x86-5777 [001] 74970.625325: kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
>> qemu-system-x86-5777 [001] 74970.625330: kvm_entry: vcpu 1
>> qemu-system-x86-5777 [001] 74970.625333: kvm_exit: reason PENDING_INTERRUPT rip 0xffffffff81043e54 info 0 0
>> qemu-system-x86-5777 [001] 74970.625333: kvm_nested_vmexit: rip: 0xffffffff81043e54 reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
>> qemu-system-x86-5777 [001] 74970.625334: kvm_nested_vmexit_inject: reason: PENDING_INTERRUPT ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
>> qemu-system-x86-5777 [001] 74970.625339: kvm_entry: vcpu 1
>> qemu-system-x86-5777 [001] 74970.625341: kvm_exit: reason EPT_MISCONFIG rip 0xfffffffff0002307 info 0 0
>> qemu-system-x86-5777 [001] 74970.625343: kvm_emulate_insn: 0:fffffffff0002307:8b 40 20 (prot64)
>>
>> And then Jailhouse crashes (which is also interesting to understand why
>> - L1 host state is corrupt). Anyway, the point is that we leak
>> CPU_BASED_VIRTUAL_INTR_PENDING from L0 into vmcs12. L0 sets it before
>
> In prepare_vmcs02:
>
> exec_control = vmx_exec_control(vmx); /* L0's desires */
> exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
> exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
> exec_control &= ~CPU_BASED_TPR_SHADOW;
> exec_control |= vmcs12->cpu_based_vm_exec_control;
>
> Could you point out where the other places L0 sets
> CPU_BASED_VIRTUAL_INTR_PENDING before entering L2?
enable_irq_window(). I instrumented it, and it showed up right before
vmcs12 state became corrupted.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 9:51 ` Jan Kiszka
@ 2014-10-08 10:20 ` Paolo Bonzini
2014-10-08 10:29 ` Jan Kiszka
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2014-10-08 10:20 UTC (permalink / raw)
To: Jan Kiszka, Wanpeng Li; +Cc: kvm, Bandan Das
Il 08/10/2014 11:51, Jan Kiszka ha scritto:
>> > Could you point out where the other places L0 sets
>> > CPU_BASED_VIRTUAL_INTR_PENDING before entering L2?
> enable_irq_window(). I instrumented it, and it showed up right before
> vmcs12 state became corrupted.
But it would write to the vmcs02, not to the shadow VMCS; the shadow
VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
BTW, I think the two lines here that Wanpeng pointed out:
exec_control = vmx_exec_control(vmx); /* L0's desires */
exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
can be deleted, the bits will never be in vmx_exec_control(vmx), see
setup_vmcs_config.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 10:20 ` Paolo Bonzini
@ 2014-10-08 10:29 ` Jan Kiszka
2014-10-08 10:34 ` Paolo Bonzini
0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 10:29 UTC (permalink / raw)
To: Paolo Bonzini, Wanpeng Li; +Cc: kvm, Bandan Das
On 2014-10-08 12:20, Paolo Bonzini wrote:
> Il 08/10/2014 11:51, Jan Kiszka ha scritto:
>>>> Could you point out where the other places L0 sets
>>>> CPU_BASED_VIRTUAL_INTR_PENDING before entering L2?
>> enable_irq_window(). I instrumented it, and it showed up right before
>> vmcs12 state became corrupted.
>
> But it would write to the vmcs02, not to the shadow VMCS; the shadow
> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
> at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
the proble disappears when shadowing is disabled. Need to think about
the path again. Maybe there is just a bug, not a conceptual issue.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 10:29 ` Jan Kiszka
@ 2014-10-08 10:34 ` Paolo Bonzini
2014-10-08 15:07 ` Jan Kiszka
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2014-10-08 10:34 UTC (permalink / raw)
To: Jan Kiszka, Wanpeng Li; +Cc: kvm, Bandan Das
Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>> > But it would write to the vmcs02, not to the shadow VMCS; the shadow
>> > VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>> > at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
>> > bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
> the proble disappears when shadowing is disabled. Need to think about
> the path again. Maybe there is just a bug, not a conceptual issue.
Yeah, and at this point we cannot actually exclude a processor bug. Can
you check that the bit is not in the shadow VMCS just before vmrun, or
just after enable_irq_window?
Having a kvm-unit-tests testcase could also be of some help.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 10:34 ` Paolo Bonzini
@ 2014-10-08 15:07 ` Jan Kiszka
2014-10-08 15:44 ` Paolo Bonzini
2014-10-08 23:34 ` Wanpeng Li
0 siblings, 2 replies; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 15:07 UTC (permalink / raw)
To: Paolo Bonzini, Wanpeng Li; +Cc: kvm, Bandan Das
On 2014-10-08 12:34, Paolo Bonzini wrote:
> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>> at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>> the proble disappears when shadowing is disabled. Need to think about
>> the path again. Maybe there is just a bug, not a conceptual issue.
>
> Yeah, and at this point we cannot actually exclude a processor bug. Can
> you check that the bit is not in the shadow VMCS just before vmrun, or
> just after enable_irq_window?
>
> Having a kvm-unit-tests testcase could also be of some help.
As usual, this was a nasty race that involved some concurrent VCPUs and
proper host load, so hard to write unit tests...
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 04fa1b8..d6bcaca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
const unsigned long *fields = shadow_read_write_fields;
const int num_fields = max_shadow_read_write_fields;
+ preempt_disable();
+
vmcs_load(shadow_vmcs);
for (i = 0; i < num_fields; i++) {
@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
vmcs_clear(shadow_vmcs);
vmcs_load(vmx->loaded_vmcs->vmcs);
+
+ preempt_enable();
}
static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
u64 field_value = 0;
struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+ preempt_disable();
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
vmcs_clear(shadow_vmcs);
vmcs_load(vmx->loaded_vmcs->vmcs);
+
+ preempt_enable();
}
/*
No proper patch yet because there might be a smarter approach without
using the preempt_disable() hammer. But the point is that we temporarily
load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
is scheduling in right in the middle of this, the wrong vmcs will be
flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
window flag set...
Patch is currently under heavy load testing here, but it looks very good
as the bug was quickly reproducible before I applied it.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 15:07 ` Jan Kiszka
@ 2014-10-08 15:44 ` Paolo Bonzini
2014-10-08 16:07 ` Jan Kiszka
2014-10-08 23:34 ` Wanpeng Li
1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2014-10-08 15:44 UTC (permalink / raw)
To: Jan Kiszka, Wanpeng Li; +Cc: kvm, Bandan Das
Il 08/10/2014 17:07, Jan Kiszka ha scritto:
> As usual, this was a nasty race that involved some concurrent VCPUs and
> proper host load, so hard to write unit tests...
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 04fa1b8..d6bcaca 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
> const unsigned long *fields = shadow_read_write_fields;
> const int num_fields = max_shadow_read_write_fields;
>
> + preempt_disable();
> +
> vmcs_load(shadow_vmcs);
>
> for (i = 0; i < num_fields; i++) {
> @@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>
> vmcs_clear(shadow_vmcs);
> vmcs_load(vmx->loaded_vmcs->vmcs);
> +
> + preempt_enable();
> }
>
> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> @@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> u64 field_value = 0;
> struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>
> + preempt_disable();
> +
> vmcs_load(shadow_vmcs);
>
> for (q = 0; q < ARRAY_SIZE(fields); q++) {
> @@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>
> vmcs_clear(shadow_vmcs);
> vmcs_load(vmx->loaded_vmcs->vmcs);
> +
> + preempt_enable();
> }
>
> No proper patch yet because there might be a smarter approach without
> using the preempt_disable() hammer.
copy_vmcs12_to_shadow already runs with preemption disabled; for stable@
it's not that bad to do the same in copy_shadow_to_vmcs12.
For 3.18 it could be nice of course to use loaded_vmcs properly, but it
would also incur some overhead.
Paolo
> But the point is that we temporarily
> load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
> is scheduling in right in the middle of this, the wrong vmcs will be
> flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
> window flag set...
>
> Patch is currently under heavy load testing here, but it looks very good
> as the bug was quickly reproducible before I applied it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 15:44 ` Paolo Bonzini
@ 2014-10-08 16:07 ` Jan Kiszka
0 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2014-10-08 16:07 UTC (permalink / raw)
To: Paolo Bonzini, Wanpeng Li; +Cc: kvm, Bandan Das
On 2014-10-08 17:44, Paolo Bonzini wrote:
> Il 08/10/2014 17:07, Jan Kiszka ha scritto:
>> As usual, this was a nasty race that involved some concurrent VCPUs and
>> proper host load, so hard to write unit tests...
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 04fa1b8..d6bcaca 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> const unsigned long *fields = shadow_read_write_fields;
>> const int num_fields = max_shadow_read_write_fields;
>>
>> + preempt_disable();
>> +
>> vmcs_load(shadow_vmcs);
>>
>> for (i = 0; i < num_fields; i++) {
>> @@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>>
>> vmcs_clear(shadow_vmcs);
>> vmcs_load(vmx->loaded_vmcs->vmcs);
>> +
>> + preempt_enable();
>> }
>>
>> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> @@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> u64 field_value = 0;
>> struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>>
>> + preempt_disable();
>> +
>> vmcs_load(shadow_vmcs);
>>
>> for (q = 0; q < ARRAY_SIZE(fields); q++) {
>> @@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>
>> vmcs_clear(shadow_vmcs);
>> vmcs_load(vmx->loaded_vmcs->vmcs);
>> +
>> + preempt_enable();
>> }
>>
>> No proper patch yet because there might be a smarter approach without
>> using the preempt_disable() hammer.
>
> copy_vmcs12_to_shadow already runs with preemption disabled; for stable@
> it's not that bad to do the same in copy_shadow_to_vmcs12.
>
> For 3.18 it could be nice of course to use loaded_vmcs properly, but it
> would also incur some overhead.
If the other direction is already under preempt_disable, I'm not sure if
there is much to gain for this direction.
Anyway, fix sent.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 15:07 ` Jan Kiszka
2014-10-08 15:44 ` Paolo Bonzini
@ 2014-10-08 23:34 ` Wanpeng Li
2014-10-08 23:58 ` Wanpeng Li
1 sibling, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2014-10-08 23:34 UTC (permalink / raw)
To: Jan Kiszka, Paolo Bonzini; +Cc: kvm, Bandan Das
On Wed, Oct 08, 2014 at 05:07:48PM +0200, Jan Kiszka wrote:
>On 2014-10-08 12:34, Paolo Bonzini wrote:
>> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>>> at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
>>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>>> the proble disappears when shadowing is disabled. Need to think about
>>> the path again. Maybe there is just a bug, not a conceptual issue.
>>
>> Yeah, and at this point we cannot actually exclude a processor bug. Can
>> you check that the bit is not in the shadow VMCS just before vmrun, or
>> just after enable_irq_window?
>>
>> Having a kvm-unit-tests testcase could also be of some help.
>
>As usual, this was a nasty race that involved some concurrent VCPUs and
>proper host load, so hard to write unit tests...
>
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index 04fa1b8..d6bcaca 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
> const unsigned long *fields = shadow_read_write_fields;
> const int num_fields = max_shadow_read_write_fields;
>
>+ preempt_disable();
>+
> vmcs_load(shadow_vmcs);
>
> for (i = 0; i < num_fields; i++) {
>@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>
> vmcs_clear(shadow_vmcs);
> vmcs_load(vmx->loaded_vmcs->vmcs);
>+
>+ preempt_enable();
> }
>
> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> u64 field_value = 0;
> struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>
>+ preempt_disable();
>+
> vmcs_load(shadow_vmcs);
>
> for (q = 0; q < ARRAY_SIZE(fields); q++) {
>@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>
> vmcs_clear(shadow_vmcs);
> vmcs_load(vmx->loaded_vmcs->vmcs);
>+
>+ preempt_enable();
> }
>
> /*
>
>No proper patch yet because there might be a smarter approach without
>using the preempt_disable() hammer. But the point is that we temporarily
>load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
>is scheduling in right in the middle of this, the wrong vmcs will be
>flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
>window flag set...
If non-shadow vmcs and shadow vmcs can present in one system simultaneously?
Regards,
Wanpeng Li
>
>Patch is currently under heavy load testing here, but it looks very good
>as the bug was quickly reproducible before I applied it.
>
>Jan
>
>--
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
2014-10-08 23:34 ` Wanpeng Li
@ 2014-10-08 23:58 ` Wanpeng Li
0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2014-10-08 23:58 UTC (permalink / raw)
To: Jan Kiszka, Paolo Bonzini; +Cc: kvm, Bandan Das
On Thu, Oct 09, 2014 at 07:34:47AM +0800, Wanpeng Li wrote:
>On Wed, Oct 08, 2014 at 05:07:48PM +0200, Jan Kiszka wrote:
>>On 2014-10-08 12:34, Paolo Bonzini wrote:
>>> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>>>> at no other time. It is not clear to me how the VIRTUAL_INTR_PENDING
>>>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>>>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>>>> the proble disappears when shadowing is disabled. Need to think about
>>>> the path again. Maybe there is just a bug, not a conceptual issue.
>>>
>>> Yeah, and at this point we cannot actually exclude a processor bug. Can
>>> you check that the bit is not in the shadow VMCS just before vmrun, or
>>> just after enable_irq_window?
>>>
>>> Having a kvm-unit-tests testcase could also be of some help.
>>
>>As usual, this was a nasty race that involved some concurrent VCPUs and
>>proper host load, so hard to write unit tests...
>>
>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>index 04fa1b8..d6bcaca 100644
>>--- a/arch/x86/kvm/vmx.c
>>+++ b/arch/x86/kvm/vmx.c
>>@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> const unsigned long *fields = shadow_read_write_fields;
>> const int num_fields = max_shadow_read_write_fields;
>>
>>+ preempt_disable();
>>+
>> vmcs_load(shadow_vmcs);
>>
>> for (i = 0; i < num_fields; i++) {
>>@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>>
>> vmcs_clear(shadow_vmcs);
>> vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+ preempt_enable();
>> }
>>
>> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> u64 field_value = 0;
>> struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>>
>>+ preempt_disable();
>>+
>> vmcs_load(shadow_vmcs);
>>
>> for (q = 0; q < ARRAY_SIZE(fields); q++) {
>>@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>
>> vmcs_clear(shadow_vmcs);
>> vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+ preempt_enable();
>> }
>>
>> /*
>>
>>No proper patch yet because there might be a smarter approach without
>>using the preempt_disable() hammer. But the point is that we temporarily
>>load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
>>is scheduling in right in the middle of this, the wrong vmcs will be
>>flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
>>window flag set...
>
>If non-shadow vmcs and shadow vmcs can present in one system simultaneously?
Ah, got it, you mean non-current-shadow vmcs.
Regards,
Wanpeng Li
>
>Regards,
>Wanpeng Li
>
>>
>>Patch is currently under heavy load testing here, but it looks very good
>>as the bug was quickly reproducible before I applied it.
>>
>>Jan
>>
>>--
>>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>>Corporate Competence Center Embedded Linux
>>--
>>To unsubscribe from this list: send the line "unsubscribe kvm" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-10-09 0:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-08 8:29 nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken Jan Kiszka
2014-10-08 8:47 ` Paolo Bonzini
2014-10-08 8:54 ` Jan Kiszka
2014-10-08 9:25 ` Wanpeng Li
2014-10-08 9:51 ` Jan Kiszka
2014-10-08 10:20 ` Paolo Bonzini
2014-10-08 10:29 ` Jan Kiszka
2014-10-08 10:34 ` Paolo Bonzini
2014-10-08 15:07 ` Jan Kiszka
2014-10-08 15:44 ` Paolo Bonzini
2014-10-08 16:07 ` Jan Kiszka
2014-10-08 23:34 ` Wanpeng Li
2014-10-08 23:58 ` Wanpeng Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).