* [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work
@ 2013-03-16 10:23 Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
Version 2 both takes review comments into account, reorders some patches
that have dependencies and also addresses new findings regarding NMI
injections. Fixes for vmx_interrupt_allowed and vmx_nmi_allowed were
split up as they turn out to be more different to each other and are
independent anyway. Finally there was still a bug in the handling of
EXIT_REASON_NMI_WINDOW - the wrong condition was checked.
Applies on top of 'queue'.
Jan Kiszka (5):
KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to
L1
KVM: nVMX: Rework event injection and recovery
KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask
KVM: nVMX: Fix conditions for interrupt injection
KVM: nVMX: Fix conditions for NMI injection
arch/x86/kvm/vmx.c | 210 ++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 154 insertions(+), 56 deletions(-)
--
1.7.3.4
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
@ 2013-03-16 10:23 ` Jan Kiszka
2013-03-17 13:47 ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery Jan Kiszka
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
From: Jan Kiszka <jan.kiszka@siemens.com>
Check if the interrupt or NMI window exit is for L1 by testing if it has
the corresponding controls enabled. This is required when we allow
direct injection from L0 to L2
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/vmx.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ad978a6..126d047 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6111,14 +6111,9 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_TRIPLE_FAULT:
return 1;
case EXIT_REASON_PENDING_INTERRUPT:
+ return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_INTR_PENDING);
case EXIT_REASON_NMI_WINDOW:
- /*
- * prepare_vmcs02() set the CPU_BASED_VIRTUAL_INTR_PENDING bit
- * (aka Interrupt Window Exiting) only when L1 turned it on,
- * so if we got a PENDING_INTERRUPT exit, this must be for L1.
- * Same for NMI Window Exiting.
- */
- return 1;
+ return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_NMI_PENDING);
case EXIT_REASON_TASK_SWITCH:
return 1;
case EXIT_REASON_CPUID:
--
1.7.3.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
@ 2013-03-16 10:23 ` Jan Kiszka
2013-03-17 13:45 ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask Jan Kiszka
` (3 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
From: Jan Kiszka <jan.kiszka@siemens.com>
The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1.
VMX and SVM are now identical in how they recover event injections from
unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
still contains a valid event and, if yes, transfer the content into L1's
idt_vectoring_info_field.
However, we differ on how to deal with events that L0 wanted to inject
into L2. Likely, this case is still broken in SVM. For VMX, the function
vmcs12_save_pending_events deals with transferring pending L0 events
into the queue of L1. That is mandatory as L1 may decide to switch the
guest state completely, invalidating or preserving the pending events
for later injection (including on a different node, once we support
migration).
Note that we treat directly injected NMIs differently as they can hit
both L1 and L2. In this case, we let L0 try to injection again also over
L1 after leaving L2.
To avoid that we incorrectly leak an event into the architectural VCPU
state that L1 wants to inject, we skip cancellation on nested run.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/vmx.c | 118 ++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 87 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 126d047..ca74358 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6492,8 +6492,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
{
- if (is_guest_mode(&vmx->vcpu))
- return;
__vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
VM_EXIT_INSTRUCTION_LEN,
IDT_VECTORING_ERROR_CODE);
@@ -6501,7 +6499,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
{
- if (is_guest_mode(vcpu))
+ if (to_vmx(vcpu)->nested.nested_run_pending)
return;
__vmx_complete_interrupts(vcpu,
vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
@@ -6534,21 +6532,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long debugctlmsr;
- if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
- struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
- if (vmcs12->idt_vectoring_info_field &
- VECTORING_INFO_VALID_MASK) {
- vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
- vmcs12->idt_vectoring_info_field);
- vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
- vmcs12->vm_exit_instruction_len);
- if (vmcs12->idt_vectoring_info_field &
- VECTORING_INFO_DELIVER_CODE_MASK)
- vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
- vmcs12->idt_vectoring_error_code);
- }
- }
-
/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
vmx->entry_time = ktime_get();
@@ -6707,17 +6690,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
- if (is_guest_mode(vcpu)) {
- struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
- vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info;
- if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) {
- vmcs12->idt_vectoring_error_code =
- vmcs_read32(IDT_VECTORING_ERROR_CODE);
- vmcs12->vm_exit_instruction_len =
- vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
- }
- }
-
vmx->loaded_vmcs->launched = 1;
vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
@@ -7324,6 +7296,52 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.cr4_guest_owned_bits));
}
+static void vmcs12_save_pending_events(struct kvm_vcpu *vcpu,
+ struct vmcs12 *vmcs12)
+{
+ u32 idt_vectoring;
+ unsigned int nr;
+
+ /*
+ * We only transfer exceptions and maskable interrupts. It is fine if
+ * L0 retries to inject a pending NMI over L1.
+ */
+ if (vcpu->arch.exception.pending) {
+ nr = vcpu->arch.exception.nr;
+ idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
+
+ if (kvm_exception_is_soft(nr)) {
+ vmcs12->vm_exit_instruction_len =
+ vcpu->arch.event_exit_inst_len;
+ idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
+ } else
+ idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
+
+ if (vcpu->arch.exception.has_error_code) {
+ idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
+ vmcs12->idt_vectoring_error_code =
+ vcpu->arch.exception.error_code;
+ }
+
+ vmcs12->idt_vectoring_info_field = idt_vectoring;
+ } else if (vcpu->arch.interrupt.pending) {
+ nr = vcpu->arch.interrupt.nr;
+ idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
+
+ if (vcpu->arch.interrupt.soft) {
+ idt_vectoring |= INTR_TYPE_SOFT_INTR;
+ vmcs12->vm_entry_instruction_len =
+ vcpu->arch.event_exit_inst_len;
+ } else
+ idt_vectoring |= INTR_TYPE_EXT_INTR;
+
+ vmcs12->idt_vectoring_info_field = idt_vectoring;
+ }
+
+ kvm_clear_exception_queue(vcpu);
+ kvm_clear_interrupt_queue(vcpu);
+}
+
/*
* prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
* and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
@@ -7415,9 +7433,47 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- /* clear vm-entry fields which are to be cleared on exit */
- if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
+ if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
+ if ((vmcs12->vm_entry_intr_info_field &
+ INTR_INFO_VALID_MASK) &&
+ (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
+ INTR_INFO_VALID_MASK)) {
+ /*
+ * Preserve the event that was supposed to be injected
+ * by L1 via emulating it would have been returned in
+ * IDT_VECTORING_INFO_FIELD.
+ */
+ vmcs12->idt_vectoring_info_field =
+ vmcs12->vm_entry_intr_info_field;
+ vmcs12->idt_vectoring_error_code =
+ vmcs12->vm_entry_exception_error_code;
+ vmcs12->vm_exit_instruction_len =
+ vmcs12->vm_entry_instruction_len;
+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+
+ /*
+ * We do not drop NMIs that targeted L2 below as they
+ * can also be reinjected over L1. But if this event
+ * was an NMI, it was synthetic and came from L1.
+ */
+ vcpu->arch.nmi_injected = false;
+ } else
+ /*
+ * Transfer the event L0 may wanted to inject into L2
+ * to IDT_VECTORING_INFO_FIELD.
+ */
+ vmcs12_save_pending_events(vcpu, vmcs12);
+
+ /* clear vm-entry fields which are to be cleared on exit */
vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
+ }
+
+ /*
+ * Drop what we picked up for L2 via vmx_complete_interrupts. It is
+ * preserved above and would only end up incorrectly in L1.
+ */
+ kvm_clear_exception_queue(vcpu);
+ kvm_clear_interrupt_queue(vcpu);
}
/*
--
1.7.3.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery Jan Kiszka
@ 2013-03-16 10:23 ` Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 4/5] KVM: nVMX: Fix conditions for interrupt injection Jan Kiszka
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
From: Jan Kiszka <jan.kiszka@siemens.com>
vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional
changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/vmx.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ca74358..a5f56df 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4283,16 +4283,6 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
}
-static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
-{
- if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
- return 0;
-
- return !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
- (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
- | GUEST_INTR_STATE_NMI));
-}
-
static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
{
if (!cpu_has_virtual_nmis())
@@ -4322,6 +4312,16 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
}
}
+static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
+{
+ if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
+ return 0;
+
+ return !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+ (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
+ | GUEST_INTR_STATE_NMI));
+}
+
static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
--
1.7.3.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] KVM: nVMX: Fix conditions for interrupt injection
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
` (2 preceding siblings ...)
2013-03-16 10:23 ` [PATCH v2 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask Jan Kiszka
@ 2013-03-16 10:23 ` Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 5/5] KVM: nVMX: Fix conditions for NMI injection Jan Kiszka
2013-03-16 10:42 ` [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
5 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
From: Jan Kiszka <jan.kiszka@siemens.com>
If we are in guest mode, L0 can only inject events into L2 if L1 has
nothing pending. Otherwise, L0 would overwrite L1's events and they
would get lost. But even if no injection of L1 is pending, we do not
want L0 to interrupt unnecessarily an on going vmentry with all its side
effects on the vmcs. Therefore, injection shall be disallowed during
L1->L2 transitions. This check is conceptually independent of
nested_exit_on_intr.
If L1 traps external interrupts, then we also need to look at L1's
idt_vectoring_info_field. If it is empty, we can kick the guest from L2
to L1, just like the previous code worked.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/vmx.c | 28 ++++++++++++++++++++--------
1 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a5f56df..27e7e59 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4324,16 +4324,28 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
{
- if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
+ if (is_guest_mode(vcpu)) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
- if (to_vmx(vcpu)->nested.nested_run_pending ||
- (vmcs12->idt_vectoring_info_field &
- VECTORING_INFO_VALID_MASK))
+
+ if (to_vmx(vcpu)->nested.nested_run_pending)
return 0;
- nested_vmx_vmexit(vcpu);
- vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
- vmcs12->vm_exit_intr_info = 0;
- /* fall through to normal code, but now in L1, not L2 */
+ if (nested_exit_on_intr(vcpu)) {
+ /*
+ * Check if the idt_vectoring_info_field is free. We
+ * cannot raise EXIT_REASON_EXTERNAL_INTERRUPT if it
+ * isn't.
+ */
+ if (vmcs12->idt_vectoring_info_field &
+ VECTORING_INFO_VALID_MASK)
+ return 0;
+ nested_vmx_vmexit(vcpu);
+ vmcs12->vm_exit_reason =
+ EXIT_REASON_EXTERNAL_INTERRUPT;
+ vmcs12->vm_exit_intr_info = 0;
+ /*
+ * fall through to normal code, but now in L1, not L2
+ */
+ }
}
return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
--
1.7.3.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] KVM: nVMX: Fix conditions for NMI injection
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
` (3 preceding siblings ...)
2013-03-16 10:23 ` [PATCH v2 4/5] KVM: nVMX: Fix conditions for interrupt injection Jan Kiszka
@ 2013-03-16 10:23 ` Jan Kiszka
2013-03-16 10:42 ` [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
5 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:23 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
From: Jan Kiszka <jan.kiszka@siemens.com>
The logic for checking if interrupts can be injected has to be applied
also on NMIs. The difference is that if NMI interception is on these
events are consumed and blocked by the VM exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/vmx.c | 35 +++++++++++++++++++++++++++++++++++
1 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 27e7e59..83a57b7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4189,6 +4189,12 @@ static bool nested_exit_on_intr(struct kvm_vcpu *vcpu)
PIN_BASED_EXT_INTR_MASK;
}
+static bool nested_exit_on_nmi(struct kvm_vcpu *vcpu)
+{
+ return get_vmcs12(vcpu)->pin_based_vm_exec_control &
+ PIN_BASED_NMI_EXITING;
+}
+
static void enable_irq_window(struct kvm_vcpu *vcpu)
{
u32 cpu_based_vm_exec_control;
@@ -4314,6 +4320,35 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
{
+ if (is_guest_mode(vcpu)) {
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+ if (to_vmx(vcpu)->nested.nested_run_pending ||
+ vmcs_read32(GUEST_ACTIVITY_STATE) ==
+ GUEST_ACTIVITY_WAIT_SIPI)
+ return 0;
+ if (nested_exit_on_nmi(vcpu)) {
+ /*
+ * Check if the idt_vectoring_info_field is free. We
+ * cannot raise EXIT_REASON_EXCEPTION_NMI if it isn't.
+ */
+ if (vmcs12->idt_vectoring_info_field &
+ VECTORING_INFO_VALID_MASK)
+ return 0;
+ nested_vmx_vmexit(vcpu);
+ vmcs12->vm_exit_reason = EXIT_REASON_EXCEPTION_NMI;
+ vmcs12->vm_exit_intr_info = NMI_VECTOR |
+ INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK;
+ /*
+ * The NMI-triggered VM exit counts as injection:
+ * clear this one and block further NMIs.
+ */
+ vcpu->arch.nmi_pending = 0;
+ vmx_set_nmi_mask(vcpu, true);
+ return 0;
+ }
+ }
+
if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
return 0;
--
1.7.3.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
` (4 preceding siblings ...)
2013-03-16 10:23 ` [PATCH v2 5/5] KVM: nVMX: Fix conditions for NMI injection Jan Kiszka
@ 2013-03-16 10:42 ` Jan Kiszka
5 siblings, 0 replies; 13+ messages in thread
From: Jan Kiszka @ 2013-03-16 10:42 UTC (permalink / raw)
To: Gleb Natapov, Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Nadav Har'El
[-- Attachment #1: Type: text/plain, Size: 1249 bytes --]
On 2013-03-16 11:23, Jan Kiszka wrote:
> Version 2 both takes review comments into account, reorders some patches
> that have dependencies and also addresses new findings regarding NMI
> injections. Fixes for vmx_interrupt_allowed and vmx_nmi_allowed were
> split up as they turn out to be more different to each other and are
> independent anyway. Finally there was still a bug in the handling of
> EXIT_REASON_NMI_WINDOW - the wrong condition was checked.
>
> Applies on top of 'queue'.
>
> Jan Kiszka (5):
> KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to
> L1
> KVM: nVMX: Rework event injection and recovery
> KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask
> KVM: nVMX: Fix conditions for interrupt injection
> KVM: nVMX: Fix conditions for NMI injection
>
> arch/x86/kvm/vmx.c | 210 ++++++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 154 insertions(+), 56 deletions(-)
>
BTW, I added nested virtualization test cases for KVM to our GSoC
project list. Anyone who want to join me in mentoring is very welcome.
I'm not too optimistic if we can attract a good student for this, it's
quite tricky topic, but it's also a thrilling one IMO.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-16 10:23 ` [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery Jan Kiszka
@ 2013-03-17 13:45 ` Gleb Natapov
2013-03-17 15:02 ` Jan Kiszka
0 siblings, 1 reply; 13+ messages in thread
From: Gleb Natapov @ 2013-03-17 13:45 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> The basic idea is to always transfer the pending event injection on
> vmexit into the architectural state of the VCPU and then drop it from
> there if it turns out that we left L2 to enter L1.
>
> VMX and SVM are now identical in how they recover event injections from
> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
> still contains a valid event and, if yes, transfer the content into L1's
> idt_vectoring_info_field.
>
But how this can happens with VMX code? VMX has this nested_run_pending
thing that prevents #vmexit emulation from happening without vmlaunch.
This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
#vmexit emulation since it is marked invalid during vmlaunch.
> However, we differ on how to deal with events that L0 wanted to inject
> into L2. Likely, this case is still broken in SVM. For VMX, the function
> vmcs12_save_pending_events deals with transferring pending L0 events
> into the queue of L1. That is mandatory as L1 may decide to switch the
> guest state completely, invalidating or preserving the pending events
> for later injection (including on a different node, once we support
> migration).
>
> Note that we treat directly injected NMIs differently as they can hit
> both L1 and L2. In this case, we let L0 try to injection again also over
> L1 after leaving L2.
>
Hmm, where SDM says NMI behaves this way?
> To avoid that we incorrectly leak an event into the architectural VCPU
> state that L1 wants to inject, we skip cancellation on nested run.
>
How the leak can happen?
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> arch/x86/kvm/vmx.c | 118 ++++++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 87 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 126d047..ca74358 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6492,8 +6492,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
>
> static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
> {
> - if (is_guest_mode(&vmx->vcpu))
> - return;
> __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
> VM_EXIT_INSTRUCTION_LEN,
> IDT_VECTORING_ERROR_CODE);
> @@ -6501,7 +6499,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>
> static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> {
> - if (is_guest_mode(vcpu))
> + if (to_vmx(vcpu)->nested.nested_run_pending)
> return;
> __vmx_complete_interrupts(vcpu,
> vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
> @@ -6534,21 +6532,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> unsigned long debugctlmsr;
>
> - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> - if (vmcs12->idt_vectoring_info_field &
> - VECTORING_INFO_VALID_MASK) {
> - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> - vmcs12->idt_vectoring_info_field);
> - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> - vmcs12->vm_exit_instruction_len);
> - if (vmcs12->idt_vectoring_info_field &
> - VECTORING_INFO_DELIVER_CODE_MASK)
> - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> - vmcs12->idt_vectoring_error_code);
> - }
> - }
> -
> /* Record the guest's net vcpu time for enforced NMI injections. */
> if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
> vmx->entry_time = ktime_get();
> @@ -6707,17 +6690,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>
> - if (is_guest_mode(vcpu)) {
> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info;
> - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) {
> - vmcs12->idt_vectoring_error_code =
> - vmcs_read32(IDT_VECTORING_ERROR_CODE);
> - vmcs12->vm_exit_instruction_len =
> - vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> - }
> - }
> -
> vmx->loaded_vmcs->launched = 1;
>
> vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
> @@ -7324,6 +7296,52 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> vcpu->arch.cr4_guest_owned_bits));
> }
>
> +static void vmcs12_save_pending_events(struct kvm_vcpu *vcpu,
> + struct vmcs12 *vmcs12)
> +{
> + u32 idt_vectoring;
> + unsigned int nr;
> +
> + /*
> + * We only transfer exceptions and maskable interrupts. It is fine if
> + * L0 retries to inject a pending NMI over L1.
> + */
> + if (vcpu->arch.exception.pending) {
> + nr = vcpu->arch.exception.nr;
> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
> +
> + if (kvm_exception_is_soft(nr)) {
> + vmcs12->vm_exit_instruction_len =
> + vcpu->arch.event_exit_inst_len;
> + idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
> + } else
> + idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
> +
> + if (vcpu->arch.exception.has_error_code) {
> + idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
> + vmcs12->idt_vectoring_error_code =
> + vcpu->arch.exception.error_code;
> + }
> +
> + vmcs12->idt_vectoring_info_field = idt_vectoring;
> + } else if (vcpu->arch.interrupt.pending) {
> + nr = vcpu->arch.interrupt.nr;
> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
> +
> + if (vcpu->arch.interrupt.soft) {
> + idt_vectoring |= INTR_TYPE_SOFT_INTR;
> + vmcs12->vm_entry_instruction_len =
> + vcpu->arch.event_exit_inst_len;
> + } else
> + idt_vectoring |= INTR_TYPE_EXT_INTR;
> +
> + vmcs12->idt_vectoring_info_field = idt_vectoring;
> + }
> +
> + kvm_clear_exception_queue(vcpu);
> + kvm_clear_interrupt_queue(vcpu);
> +}
> +
> /*
> * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
> * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
> @@ -7415,9 +7433,47 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>
> - /* clear vm-entry fields which are to be cleared on exit */
> - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
> + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
> + if ((vmcs12->vm_entry_intr_info_field &
> + INTR_INFO_VALID_MASK) &&
> + (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
> + INTR_INFO_VALID_MASK)) {
Again I do not see how this condition can be true.
> + /*
> + * Preserve the event that was supposed to be injected
> + * by L1 via emulating it would have been returned in
> + * IDT_VECTORING_INFO_FIELD.
> + */
> + vmcs12->idt_vectoring_info_field =
> + vmcs12->vm_entry_intr_info_field;
> + vmcs12->idt_vectoring_error_code =
> + vmcs12->vm_entry_exception_error_code;
> + vmcs12->vm_exit_instruction_len =
> + vmcs12->vm_entry_instruction_len;
> + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> +
> + /*
> + * We do not drop NMIs that targeted L2 below as they
> + * can also be reinjected over L1. But if this event
> + * was an NMI, it was synthetic and came from L1.
> + */
> + vcpu->arch.nmi_injected = false;
> + } else
> + /*
> + * Transfer the event L0 may wanted to inject into L2
> + * to IDT_VECTORING_INFO_FIELD.
> + */
I do not understand the comment. This transfers an event from event queue into vmcs12.
Since vmx_complete_interrupts() transfers event that L1 tried to inject
into event queue too he we handle not only L0->L2, but also L1->L2
events too. In fast I think only "else" part of this if() is needed.
> + vmcs12_save_pending_events(vcpu, vmcs12);
> +
> + /* clear vm-entry fields which are to be cleared on exit */
> vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
> + }
> +
> + /*
> + * Drop what we picked up for L2 via vmx_complete_interrupts. It is
> + * preserved above and would only end up incorrectly in L1.
> + */
> + kvm_clear_exception_queue(vcpu);
> + kvm_clear_interrupt_queue(vcpu);
> }
>
> /*
> --
> 1.7.3.4
--
Gleb.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
@ 2013-03-17 13:47 ` Gleb Natapov
0 siblings, 0 replies; 13+ messages in thread
From: Gleb Natapov @ 2013-03-17 13:47 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
On Sat, Mar 16, 2013 at 11:23:15AM +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> Check if the interrupt or NMI window exit is for L1 by testing if it has
> the corresponding controls enabled. This is required when we allow
> direct injection from L0 to L2
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 9 ++-------
> 1 files changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ad978a6..126d047 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6111,14 +6111,9 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> case EXIT_REASON_TRIPLE_FAULT:
> return 1;
> case EXIT_REASON_PENDING_INTERRUPT:
> + return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_INTR_PENDING);
> case EXIT_REASON_NMI_WINDOW:
> - /*
> - * prepare_vmcs02() set the CPU_BASED_VIRTUAL_INTR_PENDING bit
> - * (aka Interrupt Window Exiting) only when L1 turned it on,
> - * so if we got a PENDING_INTERRUPT exit, this must be for L1.
> - * Same for NMI Window Exiting.
> - */
> - return 1;
> + return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_NMI_PENDING);
> case EXIT_REASON_TASK_SWITCH:
> return 1;
> case EXIT_REASON_CPUID:
> --
> 1.7.3.4
--
Gleb.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-17 13:45 ` Gleb Natapov
@ 2013-03-17 15:02 ` Jan Kiszka
2013-03-17 15:14 ` Gleb Natapov
0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2013-03-17 15:02 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
[-- Attachment #1: Type: text/plain, Size: 9216 bytes --]
On 2013-03-17 14:45, Gleb Natapov wrote:
> On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> The basic idea is to always transfer the pending event injection on
>> vmexit into the architectural state of the VCPU and then drop it from
>> there if it turns out that we left L2 to enter L1.
>>
>> VMX and SVM are now identical in how they recover event injections from
>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
>> still contains a valid event and, if yes, transfer the content into L1's
>> idt_vectoring_info_field.
>>
> But how this can happens with VMX code? VMX has this nested_run_pending
> thing that prevents #vmexit emulation from happening without vmlaunch.
> This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
> #vmexit emulation since it is marked invalid during vmlaunch.
Now that nmi/interrupt_allowed is strict /wrt nested_run_pending again,
it may indeed no longer happen. It was definitely a problem before, also
with direct vmexit on pending INIT. Requires a second thought, maybe
also a WARN_ON(vmx->nested.nested_run_pending) in nested_vmx_vmexit.
>
>> However, we differ on how to deal with events that L0 wanted to inject
>> into L2. Likely, this case is still broken in SVM. For VMX, the function
>> vmcs12_save_pending_events deals with transferring pending L0 events
>> into the queue of L1. That is mandatory as L1 may decide to switch the
>> guest state completely, invalidating or preserving the pending events
>> for later injection (including on a different node, once we support
>> migration).
>>
>> Note that we treat directly injected NMIs differently as they can hit
>> both L1 and L2. In this case, we let L0 try to injection again also over
>> L1 after leaving L2.
>>
> Hmm, where SDM says NMI behaves this way?
NMIs are only blocked in root mode if we took an NMI-related vmexit (or,
of course, an NMI is being processed). Thus, every arriving NMI can
either hit the guest or the host - pure luck.
However, I have missed the fact that an NMI may have been injected from
L1 as well. If injection triggers a vmexit, that NMI could now leak into
L1. So we have to process them as well in vmcs12_save_pending_events.
>
>> To avoid that we incorrectly leak an event into the architectural VCPU
>> state that L1 wants to inject, we skip cancellation on nested run.
>>
> How the leak can happen?
See above, this likely no longer applies.
>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>> arch/x86/kvm/vmx.c | 118 ++++++++++++++++++++++++++++++++++++++--------------
>> 1 files changed, 87 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 126d047..ca74358 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6492,8 +6492,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
>>
>> static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>> {
>> - if (is_guest_mode(&vmx->vcpu))
>> - return;
>> __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
>> VM_EXIT_INSTRUCTION_LEN,
>> IDT_VECTORING_ERROR_CODE);
>> @@ -6501,7 +6499,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>>
>> static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>> {
>> - if (is_guest_mode(vcpu))
>> + if (to_vmx(vcpu)->nested.nested_run_pending)
>> return;
>> __vmx_complete_interrupts(vcpu,
>> vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
>> @@ -6534,21 +6532,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> unsigned long debugctlmsr;
>>
>> - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
>> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> - if (vmcs12->idt_vectoring_info_field &
>> - VECTORING_INFO_VALID_MASK) {
>> - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>> - vmcs12->idt_vectoring_info_field);
>> - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
>> - vmcs12->vm_exit_instruction_len);
>> - if (vmcs12->idt_vectoring_info_field &
>> - VECTORING_INFO_DELIVER_CODE_MASK)
>> - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
>> - vmcs12->idt_vectoring_error_code);
>> - }
>> - }
>> -
>> /* Record the guest's net vcpu time for enforced NMI injections. */
>> if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
>> vmx->entry_time = ktime_get();
>> @@ -6707,17 +6690,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>
>> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>>
>> - if (is_guest_mode(vcpu)) {
>> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info;
>> - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) {
>> - vmcs12->idt_vectoring_error_code =
>> - vmcs_read32(IDT_VECTORING_ERROR_CODE);
>> - vmcs12->vm_exit_instruction_len =
>> - vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>> - }
>> - }
>> -
>> vmx->loaded_vmcs->launched = 1;
>>
>> vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
>> @@ -7324,6 +7296,52 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> vcpu->arch.cr4_guest_owned_bits));
>> }
>>
>> +static void vmcs12_save_pending_events(struct kvm_vcpu *vcpu,
>> + struct vmcs12 *vmcs12)
>> +{
>> + u32 idt_vectoring;
>> + unsigned int nr;
>> +
>> + /*
>> + * We only transfer exceptions and maskable interrupts. It is fine if
>> + * L0 retries to inject a pending NMI over L1.
>> + */
>> + if (vcpu->arch.exception.pending) {
>> + nr = vcpu->arch.exception.nr;
>> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
>> +
>> + if (kvm_exception_is_soft(nr)) {
>> + vmcs12->vm_exit_instruction_len =
>> + vcpu->arch.event_exit_inst_len;
>> + idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
>> + } else
>> + idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
>> +
>> + if (vcpu->arch.exception.has_error_code) {
>> + idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
>> + vmcs12->idt_vectoring_error_code =
>> + vcpu->arch.exception.error_code;
>> + }
>> +
>> + vmcs12->idt_vectoring_info_field = idt_vectoring;
>> + } else if (vcpu->arch.interrupt.pending) {
>> + nr = vcpu->arch.interrupt.nr;
>> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
>> +
>> + if (vcpu->arch.interrupt.soft) {
>> + idt_vectoring |= INTR_TYPE_SOFT_INTR;
>> + vmcs12->vm_entry_instruction_len =
>> + vcpu->arch.event_exit_inst_len;
>> + } else
>> + idt_vectoring |= INTR_TYPE_EXT_INTR;
>> +
>> + vmcs12->idt_vectoring_info_field = idt_vectoring;
>> + }
>> +
>> + kvm_clear_exception_queue(vcpu);
>> + kvm_clear_interrupt_queue(vcpu);
>> +}
>> +
>> /*
>> * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
>> * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
>> @@ -7415,9 +7433,47 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>>
>> - /* clear vm-entry fields which are to be cleared on exit */
>> - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
>> + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
>> + if ((vmcs12->vm_entry_intr_info_field &
>> + INTR_INFO_VALID_MASK) &&
>> + (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
>> + INTR_INFO_VALID_MASK)) {
> Again I do not see how this condition can be true.
>
>> + /*
>> + * Preserve the event that was supposed to be injected
>> + * by L1 via emulating it would have been returned in
>> + * IDT_VECTORING_INFO_FIELD.
>> + */
>> + vmcs12->idt_vectoring_info_field =
>> + vmcs12->vm_entry_intr_info_field;
>> + vmcs12->idt_vectoring_error_code =
>> + vmcs12->vm_entry_exception_error_code;
>> + vmcs12->vm_exit_instruction_len =
>> + vmcs12->vm_entry_instruction_len;
>> + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>> +
>> + /*
>> + * We do not drop NMIs that targeted L2 below as they
>> + * can also be reinjected over L1. But if this event
>> + * was an NMI, it was synthetic and came from L1.
>> + */
>> + vcpu->arch.nmi_injected = false;
>> + } else
>> + /*
>> + * Transfer the event L0 may wanted to inject into L2
>> + * to IDT_VECTORING_INFO_FIELD.
>> + */
> I do not understand the comment. This transfers an event from event queue into vmcs12.
> Since vmx_complete_interrupts() transfers event that L1 tried to inject
> into event queue too he we handle not only L0->L2, but also L1->L2
> events too.
I'm not sure if I fully understand your remark. Is it that the comment
is only talking about L0 events? That is indeed not fully true, L1
events should make it to the architectural queue as well. Will adjust this.
> In fast I think only "else" part of this if() is needed.
Yes, probably.
Thanks,
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-17 15:02 ` Jan Kiszka
@ 2013-03-17 15:14 ` Gleb Natapov
2013-03-17 15:17 ` Jan Kiszka
0 siblings, 1 reply; 13+ messages in thread
From: Gleb Natapov @ 2013-03-17 15:14 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
On Sun, Mar 17, 2013 at 04:02:07PM +0100, Jan Kiszka wrote:
> On 2013-03-17 14:45, Gleb Natapov wrote:
> > On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> The basic idea is to always transfer the pending event injection on
> >> vmexit into the architectural state of the VCPU and then drop it from
> >> there if it turns out that we left L2 to enter L1.
> >>
> >> VMX and SVM are now identical in how they recover event injections from
> >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
> >> still contains a valid event and, if yes, transfer the content into L1's
> >> idt_vectoring_info_field.
> >>
> > But how this can happens with VMX code? VMX has this nested_run_pending
> > thing that prevents #vmexit emulation from happening without vmlaunch.
> > This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
> > #vmexit emulation since it is marked invalid during vmlaunch.
>
> Now that nmi/interrupt_allowed is strict /wrt nested_run_pending again,
> it may indeed no longer happen. It was definitely a problem before, also
> with direct vmexit on pending INIT. Requires a second thought, maybe
> also a WARN_ON(vmx->nested.nested_run_pending) in nested_vmx_vmexit.
>
> >
> >> However, we differ on how to deal with events that L0 wanted to inject
> >> into L2. Likely, this case is still broken in SVM. For VMX, the function
> >> vmcs12_save_pending_events deals with transferring pending L0 events
> >> into the queue of L1. That is mandatory as L1 may decide to switch the
> >> guest state completely, invalidating or preserving the pending events
> >> for later injection (including on a different node, once we support
> >> migration).
> >>
> >> Note that we treat directly injected NMIs differently as they can hit
> >> both L1 and L2. In this case, we let L0 try to injection again also over
> >> L1 after leaving L2.
> >>
> > Hmm, where SDM says NMI behaves this way?
>
> NMIs are only blocked in root mode if we took an NMI-related vmexit (or,
> of course, an NMI is being processed). Thus, every arriving NMI can
> either hit the guest or the host - pure luck.
>
> However, I have missed the fact that an NMI may have been injected from
> L1 as well. If injection triggers a vmexit, that NMI could now leak into
> L1. So we have to process them as well in vmcs12_save_pending_events.
>
You mean "should not leak into L0" not L1?
> >
> >> To avoid that we incorrectly leak an event into the architectural VCPU
> >> state that L1 wants to inject, we skip cancellation on nested run.
> >>
> > How the leak can happen?
>
> See above, this likely no longer applies.
>
> >
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >> arch/x86/kvm/vmx.c | 118 ++++++++++++++++++++++++++++++++++++++--------------
> >> 1 files changed, 87 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 126d047..ca74358 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -6492,8 +6492,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
> >>
> >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
> >> {
> >> - if (is_guest_mode(&vmx->vcpu))
> >> - return;
> >> __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
> >> VM_EXIT_INSTRUCTION_LEN,
> >> IDT_VECTORING_ERROR_CODE);
> >> @@ -6501,7 +6499,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
> >>
> >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> >> {
> >> - if (is_guest_mode(vcpu))
> >> + if (to_vmx(vcpu)->nested.nested_run_pending)
> >> return;
> >> __vmx_complete_interrupts(vcpu,
> >> vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
> >> @@ -6534,21 +6532,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >> struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> unsigned long debugctlmsr;
> >>
> >> - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
> >> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >> - if (vmcs12->idt_vectoring_info_field &
> >> - VECTORING_INFO_VALID_MASK) {
> >> - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> >> - vmcs12->idt_vectoring_info_field);
> >> - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> >> - vmcs12->vm_exit_instruction_len);
> >> - if (vmcs12->idt_vectoring_info_field &
> >> - VECTORING_INFO_DELIVER_CODE_MASK)
> >> - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> >> - vmcs12->idt_vectoring_error_code);
> >> - }
> >> - }
> >> -
> >> /* Record the guest's net vcpu time for enforced NMI injections. */
> >> if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
> >> vmx->entry_time = ktime_get();
> >> @@ -6707,17 +6690,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >>
> >> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
> >>
> >> - if (is_guest_mode(vcpu)) {
> >> - struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >> - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info;
> >> - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) {
> >> - vmcs12->idt_vectoring_error_code =
> >> - vmcs_read32(IDT_VECTORING_ERROR_CODE);
> >> - vmcs12->vm_exit_instruction_len =
> >> - vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> >> - }
> >> - }
> >> -
> >> vmx->loaded_vmcs->launched = 1;
> >>
> >> vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
> >> @@ -7324,6 +7296,52 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> >> vcpu->arch.cr4_guest_owned_bits));
> >> }
> >>
> >> +static void vmcs12_save_pending_events(struct kvm_vcpu *vcpu,
> >> + struct vmcs12 *vmcs12)
> >> +{
> >> + u32 idt_vectoring;
> >> + unsigned int nr;
> >> +
> >> + /*
> >> + * We only transfer exceptions and maskable interrupts. It is fine if
> >> + * L0 retries to inject a pending NMI over L1.
> >> + */
> >> + if (vcpu->arch.exception.pending) {
> >> + nr = vcpu->arch.exception.nr;
> >> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
> >> +
> >> + if (kvm_exception_is_soft(nr)) {
> >> + vmcs12->vm_exit_instruction_len =
> >> + vcpu->arch.event_exit_inst_len;
> >> + idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
> >> + } else
> >> + idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
> >> +
> >> + if (vcpu->arch.exception.has_error_code) {
> >> + idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
> >> + vmcs12->idt_vectoring_error_code =
> >> + vcpu->arch.exception.error_code;
> >> + }
> >> +
> >> + vmcs12->idt_vectoring_info_field = idt_vectoring;
> >> + } else if (vcpu->arch.interrupt.pending) {
> >> + nr = vcpu->arch.interrupt.nr;
> >> + idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
> >> +
> >> + if (vcpu->arch.interrupt.soft) {
> >> + idt_vectoring |= INTR_TYPE_SOFT_INTR;
> >> + vmcs12->vm_entry_instruction_len =
> >> + vcpu->arch.event_exit_inst_len;
> >> + } else
> >> + idt_vectoring |= INTR_TYPE_EXT_INTR;
> >> +
> >> + vmcs12->idt_vectoring_info_field = idt_vectoring;
> >> + }
> >> +
> >> + kvm_clear_exception_queue(vcpu);
> >> + kvm_clear_interrupt_queue(vcpu);
> >> +}
> >> +
> >> /*
> >> * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
> >> * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
> >> @@ -7415,9 +7433,47 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> >> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> >> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> >>
> >> - /* clear vm-entry fields which are to be cleared on exit */
> >> - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
> >> + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
> >> + if ((vmcs12->vm_entry_intr_info_field &
> >> + INTR_INFO_VALID_MASK) &&
> >> + (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
> >> + INTR_INFO_VALID_MASK)) {
> > Again I do not see how this condition can be true.
> >
> >> + /*
> >> + * Preserve the event that was supposed to be injected
> >> + * by L1 via emulating it would have been returned in
> >> + * IDT_VECTORING_INFO_FIELD.
> >> + */
> >> + vmcs12->idt_vectoring_info_field =
> >> + vmcs12->vm_entry_intr_info_field;
> >> + vmcs12->idt_vectoring_error_code =
> >> + vmcs12->vm_entry_exception_error_code;
> >> + vmcs12->vm_exit_instruction_len =
> >> + vmcs12->vm_entry_instruction_len;
> >> + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> >> +
> >> + /*
> >> + * We do not drop NMIs that targeted L2 below as they
> >> + * can also be reinjected over L1. But if this event
> >> + * was an NMI, it was synthetic and came from L1.
> >> + */
> >> + vcpu->arch.nmi_injected = false;
> >> + } else
> >> + /*
> >> + * Transfer the event L0 may wanted to inject into L2
> >> + * to IDT_VECTORING_INFO_FIELD.
> >> + */
> > I do not understand the comment. This transfers an event from event queue into vmcs12.
> > Since vmx_complete_interrupts() transfers event that L1 tried to inject
> > into event queue too he we handle not only L0->L2, but also L1->L2
> > events too.
>
> I'm not sure if I fully understand your remark. Is it that the comment
> is only talking about L0 events? That is indeed not fully true, L1
> events should make it to the architectural queue as well. Will adjust this.
>
Yes, I was referring to the comment mentioning L0 only.
> > In fast I think only "else" part of this if() is needed.
>
> Yes, probably.
>
> Thanks,
> Jan
>
>
--
Gleb.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-17 15:14 ` Gleb Natapov
@ 2013-03-17 15:17 ` Jan Kiszka
2013-03-17 15:19 ` Gleb Natapov
0 siblings, 1 reply; 13+ messages in thread
From: Jan Kiszka @ 2013-03-17 15:17 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
[-- Attachment #1: Type: text/plain, Size: 2691 bytes --]
On 2013-03-17 16:14, Gleb Natapov wrote:
> On Sun, Mar 17, 2013 at 04:02:07PM +0100, Jan Kiszka wrote:
>> On 2013-03-17 14:45, Gleb Natapov wrote:
>>> On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> The basic idea is to always transfer the pending event injection on
>>>> vmexit into the architectural state of the VCPU and then drop it from
>>>> there if it turns out that we left L2 to enter L1.
>>>>
>>>> VMX and SVM are now identical in how they recover event injections from
>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
>>>> still contains a valid event and, if yes, transfer the content into L1's
>>>> idt_vectoring_info_field.
>>>>
>>> But how this can happens with VMX code? VMX has this nested_run_pending
>>> thing that prevents #vmexit emulation from happening without vmlaunch.
>>> This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
>>> #vmexit emulation since it is marked invalid during vmlaunch.
>>
>> Now that nmi/interrupt_allowed is strict /wrt nested_run_pending again,
>> it may indeed no longer happen. It was definitely a problem before, also
>> with direct vmexit on pending INIT. Requires a second thought, maybe
>> also a WARN_ON(vmx->nested.nested_run_pending) in nested_vmx_vmexit.
>>
>>>
>>>> However, we differ on how to deal with events that L0 wanted to inject
>>>> into L2. Likely, this case is still broken in SVM. For VMX, the function
>>>> vmcs12_save_pending_events deals with transferring pending L0 events
>>>> into the queue of L1. That is mandatory as L1 may decide to switch the
>>>> guest state completely, invalidating or preserving the pending events
>>>> for later injection (including on a different node, once we support
>>>> migration).
>>>>
>>>> Note that we treat directly injected NMIs differently as they can hit
>>>> both L1 and L2. In this case, we let L0 try to injection again also over
>>>> L1 after leaving L2.
>>>>
>>> Hmm, where SDM says NMI behaves this way?
>>
>> NMIs are only blocked in root mode if we took an NMI-related vmexit (or,
>> of course, an NMI is being processed). Thus, every arriving NMI can
>> either hit the guest or the host - pure luck.
>>
>> However, I have missed the fact that an NMI may have been injected from
>> L1 as well. If injection triggers a vmexit, that NMI could now leak into
>> L1. So we have to process them as well in vmcs12_save_pending_events.
>>
> You mean "should not leak into L0" not L1?
No, L1. If we keep the NMI in the architectural queue, L0 will try to
reinject it over L1 after the vmexit to it.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
2013-03-17 15:17 ` Jan Kiszka
@ 2013-03-17 15:19 ` Gleb Natapov
0 siblings, 0 replies; 13+ messages in thread
From: Gleb Natapov @ 2013-03-17 15:19 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Nadav Har'El
On Sun, Mar 17, 2013 at 04:17:19PM +0100, Jan Kiszka wrote:
> On 2013-03-17 16:14, Gleb Natapov wrote:
> > On Sun, Mar 17, 2013 at 04:02:07PM +0100, Jan Kiszka wrote:
> >> On 2013-03-17 14:45, Gleb Natapov wrote:
> >>> On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
> >>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>
> >>>> The basic idea is to always transfer the pending event injection on
> >>>> vmexit into the architectural state of the VCPU and then drop it from
> >>>> there if it turns out that we left L2 to enter L1.
> >>>>
> >>>> VMX and SVM are now identical in how they recover event injections from
> >>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
> >>>> still contains a valid event and, if yes, transfer the content into L1's
> >>>> idt_vectoring_info_field.
> >>>>
> >>> But how this can happens with VMX code? VMX has this nested_run_pending
> >>> thing that prevents #vmexit emulation from happening without vmlaunch.
> >>> This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
> >>> #vmexit emulation since it is marked invalid during vmlaunch.
> >>
> >> Now that nmi/interrupt_allowed is strict /wrt nested_run_pending again,
> >> it may indeed no longer happen. It was definitely a problem before, also
> >> with direct vmexit on pending INIT. Requires a second thought, maybe
> >> also a WARN_ON(vmx->nested.nested_run_pending) in nested_vmx_vmexit.
> >>
> >>>
> >>>> However, we differ on how to deal with events that L0 wanted to inject
> >>>> into L2. Likely, this case is still broken in SVM. For VMX, the function
> >>>> vmcs12_save_pending_events deals with transferring pending L0 events
> >>>> into the queue of L1. That is mandatory as L1 may decide to switch the
> >>>> guest state completely, invalidating or preserving the pending events
> >>>> for later injection (including on a different node, once we support
> >>>> migration).
> >>>>
> >>>> Note that we treat directly injected NMIs differently as they can hit
> >>>> both L1 and L2. In this case, we let L0 try to injection again also over
> >>>> L1 after leaving L2.
> >>>>
> >>> Hmm, where SDM says NMI behaves this way?
> >>
> >> NMIs are only blocked in root mode if we took an NMI-related vmexit (or,
> >> of course, an NMI is being processed). Thus, every arriving NMI can
> >> either hit the guest or the host - pure luck.
> >>
> >> However, I have missed the fact that an NMI may have been injected from
> >> L1 as well. If injection triggers a vmexit, that NMI could now leak into
> >> L1. So we have to process them as well in vmcs12_save_pending_events.
> >>
> > You mean "should not leak into L0" not L1?
>
> No, L1. If we keep the NMI in the architectural queue, L0 will try to
> reinject it over L1 after the vmexit to it.
>
Ah, yes. I meant the same, but by leaking to L0 I was talking about L0
thinking that it needs to reinject it.
--
Gleb.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-03-17 15:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
2013-03-17 13:47 ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery Jan Kiszka
2013-03-17 13:45 ` Gleb Natapov
2013-03-17 15:02 ` Jan Kiszka
2013-03-17 15:14 ` Gleb Natapov
2013-03-17 15:17 ` Jan Kiszka
2013-03-17 15:19 ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 4/5] KVM: nVMX: Fix conditions for interrupt injection Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 5/5] KVM: nVMX: Fix conditions for NMI injection Jan Kiszka
2013-03-16 10:42 ` [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).