kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts
@ 2024-09-06  4:34 Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API Sean Christopherson
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Fix a bug where KVM injects L2's nested posted interrupt into L1 as a
nested VM-Exit instead of triggering PI processing.  The actual bug is
technically a generic nested posted interrupts problem, but due to the
way that KVM handles interrupt delivery, the issue is mostly limited to
to IPI virtualization being enabled.

Found by the nested posted interrupt KUT test on SPR.

If it weren't for an annoying TOCTOU bug waiting to happen, the fix would
be quite simple, e.g. it's really just:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f7dde74ff565..b07805daedf5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4288,6 +4288,15 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
                        return -EBUSY;
                if (!nested_exit_on_intr(vcpu))
                        goto no_vmexit;
+
+               if (nested_cpu_has_posted_intr(get_vmcs12(vcpu)) &&
+                   kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) {
+                       vmx->nested.pi_pending = true;
+                       kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv);
+                       goto no_vmexit;
+               }
+
                nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
                return 0;
        }

v2:
 - Split kvm_get_apic_interrupt() into has+ack to avoid marking the IRQ as
   in-service in vmcs02 instead of vmcs01. [Nathan]
 - Gather reviews, but only for the patches that didn't meaningful change (all
   two of them). [Chao]
 - Drop Cc: stable@ from all patches.  For real world hypervisors, this is
   unlikely to cause functional issues, only loss of IPI virtualization
   performance due to the unnecessary VM-Exit.  Whereas evidenced by my screwup
   in v1, this code is plenty subtle enough to introduce bugs.
 - Drop the patch to store nested.posted_intr_nv as an int, as there is no need
   to explicitly match -1 (as a signed int) in this approach.
 - Add a patch to assert vcpu->mutex is held when getting vmcs12, as I was
   "this" close to yanking out nested.posted_intr_nv, until I realized that
   accessing a different vCPU's vmcs12 in the IPI path is unsafe.

v1: https://lore.kernel.org/all/20240720000138.3027780-1-seanjc@google.com

Sean Christopherson (7):
  KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API
  KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection
    site
  KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no
    IRQ
  KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit
    injection
  KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt()
  KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at
    VM-Enter
  KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary
    VMCSes

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/irq.c              | 10 ++++--
 arch/x86/kvm/lapic.c            |  9 +++---
 arch/x86/kvm/lapic.h            |  2 +-
 arch/x86/kvm/vmx/nested.c       | 57 ++++++++++++++++++++++++++-------
 arch/x86/kvm/vmx/nested.h       |  6 ++++
 arch/x86/kvm/vmx/vmx.c          |  7 ++++
 7 files changed, 72 insertions(+), 20 deletions(-)


base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 2/7] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site Sean Christopherson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Split the "ack" phase, i.e. the movement of an interrupt from IRR=>ISR,
out of kvm_get_apic_interrupt() and into a separate API so that nested
VMX can acknowledge a specific interrupt _after_ emulating a VM-Exit from
L2 to L1.

To correctly emulate nested posted interrupts while APICv is active, KVM
must:

  1. find the highest pending interrupt.
  2. check if that IRQ is L2's notification vector
  3. emulate VM-Exit if the IRQ is NOT the notification vector
  4. ACK the IRQ in L1 _after_ VM-Exit

When APICv is active, the process of moving the IRQ from the IRR to the
ISR also requires a VMWRITE to update vmcs01.GUEST_INTERRUPT_STATUS.SVI,
and so acknowledging the interrupt before switching to vmcs01 would result
in marking the IRQ as in-service in the wrong VMCS.

KVM currently fudges around this issue by doing kvm_get_apic_interrupt()
smack dab in the middle of emulating VM-Exit, but that hack doesn't play
nice with nested posted interrupts, as notification vector IRQs don't
trigger a VM-Exit in the first place.

Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 17 +++++++++++++----
 arch/x86/kvm/lapic.h |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a7172ba59ad2..ff63ef8163a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2924,14 +2924,13 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 	}
 }
 
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
+void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
 {
-	int vector = kvm_apic_has_interrupt(vcpu);
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u32 ppr;
 
-	if (vector == -1)
-		return -1;
+	if (WARN_ON_ONCE(vector < 0 || !apic))
+		return;
 
 	/*
 	 * We get here even with APIC virtualization enabled, if doing
@@ -2959,6 +2958,16 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 		__apic_update_ppr(apic, &ppr);
 	}
 
+}
+EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
+
+int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
+{
+	int vector = kvm_apic_has_interrupt(vcpu);
+
+	if (vector != -1)
+		kvm_apic_ack_interrupt(vcpu, vector);
+
 	return vector;
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7ef8ae73e82d..db80a2965b9c 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -88,6 +88,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu);
 void kvm_free_lapic(struct kvm_vcpu *vcpu);
 
 int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/7] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 3/7] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ Sean Christopherson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from
nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one
and only path where KVM invokes nested_vmx_vmexit() with
EXIT_REASON_EXTERNAL_INTERRUPT.  A future fix will perform a last-minute
check on L2's nested posted interrupt notification vector, just before
injecting a nested VM-Exit.  To handle that scenario correctly, KVM needs
to get the interrupt _before_ injecting VM-Exit, as simply querying the
highest priority interrupt, via kvm_cpu_has_interrupt(), would result in
TOCTOU bug, as a new, higher priority interrupt could arrive between
kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt().

Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't
suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in
kvm_get_apic_interrupt(), and acknowledging the interrupt before nested
VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01.

Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ
is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU
issue described above.

Opportunistically convert the WARN_ON() to a WARN_ON_ONCE().  If KVM has
a bug that results in a false positive from kvm_cpu_has_interrupt(),
spamming dmesg won't help the situation.

Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as
they are always "wanted" by L0.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/irq.c              |  3 ++-
 arch/x86/kvm/vmx/nested.c       | 36 ++++++++++++++++++++++++---------
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 950a03e0181e..d66d562c0ab3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2251,6 +2251,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_cpu_has_extint(struct kvm_vcpu *v);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
+int kvm_cpu_get_extint(struct kvm_vcpu *v);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 3d7eb11d0e45..810da99ff7ed 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -108,7 +108,7 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
  * Read pending interrupt(from non-APIC source)
  * vector and intack.
  */
-static int kvm_cpu_get_extint(struct kvm_vcpu *v)
+int kvm_cpu_get_extint(struct kvm_vcpu *v)
 {
 	if (!kvm_cpu_has_extint(v)) {
 		WARN_ON(!lapic_in_kernel(v));
@@ -131,6 +131,7 @@ static int kvm_cpu_get_extint(struct kvm_vcpu *v)
 	} else
 		return kvm_pic_read_irq(v->kvm); /* PIC */
 }
+EXPORT_SYMBOL_GPL(kvm_cpu_get_extint);
 
 /*
  * Read pending interrupt vector and intack.
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2392a7ef254d..e6af5f1d3b61 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4284,11 +4284,37 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 	}
 
 	if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) {
+		int irq;
+
 		if (block_nested_events)
 			return -EBUSY;
 		if (!nested_exit_on_intr(vcpu))
 			goto no_vmexit;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+
+		if (!nested_exit_intr_ack_set(vcpu)) {
+			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+			return 0;
+		}
+
+		irq = kvm_cpu_get_extint(vcpu);
+		if (irq != -1) {
+			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+					  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
+			return 0;
+		}
+
+		irq = kvm_apic_has_interrupt(vcpu);
+		WARN_ON_ONCE(irq < 0);
+
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+				  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
+
+		/*
+		 * ACK the interrupt _after_ emulating VM-Exit, as the IRQ must
+		 * be marked as in-service in vmcs01.GUEST_INTERRUPT_STATUS.SVI
+		 * if APICv is active.
+		 */
+		kvm_apic_ack_interrupt(vcpu, irq);
 		return 0;
 	}
 
@@ -4969,14 +4995,6 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
 	if (likely(!vmx->fail)) {
-		if ((u16)vm_exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
-		    nested_exit_intr_ack_set(vcpu)) {
-			int irq = kvm_cpu_get_interrupt(vcpu);
-			WARN_ON(irq < 0);
-			vmcs12->vm_exit_intr_info = irq |
-				INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
-		}
-
 		if (vm_exit_reason != -1)
 			trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
 						       vmcs12->exit_qualification,
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/7] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 2/7] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 4/7] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection Sean Christopherson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

In the should-be-impossible scenario that kvm_cpu_get_interrupt() doesn't
return a valid vector after checking kvm_cpu_has_interrupt(), skip VM-Exit
injection to reduce the probability of crashing/confusing L1.  Now that
KVM gets the IRQ _before_ calling nested_vmx_vmexit(), squashing the
VM-Exit injection is trivial since there are no actions that need to be
undone.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e6af5f1d3b61..6b7e0ab0e45e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4304,7 +4304,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 		}
 
 		irq = kvm_apic_has_interrupt(vcpu);
-		WARN_ON_ONCE(irq < 0);
+		if (WARN_ON_ONCE(irq < 0))
+			goto no_vmexit;
 
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
 				  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/7] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-09-06  4:34 ` [PATCH v2 3/7] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 5/7] KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() Sean Christopherson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

When synthensizing a nested VM-Exit due to an external interrupt, pend a
nested posted interrupt if the external interrupt vector matches L2's PI
notification vector, i.e. if the interrupt is a PI notification for L2.
This fixes a bug where KVM will incorrectly inject VM-Exit instead of
processing nested posted interrupt when IPI virtualization is enabled.

Per the SDM, detection of the notification vector doesn't occur until the
interrupt is acknowledge and deliver to the CPU core.

  If the external-interrupt exiting VM-execution control is 1, any unmasked
  external interrupt causes a VM exit (see Section 26.2). If the "process
  posted interrupts" VM-execution control is also 1, this behavior is
  changed and the processor handles an external interrupt as follows:

    1. The local APIC is acknowledged; this provides the processor core
       with an interrupt vector, called here the physical vector.
    2. If the physical vector equals the posted-interrupt notification
       vector, the logical processor continues to the next step. Otherwise,
       a VM exit occurs as it would normally due to an external interrupt;
       the vector is saved in the VM-exit interruption-information field.

For the most part, KVM has avoided problems because a PI NV for L2 that
arrives will L2 is active will be processed by hardware, and KVM checks
for a pending notification vector during nested VM-Enter.  Thus, to hit
the bug, the PI NV interrupt needs to sneak its way into L1's vIRR while
L2 is active.

Without IPI virtualization, the scenario is practically impossible to hit,
modulo L1 doing weird things (see below), as the ordering between
vmx_deliver_posted_interrupt() and nested VM-Enter effectively guarantees
that either the sender will see the vCPU as being in_guest_mode(), or the
receiver will see the interrupt in its vIRR.

With IPI virtualization, introduced by commit d588bb9be1da ("KVM: VMX:
enable IPI virtualization"), the sending CPU effectively implements a rough
equivalent of vmx_deliver_posted_interrupt(), sans the nested PI NV check.
If the target vCPU has a valid PID, the CPU will send a PI NV interrupt
based on _L1's_ PID, as the sender's because IPIv table points at L1 PIDs.

  PIR := 32 bytes at PID_ADDR;
  // under lock
  PIR[V] := 1;
  store PIR at PID_ADDR;
  // release lock

  NotifyInfo := 8 bytes at PID_ADDR + 32;
  // under lock
  IF NotifyInfo.ON = 0 AND NotifyInfo.SN = 0; THEN
    NotifyInfo.ON := 1;
    SendNotify := 1;
  ELSE
    SendNotify := 0;
  FI;
  store NotifyInfo at PID_ADDR + 32;
  // release lock

  IF SendNotify = 1; THEN
    send an IPI specified by NotifyInfo.NDST and NotifyInfo.NV;
  FI;

As a result, the target vCPU ends up receiving an interrupt on KVM's
POSTED_INTR_VECTOR while L2 is running, with an interrupt in L1's PIR for
L2's nested PI NV.  The POSTED_INTR_VECTOR interrupt triggers a VM-Exit
from L2 to L0, KVM moves the interrupt from L1's PIR to vIRR, triggers a
KVM_REQ_EVENT prior to re-entry to L2, and calls vmx_check_nested_events(),
effectively bypassing all of KVM's "early" checks on nested PI NV.

Without IPI virtualization, the bug can likely be hit only if L1 programs
an assigned device to _post_ an interrupt to L2's notification vector, by
way of L1's PID.PIR.  Doing so would allow the interrupt to get into L1's
vIRR without KVM checking vmcs12's NV.  Which is architecturally allowed,
but unlikely behavior for a hypervisor.

Cc: Zeng Guang <guang.zeng@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6b7e0ab0e45e..238c26155c2a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4307,6 +4307,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 		if (WARN_ON_ONCE(irq < 0))
 			goto no_vmexit;
 
+		/*
+		 * If the IRQ is L2's PI notification vector, process posted
+		 * interrupts for L2 instead of injecting VM-Exit, as the
+		 * detection/morphing architecturally occurs when the IRQ is
+		 * delivered to the CPU.  Note, only interrupts that are routed
+		 * through the local APIC trigger posted interrupt processing,
+		 * and enabling posted interrupts requires ACK-on-exit.
+		 */
+		if (irq == vmx->nested.posted_intr_nv) {
+			vmx->nested.pi_pending = true;
+			kvm_apic_clear_irr(vcpu, irq);
+			goto no_vmexit;
+		}
+
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
 				  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
 
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/7] KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt()
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
                   ` (3 preceding siblings ...)
  2024-09-06  4:34 ` [PATCH v2 4/7] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 6/7] KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter Sean Christopherson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() now that nVMX
essentially open codes kvm_get_apic_interrupt() in order to correctly
emulate nested posted interrupts.

Opportunistically stop exporting kvm_cpu_get_interrupt(), as the
aforementioned nVMX flow was the only user in vendor code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/irq.c   |  7 +++++--
 arch/x86/kvm/lapic.c | 10 ----------
 arch/x86/kvm/lapic.h |  1 -
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 810da99ff7ed..63f66c51975a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -142,9 +142,12 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 	if (vector != -1)
 		return vector;			/* PIC */
 
-	return kvm_get_apic_interrupt(v);	/* APIC */
+	vector = kvm_apic_has_interrupt(v);	/* APIC */
+	if (vector != -1)
+		kvm_apic_ack_interrupt(v, vector);
+
+	return vector;
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ff63ef8163a9..5c820e3f06a8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2961,16 +2961,6 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
 }
 EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
 
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
-{
-	int vector = kvm_apic_has_interrupt(vcpu);
-
-	if (vector != -1)
-		kvm_apic_ack_interrupt(vcpu, vector);
-
-	return vector;
-}
-
 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s, bool set)
 {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index db80a2965b9c..8310ff74be29 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -90,7 +90,6 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
 int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
 void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/7] KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
                   ` (4 preceding siblings ...)
  2024-09-06  4:34 ` [PATCH v2 5/7] KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-06  4:34 ` [PATCH v2 7/7] KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes Sean Christopherson
  2024-09-10  4:56 ` [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Explicitly invalidate posted_intr_nv when emulating nested VM-Enter and
posted interrupts are disabled to make it clear that posted_intr_nv is
valid if and only if nested posted interrupts are enabled, and as a cheap
way to harden against KVM bugs.

KVM initializes posted_intr_nv to -1 at vCPU creation and resets it to -1
when unloading vmcs12 and/or leaving nested mode, i.e. this is not a bug
fix (or at least, it's not intended to be a bug fix).

Note, tracking nested.posted_intr_nv as a u16 subtly adds a measure of
safety, as it prevents unintentionally matching KVM's informal "no IRQ"
vector of -1, stored as a signed int.  Because a u16 can be always be
represented as a signed int, the effective "invalid" value of
posted_intr_nv, 65535, will be preserved as-is when comparing against an
int, i.e. will be zero-extended, not sign-extended, and thus won't get a
false positive if KVM is buggy and compares posted_intr_nv against -1.

Opportunistically add a comment in vmx_deliver_nested_posted_interrupt()
to call out that it must check vmx->nested.posted_intr_nv, not the vector
in vmcs12, which is presumably the _entire_ reason nested.posted_intr_nv
exists.  E.g. vmcs12 is a KVM-controlled snapshot, so there are no TOCTOU
races to worry about, the only potential badness is if the vCPU leaves
nested and frees vmcs12 between the sender checking is_guest_mode() and
dereferencing the vmcs12 pointer.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 arch/x86/kvm/vmx/vmx.c    | 7 +++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 238c26155c2a..7e8a646e2851 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2317,10 +2317,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 
 	/* Posted interrupts setting is only taken from vmcs12.  */
 	vmx->nested.pi_pending = false;
-	if (nested_cpu_has_posted_intr(vmcs12))
+	if (nested_cpu_has_posted_intr(vmcs12)) {
 		vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
-	else
+	} else {
+		vmx->nested.posted_intr_nv = -1;
 		exec_control &= ~PIN_BASED_POSTED_INTR;
+	}
 	pin_controls_set(vmx, exec_control);
 
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f18c2d8c7476..63d656032384 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4219,6 +4219,13 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	/*
+	 * DO NOT query the vCPU's vmcs12, as vmcs12 is dynamically allocated
+	 * and freed, and must not be accessed outside of vcpu->mutex.  The
+	 * vCPU's cached PI NV is valid if and only if posted interrupts
+	 * enabled in its vmcs12, i.e. checking the vector also checks that
+	 * L1 has enabled posted interrupts for L2.
+	 */
 	if (is_guest_mode(vcpu) &&
 	    vector == vmx->nested.posted_intr_nv) {
 		/*
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 7/7] KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
                   ` (5 preceding siblings ...)
  2024-09-06  4:34 ` [PATCH v2 6/7] KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter Sean Christopherson
@ 2024-09-06  4:34 ` Sean Christopherson
  2024-09-10  4:56 ` [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
  7 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-06  4:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

Add lockdep assertions in get_vmcs12() and get_shadow_vmcs12() to verify
the vCPU's mutex is held, as the returned VMCS objects are dynamically
allocated/freed when nested VMX is turned on/off, i.e. accessing vmcs12
structures without holding vcpu->mutex is susceptible to use-after-free.

Waive the assertion if the VM is being destroyed, as KVM currently forces
a nested VM-Exit when freeing the vCPU.  If/when that wart is fixed, the
assertion can/should be converted to an unqualified lockdep assertion.
See also https://lore.kernel.org/all/Zsd0TqCeY3B5Sb5b@google.com.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index cce4e2aa30fb..668b6c83a373 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -39,11 +39,17 @@ bool nested_vmx_check_io_bitmaps(struct kvm_vcpu *vcpu, unsigned int port,
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
 {
+	lockdep_assert_once(lockdep_is_held(&vcpu->mutex) ||
+			    !refcount_read(&vcpu->kvm->users_count));
+
 	return to_vmx(vcpu)->nested.cached_vmcs12;
 }
 
 static inline struct vmcs12 *get_shadow_vmcs12(struct kvm_vcpu *vcpu)
 {
+	lockdep_assert_once(lockdep_is_held(&vcpu->mutex) ||
+			    !refcount_read(&vcpu->kvm->users_count));
+
 	return to_vmx(vcpu)->nested.cached_shadow_vmcs12;
 }
 
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts
  2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
                   ` (6 preceding siblings ...)
  2024-09-06  4:34 ` [PATCH v2 7/7] KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes Sean Christopherson
@ 2024-09-10  4:56 ` Sean Christopherson
  2024-09-10 16:22   ` Nathan Chancellor
  7 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2024-09-10  4:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Nathan Chancellor, Chao Gao, Zeng Guang

On Thu, 05 Sep 2024 21:34:06 -0700, Sean Christopherson wrote:
> Fix a bug where KVM injects L2's nested posted interrupt into L1 as a
> nested VM-Exit instead of triggering PI processing.  The actual bug is
> technically a generic nested posted interrupts problem, but due to the
> way that KVM handles interrupt delivery, the issue is mostly limited to
> to IPI virtualization being enabled.
> 
> Found by the nested posted interrupt KUT test on SPR.
> 
> [...]

Trying this again, hopefully with less awful testing this time...

Applied to kvm-x86 vmx.

[1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API
      https://github.com/kvm-x86/linux/commit/a194a3a13ce0
[2/7] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
      https://github.com/kvm-x86/linux/commit/363010e1dd0e
[3/7] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ
      https://github.com/kvm-x86/linux/commit/8c23670f2b00
[4/7] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection
      https://github.com/kvm-x86/linux/commit/6e0b456547f4
[5/7] KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt()
      https://github.com/kvm-x86/linux/commit/aa9477966aab
[6/7] KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter
      https://github.com/kvm-x86/linux/commit/1ed0f119c5ff
[7/7] KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes
      https://github.com/kvm-x86/linux/commit/3dde46a21aa7

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts
  2024-09-10  4:56 ` [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
@ 2024-09-10 16:22   ` Nathan Chancellor
  2024-09-10 17:43     ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Chancellor @ 2024-09-10 16:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Zeng Guang

On Mon, Sep 09, 2024 at 09:56:42PM -0700, Sean Christopherson wrote:
> On Thu, 05 Sep 2024 21:34:06 -0700, Sean Christopherson wrote:
> > Fix a bug where KVM injects L2's nested posted interrupt into L1 as a
> > nested VM-Exit instead of triggering PI processing.  The actual bug is
> > technically a generic nested posted interrupts problem, but due to the
> > way that KVM handles interrupt delivery, the issue is mostly limited to
> > to IPI virtualization being enabled.
> > 
> > Found by the nested posted interrupt KUT test on SPR.
> > 
> > [...]
> 
> Trying this again, hopefully with less awful testing this time...

I meant to reply yesterday but I guess I lost track of time. This passed
my testing on all my machines, so it is not as bad as last time :)

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts
  2024-09-10 16:22   ` Nathan Chancellor
@ 2024-09-10 17:43     ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-09-10 17:43 UTC (permalink / raw)
  To: Nathan Chancellor; +Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Zeng Guang

On Tue, Sep 10, 2024, Nathan Chancellor wrote:
> On Mon, Sep 09, 2024 at 09:56:42PM -0700, Sean Christopherson wrote:
> > On Thu, 05 Sep 2024 21:34:06 -0700, Sean Christopherson wrote:
> > > Fix a bug where KVM injects L2's nested posted interrupt into L1 as a
> > > nested VM-Exit instead of triggering PI processing.  The actual bug is
> > > technically a generic nested posted interrupts problem, but due to the
> > > way that KVM handles interrupt delivery, the issue is mostly limited to
> > > to IPI virtualization being enabled.
> > > 
> > > Found by the nested posted interrupt KUT test on SPR.
> > > 
> > > [...]
> > 
> > Trying this again, hopefully with less awful testing this time...
> 
> I meant to reply yesterday but I guess I lost track of time. This passed
> my testing on all my machines, so it is not as bad as last time :)

Mission Suck Less Accomplished!

Thanks much!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-09-10 17:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-06  4:34 [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 1/7] KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 2/7] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 3/7] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 4/7] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 5/7] KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 6/7] KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter Sean Christopherson
2024-09-06  4:34 ` [PATCH v2 7/7] KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes Sean Christopherson
2024-09-10  4:56 ` [PATCH v2 0/7] KVM: nVMX: Fix IPIv vs. nested posted interrupts Sean Christopherson
2024-09-10 16:22   ` Nathan Chancellor
2024-09-10 17:43     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).