kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] KVM: TDX: TDX interrupts
@ 2025-02-11  2:58 Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

Hi,

This patch series introduces the support of interrupt handling for TDX
guests, including virtual interrupt injection and VM-Exits caused by
vectored events.

This patch set is one of several patch sets that are all needed to provide
the ability to run a functioning TD VM. We think this is in pretty good
shape at this point, but it probably needs another round of review before
hand off. We would appreciate review from Sean on the implementation of the
APICv feedback [1].


Base of this series
===================
This series is based on kvm-coco-queue up to the end of MMU part 2, plus
two later sections. Stack is:
  - '55f78d925e07 ("KVM: TDX: Return -EBUSY when tdh_mem_page_add()
    encounters TDX_OPERAND_BUSY")'.
  - v2 of "KVM: TDX: TD vcpu enter/exit" (There is one small log difference
    between the v2 patches and the commits in kvm-coco-queue. No code
    differences). 
  - v2 of "KVM: TDX: TDX hypercalls may exit to userspace"


Notable changes since v1 [2]
============================
Enforce APICv active for TDX guests from the view of KVM, which was
suggested by Sean in one PUCK session, because it is not concept right to
"lie" to KVM that APICv is disabled while it is actually enabled. Instead,
it's better to make APICv enabled and prevent it from being disabled from
the view of KVM. More details can be found in the discussion thread [1].
For this purpose, additional checks are implemented:
- Check enable_apicv in tdx_bringup().
- Reject KVM_{GET,SET}_LAPIC from userspace, thus it requires the code
  change in QEMU to skip request KVM_{GET,SET}_LAPIC.
- Implement vt_refresh_apicv_exec_ctrl() to bug the VM if APICv is
  disabled.

Enforce KVM_IRQCHIP_SPLIT for TDX guests to disallow in-kernel I/O APIC
while in-kernel local APIC is needed.

Unify the code to handle NMIs and external interrupts.

WARN on init event for TDX vCPU.

Drop vt_hwapic_irr_update() since .hwapic_irr_update() is gone in 6.14.

Also, there is an new information update about NMI blocked status after
exiting from TDX guest for NMI-induced exits in "NMI" part of section
"VM-Exits caused by vectored event" below.


Virtual interrupt injection
===========================
Non-NMI Interrupt
-----------------
TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM can update
them directly. To post pending interrupts in the PID, KVM can generate a
self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of TDX
guest. In this series, assumes interrupt is always allowed. A later patch
set will have the support for TDX guest to call TDVMCALL with HLT, which
passes the interrupt block flag, so that whether interrupt is allowed in
HLT will checked against the interrupt block flag.

NMI
---
KVM can request the TDX module to inject a NMI into a TDX vCPU by setting
the PEND_NMI TDVPS field to 1. Following that, KVM can call TDH.VP.ENTER to
run the vCPU and the TDX module will attempt to inject the NMI as soon as
possible.
PEND_NMI TDVPS field is a 1-bit filed, i.e. KVM can only pend one NMI in
the TDX module. Also, TDX doesn't allow KVM to request NMI-window exit
directly. When there is already one NMI pending in the TDX module, i.e. it
has not been delivered to TDX guest yet, if there is NMI pending in KVM,
collapse the pending NMI in KVM into the one pending in the TDX module.
Such collapse is OK considering on X86 bare metal, multiple NMIs could
collapse into one NMI, e.g. when NMI is blocked by SMI.  It's OS's
responsibility to poll all NMI sources in the NMI handler to avoid missing
handling of some NMI events. More details can be found in the changelog of
the patch "KVM: TDX: Implement methods to inject NMI".

SMI
---
TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM. Handle
SMI request as what KVM does for CONFIG_KVM_SMM=n, i.e. return -ENOTTY,
and add KVM_BUG_ON() to SMI related OPs for TD.

INIT/SIPI event
----------------
TDX defines its own vCPU creation and initialization sequence including
multiple SEAMCALLs.  Also, it's only allowed during TD build time. Always
block INIT and SIPI events for the TDX guest.


VM-Exits caused by vectored event
=================================
NMI (with *new update*)
-----------------------
Just like the VMX case, NMI remains blocked after exiting from TDX guest
for NMI-induced exits [*], handle VM-Exit caused by NMIs within
tdx_vcpu_enter_exit(), i.e., handled before leaving the safety of noinstr.

[*]: Old TDX modules may have a bug which makes NMI unblocked after exiting
from TDX guest for NMI-induced exits.  This could potentially lead to
nested NMIs: a new NMI arrives when KVM is manually calling the host NMI
handler.  This is an architectural violation, but it doesn't have real harm
until FRED is enabled together with TDX (for non-FRED, the host NMI handler
can handle nested NMIs).  Given this is rare to happen and has no real
harm, ignore this for the initial TDX support.
For the new TDX modules that fixed the bug, NMIs are blocked after exiting
from TDX guest for NMI-induced exits, which is the default behavior and no
"opt-in" is needed. This is aligned with the suggestion made by Sean [3].

External Interrupt
------------------
Similar to the VMX case, external interrupts are handled in
.handle_exit_irqoff() callback.

Exception
---------
Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

SMI
---
In SEAM root mode (TDX module), all interrupts are blocked. If an SMI
occurs in SEAM non-root mode (TD guest), the SMI causes VM exit to TDX
module, then SEAMRET to KVM. Once it exits to KVM, SMI is delivered and
handled by kernel handler right away.
An SMI can be "I/O SMI" or "other SMI".  For TDX, there will be no I/O SMI
because I/O instructions inside TDX guest trigger #VE and TDX guest needs
to use TDVMCALL to request VMM to do I/O emulation.
For "other SMI", there are two cases:
- MSMI case.  When BIOS eMCA MCE-SMI morphing is enabled, the #MC occurs in
  TDX guest will be delivered as an MSMI.  It causes an
  EXIT_REASON_OTHER_SMI VM exit with MSMI (bit 0) set in the exit
  qualification.  On VM exit, TDX module checks whether the "other SMI" is
  caused by an MSMI or not.  If so, TDX module marks TD as fatal,
  preventing further TD entries, and then completes the TD exit flow to KVM
  with the TDH.VP.ENTER outputs indicating TDX_NON_RECOVERABLE_TD.  After
  TD exit, the MSMI is delivered and eventually handled by the kernel
  machine check handler (7911f14 x86/mce: Implement recovery for errors in
  TDX/SEAM non-root mode), i.e., the memory page is marked as poisoned and
  it won't be freed to the free list when the TDX guest is terminated.
  Since the TDX guest is dead, follow other non-recoverable cases, exit to
  userspace.
- For non-MSMI case, KVM doesn't need to do anything, just continue TDX
  vCPU execution.


Repos
=====
Due to "KVM: VMX: Move common fields of struct" in "TDX vcpu enter/exit" v2
[4], subsequent patches require changes to use new struct vcpu_vt, refer to
the full KVM branch below.

It requires TDX module 1.5.06.00.0744 [5], or later as mentioned in [4].
A working edk2 commit is 95d8a1c ("UnitTestFrameworkPkg: Use TianoCore
mirror of subhook submodule").

The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2025-02-10

A matching QEMU is here:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-upstream-v7


Testing 
=======
It has been tested as part of the development branch for the TDX base
series. The testing consisted of TDX kvm-unit-tests and booting a Linux
TD, and TDX enhanced KVM selftests.

[1] https://lore.kernel.org/kvm/Z4VKdbW1R0AoLvkB@google.com
[2] https://lore.kernel.org/kvm/20241209010734.3543481-1-binbin.wu@linux.intel.com
[3] https://lore.kernel.org/kvm/Z0T_iPdmtpjrc14q@google.com
[4] https://lore.kernel.org/kvm/20250129095902.16391-1-adrian.hunter@intel.com
[5] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06

Binbin Wu (2):
  KVM: TDX: Enforce KVM_IRQCHIP_SPLIT for TDX guests
  KVM: VMX: Move emulation_required to struct vcpu_vt

Isaku Yamahata (12):
  KVM: TDX: Disable PI wakeup for IPIv
  KVM: VMX: Move posted interrupt delivery code to common header
  KVM: TDX: Implement non-NMI interrupt injection
  KVM: TDX: Wait lapic expire when timer IRQ was injected
  KVM: TDX: Implement methods to inject NMI
  KVM: TDX: Complete interrupts after TD exit
  KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  KVM: TDX: Always block INIT/SIPI
  KVM: TDX: Force APICv active for TDX guest
  KVM: TDX: Add methods to ignore virtual apic related operation
  KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT
  KVM: TDX: Handle EXIT_REASON_OTHER_SMI

Sean Christopherson (3):
  KVM: TDX: Add support for find pending IRQ in a protected local APIC
  KVM: x86: Assume timer IRQ was injected if APIC state is protected
  KVM: VMX: Add a helper for NMI handling

 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   1 +
 arch/x86/include/asm/posted_intr.h |   5 +
 arch/x86/include/uapi/asm/vmx.h    |   1 +
 arch/x86/kvm/irq.c                 |   3 +
 arch/x86/kvm/lapic.c               |  14 +-
 arch/x86/kvm/lapic.h               |   2 +
 arch/x86/kvm/smm.h                 |   3 +
 arch/x86/kvm/vmx/common.h          |  74 ++++++++
 arch/x86/kvm/vmx/main.c            | 262 ++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/nested.c          |   2 +-
 arch/x86/kvm/vmx/posted_intr.c     |   9 +-
 arch/x86/kvm/vmx/posted_intr.h     |   2 +
 arch/x86/kvm/vmx/tdx.c             | 145 +++++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |   5 +
 arch/x86/kvm/vmx/vmx.c             | 113 +++----------
 arch/x86/kvm/vmx/vmx.h             |   1 -
 arch/x86/kvm/vmx/x86_ops.h         |  12 ++
 arch/x86/kvm/x86.c                 |   6 +
 19 files changed, 541 insertions(+), 120 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  7:23   ` Binbin Wu
  2025-02-12  8:12   ` Chao Gao
  2025-02-11  2:58 ` [PATCH v2 02/17] KVM: TDX: Disable PI wakeup for IPIv Binbin Wu
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Sean Christopherson <seanjc@google.com>

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest has a pending IRQ.  For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM.  As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM.  To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector.  And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unnecessary checks.  A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS.  Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
 - Fix a typo in changelog.

TDX interrupts v1:
 - Dropped vt_protected_apic_has_interrupt() with KVM_BUG_ON(), wire in
   tdx_protected_apic_has_interrupt() directly. (Rick)
 - Add {} on else in vt_hardware_setup()
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 +
 arch/x86/include/asm/kvm_host.h    | 1 +
 arch/x86/kvm/irq.c                 | 3 +++
 arch/x86/kvm/lapic.c               | 3 +++
 arch/x86/kvm/lapic.h               | 2 ++
 arch/x86/kvm/vmx/main.c            | 3 +++
 arch/x86/kvm/vmx/tdx.c             | 6 ++++++
 arch/x86/kvm/vmx/x86_ops.h         | 2 ++
 8 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index d953a454bafb..2eaabff66c82 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -115,6 +115,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment)
 KVM_X86_OP_OPTIONAL(apicv_pre_state_restore)
 KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
 KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
 KVM_X86_OP_OPTIONAL(set_hv_timer)
 KVM_X86_OP_OPTIONAL(cancel_hv_timer)
 KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e855866bf600..ad275b606d68 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1838,6 +1838,7 @@ struct kvm_x86_ops {
 	void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);
 
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 63f66c51975a..f0644d0bbe11 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;
 
+	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a1cbca31ec30..bbdede07d063 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2967,6 +2967,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;
 
+	if (apic->guest_apic_protected)
+		return -1;
+
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 1a8553ebdb42..e33c969439f7 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,6 +65,8 @@ struct kvm_lapic {
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
+	/* Select registers in the vAPIC cannot be read/written. */
+	bool guest_apic_protected;
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7f1318c44040..2b1ea57a3a4e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -62,6 +62,8 @@ static __init int vt_hardware_setup(void)
 		vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
 		vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
 		vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
+	} else {
+		vt_x86_ops.protected_apic_has_interrupt = NULL;
 	}
 
 	return 0;
@@ -371,6 +373,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_interrupt = vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+	.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.set_identity_map_addr = vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8f3147c6e602..6940ce812730 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -668,6 +668,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+	vcpu->arch.apic->guest_apic_protected = true;
 
 	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
 
@@ -709,6 +710,11 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	local_irq_enable();
 }
 
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return pi_has_pending_interrupt(vcpu);
+}
+
 /*
  * Compared to vmx_prepare_switch_to_guest(), there is not much to do
  * as SEAMCALL/SEAMRET calls take care of most of save and restore.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 92716f6486e9..8086e5c58cd6 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -135,6 +135,7 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int tdx_handle_exit(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion fastpath);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
@@ -173,6 +174,7 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat
 }
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { return false; }
 static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion fastpath) { return 0; }
 static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/17] KVM: TDX: Disable PI wakeup for IPIv
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header Binbin Wu
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Disable PI wakeup for IPI virtualization (IPIv) case for TDX.

When a vCPU is being scheduled out, notification vector is switched and
pi_wakeup_handler() is enabled when the vCPU has interrupt enabled and
posted interrupt is used to wake up the vCPU.

For VMX, a blocked vCPU can be the target of posted interrupts when using
IPIv or VT-d PI.  TDX doesn't support IPIv, disable PI wakeup for IPIv.
Also, since the guest status of TD vCPU is protected, assume interrupt is
always enabled for TD. (PV HLT hypercall is not support yet, TDX guest
tells VMM whether HLT is called with interrupt disabled or not.)

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[binbin: split into new patch]
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- "KVM: VMX: Remove use of struct vcpu_vmx from posted_intr.c" is dropped
  because the related fields have been moved to the common struct vcpu_vt
  already. Move the pi_wakeup_list init to this patch.

TDX interrupts v1:
- This is split out as a new patch from patch
  "KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c"
---
 arch/x86/kvm/vmx/posted_intr.c | 7 +++++--
 arch/x86/kvm/vmx/tdx.c         | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5696e0f9f924..25f8a19e2831 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -11,6 +11,7 @@
 #include "posted_intr.h"
 #include "trace.h"
 #include "vmx.h"
+#include "tdx.h"
 
 /*
  * Maintain a per-CPU list of vCPUs that need to be awakened by wakeup_handler()
@@ -190,7 +191,8 @@ static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)
 	 * notification vector is switched to the one that calls
 	 * back to the pi_wakeup_handler() function.
 	 */
-	return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm);
+	return (vmx_can_use_ipiv(vcpu) && !is_td_vcpu(vcpu)) ||
+		vmx_can_use_vtd_pi(vcpu->kvm);
 }
 
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -200,7 +202,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 	if (!vmx_needs_pi_wakeup(vcpu))
 		return;
 
-	if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+	if (kvm_vcpu_is_blocking(vcpu) &&
+	    (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu)))
 		pi_enable_wakeup_handler(vcpu);
 
 	/*
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6940ce812730..825f13371134 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -669,6 +669,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
 	vcpu->arch.apic->guest_apic_protected = true;
+	INIT_LIST_HEAD(&tdx->vt.pi_wakeup_list);
 
 	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 02/17] KVM: TDX: Disable PI wakeup for IPIv Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-13  6:59   ` Chao Gao
  2025-02-11  2:58 ` [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection Binbin Wu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Move posted interrupt delivery code to common header so that TDX can
leverage it.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[binbin: split into new patch]
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- Rebased due to moving pi_desc to vcpu_vt.

TDX interrupts v1:
- This is split out from patch "KVM: TDX: Implement interrupt injection"
---
 arch/x86/kvm/vmx/common.h | 71 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 59 +-------------------------------
 2 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 9d4982694f06..079aeca65e2c 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@
 
 #include <linux/kvm_host.h>
 
+#include "posted_intr.h"
 #include "mmu.h"
 
 union vmx_exit_reason {
@@ -108,4 +109,74 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+						     int pi_vec)
+{
+#ifdef CONFIG_SMP
+	if (vcpu->mode == IN_GUEST_MODE) {
+		/*
+		 * The vector of the virtual has already been set in the PIR.
+		 * Send a notification event to deliver the virtual interrupt
+		 * unless the vCPU is the currently running vCPU, i.e. the
+		 * event is being sent from a fastpath VM-Exit handler, in
+		 * which case the PIR will be synced to the vIRR before
+		 * re-entering the guest.
+		 *
+		 * When the target is not the running vCPU, the following
+		 * possibilities emerge:
+		 *
+		 * Case 1: vCPU stays in non-root mode. Sending a notification
+		 * event posts the interrupt to the vCPU.
+		 *
+		 * Case 2: vCPU exits to root mode and is still runnable. The
+		 * PIR will be synced to the vIRR before re-entering the guest.
+		 * Sending a notification event is ok as the host IRQ handler
+		 * will ignore the spurious event.
+		 *
+		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
+		 * has already synced PIR to vIRR and never blocks the vCPU if
+		 * the vIRR is not empty. Therefore, a blocked vCPU here does
+		 * not wait for any requested interrupts in PIR, and sending a
+		 * notification event also results in a benign, spurious event.
+		 */
+
+		if (vcpu != kvm_get_running_vcpu())
+			__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		return;
+	}
+#endif
+	/*
+	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+	 * otherwise do nothing as KVM will grab the highest priority pending
+	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+	 */
+	kvm_vcpu_wake_up(vcpu);
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+						  struct pi_desc *pi_desc, int vector)
+{
+	if (pi_test_and_set_pir(vector, pi_desc))
+		return;
+
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(pi_desc))
+		return;
+
+	/*
+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
+	 * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
+	 * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
+	 */
+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5475abb11533..77108ab24c66 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4186,50 +4186,6 @@ void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 		pt_update_intercept_for_msr(vcpu);
 }
 
-static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
-						     int pi_vec)
-{
-#ifdef CONFIG_SMP
-	if (vcpu->mode == IN_GUEST_MODE) {
-		/*
-		 * The vector of the virtual has already been set in the PIR.
-		 * Send a notification event to deliver the virtual interrupt
-		 * unless the vCPU is the currently running vCPU, i.e. the
-		 * event is being sent from a fastpath VM-Exit handler, in
-		 * which case the PIR will be synced to the vIRR before
-		 * re-entering the guest.
-		 *
-		 * When the target is not the running vCPU, the following
-		 * possibilities emerge:
-		 *
-		 * Case 1: vCPU stays in non-root mode. Sending a notification
-		 * event posts the interrupt to the vCPU.
-		 *
-		 * Case 2: vCPU exits to root mode and is still runnable. The
-		 * PIR will be synced to the vIRR before re-entering the guest.
-		 * Sending a notification event is ok as the host IRQ handler
-		 * will ignore the spurious event.
-		 *
-		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
-		 * has already synced PIR to vIRR and never blocks the vCPU if
-		 * the vIRR is not empty. Therefore, a blocked vCPU here does
-		 * not wait for any requested interrupts in PIR, and sending a
-		 * notification event also results in a benign, spurious event.
-		 */
-
-		if (vcpu != kvm_get_running_vcpu())
-			__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
-		return;
-	}
-#endif
-	/*
-	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
-	 * otherwise do nothing as KVM will grab the highest priority pending
-	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
-	 */
-	kvm_vcpu_wake_up(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -4289,20 +4245,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 	if (!vcpu->arch.apic->apicv_active)
 		return -1;
 
-	if (pi_test_and_set_pir(vector, &vt->pi_desc))
-		return 0;
-
-	/* If a previous notification has sent the IPI, nothing to do.  */
-	if (pi_test_and_set_on(&vt->pi_desc))
-		return 0;
-
-	/*
-	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
-	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
-	 * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
-	 * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
-	 */
-	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+	__vmx_deliver_posted_interrupt(vcpu, &vt->pi_desc, vector);
 	return 0;
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (2 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-13  7:15   ` Chao Gao
  2025-02-11  2:58 ` [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected Binbin Wu
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement non-NMI interrupt injection for TDX via posted interrupt.

As CPU state is protected and APICv is enabled for the TDX guest, TDX
supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM can
update them directly.  If target vCPU is in non-root mode, send posted
interrupt notification to the vCPU and hardware will sync PIR to vIRR
atomically.  Otherwise, kick it to pick up the interrupt from PID. To
post pending interrupts in the PID, KVM can generate a self-IPI with
notification vector prior to TD entry.

Since the guest status of TD vCPU is protected, assume interrupt is
always allowed.  Ignore the code path for event injection mechanism or
LAPIC emulation for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
TDX interrupts v2:
- Rebased due to moving pi_desc to vcpu_vt.

TDX interrupts v1:
- Renamed from "KVM: TDX: Implement interrupt injection"
  to "KVM: TDX: Implement non-NMI interrupt injection"
- Rewrite changelog.
- Add a blank line. (Binbin)
- Split posted interrupt delivery code movement to a separate patch.
- Split kvm_wait_lapic_expire() out to a separate patch. (Chao)
- Use __pi_set_sn() to resolve upstream conflicts.
- Use kvm_x86_call()
---
 arch/x86/kvm/vmx/main.c        | 94 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/vmx/posted_intr.h |  2 +
 arch/x86/kvm/vmx/tdx.c         | 23 ++++++++-
 arch/x86/kvm/vmx/vmx.c         |  8 ---
 arch/x86/kvm/vmx/x86_ops.h     |  6 +++
 6 files changed, 116 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 2b1ea57a3a4e..3d590b580e2f 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -180,6 +180,34 @@ static int vt_handle_exit(struct kvm_vcpu *vcpu,
 	return vmx_handle_exit(vcpu, fastpath);
 }
 
+static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
+
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return -1;
+
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	if (is_td_vcpu(apic->vcpu)) {
+		tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
+					     vector);
+		return;
+	}
+
+	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -227,6 +255,54 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
 }
 
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_irq(vcpu, reinjected);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_irq_window(vcpu);
+}
+
 static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code)
 {
 	*intr_info = 0;
@@ -347,19 +423,19 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.handle_exit = vt_handle_exit,
 	.skip_emulated_instruction = vmx_skip_emulated_instruction,
 	.update_emulated_instruction = vmx_update_emulated_instruction,
-	.set_interrupt_shadow = vmx_set_interrupt_shadow,
-	.get_interrupt_shadow = vmx_get_interrupt_shadow,
+	.set_interrupt_shadow = vt_set_interrupt_shadow,
+	.get_interrupt_shadow = vt_get_interrupt_shadow,
 	.patch_hypercall = vmx_patch_hypercall,
-	.inject_irq = vmx_inject_irq,
+	.inject_irq = vt_inject_irq,
 	.inject_nmi = vmx_inject_nmi,
 	.inject_exception = vmx_inject_exception,
-	.cancel_injection = vmx_cancel_injection,
-	.interrupt_allowed = vmx_interrupt_allowed,
+	.cancel_injection = vt_cancel_injection,
+	.interrupt_allowed = vt_interrupt_allowed,
 	.nmi_allowed = vmx_nmi_allowed,
 	.get_nmi_mask = vmx_get_nmi_mask,
 	.set_nmi_mask = vmx_set_nmi_mask,
 	.enable_nmi_window = vmx_enable_nmi_window,
-	.enable_irq_window = vmx_enable_irq_window,
+	.enable_irq_window = vt_enable_irq_window,
 	.update_cr8_intercept = vmx_update_cr8_intercept,
 
 	.x2apic_icr_is_split = false,
@@ -367,11 +443,11 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
-	.apicv_pre_state_restore = vmx_apicv_pre_state_restore,
+	.apicv_pre_state_restore = vt_apicv_pre_state_restore,
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
-	.sync_pir_to_irr = vmx_sync_pir_to_irr,
-	.deliver_interrupt = vmx_deliver_interrupt,
+	.sync_pir_to_irr = vt_sync_pir_to_irr,
+	.deliver_interrupt = vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
 	.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt,
 
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 25f8a19e2831..895bbe85b818 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -32,7 +32,7 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
  */
 static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
 
-static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
 	return &(to_vt(vcpu)->pi_desc);
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index ad9116a99bcc..68605ca7ef68 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -5,6 +5,8 @@
 #include <linux/bitmap.h>
 #include <asm/posted_intr.h>
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu);
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 825f13371134..d289040172bc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -685,6 +685,10 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE)
 		vcpu->arch.xfd_no_write_intercept = true;
 
+
+	tdx->vt.pi_desc.nv = POSTED_INTR_VECTOR;
+	__pi_set_sn(&tdx->vt.pi_desc);
+
 	tdx->state = VCPU_TD_STATE_UNINITIALIZED;
 
 	return 0;
@@ -694,6 +698,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 
+	vmx_vcpu_pi_load(vcpu, cpu);
 	if (vcpu->cpu == cpu)
 		return;
 
@@ -950,6 +955,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 
 	trace_kvm_entry(vcpu, force_immediate_exit);
 
+	if (pi_test_on(&vt->pi_desc))
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
 	tdx_vcpu_enter_exit(vcpu);
 
 	if (vt->host_debugctlmsr & ~TDX_DEBUGCTL_PRESERVED)
@@ -1607,6 +1615,16 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
 	return tdx_sept_drop_private_spte(kvm, gfn, level, pfn_to_page(pfn));
 }
 
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	struct kvm_vcpu *vcpu = apic->vcpu;
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	/* TDX supports only posted interrupt.  No lapic emulation. */
+	__vmx_deliver_posted_interrupt(vcpu, &tdx->vt.pi_desc, vector);
+}
+
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -2578,8 +2596,11 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
 	/* TODO: freeze vCPU model before kvm_update_cpuid_runtime() */
 	kvm_update_cpuid_runtime(vcpu);
 
-	tdx->state = VCPU_TD_STATE_INITIALIZED;
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->vt.pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
 
+	tdx->state = VCPU_TD_STATE_INITIALIZED;
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 77108ab24c66..cb6043e29ef9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6902,14 +6902,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
 
-void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vt *vt = to_vt(vcpu);
-
-	pi_clear_on(&vt->pi_desc);
-	memset(vt->pi_desc.pir, 0, sizeof(vt->pi_desc.pir));
-}
-
 void vmx_do_interrupt_irqoff(unsigned long entry);
 void vmx_do_nmi_irqoff(void);
 
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 8086e5c58cd6..d521ad276d51 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -138,6 +138,9 @@ void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int tdx_handle_exit(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion fastpath);
+
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
 
@@ -177,6 +180,9 @@ static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { return false; }
 static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion fastpath) { return 0; }
+
+static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+					 int trig_mode, int vector) {}
 static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
 				     u64 *info2, u32 *intr_info, u32 *error_code) {}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (3 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-13  7:26   ` Chao Gao
  2025-02-11  2:58 ` [PATCH v2 06/17] KVM: TDX: Wait lapic expire when timer IRQ was injected Binbin Wu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Sean Christopherson <seanjc@google.com>

If APIC state is protected, i.e. the vCPU is a TDX guest, assume a timer
IRQ was injected when deciding whether or not to busy wait in the "timer
advanced" path.  The "real" vIRR is not readable/writable, so trying to
query for a pending timer IRQ will return garbage.

Note, TDX can scour the PIR if it wants to be more precise and skip the
"wait" call entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- No change.

TDX interrupts v1:
- Renamed from "KVM: x86: Assume timer IRQ was injected if APIC state is proteced"
  to "KVM: x86: Assume timer IRQ was injected if APIC state is protected", i.e.,
  fix the typo 'proteced'.
---
 arch/x86/kvm/lapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bbdede07d063..bab5c42f63b7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1797,8 +1797,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 reg = kvm_lapic_get_reg(apic, APIC_LVTT);
+	u32 reg;
 
+	/*
+	 * Assume a timer IRQ was "injected" if the APIC is protected.  KVM's
+	 * copy of the vIRR is bogus, it's the responsibility of the caller to
+	 * precisely check whether or not a timer IRQ is pending.
+	 */
+	if (apic->guest_apic_protected)
+		return true;
+
+	reg  = kvm_lapic_get_reg(apic, APIC_LVTT);
 	if (kvm_apic_hw_enabled(apic)) {
 		int vec = reg & APIC_VECTOR_MASK;
 		void *bitmap = apic->regs + APIC_ISR;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/17] KVM: TDX: Wait lapic expire when timer IRQ was injected
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (4 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 07/17] KVM: TDX: Implement methods to inject NMI Binbin Wu
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Call kvm_wait_lapic_expire() when POSTED_INTR_ON is set and the vector
for LVTT is set in PIR before TD entry.

KVM always assumes a timer IRQ was injected if APIC state is protected.
For TDX guest, APIC state is protected and KVM injects timer IRQ via posted
interrupt.  To avoid unnecessary wait calls, only call
kvm_wait_lapic_expire() when a timer IRQ was injected, i.e., POSTED_INTR_ON
is set and the vector for LVTT is set in PIR.

Add a helper to test PIR.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- Rebased due to moving pi_desc to vcpu_vt.

TDX interrupts v1:
- Split out from patch "KVM: TDX: Implement interrupt injection". (Chao)
- Check PIR against LVTT vector.
---
 arch/x86/include/asm/posted_intr.h | 5 +++++
 arch/x86/kvm/vmx/tdx.c             | 7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/posted_intr.h b/arch/x86/include/asm/posted_intr.h
index de788b400fba..bb107ebbe713 100644
--- a/arch/x86/include/asm/posted_intr.h
+++ b/arch/x86/include/asm/posted_intr.h
@@ -81,6 +81,11 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
 	return test_bit(POSTED_INTR_SN, (unsigned long *)&pi_desc->control);
 }
 
+static inline bool pi_test_pir(int vector, struct pi_desc *pi_desc)
+{
+	return test_bit(vector, (unsigned long *)pi_desc->pir);
+}
+
 /* Non-atomic helpers */
 static inline void __pi_set_sn(struct pi_desc *pi_desc)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d289040172bc..4b8e28bde021 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -955,9 +955,14 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 
 	trace_kvm_entry(vcpu, force_immediate_exit);
 
-	if (pi_test_on(&vt->pi_desc))
+	if (pi_test_on(&vt->pi_desc)) {
 		apic->send_IPI_self(POSTED_INTR_VECTOR);
 
+		if (pi_test_pir(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTT) &
+			       APIC_VECTOR_MASK, &vt->pi_desc))
+			kvm_wait_lapic_expire(vcpu);
+	}
+
 	tdx_vcpu_enter_exit(vcpu);
 
 	if (vt->host_debugctlmsr & ~TDX_DEBUGCTL_PRESERVED)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/17] KVM: TDX: Implement methods to inject NMI
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (5 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 06/17] KVM: TDX: Wait lapic expire when timer IRQ was injected Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit Binbin Wu
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Inject NMI to TDX guest by setting the PEND_NMI TDVPS field to 1, i.e. make
the NMI pending in the TDX module.  If there is a further pending NMI in
KVM, collapse it to the one pending in the TDX module.

VMM can request the TDX module to inject a NMI into a TDX vCPU by setting
the PEND_NMI TDVPS field to 1.  Following that, VMM can call TDH.VP.ENTER
to run the vCPU and the TDX module will attempt to inject the NMI as soon
as possible.

KVM has the following 3 cases to inject two NMIs when handling simultaneous
NMIs and they need to be injected in a back-to-back way.  Otherwise, OS
kernel may fire a warning about the unknown NMI [1]:
K1. One NMI is being handled in the guest and one NMI pending in KVM.
    KVM requests NMI window exit to inject the pending NMI.
K2. Two NMIs are pending in KVM.
    KVM injects the first NMI and requests NMI window exit to inject the
    second NMI.
K3. A previous NMI needs to be rejected and one NMI pending in KVM.
    KVM first requests force immediate exit followed by a VM entry to
    complete the NMI rejection.  Then, during the force immediate exit, KVM
    requests NMI window exit to inject the pending NMI.

For TDX, PEND_NMI TDVPS field is a 1-bit field, i.e. KVM can only pend one
NMI in the TDX module.  Also, the vCPU state is protected, KVM doesn't know
the NMI blocking states of TDX vCPU, KVM has to assume NMI is always
unmasked and allowed.  When KVM sees PEND_NMI is 1 after a TD exit, it
means the previous NMI needs to be re-injected.

Based on KVM's NMI handling flow, there are following 6 cases:
    In NMI handler    TDX module    KVM
T1. No                PEND_NMI=0    1 pending NMI
T2. No                PEND_NMI=0    2 pending NMIs
T3. No                PEND_NMI=1    1 pending NMI
T4. Yes               PEND_NMI=0    1 pending NMI
T5. Yes               PEND_NMI=0    2 pending NMIs
T6. Yes               PEND_NMI=1    1 pending NMI
K1 is mapped to T4.
K2 is mapped to T2 or T5.
K3 is mapped to T3 or T6.
Note: KVM doesn't know whether NMI is blocked by a NMI or not, case T5 and
T6 can happen.

When handling pending NMI in KVM for TDX guest, what KVM can do is to add a
pending NMI in TDX module when PEND_NMI is 0.  T1 and T4 can be handled by
this way.  However, TDX doesn't allow KVM to request NMI window exit
directly, if PEND_NMI is already set and there is still pending NMI in KVM,
the only way KVM could try is to request a force immediate exit.  But for
case T5 and T6, force immediate exit will result in infinite loop because
force immediate exit makes it no progress in the NMI handler, so that the
pending NMI in the TDX module can never be injected.

Considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI.  It's OS's responsibility to poll all NMI
sources in the NMI handler to avoid missing handling of some NMI events.

Based on that, for the above 3 cases (K1-K3), only case K1 must inject the
second NMI because the guest NMI handler may have already polled some of
the NMI sources, which could include the source of the pending NMI, the
pending NMI must be injected to avoid the lost of NMI.  For case K2 and K3,
the guest OS will poll all NMI sources (including the sources caused by the
second NMI and further NMI collapsed) when the delivery of the first NMI,
KVM doesn't have the necessity to inject the second NMI.

To handle the NMI injection properly for TDX, there are two options:
- Option 1: Modify the KVM's NMI handling common code, to collapse the
  second pending NMI for K2 and K3.
- Option 2: Do it in TDX specific way. When the previous NMI is still
  pending in the TDX module, i.e. it has not been delivered to TDX guest
  yet, collapse the pending NMI in KVM into the previous one.

This patch goes with option 2 because it is simple and doesn't impact other
VM types.  Option 1 may need more discussions.

This is the first need to access vCPU scope metadata in the "management"
class. Make needed accessors available.

[1] https://lore.kernel.org/all/1317409584-23662-5-git-send-email-dzickus@redhat.com/

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
TDX interrupts v2:
- Fix a typo "filed" -> "field" in changelog.

TDX interrupts v1:
- Collapse the pending NMI in KVM if there is already one pending in the
  TDX module.
---
 arch/x86/kvm/vmx/main.c    | 61 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     | 16 ++++++++++
 arch/x86/kvm/vmx/tdx.h     |  5 ++++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 4 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 3d590b580e2f..0d9b17d55bcc 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -244,6 +244,57 @@ static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	vmx_flush_tlb_guest(vcpu);
 }
 
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_inject_nmi(vcpu);
+		return;
+	}
+
+	vmx_inject_nmi(vcpu);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/*
+	 * The TDX module manages NMI windows and NMI reinjection, and hides NMI
+	 * blocking, all KVM can do is throw an NMI over the wall.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * KVM can't get NMI blocking status for TDX guest, assume NMIs are
+	 * always unmasked.
+	 */
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	/* Refer to the comments in tdx_inject_nmi(). */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_nmi_window(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			    int pgd_level)
 {
@@ -427,14 +478,14 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.get_interrupt_shadow = vt_get_interrupt_shadow,
 	.patch_hypercall = vmx_patch_hypercall,
 	.inject_irq = vt_inject_irq,
-	.inject_nmi = vmx_inject_nmi,
+	.inject_nmi = vt_inject_nmi,
 	.inject_exception = vmx_inject_exception,
 	.cancel_injection = vt_cancel_injection,
 	.interrupt_allowed = vt_interrupt_allowed,
-	.nmi_allowed = vmx_nmi_allowed,
-	.get_nmi_mask = vmx_get_nmi_mask,
-	.set_nmi_mask = vmx_set_nmi_mask,
-	.enable_nmi_window = vmx_enable_nmi_window,
+	.nmi_allowed = vt_nmi_allowed,
+	.get_nmi_mask = vt_get_nmi_mask,
+	.set_nmi_mask = vt_set_nmi_mask,
+	.enable_nmi_window = vt_enable_nmi_window,
 	.enable_irq_window = vt_enable_irq_window,
 	.update_cr8_intercept = vmx_update_cr8_intercept,
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4b8e28bde021..ba9038ac5bf7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -988,6 +988,22 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 	return tdx_exit_handlers_fastpath(vcpu);
 }
 
+void tdx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.nmi_injections;
+	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
+	/*
+	 * TDX doesn't support KVM to request NMI window exit.  If there is
+	 * still a pending vNMI, KVM is not able to inject it along with the
+	 * one pending in TDX module in a back-to-back way.  Since the previous
+	 * vNMI is still pending in TDX module, i.e. it has not been delivered
+	 * to TDX guest yet, it's OK to collapse the pending vNMI into the
+	 * previous one.  The guest is expected to handle all the NMI sources
+	 * when handling the first vNMI.
+	 */
+	vcpu->arch.nmi_pending = 0;
+}
+
 static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
 {
 	tdvmcall_set_return_code(vcpu, vcpu->run->hypercall.ret);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 45c1d064b6b7..ba187cbf4a81 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -112,6 +112,8 @@ static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
 			 "Invalid TD VMCS access for 16-bit field");
 }
 
+static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
+
 #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)				\
 static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx,	\
 							u32 field)		\
@@ -161,6 +163,9 @@ static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx,	\
 TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
+
+TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
+
 #else
 static inline int tdx_bringup(void) { return 0; }
 static inline void tdx_cleanup(void) {}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d521ad276d51..91988a715d75 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -141,6 +141,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu,
 
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
+void tdx_inject_nmi(struct kvm_vcpu *vcpu);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
 
@@ -183,6 +184,7 @@ static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
 
 static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 					 int trig_mode, int vector) {}
+static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
 static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
 				     u64 *info2, u32 *intr_info, u32 *error_code) {}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (6 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 07/17] KVM: TDX: Implement methods to inject NMI Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-13  8:20   ` Chao Gao
  2025-02-11  2:58 ` [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM Binbin Wu
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Complete NMI injection by updating the status of NMI injection for TDX.

Because TDX virtualizes vAPIC, and non-NMI interrupts are delivered
using posted-interrupt mechanism, KVM only needs to care about NMI
injection.

For VMX, KVM injects an NMI by setting VM_ENTRY_INTR_INFO_FIELD via
vector-event injection mechanism.  For TDX, KVM needs to request TDX
module to inject an NMI into a guest TD vCPU when the vCPU is not
active by setting PEND_NMI field within the TDX vCPU scope metadata
(Trust Domain Virtual Processor State (TDVPS)).  TDX module will attempt
to inject an NMI as soon as possible on TD entry.  KVM can read PEND_NMI
to get the status of NMI injection.  A value of 0 indicates the NMI has
been injected into the guest TD vCPU.

Update KVM's NMI status on TD exit by checking whether a requested NMI
has been injected into the TD.  Reading the PEND_NMI field via SEAMCALL
is expensive so only perform the check if an NMI was requested to inject.
If the read back value is 0, the NMI has been injected, update the NMI
status.  If the read back value is 1, no action needed since the PEND_NMI
is still set.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
TDX interrupts v2:
- No change.

TDX interrupts v1:
- Shortlog "tdexit" -> "TD exit" (Reinette)
- Update change log as following suggested by Reinette with a little
  supplement.
  https://lore.kernel.org/lkml/fe9cec78-36ee-4a20-81df-ec837a45f69f@linux.intel.com/
- Fix comment, "nmi" -> "NMI" and add a missing period. (Reinette)
- Add a comment to explain why no need to request KVM_REQ_EVENT.

v19:
- move tdvps_management_check() to this patch
- typo: complete -> Complete in short log
---
 arch/x86/kvm/vmx/tdx.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ba9038ac5bf7..9737574b8049 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -803,6 +803,21 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
+{
+	/* Avoid costly SEAMCALL if no NMI was injected. */
+	if (vcpu->arch.nmi_injected) {
+		/*
+		 * No need to request KVM_REQ_EVENT because PEND_NMI is still
+		 * set if NMI re-injection needed.  No other event types need
+		 * to be handled because TDX doesn't support injection of
+		 * exception, SMI or interrupt (via event injection).
+		 */
+		vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
+							      TD_VCPU_PEND_NMI);
+	}
+}
+
 struct tdx_uret_msr {
 	u32 msr;
 	unsigned int slot;
@@ -985,6 +1000,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 	if (unlikely(tdx_failed_vmentry(vcpu)))
 		return EXIT_FASTPATH_NONE;
 
+	tdx_complete_interrupts(vcpu);
+
 	return tdx_exit_handlers_fastpath(vcpu);
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (7 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-12  1:47   ` Sean Christopherson
  2025-02-11  2:58 ` [PATCH v2 10/17] KVM: TDX: Always block INIT/SIPI Binbin Wu
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Handle SMI request as what KVM does for CONFIG_KVM_SMM=n, i.e. return
-ENOTTY, and add KVM_BUG_ON() to SMI related OPs for TD.

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs.  Because guest state (vCPU state, memory
state) is protected, it must go through the TDX module APIs to change
guest state.  However, the TDX module doesn't provide a way for VMM to
inject SMI into guest TD or a way for VMM to switch guest vCPU mode into
SMM.

MSR_IA32_SMBASE will not be emulated for TDX guest, -ENOTTY will be
returned when SMI is requested.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- No change.

TDX interrupts v1:
- Renamed from "KVM: TDX: Silently discard SMI request" to
  "KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM".
- Change the changelog.
- Handle SMI request as !CONFIG_KVM_SMM for TD, and remove the
  unnecessary comment. (Sean)
- Bug the VM if SMI OPs are called for a TD and remove related
  tdx_* functions, but still keep the vt_* wrappers. (Sean, Paolo)
- Use kvm_x86_call()
---
 arch/x86/kvm/smm.h      |  3 +++
 arch/x86/kvm/vmx/main.c | 43 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index a1cf2ac5bd78..551703fbe200 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -142,6 +142,9 @@ union kvm_smram {
 
 static inline int kvm_inject_smi(struct kvm_vcpu *vcpu)
 {
+	if (!kvm_x86_call(has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
+		return -ENOTTY;
+
 	kvm_make_request(KVM_REQ_SMI, vcpu);
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 0d9b17d55bcc..8d91bd8eb991 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -180,6 +180,41 @@ static int vt_handle_exit(struct kvm_vcpu *vcpu,
 	return vmx_handle_exit(vcpu, fastpath);
 }
 
+#ifdef CONFIG_KVM_SMM
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return false;
+
+	return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_enter_smm(vcpu, smram);
+}
+
+static int vt_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_leave_smm(vcpu, smram);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	/* RSM will cause a vmexit anyway.  */
+	vmx_enable_smi_window(vcpu);
+}
+#endif
+
 static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
@@ -539,10 +574,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.setup_mce = vmx_setup_mce,
 
 #ifdef CONFIG_KVM_SMM
-	.smi_allowed = vmx_smi_allowed,
-	.enter_smm = vmx_enter_smm,
-	.leave_smm = vmx_leave_smm,
-	.enable_smi_window = vmx_enable_smi_window,
+	.smi_allowed = vt_smi_allowed,
+	.enter_smm = vt_enter_smm,
+	.leave_smm = vt_leave_smm,
+	.enable_smi_window = vt_enable_smi_window,
 #endif
 
 	.check_emulate_instruction = vmx_check_emulate_instruction,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/17] KVM: TDX: Always block INIT/SIPI
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (8 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 11/17] KVM: TDX: Enforce KVM_IRQCHIP_SPLIT for TDX guests Binbin Wu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Always block INIT and SIPI events for the TDX guest because the TDX module
doesn't provide API for VMM to inject INIT IPI or SIPI.

TDX defines its own vCPU creation and initialization sequence including
multiple seamcalls.  Also, it's only allowed during TD build time.

Given that TDX guest is para-virtualized to boot BSP/APs, normally there
shouldn't be any INIT/SIPI event for TDX guest.  If any, three options to
handle them:
1. Always block INIT/SIPI request.
2. (Silently) ignore INIT/SIPI request during delivery.
3. Return error to guest TDs somehow.

Choose option 1 for simplicity. Since INIT and SIPI are always blocked,
INIT handling and the OP vcpu_deliver_sipi_vector() won't be called, no
need to add new interface or helper function for INIT/SIPI delivery.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- WARN on init event. (Sean)
- Improve comments about vcpu reset for TDX. (Xiaoyao, Sean)

TDX interrupts v1:
- Renamed from "KVM: TDX: Silently ignore INIT/SIPI" to
  "KVM: TDX: Always block INIT/SIPI".
- Remove KVM_BUG_ON() in tdx_vcpu_reset(). (Rick)
- Drop tdx_vcpu_reset() and move the comment to vt_vcpu_reset().
- Remove unnecessary interface and helpers to delivery INIT/SIPI
  because INIT/SIPI events are always blocked for TDX. (Binbin)
- Update changelog.
---
 arch/x86/kvm/vmx/main.c    | 18 ++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 13 +++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 8d91bd8eb991..1ff4903a1853 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -119,8 +119,10 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
 
 static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
-	if (is_td_vcpu(vcpu))
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_reset(vcpu, init_event);
 		return;
+	}
 
 	vmx_vcpu_reset(vcpu, init_event);
 }
@@ -215,6 +217,18 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 }
 #endif
 
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * INIT and SIPI are always blocked for TDX, i.e., INIT handling and
+	 * the OP vcpu_deliver_sipi_vector() won't be called.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_apic_init_signal_blocked(vcpu);
+}
+
 static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
@@ -581,7 +595,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 #endif
 
 	.check_emulate_instruction = vmx_check_emulate_instruction,
-	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
+	.apic_init_signal_blocked = vt_apic_init_signal_blocked,
 	.migrate_timers = vmx_migrate_timers,
 
 	.msr_filter_changed = vmx_msr_filter_changed,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9737574b8049..bd349e3d4089 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2642,6 +2642,19 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
 	return 0;
 }
 
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	/*
+	 * Yell on INIT, as TDX doesn't support INIT, i.e. KVM should drop all
+	 * INIT events.
+	 *
+	 * Defer initializing vCPU for RESET state until KVM_TDX_INIT_VCPU, as
+	 * userspace needs to define the vCPU model before KVM can initialize
+	 * vCPU state, e.g. to enable x2APIC.
+	 */
+	WARN_ON_ONCE(init_event);
+}
+
 struct tdx_gmem_post_populate_arg {
 	struct kvm_vcpu *vcpu;
 	__u32 flags;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 91988a715d75..eb6a841f4842 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -129,6 +129,7 @@ void tdx_vm_free(struct kvm *kvm);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
@@ -169,6 +170,7 @@ static inline void tdx_vm_free(struct kvm *kvm) {}
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
 
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 11/17] KVM: TDX: Enforce KVM_IRQCHIP_SPLIT for TDX guests
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (9 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 10/17] KVM: TDX: Always block INIT/SIPI Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 12/17] KVM: TDX: Force APICv active for TDX guest Binbin Wu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

Enforce KVM_IRQCHIP_SPLIT for TDX guests to disallow in-kernel I/O APIC
while in-kernel local APIC is needed.

APICv is always enabled by TDX module and TDX Module doesn't allow the
hypervisor to modify the EOI-bitmap, i.e. all EOIs are accelerated and
never trigger exits.  Level-triggered interrupts and other things depending
on EOI VM-Exit can't be faithfully emulated in KVM.  Also, the lazy check
of pending APIC EOI for RTC edge-triggered interrupts, which was introduced
as a workaround when EOI cannot be intercepted, doesn't work for TDX either
because kvm_apic_pending_eoi() checks vIRR and vISR, but both values are
invisible in KVM.

If the guest induces generation of a level-triggered interrupt, the VMM is
left with the choice of dropping the interrupt, sending it as-is, or
converting it to an edge-triggered interrupt.  Ditto for KVM.  All of those
options will make the guest unhappy. There's no architectural behavior KVM
can provide that's better than sending the interrupt and hoping for the
best.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- New added.
---
 arch/x86/kvm/vmx/tdx.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index bd349e3d4089..4b3251680d43 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -13,6 +13,7 @@
 #include "mmu/spte.h"
 #include "common.h"
 #include "posted_intr.h"
+#include "irq.h"
 #include <trace/events/kvm.h>
 #include "trace.h"
 
@@ -663,8 +664,12 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	if (kvm_tdx->state != TD_STATE_INITIALIZED)
 		return -EIO;
 
-	/* TDX module mandates APICv, which requires an in-kernel local APIC. */
-	if (!lapic_in_kernel(vcpu))
+	/*
+	 * TDX module mandates APICv, which requires an in-kernel local APIC.
+	 * Disallow an in-kernel I/O APIC, because level-triggered interrupts
+	 * and thus the I/O APIC as a whole can't be faithfully emulated in KVM.
+	 */
+	if (!irqchip_split(vcpu->kvm))
 		return -EINVAL;
 
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 12/17] KVM: TDX: Force APICv active for TDX guest
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (10 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 11/17] KVM: TDX: Enforce KVM_IRQCHIP_SPLIT for TDX guests Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 13/17] KVM: TDX: Add methods to ignore virtual apic related operation Binbin Wu
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Force APICv active for TDX guests in KVM because APICv is always enabled
by TDX module.

From the view of KVM, whether APICv state is active or not is decided by:
1. APIC is hw enabled
2. VM and vCPU have no inhibit reasons set.

After TDX vCPU init, APIC is set to x2APIC mode. KVM_SET_{SREGS,SREGS2} are
rejected due to has_protected_state for TDs and guest_state_protected
for TDX vCPUs are set.  Reject KVM_{GET,SET}_LAPIC from userspace since
migration is not supported yet, so that userspace cannot disable APIC.

For various APICv inhibit reasons:
- APICV_INHIBIT_REASON_DISABLED is impossible after checking enable_apicv
  in tdx_bringup(). If !enable_apicv, TDX support will be disabled.
- APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED is impossible since x2APIC is
  mandatory, KVM emulates APIC_ID as read-only for x2APIC mode. (Note:
  APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED could be set if the memory
  allocation fails for KVM apic_map.)
- APICV_INHIBIT_REASON_HYPERV is impossible since TDX doesn't support
  HyperV guest yet.
- APICV_INHIBIT_REASON_ABSENT is impossible since in-kernel LAPIC is
  checked in tdx_vcpu_create().
- APICV_INHIBIT_REASON_BLOCKIRQ is impossible since TDX doesn't support
  KVM_SET_GUEST_DEBUG.
- APICV_INHIBIT_REASON_APIC_ID_MODIFIED is impossible since x2APIC is
  mandatory.
- APICV_INHIBIT_REASON_APIC_BASE_MODIFIED is impossible since KVM rejects
  userspace to set APIC base.
- The rest inhibit reasons are relevant only to AMD's AVIC, including
  APICV_INHIBIT_REASON_NESTED, APICV_INHIBIT_REASON_IRQWIN,
  APICV_INHIBIT_REASON_PIT_REINJ, APICV_INHIBIT_REASON_SEV, and
  APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED.
  (For APICV_INHIBIT_REASON_PIT_REINJ, similar to AVIC, KVM can't intercept
   EOI for TDX guests neither, but KVM enforces KVM_IRQCHIP_SPLIT for TDX
   guests, which eliminates the in-kernel PIT.)

Implement vt_refresh_apicv_exec_ctrl() to call KVM_BUG_ON() if APICv is
disabled for TDX guests.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- Renamed from "KVM: TDX: Inhibit APICv for TDX guest"
- Check enable_apicv in tdx_bringup().
- Changed APICv active state from always false to true for TDX guests. (Sean)
- Reject KVM_{GET,SET}_LAPIC from userspace.
- Implement vt_refresh_apicv_exec_ctrl() to bug the VM if APICv is
  disabled.

TDX interrupts v1:
- Removed WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm)) in
  tdx_td_vcpu_init(). (Rick)
- Change APICV -> APICv in changelog for consistency.
- Split the changelog to 2 paragraphs.
---
 arch/x86/kvm/vmx/main.c | 12 +++++++++++-
 arch/x86/kvm/vmx/tdx.c  |  5 +++++
 arch/x86/kvm/x86.c      |  6 ++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 1ff4903a1853..7fa579c90991 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -426,6 +426,16 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 	vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
 }
 
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		KVM_BUG_ON(!kvm_vcpu_apicv_active(vcpu), vcpu->kvm);
+		return;
+	}
+
+	vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -541,7 +551,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.x2apic_icr_is_split = false,
 	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
+	.refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
 	.apicv_pre_state_restore = vt_apicv_pre_state_restore,
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4b3251680d43..4a29b3998cde 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3063,6 +3063,11 @@ int __init tdx_bringup(void)
 		goto success_disable_tdx;
 	}
 
+	if (!enable_apicv) {
+		pr_err("APICv is required for TDX\n");
+		goto success_disable_tdx;
+	}
+
 	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
 		pr_err("MOVDIR64B is reqiured for TDX\n");
 		goto success_disable_tdx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a41d57ba4a86..1e2ab3598846 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5105,6 +5105,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
+	if (vcpu->arch.apic->guest_apic_protected)
+		return -EINVAL;
+
 	kvm_x86_call(sync_pir_to_irr)(vcpu);
 
 	return kvm_apic_get_state(vcpu, s);
@@ -5115,6 +5118,9 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
 {
 	int r;
 
+	if (vcpu->arch.apic->guest_apic_protected)
+		return -EINVAL;
+
 	r = kvm_apic_set_state(vcpu, s);
 	if (r)
 		return r;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 13/17] KVM: TDX: Add methods to ignore virtual apic related operation
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (11 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 12/17] KVM: TDX: Force APICv active for TDX guest Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 14/17] KVM: VMX: Move emulation_required to struct vcpu_vt Binbin Wu
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX protects TDX guest APIC state from VMM.  Implement access methods of
TDX guest vAPIC state to ignore them or return zero.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- Rebased due to "Force APICv active for TDX guest", i.e.,
  vt_refresh_apicv_exec_ctrl() is moved to the patch
  "KVM: TDX: Force APICv active for TDX guest".
- Drop vt_hwapic_irr_update() since .hwapic_irr_update() is gone in 6.14.

TDX interrupts v1:
- Removed WARN_ON_ONCE() in tdx_set_virtual_apic_mode(). (Rick)
- Open code tdx_set_virtual_apic_mode(). (Binbin)
---
 arch/x86/kvm/vmx/main.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7fa579c90991..9c173645928c 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -229,6 +229,15 @@ static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 	return vmx_apic_init_signal_blocked(vcpu);
 }
 
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	/* Only x2APIC mode is supported for TD. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_set_virtual_apic_mode(vcpu);
+}
+
 static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
@@ -237,6 +246,14 @@ static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
 	memset(pi->pir, 0, sizeof(pi->pir));
 }
 
+static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_hwapic_isr_update(vcpu, max_isr);
+}
+
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -426,6 +443,14 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 	vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
 }
 
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_apic_access_page_addr(vcpu);
+}
+
 static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -549,13 +574,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.update_cr8_intercept = vmx_update_cr8_intercept,
 
 	.x2apic_icr_is_split = false,
-	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
+	.set_virtual_apic_mode = vt_set_virtual_apic_mode,
+	.set_apic_access_page_addr = vt_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
 	.apicv_pre_state_restore = vt_apicv_pre_state_restore,
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
-	.hwapic_isr_update = vmx_hwapic_isr_update,
+	.hwapic_isr_update = vt_hwapic_isr_update,
 	.sync_pir_to_irr = vt_sync_pir_to_irr,
 	.deliver_interrupt = vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 14/17] KVM: VMX: Move emulation_required to struct vcpu_vt
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (12 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 13/17] KVM: TDX: Add methods to ignore virtual apic related operation Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling Binbin Wu
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

Move emulation_required from struct vcpu_vmx to struct vcpu_vt so that
vmx_handle_exit_irqoff() can be reused by TDX code.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- New added.
---
 arch/x86/kvm/vmx/common.h |  1 +
 arch/x86/kvm/vmx/nested.c |  2 +-
 arch/x86/kvm/vmx/vmx.c    | 20 ++++++++++----------
 arch/x86/kvm/vmx/vmx.h    |  1 -
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 079aeca65e2c..f26f7b1acbca 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -48,6 +48,7 @@ struct vcpu_vt {
 	 * hardware.
 	 */
 	bool		guest_state_loaded;
+	bool		emulation_required;
 
 #ifdef CONFIG_X86_64
 	u64		msr_host_kernel_gs_base;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3add9f1073ff..8ae608a1e66c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4794,7 +4794,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 				vmcs12->vm_exit_msr_load_count))
 		nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_MSR_FAIL);
 
-	to_vmx(vcpu)->emulation_required = vmx_emulation_required(vcpu);
+	to_vt(vcpu)->emulation_required = vmx_emulation_required(vcpu);
 }
 
 static inline u64 nested_vmx_get_vmcs01_guest_efer(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cb6043e29ef9..012649688e46 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1584,7 +1584,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	vmcs_writel(GUEST_RFLAGS, rflags);
 
 	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
-		vmx->emulation_required = vmx_emulation_required(vcpu);
+		vmx->vt.emulation_required = vmx_emulation_required(vcpu);
 }
 
 bool vmx_get_if_flag(struct kvm_vcpu *vcpu)
@@ -1866,7 +1866,7 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
 		return;
 	}
 
-	WARN_ON_ONCE(vmx->emulation_required);
+	WARN_ON_ONCE(vmx->vt.emulation_required);
 
 	if (kvm_exception_is_soft(ex->vector)) {
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
@@ -3395,7 +3395,7 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	}
 
 	/* depends on vcpu->arch.cr0 to be set to a new value */
-	vmx->emulation_required = vmx_emulation_required(vcpu);
+	vmx->vt.emulation_required = vmx_emulation_required(vcpu);
 }
 
 static int vmx_get_max_ept_level(void)
@@ -3658,7 +3658,7 @@ void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg)
 {
 	__vmx_set_segment(vcpu, var, seg);
 
-	to_vmx(vcpu)->emulation_required = vmx_emulation_required(vcpu);
+	to_vmx(vcpu)->vt.emulation_required = vmx_emulation_required(vcpu);
 }
 
 void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5798,7 +5798,7 @@ static bool vmx_emulation_required_with_pending_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	return vmx->emulation_required && !vmx->rmode.vm86_active &&
+	return vmx->vt.emulation_required && !vmx->rmode.vm86_active &&
 	       (kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected);
 }
 
@@ -5811,7 +5811,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 	intr_window_requested = exec_controls_get(vmx) &
 				CPU_BASED_INTR_WINDOW_EXITING;
 
-	while (vmx->emulation_required && count-- != 0) {
+	while (vmx->vt.emulation_required && count-- != 0) {
 		if (intr_window_requested && !vmx_interrupt_blocked(vcpu))
 			return handle_interrupt_window(&vmx->vcpu);
 
@@ -6458,7 +6458,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 		 * the least awful solution for the userspace case without
 		 * risking false positives.
 		 */
-		if (vmx->emulation_required) {
+		if (vmx->vt.emulation_required) {
 			nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
 			return 1;
 		}
@@ -6468,7 +6468,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	}
 
 	/* If guest state is invalid, start emulating.  L2 is handled above. */
-	if (vmx->emulation_required)
+	if (vmx->vt.emulation_required)
 		return handle_invalid_guest_state(vcpu);
 
 	if (exit_reason.failed_vmentry) {
@@ -6961,7 +6961,7 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (vmx->emulation_required)
+	if (vmx->vt.emulation_required)
 		return;
 
 	if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXTERNAL_INTERRUPT)
@@ -7284,7 +7284,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 	 * start emulation until we arrive back to a valid state.  Synthesize a
 	 * consistency check VM-Exit due to invalid guest state and bail.
 	 */
-	if (unlikely(vmx->emulation_required)) {
+	if (unlikely(vmx->vt.emulation_required)) {
 		vmx->fail = 0;
 
 		vmx->vt.exit_reason.full = EXIT_REASON_INVALID_STATE;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e635199901e2..6d1e40ecc024 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -263,7 +263,6 @@ struct vcpu_vmx {
 		} seg[8];
 	} segment_cache;
 	int vpid;
-	bool emulation_required;
 
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (13 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 14/17] KVM: VMX: Move emulation_required to struct vcpu_vt Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-12  1:10   ` Sean Christopherson
  2025-02-11  2:58 ` [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT Binbin Wu
  2025-02-11  2:58 ` [PATCH v2 17/17] KVM: TDX: Handle EXIT_REASON_OTHER_SMI Binbin Wu
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a helper to handles NMI exit.

TDX handles the NMI exit the same as VMX case.  Add a helper to share the
code with TDX, expose the helper in common.h.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts v2:
- Renamed from "KVM: VMX: Move NMI/exception handler to common helper".
- Revert the unnecessary move, because in later patch TDX will reuse
  vmx_handle_exit_irqoff() as handle_exit_irqoff() callback.
- Add the check for NMI to __vmx_handle_nmi() and rename it to vmx_handle_nmi().
- Update change log according to the change.

TDX interrupts v1:
- Update change log with suggestions from (Binbin)
- Move the NMI handling code to common header and add a helper
  __vmx_handle_nmi() for it. (Binbin)
---
 arch/x86/kvm/vmx/common.h |  2 ++
 arch/x86/kvm/vmx/vmx.c    | 24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index f26f7b1acbca..67b16bd8a788 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -180,4 +180,6 @@ static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
 	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
 }
 
+noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu);
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 012649688e46..228a7e51b6a5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7212,6 +7212,20 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu,
 	}
 }
 
+noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
+{
+	if ((u16)vmx_get_exit_reason(vcpu).basic != EXIT_REASON_EXCEPTION_NMI ||
+		!is_nmi(vmx_get_intr_info(vcpu)))
+		return;
+
+	kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+	if (cpu_feature_enabled(X86_FEATURE_FRED))
+		fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
+	else
+		vmx_do_nmi_irqoff();
+	kvm_after_interrupt(vcpu);
+}
+
 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 					unsigned int flags)
 {
@@ -7255,15 +7269,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	if (likely(!vmx_get_exit_reason(vcpu).failed_vmentry))
 		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
-	if ((u16)vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXCEPTION_NMI &&
-	    is_nmi(vmx_get_intr_info(vcpu))) {
-		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-		if (cpu_feature_enabled(X86_FEATURE_FRED))
-			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
-		else
-			vmx_do_nmi_irqoff();
-		kvm_after_interrupt(vcpu);
-	}
+	vmx_handle_nmi(vcpu);
 
 out:
 	guest_state_exit_irqoff();
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (14 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  2025-02-12  0:50   ` Sean Christopherson
  2025-02-11  2:58 ` [PATCH v2 17/17] KVM: TDX: Handle EXIT_REASON_OTHER_SMI Binbin Wu
  16 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT exits for TDX.

NMI Handling: Just like the VMX case, NMI remains blocked after exiting
from TDX guest for NMI-induced exits [*].  Handle NMI-induced exits for
TDX guests in the same way as they are handled for VMX guests, i.e.,
handle NMI in tdx_vcpu_enter_exit() by calling the vmx_handle_nmi()
helper.

Interrupt and Exception Handling: Similar to the VMX case, external
interrupts and exceptions (machine check is the only exception type
KVM handles for TDX guests) are handled in the .handle_exit_irqoff()
callback.

For other exceptions, because TDX guest state is protected, exceptions in
TDX guests can't be intercepted.  TDX VMM isn't supposed to handle these
exceptions.  If unexpected exception occurs, exit to userspace with
KVM_EXIT_EXCEPTION.

For external interrupt, increase the statistics, same as the VMX case.

[*]: Some old TDX modules have a bug which makes NMI unblocked after
exiting from TDX guest for NMI-induced exits.  This could potentially
lead to nested NMIs: a new NMI arrives when KVM is manually calling the
host NMI handler.  This is an architectural violation, but it doesn't
have real harm until FRED is enabled together with TDX (for non-FRED,
the host NMI handler can handle nested NMIs).  Given this is rare to
happen and has no real harm, ignore this for the initial TDX support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
TDX interrupts v2:
- Drop tdx_handle_exit_irqoff(), make vmx_handle_exit_irqoff()  the common
  handle_exit_irqoff() callback for both VMX/TDX.
- Open code tdx_handle_external_interrupt(). (Sean)
- Use helper vmx_handle_nmi() to handle NMI for TDX.
- Update the changelog to reflect the latest TDX NMI arch update.

TDX interrupts v1:
- Renamed from "KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT"
  to "KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT".
- Update changelog.
- Rename tdx_handle_exception() to tdx_handle_exception_nmi() to reflect
  that NMI is also checked. (Binbin)
- Add comments in tdx_handle_exception_nmi() about why NMI and machine
  checks are ignored. (Chao)
- Exit to userspace with KVM_EXIT_EXCEPTION when unexpected exception
  occurs instead of returning -EFAULT. (Chao, Isaku)
- Switch to vp_enter_ret.
- Move the handling of NMI, exception and external interrupt from
  "KVM: TDX: Add a place holder to handle TDX VM exit" to this patch.
- Use helper __vmx_handle_nmi() to handle NMI, which including the
  support for FRED.
---
 arch/x86/kvm/vmx/tdx.c | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c |  4 +---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4a29b3998cde..2fa7ba465d10 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -911,6 +911,8 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 	tdx->exit_gpa = tdx->vp_enter_args.r8;
 	vt->exit_intr_info = tdx->vp_enter_args.r9;
 
+	vmx_handle_nmi(vcpu);
+
 	guest_state_exit_irqoff();
 }
 
@@ -1026,6 +1028,28 @@ void tdx_inject_nmi(struct kvm_vcpu *vcpu)
 	vcpu->arch.nmi_pending = 0;
 }
 
+static int tdx_handle_exception_nmi(struct kvm_vcpu *vcpu)
+{
+	u32 intr_info = vmx_get_intr_info(vcpu);
+
+	/*
+	 * Machine checks are handled by handle_exception_irqoff(), or by
+	 * tdx_handle_exit() with TDX_NON_RECOVERABLE set if a #MC occurs on
+	 * VM-Entry.  NMIs are handled by tdx_vcpu_enter_exit().
+	 */
+	if (is_nmi(intr_info) || is_machine_check(intr_info))
+		return 1;
+
+	kvm_pr_unimpl("unexpected exception 0x%x(exit_reason 0x%x qual 0x%lx)\n",
+		intr_info, vmx_get_exit_reason(vcpu).full, vmx_get_exit_qual(vcpu));
+
+	vcpu->run->exit_reason = KVM_EXIT_EXCEPTION;
+	vcpu->run->ex.exception = intr_info & INTR_INFO_VECTOR_MASK;
+	vcpu->run->ex.error_code = 0;
+
+	return 0;
+}
+
 static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
 {
 	tdvmcall_set_return_code(vcpu, vcpu->run->hypercall.ret);
@@ -1713,6 +1737,11 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 		vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
 		vcpu->mmio_needed = 0;
 		return 0;
+	case EXIT_REASON_EXCEPTION_NMI:
+		return tdx_handle_exception_nmi(vcpu);
+	case EXIT_REASON_EXTERNAL_INTERRUPT:
+		++vcpu->stat.irq_exits;
+		return 1;
 	case EXIT_REASON_TDCALL:
 		return handle_tdvmcall(vcpu);
 	case EXIT_REASON_VMCALL:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 228a7e51b6a5..caf4b2da8b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6959,9 +6959,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
 
 void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	if (vmx->vt.emulation_required)
+	if (to_vt(vcpu)->emulation_required)
 		return;
 
 	if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXTERNAL_INTERRUPT)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 17/17] KVM: TDX: Handle EXIT_REASON_OTHER_SMI
  2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
                   ` (15 preceding siblings ...)
  2025-02-11  2:58 ` [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT Binbin Wu
@ 2025-02-11  2:58 ` Binbin Wu
  16 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  2:58 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel, binbin.wu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Handle VM exit caused by "other SMI" for TDX, by returning back to
userspace for Machine Check System Management Interrupt (MSMI) case or
ignoring it and resume vCPU for non-MSMI case.

For VMX, SMM transition can happen in both VMX non-root mode and VMX
root mode.  Unlike VMX, in SEAM root mode (TDX module), all interrupts
are blocked. If an SMI occurs in SEAM non-root mode (TD guest), the SMI
causes VM exit to TDX module, then SEAMRET to KVM. Once it exits to KVM,
SMI is delivered and handled by kernel handler right away.

An SMI can be "I/O SMI" or "other SMI".  For TDX, there will be no I/O SMI
because I/O instructions inside TDX guest trigger #VE and TDX guest needs
to use TDVMCALL to request VMM to do I/O emulation.

For "other SMI", there are two cases:
- MSMI case.  When BIOS eMCA MCE-SMI morphing is enabled, the #MC occurs in
  TDX guest will be delivered as an MSMI.  It causes an
  EXIT_REASON_OTHER_SMI VM exit with MSMI (bit 0) set in the exit
  qualification.  On VM exit, TDX module checks whether the "other SMI" is
  caused by an MSMI or not.  If so, TDX module marks TD as fatal,
  preventing further TD entries, and then completes the TD exit flow to KVM
  with the TDH.VP.ENTER outputs indicating TDX_NON_RECOVERABLE_TD.  After
  TD exit, the MSMI is delivered and eventually handled by the kernel
  machine check handler (7911f145de5f x86/mce: Implement recovery for
  errors in TDX/SEAM non-root mode), i.e., the memory page is marked as
  poisoned and it won't be freed to the free list when the TDX guest is
  terminated.  Since the TDX guest is dead, follow other non-recoverable
  cases, exit to userspace.
- For non-MSMI case, KVM doesn't need to do anything, just continue TDX
  vCPU execution.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
TDX interrupts v2:
 - No change.

TDX interrupts v1:
 - Squashed "KVM: TDX: Handle EXIT_REASON_OTHER_SMI" and
   "KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI". (Chao)
 - Rewrite the changelog.
 - Remove the explicit call of kvm_machine_check() because the MSMI can
   be handled by host #MC handler.
 - Update comments according to the code change.
---
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/tdx.c          | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 6a9f268a2d2c..f0f4a4cf84a7 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -34,6 +34,7 @@
 #define EXIT_REASON_TRIPLE_FAULT        2
 #define EXIT_REASON_INIT_SIGNAL			3
 #define EXIT_REASON_SIPI_SIGNAL         4
+#define EXIT_REASON_OTHER_SMI           6
 
 #define EXIT_REASON_INTERRUPT_WINDOW    7
 #define EXIT_REASON_NMI_WINDOW          8
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 2fa7ba465d10..f7b8b52c5a76 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1750,6 +1750,27 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 		return tdx_emulate_io(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
 		return tdx_emulate_mmio(vcpu);
+	case EXIT_REASON_OTHER_SMI:
+		/*
+		 * Unlike VMX, SMI in SEAM non-root mode (i.e. when
+		 * TD guest vCPU is running) will cause VM exit to TDX module,
+		 * then SEAMRET to KVM.  Once it exits to KVM, SMI is delivered
+		 * and handled by kernel handler right away.
+		 *
+		 * The Other SMI exit can also be caused by the SEAM non-root
+		 * machine check delivered via Machine Check System Management
+		 * Interrupt (MSMI), but it has already been handled by the
+		 * kernel machine check handler, i.e., the memory page has been
+		 * marked as poisoned and it won't be freed to the free list
+		 * when the TDX guest is terminated (the TDX module marks the
+		 * guest as dead and prevent it from further running when
+		 * machine check happens in SEAM non-root).
+		 *
+		 * - A MSMI will not reach here, it's handled as non_recoverable
+		 *   case above.
+		 * - If it's not an MSMI, no need to do anything here.
+		 */
+		return 1;
 	default:
 		break;
 	}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC
  2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
@ 2025-02-11  7:23   ` Binbin Wu
  2025-02-12  8:12   ` Chao Gao
  1 sibling, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-11  7:23 UTC (permalink / raw)
  To: pbonzini, seanjc, kvm
  Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
	xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
	linux-kernel



On 2/11/2025 10:58 AM, Binbin Wu wrote:
[...]
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 63f66c51975a..f0644d0bbe11 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>   	if (kvm_cpu_has_extint(v))
>   		return 1;
>   
> +	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
> +		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
No functional impact.
But forgot to replace "static_call(kvm_x86_protected_apic_has_interrupt)(v)"
by "kvm_x86_call(protected_apic_has_interrupt)(v)"

> +
>   	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
>   }
>   EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT
  2025-02-11  2:58 ` [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT Binbin Wu
@ 2025-02-12  0:50   ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2025-02-12  0:50 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, chao.gao, linux-kernel

On Tue, Feb 11, 2025, Binbin Wu wrote:
> +	kvm_pr_unimpl("unexpected exception 0x%x(exit_reason 0x%x qual 0x%lx)\n",
> +		intr_info, vmx_get_exit_reason(vcpu).full, vmx_get_exit_qual(vcpu));

This should be vcpu_unimpl().  But I vote to omit it entirely.  Ratelimited
printks are notoriously unhelpful, and KVM is already providing a useful exit
to userspace.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling
  2025-02-11  2:58 ` [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling Binbin Wu
@ 2025-02-12  1:10   ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2025-02-12  1:10 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, chao.gao, linux-kernel

On Tue, Feb 11, 2025, Binbin Wu wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 012649688e46..228a7e51b6a5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7212,6 +7212,20 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
> +{
> +	if ((u16)vmx_get_exit_reason(vcpu).basic != EXIT_REASON_EXCEPTION_NMI ||
> +		!is_nmi(vmx_get_intr_info(vcpu)))

Align indentation.

> +		return;
> +
> +	kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> +	if (cpu_feature_enabled(X86_FEATURE_FRED))
> +		fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
> +	else
> +		vmx_do_nmi_irqoff();
> +	kvm_after_interrupt(vcpu);
> +}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  2025-02-11  2:58 ` [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM Binbin Wu
@ 2025-02-12  1:47   ` Sean Christopherson
  2025-02-12  5:51     ` Binbin Wu
  2025-02-12 10:19     ` Huang, Kai
  0 siblings, 2 replies; 33+ messages in thread
From: Sean Christopherson @ 2025-02-12  1:47 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, chao.gao, linux-kernel

On Tue, Feb 11, 2025, Binbin Wu wrote:
> +#ifdef CONFIG_KVM_SMM
> +static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> +{
> +	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
> +		return false;

Nit, while the name suggests a boolean return, the actual return in -errno/0/1,
i.e. this should be '0', not "false".

A bit late to be asking this, but has anyone verified all the KVM_BUG_ON() calls
are fully optimized out when CONFIG_KVM_INTEL_TDX=n?

/me rummages around

Sort of.  The KVM_BUG_ON()s are all gone, but sadly a stub gets left behind.  Not
the end of the world since they're all tail calls, but it's still quite useless,
especially when using frame pointers.

Aha!  Finally!  An excuse to macrofy some of this!

Rather than have a metric ton of stubs for all of the TDX variants, simply omit
the wrappers when CONFIG_KVM_INTEL_TDX=n.  Quite nearly all of vmx/main.c can go
under a single #ifdef.  That eliminates all the silly trampolines in the generated
code, and almost all of the stubs.

Compile tested only, and needs to be chunked up. E.g. switching to the
right CONFIG_xxx needs to be done elsewhere, ditto for moving the "pre restore"
function to posted_intr.c.

---
 arch/x86/kvm/vmx/main.c        | 212 +++++++++++++++++----------------
 arch/x86/kvm/vmx/posted_intr.c |   8 ++
 arch/x86/kvm/vmx/posted_intr.h |   1 +
 arch/x86/kvm/vmx/tdx.h         |   2 +-
 arch/x86/kvm/vmx/x86_ops.h     |  69 +----------
 5 files changed, 121 insertions(+), 171 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index cfffa529c831..fc087fcabd7d 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -10,9 +10,8 @@
 #include "tdx.h"
 #include "tdx_arch.h"
 
-#ifdef CONFIG_INTEL_TDX_HOST
+#ifdef CONFIG_INTEL_KVM_TDX
 static_assert(offsetof(struct vcpu_vmx, vt) == offsetof(struct vcpu_tdx, vt));
-#endif
 
 static void vt_disable_virtualization_cpu(void)
 {
@@ -241,7 +240,7 @@ static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
 	if (is_td_vcpu(vcpu))
 		return tdx_complete_emulated_msr(vcpu, err);
 
-	return kvm_complete_insn_gp(vcpu, err);
+	return vmx_complete_emulated_msr(vcpu, err);
 }
 
 #ifdef CONFIG_KVM_SMM
@@ -316,14 +315,6 @@ static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 	return vmx_set_virtual_apic_mode(vcpu);
 }
 
-static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
-
-	pi_clear_on(pi);
-	memset(pi->pir, 0, sizeof(pi->pir));
-}
-
 static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
 {
 	if (is_td_vcpu(vcpu))
@@ -352,6 +343,15 @@ static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
 }
 
+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(!is_td_vcpu(vcpu)))
+		return false;
+
+	return tdx_protected_apic_has_interrupt(vcpu);
+}
+
+
 static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -880,6 +880,12 @@ static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 
 	return 0;
 }
+#define vt_op(name) vt_##name
+#define vt_op_tdx_only(name) vt_##name
+#else /* CONFIG_INTEL_KVM_TDX */
+#define vt_op(name) vmx_##name
+#define vt_op_tdx_only(name) NULL
+#endif
 
 #define VMX_REQUIRED_APICV_INHIBITS				\
 	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
@@ -898,113 +904,113 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.hardware_unsetup = vmx_hardware_unsetup,
 
 	.enable_virtualization_cpu = vmx_enable_virtualization_cpu,
-	.disable_virtualization_cpu = vt_disable_virtualization_cpu,
+	.disable_virtualization_cpu = vt_op(disable_virtualization_cpu),
 	.emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu,
 
-	.has_emulated_msr = vt_has_emulated_msr,
+	.has_emulated_msr = vt_op(has_emulated_msr),
 
 	.vm_size = sizeof(struct kvm_vmx),
 
-	.vm_init = vt_vm_init,
-	.vm_destroy = vt_vm_destroy,
-	.vm_free = vt_vm_free,
+	.vm_init = vt_op(vm_init),
+	.vm_destroy = vt_op(vm_destroy),
+	.vm_free = vt_op_tdx_only(vm_free),
 
-	.vcpu_precreate = vt_vcpu_precreate,
-	.vcpu_create = vt_vcpu_create,
-	.vcpu_free = vt_vcpu_free,
-	.vcpu_reset = vt_vcpu_reset,
+	.vcpu_precreate = vt_op(vcpu_precreate),
+	.vcpu_create = vt_op(vcpu_create),
+	.vcpu_free = vt_op(vcpu_free),
+	.vcpu_reset = vt_op(vcpu_reset),
 
-	.prepare_switch_to_guest = vt_prepare_switch_to_guest,
-	.vcpu_load = vt_vcpu_load,
-	.vcpu_put = vt_vcpu_put,
+	.prepare_switch_to_guest = vt_op(prepare_switch_to_guest),
+	.vcpu_load = vt_op(vcpu_load),
+	.vcpu_put = vt_op(vcpu_put),
 
-	.update_exception_bitmap = vt_update_exception_bitmap,
+	.update_exception_bitmap = vt_op(update_exception_bitmap),
 	.get_feature_msr = vmx_get_feature_msr,
-	.get_msr = vt_get_msr,
-	.set_msr = vt_set_msr,
+	.get_msr = vt_op(get_msr),
+	.set_msr = vt_op(set_msr),
 
-	.get_segment_base = vt_get_segment_base,
-	.get_segment = vt_get_segment,
-	.set_segment = vt_set_segment,
-	.get_cpl = vt_get_cpl,
-	.get_cpl_no_cache = vt_get_cpl_no_cache,
-	.get_cs_db_l_bits = vt_get_cs_db_l_bits,
-	.is_valid_cr0 = vt_is_valid_cr0,
-	.set_cr0 = vt_set_cr0,
-	.is_valid_cr4 = vt_is_valid_cr4,
-	.set_cr4 = vt_set_cr4,
-	.set_efer = vt_set_efer,
-	.get_idt = vt_get_idt,
-	.set_idt = vt_set_idt,
-	.get_gdt = vt_get_gdt,
-	.set_gdt = vt_set_gdt,
-	.set_dr7 = vt_set_dr7,
-	.sync_dirty_debug_regs = vt_sync_dirty_debug_regs,
-	.cache_reg = vt_cache_reg,
-	.get_rflags = vt_get_rflags,
-	.set_rflags = vt_set_rflags,
-	.get_if_flag = vt_get_if_flag,
+	.get_segment_base = vt_op(get_segment_base),
+	.get_segment = vt_op(get_segment),
+	.set_segment = vt_op(set_segment),
+	.get_cpl = vt_op(get_cpl),
+	.get_cpl_no_cache = vt_op(get_cpl_no_cache),
+	.get_cs_db_l_bits = vt_op(get_cs_db_l_bits),
+	.is_valid_cr0 = vt_op(is_valid_cr0),
+	.set_cr0 = vt_op(set_cr0),
+	.is_valid_cr4 = vt_op(is_valid_cr4),
+	.set_cr4 = vt_op(set_cr4),
+	.set_efer = vt_op(set_efer),
+	.get_idt = vt_op(get_idt),
+	.set_idt = vt_op(set_idt),
+	.get_gdt = vt_op(get_gdt),
+	.set_gdt = vt_op(set_gdt),
+	.set_dr7 = vt_op(set_dr7),
+	.sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
+	.cache_reg = vt_op(cache_reg),
+	.get_rflags = vt_op(get_rflags),
+	.set_rflags = vt_op(set_rflags),
+	.get_if_flag = vt_op(get_if_flag),
 
-	.flush_tlb_all = vt_flush_tlb_all,
-	.flush_tlb_current = vt_flush_tlb_current,
-	.flush_tlb_gva = vt_flush_tlb_gva,
-	.flush_tlb_guest = vt_flush_tlb_guest,
+	.flush_tlb_all = vt_op(flush_tlb_all),
+	.flush_tlb_current = vt_op(flush_tlb_current),
+	.flush_tlb_gva = vt_op(flush_tlb_gva),
+	.flush_tlb_guest = vt_op(flush_tlb_guest),
 
-	.vcpu_pre_run = vt_vcpu_pre_run,
-	.vcpu_run = vt_vcpu_run,
-	.handle_exit = vt_handle_exit,
+	.vcpu_pre_run = vt_op(vcpu_pre_run),
+	.vcpu_run = vt_op(vcpu_run),
+	.handle_exit = vt_op(handle_exit),
 	.skip_emulated_instruction = vmx_skip_emulated_instruction,
 	.update_emulated_instruction = vmx_update_emulated_instruction,
-	.set_interrupt_shadow = vt_set_interrupt_shadow,
-	.get_interrupt_shadow = vt_get_interrupt_shadow,
-	.patch_hypercall = vt_patch_hypercall,
-	.inject_irq = vt_inject_irq,
-	.inject_nmi = vt_inject_nmi,
-	.inject_exception = vt_inject_exception,
-	.cancel_injection = vt_cancel_injection,
-	.interrupt_allowed = vt_interrupt_allowed,
-	.nmi_allowed = vt_nmi_allowed,
-	.get_nmi_mask = vt_get_nmi_mask,
-	.set_nmi_mask = vt_set_nmi_mask,
-	.enable_nmi_window = vt_enable_nmi_window,
-	.enable_irq_window = vt_enable_irq_window,
-	.update_cr8_intercept = vt_update_cr8_intercept,
+	.set_interrupt_shadow = vt_op(set_interrupt_shadow),
+	.get_interrupt_shadow = vt_op(get_interrupt_shadow),
+	.patch_hypercall = vt_op(patch_hypercall),
+	.inject_irq = vt_op(inject_irq),
+	.inject_nmi = vt_op(inject_nmi),
+	.inject_exception = vt_op(inject_exception),
+	.cancel_injection = vt_op(cancel_injection),
+	.interrupt_allowed = vt_op(interrupt_allowed),
+	.nmi_allowed = vt_op(nmi_allowed),
+	.get_nmi_mask = vt_op(get_nmi_mask),
+	.set_nmi_mask = vt_op(set_nmi_mask),
+	.enable_nmi_window = vt_op(enable_nmi_window),
+	.enable_irq_window = vt_op(enable_irq_window),
+	.update_cr8_intercept = vt_op(update_cr8_intercept),
 
 	.x2apic_icr_is_split = false,
-	.set_virtual_apic_mode = vt_set_virtual_apic_mode,
-	.set_apic_access_page_addr = vt_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap = vt_load_eoi_exitmap,
-	.apicv_pre_state_restore = vt_apicv_pre_state_restore,
+	.set_virtual_apic_mode = vt_op(set_virtual_apic_mode),
+	.set_apic_access_page_addr = vt_op(set_apic_access_page_addr),
+	.refresh_apicv_exec_ctrl = vt_op(refresh_apicv_exec_ctrl),
+	.load_eoi_exitmap = vt_op(load_eoi_exitmap),
+	.apicv_pre_state_restore = pi_apicv_pre_state_restore,
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
-	.hwapic_isr_update = vt_hwapic_isr_update,
-	.sync_pir_to_irr = vt_sync_pir_to_irr,
-	.deliver_interrupt = vt_deliver_interrupt,
+	.hwapic_isr_update = vt_op(hwapic_isr_update),
+	.sync_pir_to_irr = vt_op(sync_pir_to_irr),
+	.deliver_interrupt = vt_op(deliver_interrupt),
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
-	.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt,
+	.protected_apic_has_interrupt = vt_op_tdx_only(protected_apic_has_interrupt),
 
-	.set_tss_addr = vt_set_tss_addr,
-	.set_identity_map_addr = vt_set_identity_map_addr,
+	.set_tss_addr = vt_op(set_tss_addr),
+	.set_identity_map_addr = vt_op(set_identity_map_addr),
 	.get_mt_mask = vmx_get_mt_mask,
 
-	.get_exit_info = vt_get_exit_info,
-	.get_entry_info = vt_get_entry_info,
+	.get_exit_info = vt_op(get_exit_info),
+	.get_entry_info = vt_op(get_entry_info),
 
-	.vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid,
+	.vcpu_after_set_cpuid = vt_op(vcpu_after_set_cpuid),
 
 	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
-	.get_l2_tsc_offset = vt_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier = vt_get_l2_tsc_multiplier,
-	.write_tsc_offset = vt_write_tsc_offset,
-	.write_tsc_multiplier = vt_write_tsc_multiplier,
+	.get_l2_tsc_offset = vt_op(get_l2_tsc_offset),
+	.get_l2_tsc_multiplier = vt_op(get_l2_tsc_multiplier),
+	.write_tsc_offset = vt_op(write_tsc_offset),
+	.write_tsc_multiplier = vt_op(write_tsc_multiplier),
 
-	.load_mmu_pgd = vt_load_mmu_pgd,
+	.load_mmu_pgd = vt_op(load_mmu_pgd),
 
 	.check_intercept = vmx_check_intercept,
 	.handle_exit_irqoff = vmx_handle_exit_irqoff,
 
-	.update_cpu_dirty_logging = vt_update_cpu_dirty_logging,
+	.update_cpu_dirty_logging = vt_op(update_cpu_dirty_logging),
 
 	.nested_ops = &vmx_nested_ops,
 
@@ -1012,38 +1018,38 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.pi_start_assignment = vmx_pi_start_assignment,
 
 #ifdef CONFIG_X86_64
-	.set_hv_timer = vt_set_hv_timer,
-	.cancel_hv_timer = vt_cancel_hv_timer,
+	.set_hv_timer = vt_op(set_hv_timer),
+	.cancel_hv_timer = vt_op(cancel_hv_timer),
 #endif
 
-	.setup_mce = vt_setup_mce,
+	.setup_mce = vt_op(setup_mce),
 
 #ifdef CONFIG_KVM_SMM
-	.smi_allowed = vt_smi_allowed,
-	.enter_smm = vt_enter_smm,
-	.leave_smm = vt_leave_smm,
-	.enable_smi_window = vt_enable_smi_window,
+	.smi_allowed = vt_op(smi_allowed),
+	.enter_smm = vt_op(enter_smm),
+	.leave_smm = vt_op(leave_smm),
+	.enable_smi_window = vt_op(enable_smi_window),
 #endif
 
-	.check_emulate_instruction = vt_check_emulate_instruction,
-	.apic_init_signal_blocked = vt_apic_init_signal_blocked,
+	.check_emulate_instruction = vt_op(check_emulate_instruction),
+	.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
 	.migrate_timers = vmx_migrate_timers,
 
-	.msr_filter_changed = vt_msr_filter_changed,
-	.complete_emulated_msr = vt_complete_emulated_msr,
+	.msr_filter_changed = vt_op(msr_filter_changed),
+	.complete_emulated_msr = vt_op(complete_emulated_msr),
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
 
 	.get_untagged_addr = vmx_get_untagged_addr,
 
-	.mem_enc_ioctl = vt_mem_enc_ioctl,
-	.vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl,
+	.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
+	.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
 
-	.private_max_mapping_level = vt_gmem_private_max_mapping_level
+	.private_max_mapping_level = vt_op_tdx_only(gmem_private_max_mapping_level)
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
-	.hardware_setup = vt_hardware_setup,
+	.hardware_setup = vt_op(hardware_setup),
 	.handle_intel_pt_intr = NULL,
 
 	.runtime_ops = &vt_x86_ops,
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index f2ca37b3f606..a140af060bb8 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -241,6 +241,14 @@ void __init pi_init_cpu(int cpu)
 	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
 }
 
+void pi_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
+
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 68605ca7ef68..9d0677a2ba0e 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -11,6 +11,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
 void __init pi_init_cpu(int cpu);
+void pi_apicv_pre_state_restore(struct kvm_vcpu *vcpu);
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int vmx_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
 		       uint32_t guest_irq, bool set);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 196bf360a368..4e7336925059 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -5,7 +5,7 @@
 #include "tdx_arch.h"
 #include "tdx_errno.h"
 
-#ifdef CONFIG_INTEL_TDX_HOST
+#ifdef CONFIG_INTEL_KVM_TDX
 #include "common.h"
 
 int tdx_bringup(void);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 9f286602b205..95f97f9c1b60 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -58,6 +58,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
 int vmx_get_feature_msr(u32 msr, u64 *data);
 int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+#define vmx_complete_emulated_msr kvm_complete_insn_gp
 u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
 void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
@@ -120,7 +121,7 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 #endif
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
 
-#ifdef CONFIG_INTEL_TDX_HOST
+#ifdef CONFIG_KVM_INTEL_TDX
 void tdx_disable_virtualization_cpu(void);
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
@@ -164,72 +165,6 @@ void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
 void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
 int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
-#else
-static inline void tdx_disable_virtualization_cpu(void) {}
-static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
-static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
-static inline void tdx_vm_free(struct kvm *kvm) {}
-
-static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
-
-static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
-static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
-static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
-static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
-static inline int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
-static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
-{
-	return EXIT_FASTPATH_NONE;
-}
-static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
-static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
-static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { return false; }
-static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
-		enum exit_fastpath_completion fastpath) { return 0; }
-
-static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
-					 int trig_mode, int vector) {}
-static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
-static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
-				     u64 *info2, u32 *intr_info, u32 *error_code) {}
-static inline bool tdx_has_emulated_msr(u32 index) { return false; }
-static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
-static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
-
-static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
-
-static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
-					    enum pg_level level,
-					    void *private_spt)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
-					    enum pg_level level,
-					    void *private_spt)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
-					    enum pg_level level,
-					    kvm_pfn_t pfn)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
-					       enum pg_level level,
-					       kvm_pfn_t pfn)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {}
-static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {}
-static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {}
-static inline int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; }
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */

base-commit: 50b7294b916de2d855549c179498ba4b7c3ecf37
-- 


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  2025-02-12  1:47   ` Sean Christopherson
@ 2025-02-12  5:51     ` Binbin Wu
  2025-02-14 17:15       ` Edgecombe, Rick P
  2025-02-12 10:19     ` Huang, Kai
  1 sibling, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2025-02-12  5:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, chao.gao, linux-kernel



On 2/12/2025 9:47 AM, Sean Christopherson wrote:
> On Tue, Feb 11, 2025, Binbin Wu wrote:
>> +#ifdef CONFIG_KVM_SMM
>> +static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>> +{
>> +	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
>> +		return false;
> Nit, while the name suggests a boolean return, the actual return in -errno/0/1,
> i.e. this should be '0', not "false".
Yes.

>
> A bit late to be asking this, but has anyone verified all the KVM_BUG_ON() calls
> are fully optimized out when CONFIG_KVM_INTEL_TDX=n?
>
> /me rummages around
>
> Sort of.  The KVM_BUG_ON()s are all gone, but sadly a stub gets left behind.  Not
> the end of the world since they're all tail calls, but it's still quite useless,
> especially when using frame pointers.
>
> Aha!  Finally!  An excuse to macrofy some of this!
>
> Rather than have a metric ton of stubs for all of the TDX variants, simply omit
> the wrappers when CONFIG_KVM_INTEL_TDX=n.  Quite nearly all of vmx/main.c can go
> under a single #ifdef.  That eliminates all the silly trampolines in the generated
> code, and almost all of the stubs.
Thanks for the suggestion!

Since the changes will be across multiple sections of TDX KVM support,
instead of modifying them individually, are you OK if we do it in a separate
cleanup patch?

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC
  2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
  2025-02-11  7:23   ` Binbin Wu
@ 2025-02-12  8:12   ` Chao Gao
  2025-02-12 16:04     ` Sean Christopherson
  1 sibling, 1 reply; 33+ messages in thread
From: Chao Gao @ 2025-02-12  8:12 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel

>diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>index 7f1318c44040..2b1ea57a3a4e 100644
>--- a/arch/x86/kvm/vmx/main.c
>+++ b/arch/x86/kvm/vmx/main.c
>@@ -62,6 +62,8 @@ static __init int vt_hardware_setup(void)
> 		vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
> 		vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
> 		vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;

Nit: I think it would be more consistent to set up .protected_apic_has_interrupt
if TDX is enabled (rather than clearing it if TDX is disabled).

>+	} else {
>+		vt_x86_ops.protected_apic_has_interrupt = NULL;
> 	}
> 
> 	return 0;
>@@ -371,6 +373,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
> 	.deliver_interrupt = vmx_deliver_interrupt,
> 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
>+	.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt,
> 
> 	.set_tss_addr = vmx_set_tss_addr,
> 	.set_identity_map_addr = vmx_set_identity_map_addr,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  2025-02-12  1:47   ` Sean Christopherson
  2025-02-12  5:51     ` Binbin Wu
@ 2025-02-12 10:19     ` Huang, Kai
  1 sibling, 0 replies; 33+ messages in thread
From: Huang, Kai @ 2025-02-12 10:19 UTC (permalink / raw)
  To: Sean Christopherson, Binbin Wu
  Cc: pbonzini@redhat.com, kvm@vger.kernel.org, Edgecombe, Rick P,
	Hunter, Adrian, Chatre, Reinette, Li, Xiaoyao, Lindgren, Tony,
	Yamahata, Isaku, Zhao, Yan Y, Gao, Chao,
	linux-kernel@vger.kernel.org



On 12/02/2025 2:47 pm, Sean Christopherson wrote:
> On Tue, Feb 11, 2025, Binbin Wu wrote:
>> +#ifdef CONFIG_KVM_SMM
>> +static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>> +{
>> +	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
>> +		return false;
> 
> Nit, while the name suggests a boolean return, the actual return in -errno/0/1,
> i.e. this should be '0', not "false".
> 
> A bit late to be asking this, but has anyone verified all the KVM_BUG_ON() calls
> are fully optimized out when CONFIG_KVM_INTEL_TDX=n?
> 
> /me rummages around
> 
> Sort of.  The KVM_BUG_ON()s are all gone, but sadly a stub gets left behind.  Not
> the end of the world since they're all tail calls, but it's still quite useless,
> especially when using frame pointers.
> 
> Aha!  Finally!  An excuse to macrofy some of this!
> 
> Rather than have a metric ton of stubs for all of the TDX variants, simply omit
> the wrappers when CONFIG_KVM_INTEL_TDX=n.  Quite nearly all of vmx/main.c can go
> under a single #ifdef.  That eliminates all the silly trampolines in the generated
> code, and almost all of the stubs.
> 
> Compile tested only, and needs to be chunked up. E.g. switching to the
> right CONFIG_xxx needs to be done elsewhere, ditto for moving the "pre restore"
> function to posted_intr.c.

AFAICT, if we export kvm_ops_update(), or kvm_x86_ops directly to 
kvm-intel then we can even take a further step to just set those 
callbacks back to vmx_xx() again if KVM determines TDX cannot be 
supported at module loading time.  And this can cover 
!CONFIG_KVM_INTEL_TDX as well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC
  2025-02-12  8:12   ` Chao Gao
@ 2025-02-12 16:04     ` Sean Christopherson
  2025-02-13  2:12       ` Chao Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2025-02-12 16:04 UTC (permalink / raw)
  To: Chao Gao
  Cc: Binbin Wu, pbonzini, kvm, rick.p.edgecombe, kai.huang,
	adrian.hunter, reinette.chatre, xiaoyao.li, tony.lindgren,
	isaku.yamahata, yan.y.zhao, linux-kernel

On Wed, Feb 12, 2025, Chao Gao wrote:
> >diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> >index 7f1318c44040..2b1ea57a3a4e 100644
> >--- a/arch/x86/kvm/vmx/main.c
> >+++ b/arch/x86/kvm/vmx/main.c
> >@@ -62,6 +62,8 @@ static __init int vt_hardware_setup(void)
> > 		vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
> > 		vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
> > 		vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
> 
> Nit: I think it would be more consistent to set up .protected_apic_has_interrupt
> if TDX is enabled (rather than clearing it if TDX is disabled).

I think my preference would be to do the vt_op_tdx_only() thing[*], wire up all
TDX hooks by default via vt_op_tdx_only(), and then nullify them if TDX support
isn't enabled.  Or even just leave them set, e.g. based on the comment in
vt_hardware_setup(), that can happen anyways.

https://lore.kernel.org/all/Z6v9yjWLNTU6X90d@google.com


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC
  2025-02-12 16:04     ` Sean Christopherson
@ 2025-02-13  2:12       ` Chao Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Gao @ 2025-02-13  2:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Binbin Wu, pbonzini, kvm, rick.p.edgecombe, kai.huang,
	adrian.hunter, reinette.chatre, xiaoyao.li, tony.lindgren,
	isaku.yamahata, yan.y.zhao, linux-kernel

On Wed, Feb 12, 2025 at 08:04:49AM -0800, Sean Christopherson wrote:
>On Wed, Feb 12, 2025, Chao Gao wrote:
>> >diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>> >index 7f1318c44040..2b1ea57a3a4e 100644
>> >--- a/arch/x86/kvm/vmx/main.c
>> >+++ b/arch/x86/kvm/vmx/main.c
>> >@@ -62,6 +62,8 @@ static __init int vt_hardware_setup(void)
>> > 		vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
>> > 		vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
>> > 		vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
>> 
>> Nit: I think it would be more consistent to set up .protected_apic_has_interrupt
>> if TDX is enabled (rather than clearing it if TDX is disabled).
>
>I think my preference would be to do the vt_op_tdx_only() thing[*], wire up all
>TDX hooks by default via vt_op_tdx_only(),

Yes, that makes sense. I am fine as long as the hooks are set up in the same way.

>and then nullify them if TDX support
>isn't enabled.  Or even just leave them set, e.g. based on the comment in
>vt_hardware_setup(), that can happen anyways.

Indeed. No need to nullify the hooks.

>
>https://lore.kernel.org/all/Z6v9yjWLNTU6X90d@google.com
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header
  2025-02-11  2:58 ` [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header Binbin Wu
@ 2025-02-13  6:59   ` Chao Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Gao @ 2025-02-13  6:59 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel

>+/*
>+ * Send interrupt to vcpu via posted interrupt way.
>+ * 1. If target vcpu is running(non-root mode), send posted interrupt
>+ * notification to vcpu and hardware will sync PIR to vIRR atomically.

This comment primarily describes what kvm_vcpu_trigger_posted_interrupt() does.
And, it is not entirely accurate, as it is not necessarily the "hardware" that
syncs PIR to vIRR (see case 2 & 3 in the comment in
kvm_vcpu_trigger_posted_interrupt()).

How about:

/*
 * Post an interrupt to a vCPU's PIR and trigger the vCPU to process the
 * interrupt if necessary.
 */


Other than that, the patch looks good to me

Reviewed-by: Chao Gao <chao.gao@intel.com>

>+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
>+ * interrupt from PIR in next vmentry.
>+ */
>+static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
>+						  struct pi_desc *pi_desc, int vector)
>+{
>+	if (pi_test_and_set_pir(vector, pi_desc))
>+		return;
>+
>+	/* If a previous notification has sent the IPI, nothing to do.  */
>+	if (pi_test_and_set_on(pi_desc))
>+		return;
>+
>+	/*
>+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
>+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
>+	 * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
>+	 * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
>+	 */
>+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
>+}
>+

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection
  2025-02-11  2:58 ` [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection Binbin Wu
@ 2025-02-13  7:15   ` Chao Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Gao @ 2025-02-13  7:15 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel

>--- a/arch/x86/kvm/vmx/tdx.c
>+++ b/arch/x86/kvm/vmx/tdx.c
>@@ -685,6 +685,10 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> 	if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE)
> 		vcpu->arch.xfd_no_write_intercept = true;
> 
>+

remove this newline.

>+	tdx->vt.pi_desc.nv = POSTED_INTR_VECTOR;
>+	__pi_set_sn(&tdx->vt.pi_desc);
>+
> 	tdx->state = VCPU_TD_STATE_UNINITIALIZED;
> 
> 	return 0;
>@@ -694,6 +698,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> {
> 	struct vcpu_tdx *tdx = to_tdx(vcpu);
> 
>+	vmx_vcpu_pi_load(vcpu, cpu);
> 	if (vcpu->cpu == cpu)
> 		return;
> 
>@@ -950,6 +955,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> 
> 	trace_kvm_entry(vcpu, force_immediate_exit);
> 
>+	if (pi_test_on(&vt->pi_desc))
>+		apic->send_IPI_self(POSTED_INTR_VECTOR);
>+
> 	tdx_vcpu_enter_exit(vcpu);
> 
> 	if (vt->host_debugctlmsr & ~TDX_DEBUGCTL_PRESERVED)
>@@ -1607,6 +1615,16 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> 	return tdx_sept_drop_private_spte(kvm, gfn, level, pfn_to_page(pfn));
> }
> 
>+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
>+			   int trig_mode, int vector)
>+{
>+	struct kvm_vcpu *vcpu = apic->vcpu;
>+	struct vcpu_tdx *tdx = to_tdx(vcpu);
>+
>+	/* TDX supports only posted interrupt.  No lapic emulation. */
>+	__vmx_deliver_posted_interrupt(vcpu, &tdx->vt.pi_desc, vector);

trace_kvm_apicv_accept_irq() is missing compared to the VMX counterpart.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected
  2025-02-11  2:58 ` [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected Binbin Wu
@ 2025-02-13  7:26   ` Chao Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Gao @ 2025-02-13  7:26 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel

On Tue, Feb 11, 2025 at 10:58:16AM +0800, Binbin Wu wrote:
>From: Sean Christopherson <seanjc@google.com>
>
>If APIC state is protected, i.e. the vCPU is a TDX guest, assume a timer
>IRQ was injected when deciding whether or not to busy wait in the "timer
>advanced" path.  The "real" vIRR is not readable/writable, so trying to
>query for a pending timer IRQ will return garbage.
>
>Note, TDX can scour the PIR if it wants to be more precise and skip the
>"wait" call entirely.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>---
>TDX interrupts v2:
>- No change.
>
>TDX interrupts v1:
>- Renamed from "KVM: x86: Assume timer IRQ was injected if APIC state is proteced"
>  to "KVM: x86: Assume timer IRQ was injected if APIC state is protected", i.e.,
>  fix the typo 'proteced'.
>---
> arch/x86/kvm/lapic.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index bbdede07d063..bab5c42f63b7 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -1797,8 +1797,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
> static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_lapic *apic = vcpu->arch.apic;
>-	u32 reg = kvm_lapic_get_reg(apic, APIC_LVTT);
>+	u32 reg;
> 
>+	/*
>+	 * Assume a timer IRQ was "injected" if the APIC is protected.  KVM's
>+	 * copy of the vIRR is bogus, it's the responsibility of the caller to
>+	 * precisely check whether or not a timer IRQ is pending.
>+	 */
>+	if (apic->guest_apic_protected)
>+		return true;
>+
>+	reg  = kvm_lapic_get_reg(apic, APIC_LVTT);

nit:	   ^^ remove one space here

> 	if (kvm_apic_hw_enabled(apic)) {
> 		int vec = reg & APIC_VECTOR_MASK;
> 		void *bitmap = apic->regs + APIC_ISR;
>-- 
>2.46.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit
  2025-02-11  2:58 ` [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit Binbin Wu
@ 2025-02-13  8:20   ` Chao Gao
  2025-02-13  8:55     ` Binbin Wu
  0 siblings, 1 reply; 33+ messages in thread
From: Chao Gao @ 2025-02-13  8:20 UTC (permalink / raw)
  To: Binbin Wu
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel

>+static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
>+{
>+	/* Avoid costly SEAMCALL if no NMI was injected. */
>+	if (vcpu->arch.nmi_injected) {
>+		/*
>+		 * No need to request KVM_REQ_EVENT because PEND_NMI is still
>+		 * set if NMI re-injection needed.  No other event types need
>+		 * to be handled because TDX doesn't support injection of
>+		 * exception, SMI or interrupt (via event injection).
>+		 */
>+		vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
>+							      TD_VCPU_PEND_NMI);
>+	}

Why does KVM care whether/when an NMI is injected by the TDX module?

I think we can simply set nmi_injected to false unconditionally here, or even in
tdx_inject_nmi(). From KVM's perspective, NMI injection is complete right after
writing to PEND_NMI. It is the TDX module that should inject the NMI at the
right time and do the re-injection.


>+}
>+

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit
  2025-02-13  8:20   ` Chao Gao
@ 2025-02-13  8:55     ` Binbin Wu
  0 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2025-02-13  8:55 UTC (permalink / raw)
  To: Chao Gao
  Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
	reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
	yan.y.zhao, linux-kernel



On 2/13/2025 4:20 PM, Chao Gao wrote:
>> +static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
>> +{
>> +	/* Avoid costly SEAMCALL if no NMI was injected. */
>> +	if (vcpu->arch.nmi_injected) {
>> +		/*
>> +		 * No need to request KVM_REQ_EVENT because PEND_NMI is still
>> +		 * set if NMI re-injection needed.  No other event types need
>> +		 * to be handled because TDX doesn't support injection of
>> +		 * exception, SMI or interrupt (via event injection).
>> +		 */
>> +		vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
>> +							      TD_VCPU_PEND_NMI);
>> +	}
> Why does KVM care whether/when an NMI is injected by the TDX module?
>
> I think we can simply set nmi_injected to false unconditionally here, or even in
> tdx_inject_nmi(). From KVM's perspective, NMI injection is complete right after
> writing to PEND_NMI. It is the TDX module that should inject the NMI at the
> right time and do the re-injection.
Yes, it can/should be cleared unconditionally here.

Previously (v19 and before), nmi_injected will impact the limit of pending nmi.
Now, we don't care the limit of pending nmi because more pending NMIs will be
collapsed to the one pending in the TDX module.

Will update it.
Thanks!

>
>
>> +}
>> +


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  2025-02-12  5:51     ` Binbin Wu
@ 2025-02-14 17:15       ` Edgecombe, Rick P
  0 siblings, 0 replies; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-02-14 17:15 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com, binbin.wu@linux.intel.com
  Cc: Gao, Chao, Huang, Kai, Li, Xiaoyao, Lindgren, Tony,
	Hunter, Adrian, Chatre, Reinette, kvm@vger.kernel.org,
	Zhao, Yan Y, Yamahata, Isaku, linux-kernel@vger.kernel.org

On Wed, 2025-02-12 at 13:51 +0800, Binbin Wu wrote:
> > Rather than have a metric ton of stubs for all of the TDX variants, simply
> > omit
> > the wrappers when CONFIG_KVM_INTEL_TDX=n.  Quite nearly all of vmx/main.c
> > can go
> > under a single #ifdef.  That eliminates all the silly trampolines in the
> > generated
> > code, and almost all of the stubs.
> Thanks for the suggestion!
> 
> Since the changes will be across multiple sections of TDX KVM support,
> instead of modifying them individually, are you OK if we do it in a separate
> cleanup patch?

Paolo, since this would have small changes across the whole series, we would
have to figure out how get it into the "queued" kvm-coco-queue patches. I think
the two reasonable options would be to just have you do the change in kvm-coco-
queue, and we would merge our later other changes back into the resulting
branch. Or we could just do the macro as a cleanup patch after the base series.

We'll go with the second option unless we hear otherwise.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-02-14 17:16 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11  2:58 [PATCH v2 00/17] KVM: TDX: TDX interrupts Binbin Wu
2025-02-11  2:58 ` [PATCH v2 01/17] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
2025-02-11  7:23   ` Binbin Wu
2025-02-12  8:12   ` Chao Gao
2025-02-12 16:04     ` Sean Christopherson
2025-02-13  2:12       ` Chao Gao
2025-02-11  2:58 ` [PATCH v2 02/17] KVM: TDX: Disable PI wakeup for IPIv Binbin Wu
2025-02-11  2:58 ` [PATCH v2 03/17] KVM: VMX: Move posted interrupt delivery code to common header Binbin Wu
2025-02-13  6:59   ` Chao Gao
2025-02-11  2:58 ` [PATCH v2 04/17] KVM: TDX: Implement non-NMI interrupt injection Binbin Wu
2025-02-13  7:15   ` Chao Gao
2025-02-11  2:58 ` [PATCH v2 05/17] KVM: x86: Assume timer IRQ was injected if APIC state is protected Binbin Wu
2025-02-13  7:26   ` Chao Gao
2025-02-11  2:58 ` [PATCH v2 06/17] KVM: TDX: Wait lapic expire when timer IRQ was injected Binbin Wu
2025-02-11  2:58 ` [PATCH v2 07/17] KVM: TDX: Implement methods to inject NMI Binbin Wu
2025-02-11  2:58 ` [PATCH v2 08/17] KVM: TDX: Complete interrupts after TD exit Binbin Wu
2025-02-13  8:20   ` Chao Gao
2025-02-13  8:55     ` Binbin Wu
2025-02-11  2:58 ` [PATCH v2 09/17] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM Binbin Wu
2025-02-12  1:47   ` Sean Christopherson
2025-02-12  5:51     ` Binbin Wu
2025-02-14 17:15       ` Edgecombe, Rick P
2025-02-12 10:19     ` Huang, Kai
2025-02-11  2:58 ` [PATCH v2 10/17] KVM: TDX: Always block INIT/SIPI Binbin Wu
2025-02-11  2:58 ` [PATCH v2 11/17] KVM: TDX: Enforce KVM_IRQCHIP_SPLIT for TDX guests Binbin Wu
2025-02-11  2:58 ` [PATCH v2 12/17] KVM: TDX: Force APICv active for TDX guest Binbin Wu
2025-02-11  2:58 ` [PATCH v2 13/17] KVM: TDX: Add methods to ignore virtual apic related operation Binbin Wu
2025-02-11  2:58 ` [PATCH v2 14/17] KVM: VMX: Move emulation_required to struct vcpu_vt Binbin Wu
2025-02-11  2:58 ` [PATCH v2 15/17] KVM: VMX: Add a helper for NMI handling Binbin Wu
2025-02-12  1:10   ` Sean Christopherson
2025-02-11  2:58 ` [PATCH v2 16/17] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT Binbin Wu
2025-02-12  0:50   ` Sean Christopherson
2025-02-11  2:58 ` [PATCH v2 17/17] KVM: TDX: Handle EXIT_REASON_OTHER_SMI Binbin Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).