public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support
@ 2025-09-23  5:03 Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms Neeraj Upadhyay
                   ` (17 more replies)
  0 siblings, 18 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Introduction
------------

Secure AVIC is a new hardware feature in the AMD64 architecture to
allow SEV-SNP guests to prevent hypervisor from generating unexpected
interrupts to a vCPU or otherwise violate architectural assumptions
around APIC behavior.

One of the significant differences from AVIC or emulated x2APIC is that
Secure AVIC uses a guest-owned and managed APIC backing page. It also
introduces additional fields in both the VMCB and the Secure AVIC backing
page to aid the guest in limiting which interrupt vectors can be injected
into the guest.

Guest APIC Backing Page
-----------------------
Each vCPU has a guest-allocated APIC backing page, which maintains APIC
state for that vCPU. The x2APIC MSRs are mapped at their corresposing
x2APIC MMIO offset within the guest APIC backing page. All x2APIC accesses
by guest or Secure AVIC hardware operate on this backing page. The
backing page should be pinned and NPT entry for it should be always
mapped while the corresponding vCPU is running.

MSR Accesses
------------
Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
accesses are not supported.

Some of the MSR writes such as ICR writes (with shorthand equal to
self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
hardware. Other MSR writes generate a #VC exception (
VMEXIT_AVIC_NOACCEL or VMEXIT_AVIC_INCOMPLETE_IPI). The #VC
exception handler reads/writes to the guest APIC backing page.
As guest APIC backing page is accessible to the guest, guest can
optimize APIC register access by directly reading/writing to the
guest APIC backing page (instead of taking the #VC exception route).
APIC MSR reads are accelerated similar to AVIC, as described in
table "15-22. Guest vAPIC Register Access Behavior" of APM.

In addition to the architected MSRs, following new fields are added to
the guest APIC backing page which can be modified directly by the
guest:

a. ALLOWED_IRR

ALLOWED_IRR vector indicates the interrupt vectors which the guest
allows the hypervisor to send. The combination of host-controlled
REQUESTED_IRR vectors (part of VMCB) and guest-controlled ALLOWED_IRR
is used by hardware to update the IRR vectors of the Guest APIC
backing page.

#Offset        #bits        Description
204h           31:0         Guest allowed vectors 0-31
214h           31:0         Guest allowed vectors 32-63
...
274h           31:0         Guest allowed vectors 224-255

ALLOWED_IRR is meant to be used specifically for vectors that the
hypervisor emulates and is allowed to inject, such as IOAPIC/MSI
device interrupts.  Interrupt vectors used exclusively by the guest
itself (like IPI vectors) should not be allowed to be injected into
the guest for security reasons.

b. NMI Request
 
#Offset        #bits        Description
278h           0            Set by Guest to request Virtual NMI

Guest can set NMI_REQUEST to trigger APIC_ICR based NMIs.

APIC Registers
--------------

1. APIC ID

APIC_ID values is set by KVM and similar to x2apic, it is equal to
vcpu_id for a vCPU.

2. APIC LVR

APIC Version register is expected to be read from KVM's APIC state using
MSR_PROT RDMSR VMGEXIT and updated in the guest APIC backing page.

3. APIC TPR

TPR writes are accelerated and not communicated to KVM. So, the
hypervisor does not have information about TPR value for a vCPU.

4. APIC PPR

Current state of PPR is not visible to KVM.

5. APIC SPIV

Spurious Interrupt Vector register value is communicated by the guest to
the KVM.

6. APIC IRR and APIC ISR

IRR and ISR states are visible only to the guest. So, KVM cannot use these
registers to determine guest interrupts which are pending completion.

7. APIC TMR

Trigger Mode Register state is owned by the guest and not visible to KVM.
However, for IOAPIC external interrupts, KVM's software vAPIC trigger
mode is set from the guest-controlled redirection table. So, the APIC_TMR
values in the software vAPIC state can be used to identify between edge
and level triggered IOAPIC interrupts.

8. Timer registers - TMICT, TMCCT, TDCR

Timer registers are accessed using MSR_PROT VMGEXIT calls and not from the
guest APIC backing page.

9. LVT* registers

LVT registers state is accessed from KVM vAPIC state for the vCPU.

Idle HLT Intercept
-------------------

As KVM does not have access to the APIC IRR state for a Secure AVIC guest,
idle HLT intercept feature should be always enabled for a Secure AVIC
guest. Otherwise, any pending interrupts in vAPIC IRR during HLT VMEXIT
would not be serviced and the vCPU could get stuck in HLT until the next
wakeup event (which could arrive after non-deterministic amount of time).
For idle HLT intercept to work vAPIC TPR value should not block the
pending interrupts.

LAPIC Timer Support
-------------------
LAPIC timer is emulated by KVM. So, APIC_LVTT, APIC_TMICT and APIC_TDCR,
APIC_TMCCT APIC registers are not read/written to the guest APIC backing
page and are communicated to KVM using MSR_PROT VMGEXIT. 

IPI Support
-----------
Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPI
destination shorthands result in VMEXIT_AVIC_INCOMPLETE_IPI #VC exception.
The expected guest handling for VMEXIT_AVIC_INCOMPLETE_IPI is:

- For interrupts, update APIC_IRR in target vCPUs' guest APIC backing
  page.

- For NMIs, update NMI_REQUEST in target vCPUs' guest backing page.

- ICR based SMI, INIT, SIPI requests are not supported.

- After updating the target vCPU's guest APIC backing page, source vCPU
  does a MSR_PROT VMGEXIT.

- KVM either wakes up the non-running target vCPU or sends an AVIC doorbell.

Exceptions Injection
--------------------

Secure AVIC does not support event injection for guests with Secure AVIC
enabled in SEV_FEATURES. So, KVM cannot inject exceptions to Secure AVIC
guests. Hardware takes care of reinjecting an interrupted exception (for
example due to NPF) on next VMRUN. #VC exception is not reinjected. KVM
clears all exception intercepts for the Secure AVIC guest.

Interrupt Injection
-------------------

IOAPIC and MSI based device interrupts can be injected by KVM. The
interrupt flow for this is:

- IOAPIC/MSI interrupts are updated in KVM's APIC_IRR state via
  kvm_irq_delivery_to_apic().
- in ->inject_irq() callback, all interrupts which are set in KVM's
  APIC_IRR are copied to RequestedIRR VMCB field and UpdateIRR bit is
  set.
- VMRUN moves the current value of RequestedIRR to APIC_IRR in the
  guest APIC backing page and clears RequestedIRR, UpdateIRR.

Given that hardware clearing of RequestedIRR and UpdateIRR can race with
KVM's writes to these fields, above interrupt injection flow ensures
that all RequestedIRR and UpdateIRR writes are done from the same CPU
where the vCPU is run.

As interrupt delivery to a vCPU is managed by hardware, interrupt window
is not applicable for Secure AVIC guests and interrupts are always
allowed to be injected.

PIC interrupts
--------------

Legacy PIC interrupts cannot be injected as they require event_inj or
VINTR injection support. Both of these cannot be done for Secure
AVIC guest.

PIT
---

PIT Reinject mode is not supported for edge-triggered interrupts, as it
requires IRQ ack notification on EOI. As EOI is accelerated by Secure
AVIC hardware for edge- triggered interrupts, IRQ ack notification is
not called for them.

NMI Injection
-------------

NMI injection requires ALLOWED_NMI to be set in Secure AVIC control MSR
by the guest. Only VNMI injection is allowed.

Design Caveats, Open Points and Improvement Opportunities
---------------------------------------------------------

- Current code uses KVM's vAPIC APIC_IRR for storing the interrupts which
  need to be injected to the guest. It then reuses the exiting KVM's
  interrupt injection flow (with some modifications to the injectable
  interrupt determination).
  
  While functional, this approach conflates the state of KVM's
  software-emulated vAPIC with the state of the hardware-accelerated Secure
  AVIC. This can make the code harder to reason about. A cleaner approach
  could be desired here which would introduce a dedicated struct for
  holding SAVIC-specific state, completely decoupling it from the software
  lapic state and avoiding this overload of semantics.
  
  In addition, preserving the existing notion of a boolean
  guest_apic_protected instead of having to subcategorize it based on the
  interrupt injection flow would be desired. Given that KVM cannot use the
  TDX's PI (asynchronous interrupt injection) mechanism for SAVIC and must
  instead adopt the pre-VMRUN injection model of writing to the
  guest-visible backing page, this would require creating a separate flow
  for moving the KVM's pending interrupts for the vCPU to the RequestedIRR
  field.

- EOI handling for level-triggered interrupts uses KVM's unused vAPIC
  APIC_ISR regs for tracking pending level interrupts. KVM uses its
  APIC_TMR state to determine level-triggered interrupts. As KVM's
  APIC_TMR is updated from IOAPIC redirection tables, the TMR information
  should be accurate and match the guest vAPIC state.
  
  This can be cleaned up to not use KVM's vAPIC APIC_ISR state and 
  maintain the state within sev code.

- RTC_GSI requires pending EOI information to detect coalesced interrupts.
  As RTC_GSI is edge triggered, Secure AVIC does not forward EOI write to
  KVM for this interrupt. In addition, APIC_IRR and APIC_ISR states are
  not visible to KVM and are part of the guest APIC backing page. Approach
  taken in this series is to disable checking of coalesced RTC_GSI
  interrupts for Secure AVIC, which could impact userspace code which
  relies on detecting RT_GSI interrupt coalescing.
  
  Alternate approach would be to not support in-kernel IOAPIC emulation for
  Secure AVIC guests, similar to TDX.

- As exceptions cannot be injected by KVM, a more detailed examination
  of which exception intercepts need to be allowed for Secure AVIC
  guests is required.

- As KVM does not have access to the guest's APIC_IRR and APIC_ISR
  states, kvm_apic_pending_eoi() does not return correct information.

- External interrupts (PIC) are not supported. This breaks KVM's PIC
  emulation.

- PIT reinject mode is not supported.

Changes since v1:

v1: https://lore.kernel.org/lkml/20250228085115.105648-1-Neeraj.Upadhyay@amd.com/

- Rebased and resolved conflicts with the latest kvm next snapshot.
- Replaced enum with a separate lapic struct member to differentiate
  protected APIC's interrupt injection mechanism.
- Add a patch to disable KVM_FEATURE_PV_EOI and KVM_FEATURE_PV_SEND_IPI
  for protected APIC guests.
- Dropped SPIV hack patch, which always returns true from
  kvm_apic_sw_enabled():   20250228085115.105648-16-Neeraj.Upadhyay@amd.com
  Instead of this, rely on guest propagating APIC_SPIV value to KVM.
- Updates the the commit logs and cover letter to provide more
  description.

This series is based on top of commit a6ad54137af9 ("Merge branch
'guest-memfd-mmap' into HEAD") and is based on

  git.kernel.org/pub/scm/virt/kvm/kvm.git next

Git tree is available at:

  https://github.com/AMDESE/linux-kvm/tree/savic-host-latest

In addition, below patch from v1 is required, until SAVIC guest is
updated to propagate APIC_SPIV to the hypervisor.

  20250228085115.105648-16-Neeraj.Upadhyay@amd.com

Qemu tree is at:
  https://github.com/AMDESE/qemu/tree/secure-avic
  
QEMU commandline for testing Secure AVIC enabled guest:

qemu-system-x86_64 <...> -object sev-snp-guest,id=sev0,policy=0xb0000,cbitpos=51,reduced-phys-bits=1,allowed-sev-features=true,secure-avic=true

Guest Support is present in tip/tip master branch at the commit snapshot
835794d1ae4c ("Merge branch into tip/master: 'x86/tdx'").

Kishon Vijay Abraham I (2):
  KVM: SVM: Do not inject exception for Secure AVIC
  KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests

Neeraj Upadhyay (15):
  KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
  x86/cpufeatures: Add Secure AVIC CPU feature
  KVM: SVM: Add support for Secure AVIC capability in KVM
  KVM: SVM: Set guest APIC protection flags for Secure AVIC
  KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  KVM: SVM: Implement interrupt injection for Secure AVIC
  KVM: SVM: Add IPI Delivery Support for Secure AVIC
  KVM: SVM: Do not intercept exceptions for Secure AVIC guests
  KVM: SVM: Enable NMI support for Secure AVIC guests
  KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
  KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
  KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
  KVM: SVM: Check injected timers for Secure AVIC guests
  KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
  KVM: SVM: Advertise Secure AVIC support for SNP guests

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/msr-index.h   |   1 +
 arch/x86/include/asm/svm.h         |   9 +-
 arch/x86/include/uapi/asm/svm.h    |   3 +
 arch/x86/kvm/cpuid.c               |   4 +
 arch/x86/kvm/ioapic.c              |   8 +-
 arch/x86/kvm/lapic.c               |  17 +-
 arch/x86/kvm/lapic.h               |   5 +-
 arch/x86/kvm/svm/sev.c             | 367 ++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c             |  80 +++++--
 arch/x86/kvm/svm/svm.h             |  14 ++
 arch/x86/kvm/x86.c                 |  15 +-
 12 files changed, 493 insertions(+), 31 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature Neeraj Upadhyay
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The existing guest_apic_protected boolean flag is insufficient for
handling different protected guest technologies. While both Intel TDX
and AMD SNP (with Secure AVIC) protect the virtual APIC, they use
fundamentally different interrupt delivery mechanisms.

TDX relies on hardware-managed Posted Interrupts, whereas Secure AVIC
requires KVM to perform explicit software-based interrupt injection.
The current flag cannot distinguish between these two models.

To address this, introduce a new flag, prot_apic_intr_inject. This flag
is true for protected guests that require KVM to inject interrupts and
false for those that use a hardware-managed delivery mechanism.

This preparatory change allows subsequent commits to implement the correct
interrupt handling logic for Secure AVIC.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/lapic.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 72de14527698..f48218fd4638 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -70,7 +70,10 @@ struct kvm_lapic {
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
 	/* Select registers in the vAPIC cannot be read/written. */
-	bool guest_apic_protected;
+	struct {
+		bool guest_apic_protected;
+		bool prot_apic_intr_inject;
+	};
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM Neeraj Upadhyay
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Add CPU feature detection for Secure AVIC. The Secure AVIC feature
provides hardware acceleration for performance sensitive APIC accesses
and support for managing guest-owned APIC state for the SEV-SNP guests.

Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 06fc0479a23f..d855825b1b9e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -449,6 +449,7 @@
 #define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* "debug_swap" SEV-ES full debug state swap support */
 #define X86_FEATURE_RMPREAD		(19*32+21) /* RMPREAD instruction */
 #define X86_FEATURE_SEGMENTED_RMP	(19*32+23) /* Segmented RMP support */
+#define X86_FEATURE_SECURE_AVIC		(19*32+26) /* Secure AVIC */
 #define X86_FEATURE_ALLOWED_SEV_FEATURES (19*32+27) /* Allowed SEV Features */
 #define X86_FEATURE_SVSM		(19*32+28) /* "svsm" SVSM present */
 #define X86_FEATURE_HV_INUSE_WR_ALLOWED	(19*32+30) /* Allow Write to in-use hypervisor-owned pages */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC Neeraj Upadhyay
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Add support to KVM for determining if a system is capable of supporting
Secure AVIC feature.

Secure AVIC feature support is determined based on:

- secure_avic module parameter is set.
- X86_FEATURE_SECURE_AVIC CPU feature bit is set.
- SNP feature is supported.

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/include/asm/svm.h | 1 +
 arch/x86/kvm/svm/sev.c     | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc27f676243..ab3d55654c77 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -299,6 +299,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define SVM_SEV_FEAT_RESTRICTED_INJECTION		BIT(3)
 #define SVM_SEV_FEAT_ALTERNATE_INJECTION		BIT(4)
 #define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
+#define SVM_SEV_FEAT_SECURE_AVIC			BIT(16)
 
 #define VMCB_ALLOWED_SEV_FEATURES_VALID			BIT_ULL(63)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bac4d20aec0..b2eae102681c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,6 +59,10 @@ static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 static u64 sev_supported_vmsa_features;
 
+/* enable/disable SEV-SNP Secure AVIC support */
+bool sev_snp_savic_enabled = true;
+module_param_named(secure_avic, sev_snp_savic_enabled, bool, 0444);
+
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
@@ -2911,6 +2915,8 @@ void __init sev_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
 	}
+	if (sev_snp_savic_enabled)
+		kvm_cpu_cap_set(X86_FEATURE_SECURE_AVIC);
 }
 
 static bool is_sev_snp_initialized(void)
@@ -3075,6 +3081,9 @@ void __init sev_hardware_setup(void)
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
 
+	if (!sev_snp_supported || !cpu_feature_enabled(X86_FEATURE_SECURE_AVIC))
+		sev_snp_savic_enabled = false;
+
 	sev_supported_vmsa_features = 0;
 	if (sev_es_debug_swap_enabled)
 		sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (2 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests Neeraj Upadhyay
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Secure AVIC provides a hardware-backed, protected virtual APIC for
SNP guests. When this feature is active, KVM cannot directly access
the virtual APIC state and must use software-based interrupt injection
to deliver interrupts to the guest.

Introduce a helper, sev_savic_active(), to detect when a VM has Secure AVIC
enabled based on its VMSA features.

At vCPU creation time, use this helper to set the appropriate APIC flags:
 - guest_apic_protected is set to true, as the APIC state is not visible
   to KVM.
 - prot_apic_intr_inject is set to true to signal that the software
   injection path must be used for interrupt delivery.

This ensures that the core APIC code can correctly identify and handle
Secure AVIC guests.

This is only an initialization commit and actual support for creating
Secure AVIC enabled guests and injecting interrupts will be added in
later commits.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/svm.c | 5 +++++
 arch/x86/kvm/svm/svm.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8a66e2e985a4..064ec98d7e67 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1300,6 +1300,11 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	if (err)
 		goto error_free_vmsa_page;
 
+	if (sev_savic_active(vcpu->kvm)) {
+		vcpu->arch.apic->guest_apic_protected = true;
+		vcpu->arch.apic->prot_apic_intr_inject = true;
+	}
+
 	svm->msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->msrpm) {
 		err = -ENOMEM;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 70df7c6413cf..1090a48adeda 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -869,6 +869,10 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
+static inline bool sev_savic_active(struct kvm *kvm)
+{
+	return to_kvm_sev_info(kvm)->vmsa_features & SVM_SEV_FEAT_SECURE_AVIC;
+}
 #else
 static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
 {
@@ -899,6 +903,7 @@ static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, boo
 {
 	return 0;
 }
+static inline bool sev_savic_active(struct kvm *kvm) { return false; }
 
 static inline struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (3 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 13:55   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC Neeraj Upadhyay
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
guest APIC backing page and bitfields to control enablement of Secure
AVIC and whether the guest allows NMIs to be injected by the hypervisor.
This MSR is populated by the guest and can be read by the guest to get
the GPA of the APIC backing page. The MSR can only be accessed in Secure
AVIC mode; accessing it when not in Secure AVIC mode results in #GP. So,
KVM should not intercept it.

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/include/asm/msr-index.h | 1 +
 arch/x86/kvm/svm/sev.c           | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b65c3ba5fa14..9f16030dd849 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -707,6 +707,7 @@
 #define MSR_AMD64_SEG_RMP_ENABLED_BIT	0
 #define MSR_AMD64_SEG_RMP_ENABLED	BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
 #define MSR_AMD64_RMP_SEGMENT_SHIFT(x)	(((x) & GENMASK_ULL(13, 8)) >> 8)
+#define MSR_AMD64_SAVIC_CONTROL		0xc0010138
 
 #define MSR_SVSM_CAA			0xc001f000
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b2eae102681c..afe4127a1918 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4487,7 +4487,8 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 
 static void sev_es_init_vmcb(struct vcpu_svm *svm)
 {
-	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
 	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
@@ -4546,6 +4547,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
+
+	if (sev_savic_active(vcpu->kvm))
+		svm_set_intercept_for_msr(vcpu, MSR_AMD64_SAVIC_CONTROL, MSR_TYPE_RW, false);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (4 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 14:47   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support " Neeraj Upadhyay
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

For AMD SEV-SNP guests with Secure AVIC, the virtual APIC state is
not visible to KVM and managed by the hardware. This renders the
traditional interrupt injection mechanism, which directly modifies
guest state, unusable. Instead, interrupt delivery must be mediated
through a new interface in the VMCB. Implement support for this
mechanism.

First, new VMCB control fields, requested_irr and update_irr, are
defined to allow KVM to communicate pending interrupts to the hardware
before VMRUN.

Hook the core interrupt injection path, svm_inject_irq(). Instead of
injecting directly, transfer pending interrupts from KVM's software
IRR to the new requested_irr VMCB field and delegate final delivery
to the hardware.

Since the hardware is now responsible for the timing and delivery of
interrupts to the guest (including managing the guest's RFLAGS.IF and
vAPIC state), bypass the standard KVM interrupt window checks in
svm_interrupt_allowed() and svm_enable_irq_window(). Similarly, interrupt
re-injection is handled by the hardware and requires no explicit KVM
involvement.

Finally, update the logic for detecting pending interrupts. Add the
vendor op, protected_apic_has_interrupt(), to check only KVM's software
vAPIC IRR state.

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/include/asm/svm.h |  8 +++++--
 arch/x86/kvm/lapic.c       | 17 ++++++++++++---
 arch/x86/kvm/svm/sev.c     | 44 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c     | 13 +++++++++++
 arch/x86/kvm/svm/svm.h     |  4 ++++
 arch/x86/kvm/x86.c         | 15 ++++++++++++-
 6 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ab3d55654c77..0faf262f9f9f 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -162,10 +162,14 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 vmsa_pa;		/* Used for an SEV-ES guest */
 	u8 reserved_8[16];
 	u16 bus_lock_counter;		/* Offset 0x120 */
-	u8 reserved_9[22];
+	u8 reserved_9[18];
+	u8 update_irr;			/* Offset 0x134 */
+	u8 reserved_10[3];
 	u64 allowed_sev_features;	/* Offset 0x138 */
 	u64 guest_sev_features;		/* Offset 0x140 */
-	u8 reserved_10[664];
+	u8 reserved_11[8];
+	u32 requested_irr[8];		/* Offset 0x150 */
+	u8 reserved_12[624];
 	/*
 	 * Offset 0x3e0, 32 bytes reserved
 	 * for use by hypervisor/software.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fc437341e03..3199c7c6db05 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2938,11 +2938,22 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;
 
-	if (apic->guest_apic_protected)
+	if (!apic->guest_apic_protected) {
+		__apic_update_ppr(apic, &ppr);
+		return apic_has_interrupt_for_ppr(apic, ppr);
+	}
+
+	if (!apic->prot_apic_intr_inject)
 		return -1;
 
-	__apic_update_ppr(apic, &ppr);
-	return apic_has_interrupt_for_ppr(apic, ppr);
+	/*
+	 * For guest-protected virtual APIC, hardware manages the virtual
+	 * PPR and interrupt delivery to the guest. So, checking the KVM
+	 * managed virtual APIC's APIC_IRR state for any pending vectors
+	 * is the only thing required here.
+	 */
+	return apic_search_irr(apic);
+
 }
 EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index afe4127a1918..78cefc14a2ee 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -28,6 +28,7 @@
 #include <asm/debugreg.h>
 #include <asm/msr.h>
 #include <asm/sev.h>
+#include <asm/apic.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -35,6 +36,7 @@
 #include "svm_ops.h"
 #include "cpuid.h"
 #include "trace.h"
+#include "lapic.h"
 
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_DEFAULT	2ULL
@@ -5064,3 +5066,45 @@ void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa)
 
 	free_page((unsigned long)vmsa);
 }
+
+void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
+{
+	unsigned int i, vec, vec_pos, vec_start;
+	struct kvm_lapic *apic;
+	bool has_interrupts;
+	u32 val;
+
+	/* Secure AVIC HW takes care of re-injection */
+	if (reinjected)
+		return;
+
+	apic = svm->vcpu.arch.apic;
+	has_interrupts = false;
+
+	for (i = 0; i < ARRAY_SIZE(svm->vmcb->control.requested_irr); i++) {
+		val = apic_get_reg(apic->regs, APIC_IRR + i * 0x10);
+		if (!val)
+			continue;
+		has_interrupts = true;
+		svm->vmcb->control.requested_irr[i] |= val;
+		vec_start = i * 32;
+		/*
+		 * Clear each vector one by one to avoid race with concurrent
+		 * APIC_IRR updates from the deliver_interrupt() path.
+		 */
+		do {
+			vec_pos = __ffs(val);
+			vec = vec_start + vec_pos;
+			apic_clear_vector(vec, apic->regs + APIC_IRR);
+			val = val & ~BIT(vec_pos);
+		} while (val);
+	}
+
+	if (has_interrupts)
+		svm->vmcb->control.update_irr |= BIT(0);
+}
+
+bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
+{
+	return kvm_apic_has_interrupt(vcpu) != -1;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 064ec98d7e67..7811a87bc111 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -52,6 +52,8 @@
 #include "svm.h"
 #include "svm_ops.h"
 
+#include "lapic.h"
+
 #include "kvm_onhyperv.h"
 #include "svm_onhyperv.h"
 
@@ -3689,6 +3691,9 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u32 type;
 
+	if (sev_savic_active(vcpu->kvm))
+		return sev_savic_set_requested_irr(svm, reinjected);
+
 	if (vcpu->arch.interrupt.soft) {
 		if (svm_update_soft_interrupt_rip(vcpu))
 			return;
@@ -3870,6 +3875,9 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (sev_savic_active(vcpu->kvm))
+		return 1;
+
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
@@ -3890,6 +3898,9 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (sev_savic_active(vcpu->kvm))
+		return;
+
 	/*
 	 * In case GIF=0 we can't rely on the CPU to tell us when GIF becomes
 	 * 1, because that's a separate STGI/VMRUN intercept.  The next time we
@@ -5132,6 +5143,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.apicv_post_state_restore = avic_apicv_post_state_restore,
 	.required_apicv_inhibits = AVIC_REQUIRED_APICV_INHIBITS,
 
+	.protected_apic_has_interrupt = sev_savic_has_pending_interrupt,
+
 	.get_exit_info = svm_get_exit_info,
 	.get_entry_info = svm_get_entry_info,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1090a48adeda..60dc424d62c4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -873,6 +873,8 @@ static inline bool sev_savic_active(struct kvm *kvm)
 {
 	return to_kvm_sev_info(kvm)->vmsa_features & SVM_SEV_FEAT_SECURE_AVIC;
 }
+void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected);
+bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu);
 #else
 static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
 {
@@ -910,6 +912,8 @@ static inline struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
 	return NULL;
 }
 static inline void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) {}
+static inline void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected) {}
+static inline bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; }
 #endif
 
 /* vmenter.S */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 33fba801b205..65ebdc6deb92 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10369,7 +10369,20 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
 		if (r < 0)
 			goto out;
 		if (r) {
-			int irq = kvm_cpu_get_interrupt(vcpu);
+			int irq;
+
+			/*
+			 * Do not ack the interrupt here for guest-protected VAPIC
+			 * which requires interrupt injection to the guest.
+			 *
+			 * ->inject_irq reads the KVM's VAPIC's APIC_IRR state and
+			 * clears it.
+			 */
+			if (vcpu->arch.apic->guest_apic_protected &&
+			    vcpu->arch.apic->prot_apic_intr_inject)
+				irq = kvm_apic_has_interrupt(vcpu);
+			else
+				irq = kvm_cpu_get_interrupt(vcpu);
 
 			if (!WARN_ON_ONCE(irq == -1)) {
 				kvm_queue_interrupt(vcpu, irq, false);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support for Secure AVIC
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (5 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception " Neeraj Upadhyay
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Secure AVIC hardware only accelerates Self-IPI, i.e. on WRMSR to
APIC_SELF_IPI and APIC_ICR (with destination shorthand equal to "self")
registers, hardware takes care of updating the APIC_IRR in the guest-owned
APIC backing page of the vCPU. For other IPI types (cross-vCPU, broadcast
IPIs), software needs to take care of updating the APIC_IRR state in the
target vCPUs' APIC backing page and to ensure that the target vCPU notices
the new pending interrupt.

To ensure that the remote vCPU notices the new pending interrupt, the guest
sends a APIC_ICR MSR-write GHCB protocol event to the hypervisor.

Handle the APIC_ICR write MSR exits for Secure AVIC guests by either
sending an AVIC doorbell (if the target vCPU is running) or by waking up
the non-running target vCPU thread.

To ensure that the target vCPU observes the new IPI request, introduce a
new per-vcpu flag, sev_savic_has_pending_ipi. This flag acts as a reliable
"sticky bit" that signals a pending IPI, ensuring the event is not lost
even if the primary wakeup mechanism is missed. Update
sev_savic_has_pending_interrupt() to return true if
sev_savic_has_pending_ipi is set. This ensures that when a vCPU is about
to block (in kvm_vcpu_block()), it correctly recognizes that it has work
to do and will not go to sleep.

Clear the sev_savic_has_pending_ipi flag in pre_sev_run() just before the
next VM-entry. This resets the one-shot signal, as the pending interrupt
is now about to be processed by the hardware upon VMRUN.

During APIC_ICR write GHCB request handling, unconditionally set
sev_savic_has_pending_ipi for the target vCPU irrespective of whether the
target vCPU is in guest mode or not. If the target vCPU does not take any
other VMEXIT before taking next hlt exit, the vCPU blocking fails as
sev_savic_has_pending_ipi remains set. The sev_savic_has_pending_ipi is
cleared before next VMRUN and on subsequent hlt exit the vCPU thread
would block.

Following are the race conditions which can occur between target vCPU
doing hlt and the source vCPU's IPI request handling.

a. VMEXIT before HLT when RFLAGS.IF = 0 or Interrupt shadow is active.

   #Source-vCPU                          #Target-VCPU

   1. sev_savic_has_pending_ipi = true
   2. smp_mb();
                                         3. Disable interrupts
   4. Target vCPU is in guest mode
   5. Raise AVIC doorbell to target
      vCPU's physical APIC_ID
                                         6. VMEXIT
                                         7. sev_savic_has_pending_ipi =
                                            false
                                         8. VMRUN
                                         9. HLT
                                        10. VMEXIT
                                        11. kvm_arch_vcpu_runnable()
                                            returns false
                                        12. vCPU thread blocks

   In this scenario IDLE HLT intercept ensures that the target vCPU does
   not take hlt intercept as V_INTR is set (AVIC doorbell by source vCPU
   triggers evaluation of Secure AVIC backing page of the target vCPU
   and sets V_INTR).

b. Target vCPU takes HLT VMEXIT but hasn't cleared IN_GUEST_MODE at the
   time when doorbell write is issued by source CPU.

   #Source-vCPU                          #Target-VCPU

   1. sev_savic_has_pending_ipi = true
   2. smp_mb();
   3. Target vCPU is in guest mode
                                         4. HLT
                                         5. VMEXIT
   6. Raise AVIC doorbell to the target
      physical CPU.
                                         7. vcpu->mode =
                                              OUTSIDE_GUEST_MODE
                                         8. kvm_cpu_has_interrupt()
                                             protected_..._interrupt()
                                              smp_mb()
                                              sev_savic_has_pending_ipi is
                                              true

   In this case, the smp_mb() barriers at 2, 8 guarantee that the target
   vCPU's thread observes sev_savic_has_pending_ipi is set and returns to
   the guest mode without blocking.

c. For other cases, where the source vCPU thread observes the target vCPU
   to be outside of the guest mode, memory barriers in rcuwait_wake_up()
   (source vCPU thread) and set_current_state() (target vCPU thread)
   provides the required ordering and ensures that read of
   sev_savic_has_pending_ipi in kvm_vcpu_check_block() observes the write
   by the source vCPU.

   #Source-vCPU                          #Target-VCPU

   rcuwait_wake_up()
     smp_mb()
     task = rcu_dereference(w->task);
     if (task)
       wake_up_process()
                                        prepare_to_rcuwait()
                                          w->task = current
                                        set_current_state(
                                            TASK_INTERRUPTIBLE)
                                          smp_mb()
                                        kvm_vcpu_check_block()
                                          kvm_cpu_has_interrupt()
                                            <Read sev_savic_has_..._ipi>

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 218 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h |   2 +
 2 files changed, 219 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 78cefc14a2ee..a64fcc7637c7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3511,6 +3511,89 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
 		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
 
+	/*
+	 * It should be safe to clear sev_savic_has_pending_ipi here.
+	 *
+	 * Following are the scenarios possible:
+	 *
+	 * Scenario 1: sev_savic_has_pending_ipi is set before hlt exit of the
+	 * target vCPU.
+	 *
+	 * Source vCPU                     Target vCPU
+	 *
+	 * 1. Set APIC_IRR of target
+	 *    vCPU.
+	 *
+	 * 2. VMGEXIT
+	 *
+	 * 3. Set ...has_pending_ipi
+	 *
+	 * savic_handle_icr_write()
+	 *   ..._has_pending_ipi = true
+	 *
+	 * 4. avic_ring_doorbell()
+	 *                            - VS -
+	 *
+	 *				   4. VMEXIT
+	 *
+	 *                                 5. ..._has_pending_ipi = false
+	 *
+	 *                                 6. VM entry
+	 *
+	 *                                 7. hlt exit
+	 *
+	 * In this case, any VM exit taken by target vCPU before hlt exit
+	 * clears sev_savic_has_pending_ipi. On hlt exit, idle halt intercept
+	 * would find the V_INTR set and skip hlt exit.
+	 *
+	 * Scenario 2: sev_savic_has_pending_ipi is set when target vCPU
+	 * has taken hlt exit.
+	 *
+	 * Source vCPU                     Target vCPU
+	 *
+	 *                                 1. hlt exit
+	 *
+	 * 2. Set ...has_pending_ipi
+	 *                                 3. kvm_vcpu_has_events() returns true
+	 *                                    and VM is reentered.
+	 *
+	 *                                    vcpu_block()
+	 *                                      kvm_arch_vcpu_runnable()
+	 *                                        kvm_vcpu_has_events()
+	 *                                          <return true as ..._has_pending_ipi
+	 *                                           is set>
+	 *
+	 *                                 4. On VM entry, APIC_IRR state is re-evaluated
+	 *                                    and V_INTR is set and interrupt is delivered
+	 *                                    to vCPU.
+	 *
+	 *
+	 * Scenario 3: sev_savic_has_pending_ipi is set while halt exit is happening:
+	 *
+	 *
+	 * Source vCPU                        Target vCPU
+	 *
+	 *                                  1. hlt
+	 *                                       Hardware check V_INTR to determine
+	 *                                       if hlt exit need to be taken. No other
+	 *                                       exit such as intr exit can be taken
+	 *                                       while this sequence is being executed.
+	 *
+	 * 2. Set APIC_IRR of target vCPU.
+	 *
+	 * 3. Set ...has_pending_ipi
+	 *                                  4. hlt exit taken.
+	 *
+	 *                                  5. ...has_pending_ipi being set is observed
+	 *                                     by target vCPU and the vCPU is resumed.
+	 *
+	 * In this scenario, hardware ensures that target vCPU does not take any exit
+	 * between checking V_INTR state and halt exit. So, sev_savic_has_pending_ipi
+	 * remains set when vCPU takes hlt exit.
+	 */
+	if (READ_ONCE(svm->sev_savic_has_pending_ipi))
+		WRITE_ONCE(svm->sev_savic_has_pending_ipi, false);
+
 	/* Assign the asid allocated with this SEV guest */
 	svm->asid = asid;
 
@@ -4281,6 +4364,129 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 	return 0;
 }
 
+static void savic_handle_icr_write(struct kvm_vcpu *kvm_vcpu, u64 icr)
+{
+	struct kvm *kvm = kvm_vcpu->kvm;
+	struct kvm_vcpu *vcpu;
+	u32 icr_low, icr_high;
+	bool in_guest_mode;
+	unsigned long i;
+
+	icr_low = lower_32_bits(icr);
+	icr_high = upper_32_bits(icr);
+
+	/*
+	 * TODO: Instead of scanning all the vCPUS, get fastpath working which should
+	 * look similar to avic_kick_target_vcpus_fast().
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_match_dest(vcpu, kvm_vcpu->arch.apic, icr_low & APIC_SHORT_MASK,
+					 icr_high, icr_low & APIC_DEST_MASK))
+			continue;
+
+		/*
+		 * Setting sev_savic_has_pending_ipi could result in a spurious
+		 * return from hlt (as kvm_cpu_has_interrupt() would return true)
+		 * if destination CPU is in guest mode and the guest takes a hlt
+		 * exit after handling the IPI. sev_savic_has_pending_ipi gets cleared
+		 * on VM entry, so there can be at most one spurious return per IPI.
+		 * For vcpu->mode == IN_GUEST_MODE, sev_savic_has_pending_ipi need
+		 * to be set to handle the case where the destination vCPU has taken
+		 * hlt exit and the source CPU has not observed (target)vcpu->mode !=
+		 * IN_GUEST_MODE.
+		 */
+		WRITE_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi, true);
+		/* Order sev_savic_has_pending_ipi write and vcpu->mode read. */
+		smp_mb();
+		/* Pairs with smp_store_release in vcpu_enter_guest. */
+		in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
+		if (in_guest_mode) {
+			/*
+			 * Signal the doorbell to tell hardware to inject the IRQ.
+			 *
+			 * If the vCPU exits the guest before the doorbell chimes,
+			 * below memory ordering guarantees that the destination vCPU
+			 * observes sev_savic_has_pending_ipi == true before
+			 * blocking.
+			 *
+			 *   Src-CPU                       Dest-CPU
+			 *
+			 *  savic_handle_icr_write()
+			 *    sev_savic_has_pending_ipi = true
+			 *    smp_mb()
+			 *    smp_load_acquire(&vcpu->mode)
+			 *
+			 *                    - VS -
+			 *                              vcpu->mode = OUTSIDE_GUEST_MODE
+			 *                              __kvm_emulate_halt()
+			 *                                kvm_cpu_has_interrupt()
+			 *                                  smp_mb()
+			 *                                  if (sev_savic_has_pending_ipi)
+			 *                                      return true;
+			 *
+			 *   [S1]
+			 *     sev_savic_has_pending_ipi = true
+			 *
+			 *     SMP_MB
+			 *
+			 *   [L1]
+			 *     vcpu->mode
+			 *                                  [S2]
+			 *                                  vcpu->mode = OUTSIDE_GUEST_MODE
+			 *
+			 *
+			 *                                  SMP_MB
+			 *
+			 *                                  [L2] sev_savic_has_pending_ipi == true
+			 *
+			 *   exists (L1=IN_GUEST_MODE /\ L2=false)
+			 *
+			 *   Above condition does not exit. So, if the source CPU observes
+			 *   vcpu->mode = IN_GUEST_MODE (L1), sev_savic_has_pending_ipi load by
+			 *   the destination CPU (L2) should observe the store (S1) from the
+			 *   source CPU.
+			 */
+			avic_ring_doorbell(vcpu);
+		} else {
+			/*
+			 * Wakeup the vCPU if it was blocking.
+			 *
+			 * Memory ordering is provided by smp_mb() in rcuwait_wake_up() on the
+			 * source CPU and smp_mb() in set_current_state() inside kvm_vcpu_block()
+			 * on the destination CPU.
+			 */
+			kvm_vcpu_kick(vcpu);
+		}
+	}
+}
+
+static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
+{
+	u32 msr, reg;
+
+	msr = kvm_rcx_read(vcpu);
+	reg = (msr - APIC_BASE_MSR) << 4;
+
+	switch (reg) {
+	case APIC_ICR:
+		/*
+		 * Only APIC_ICR WRMSR requires special handling for Secure AVIC
+		 * guests to wake up destination vCPUs.
+		 */
+		if (to_svm(vcpu)->vmcb->control.exit_info_1) {
+			u64 data = kvm_read_edx_eax(vcpu);
+
+			savic_handle_icr_write(vcpu, data);
+			return true;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4419,6 +4625,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 			    control->exit_info_1, control->exit_info_2);
 		ret = -EINVAL;
 		break;
+	case SVM_EXIT_MSR:
+		if (sev_savic_active(vcpu->kvm) && savic_handle_msr_exit(vcpu))
+			return 1;
+
+		fallthrough;
 	default:
 		ret = svm_invoke_exit_handler(vcpu, exit_code);
 	}
@@ -5106,5 +5317,10 @@ void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
 
 bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
-	return kvm_apic_has_interrupt(vcpu) != -1;
+	/*
+	 * See memory ordering description in savic_handle_icr_write().
+	 */
+	smp_mb();
+	return READ_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi) ||
+		kvm_apic_has_interrupt(vcpu) != -1;
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 60dc424d62c4..a3edb6e720cd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -335,6 +335,8 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	bool sev_savic_has_pending_ipi;
 };
 
 struct svm_cpu_data {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception for Secure AVIC
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (6 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support " Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 15:00   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests Neeraj Upadhyay
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

From: Kishon Vijay Abraham I <kvijayab@amd.com>

Secure AVIC does not support injecting exception from the hypervisor.
Take an early return from svm_inject_exception() for Secure AVIC enabled
guests.

Hardware takes care of delivering exceptions initiated by the guest as
well as re-injecting exceptions initiated by the guest (in case there's
an intercept before delivering the exception).

Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7811a87bc111..fdd612c975ae 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -374,6 +374,9 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
 	struct kvm_queued_exception *ex = &vcpu->arch.exception;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (sev_savic_active(vcpu->kvm))
+		return;
+
 	kvm_deliver_exception_payload(vcpu, ex);
 
 	if (kvm_exception_is_soft(ex->vector) &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (7 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception " Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 15:15   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area " Neeraj Upadhyay
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

Exceptions cannot be explicitly injected from the hypervisor to
Secure AVIC enabled guests. So, KVM cannot inject exceptions into
a Secure AVIC guest. If KVM were to intercept an exception (e.g., #PF
or #GP), it would be unable to deliver it back to the guest, effectively
dropping the event and leading to guest misbehavior or hangs. So,
clear exception intercepts so that all exceptions are handled directly by
the guest without KVM intervention.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a64fcc7637c7..837ab55a3330 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4761,8 +4761,17 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
-	if (sev_savic_active(vcpu->kvm))
+	if (sev_savic_active(vcpu->kvm)) {
 		svm_set_intercept_for_msr(vcpu, MSR_AMD64_SAVIC_CONTROL, MSR_TYPE_RW, false);
+
+		/* Clear all exception intercepts. */
+		clr_exception_intercept(svm, PF_VECTOR);
+		clr_exception_intercept(svm, UD_VECTOR);
+		clr_exception_intercept(svm, MC_VECTOR);
+		clr_exception_intercept(svm, AC_VECTOR);
+		clr_exception_intercept(svm, DB_VECTOR);
+		clr_exception_intercept(svm, GP_VECTOR);
+	}
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (8 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 15:16   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support " Neeraj Upadhyay
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

From: Kishon Vijay Abraham I <kvijayab@amd.com>

Unlike standard SVM which uses the V_GIF (Virtual Global Interrupt Flag)
bit in the VMCB, Secure AVIC ignores this field.

Instead, the hardware requires an equivalent V_GIF bit to be set within
the vintr_ctrl field of the VMSA (Virtual Machine Save Area). Failure
to set this bit will cause the hardware to block all interrupt delivery,
rendering the guest non-functional.

To enable interrupts for Secure AVIC guests, modify sev_es_sync_vmsa()
to unconditionally set the V_GIF_MASK in the VMSA's vintr_ctrl field
whenever Secure AVIC is active. This ensures the hardware correctly
identifies the guest as interruptible.

Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 837ab55a3330..2dee210efb37 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -884,6 +884,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 
 	save->sev_features = sev->vmsa_features;
 
+	if (sev_savic_active(vcpu->kvm))
+		save->vintr_ctrl |= V_GIF_MASK;
+
 	/*
 	 * Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
 	 * breaking older measurements.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support for Secure AVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (9 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area " Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 15:25   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page Neeraj Upadhyay
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The Secure AVIC hardware introduces a new model for handling Non-Maskable
Interrupts (NMIs). This model differs significantly from standard SVM, as
guest NMI state is managed by the hardware and is not visible to KVM.

Consequently, KVM can no longer use the generic EVENT_INJ mechanism and
must not track NMI masking state in software. Instead, it must adopt the
vNMI (Virtual NMI) flow, which is the only mechanism supported by
Secure AVIC.

Enable NMI support by making three key changes:

1.  Enable NMI in VMSA: Set the V_NMI_ENABLE_MASK bit in the VMSA's
    vintr_ctr field. This is a hardware prerequisite to enable the
    vNMI feature for the guest.

2.  Use vNMI for Injection: Modify svm_inject_nmi() to use the vNMI
    flow for Secure AVIC guests. When an NMI is requested, set the
    V_NMI_PENDING_MASK in the VMCB instead of using EVENT_INJ.

3.  Update NMI Windowing: Modify svm_nmi_allowed() to reflect that
    hardware now manages NMI blocking. KVM's only responsibility is to
    avoid queuing a new vNMI if one is already pending. The check is
    now simplified to whether V_NMI_PENDING_MASK is already set.

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c |  2 +-
 arch/x86/kvm/svm/svm.c | 56 ++++++++++++++++++++++++++----------------
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2dee210efb37..7c66aefe428a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -885,7 +885,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->sev_features = sev->vmsa_features;
 
 	if (sev_savic_active(vcpu->kvm))
-		save->vintr_ctrl |= V_GIF_MASK;
+		save->vintr_ctrl |= V_GIF_MASK | V_NMI_ENABLE_MASK;
 
 	/*
 	 * Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fdd612c975ae..a945bc094c1a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3635,27 +3635,6 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static void svm_inject_nmi(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
-
-	if (svm->nmi_l1_to_l2)
-		return;
-
-	/*
-	 * No need to manually track NMI masking when vNMI is enabled, hardware
-	 * automatically sets V_NMI_BLOCKING_MASK as appropriate, including the
-	 * case where software directly injects an NMI.
-	 */
-	if (!is_vnmi_enabled(svm)) {
-		svm->nmi_masked = true;
-		svm_set_iret_intercept(svm);
-	}
-	++vcpu->stat.nmi_injections;
-}
-
 static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3689,6 +3668,33 @@ static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static void svm_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (sev_savic_active(vcpu->kvm)) {
+		svm_set_vnmi_pending(vcpu);
+		++vcpu->stat.nmi_injections;
+		return;
+	}
+
+	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
+
+	if (svm->nmi_l1_to_l2)
+		return;
+
+	/*
+	 * No need to manually track NMI masking when vNMI is enabled, hardware
+	 * automatically sets V_NMI_BLOCKING_MASK as appropriate, including the
+	 * case where software directly injects an NMI.
+	 */
+	if (!is_vnmi_enabled(svm)) {
+		svm->nmi_masked = true;
+		svm_set_iret_intercept(svm);
+	}
+	++vcpu->stat.nmi_injections;
+}
+
 static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3836,6 +3842,14 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* Secure AVIC only support V_NMI based NMI injection. */
+	if (sev_savic_active(vcpu->kvm)) {
+		if (svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK)
+			return 0;
+		return 1;
+	}
+
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (10 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support " Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 16:02   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests Neeraj Upadhyay
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The Secure AVIC hardware requires uninterrupted access to the guest's
APIC backing page. If this page is not present in the Nested Page Table
(NPT) during a hardware access, a non-recoverable nested page fault
occurs. This sets a BUSY flag in the VMSA and causes subsequent
VMRUNs to fail with an unrecoverable VMEXIT_BUSY, effectively
killing the vCPU.

This situation can arise if the backing page resides within a 2MB large
page in the NPT. If other parts of that large page are modified (e.g.,
memory state changes), KVM would split the 2MB NPT entry into 4KB
entries. This process can temporarily zap the PTE for the backing page,
creating a window for the fatal hardware access.

Introduce a new GHCB VMGEXIT protocol, SVM_VMGEXIT_SECURE_AVIC, to
allow the guest to explicitly inform KVM of the APIC backing page's
location, thereby enabling KVM to guarantee its presence in the NPT.

Implement two actions for this protocol:

- SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE:
  On this request, KVM receives the GPA of the backing page. To prevent
  the 2MB page-split issue, immediately perform a PSMASH on the GPA by
  calling sev_handle_rmp_fault(). This proactively breaks any
  containing 2MB NPT entry into 4KB pages, isolating the backing page's
  PTE and guaranteeing its presence. Store the GPA for future reference.

- SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE:
  On this request, clear the stored GPA, releasing KVM from its
  obligation to maintain the NPT entry. Return the previously
  registered GPA to the guest.

This mechanism ensures the stability of the APIC backing page mapping,
which is critical for the correct operation of Secure AVIC.

Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/include/uapi/asm/svm.h |  3 ++
 arch/x86/kvm/svm/sev.c          | 59 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h          |  1 +
 3 files changed, 63 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 9c640a521a67..f1ef52e0fab1 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -118,6 +118,9 @@
 #define SVM_VMGEXIT_AP_CREATE			1
 #define SVM_VMGEXIT_AP_DESTROY			2
 #define SVM_VMGEXIT_SNP_RUN_VMPL		0x80000018
+#define SVM_VMGEXIT_SECURE_AVIC			0x8000001a
+#define SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE	0
+#define SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE	1
 #define SVM_VMGEXIT_HV_FEATURES			0x8000fffd
 #define SVM_VMGEXIT_TERM_REQUEST		0x8000fffe
 #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code)	\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c66aefe428a..3e9cc50f2705 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3399,6 +3399,15 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		    !kvm_ghcb_rcx_is_valid(svm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_SECURE_AVIC:
+		if (!sev_savic_active(vcpu->kvm))
+			goto vmgexit_err;
+		if (!kvm_ghcb_rax_is_valid(svm))
+			goto vmgexit_err;
+		if (svm->vmcb->control.exit_info_1 == SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE)
+			if (!kvm_ghcb_rbx_is_valid(svm))
+				goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_MMIO_READ:
 	case SVM_VMGEXIT_MMIO_WRITE:
 		if (!kvm_ghcb_sw_scratch_is_valid(svm))
@@ -4490,6 +4499,53 @@ static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+static int sev_handle_savic_vmgexit(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	u64 apic_id;
+
+	apic_id = kvm_rax_read(&svm->vcpu);
+
+	if (apic_id == -1ULL) {
+		vcpu = &svm->vcpu;
+	} else {
+		vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+		if (!vcpu)
+			goto savic_request_invalid;
+	}
+
+	switch (svm->vmcb->control.exit_info_1) {
+	case SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE:
+		gpa_t gpa;
+
+		gpa = kvm_rbx_read(&svm->vcpu);
+		if (!PAGE_ALIGNED(gpa))
+			goto savic_request_invalid;
+
+		/*
+		 * sev_handle_rmp_fault() invocation would result in PSMASH if
+		 * NPTE size is 2M.
+		 */
+		sev_handle_rmp_fault(vcpu, gpa, 0);
+		to_svm(vcpu)->sev_savic_gpa = gpa;
+		break;
+	case SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE:
+		kvm_rbx_write(&svm->vcpu, to_svm(vcpu)->sev_savic_gpa);
+		to_svm(vcpu)->sev_savic_gpa = 0;
+		break;
+	default:
+		goto savic_request_invalid;
+	}
+
+	return 1;
+
+savic_request_invalid:
+	ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+
+	return 1;
+}
+
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4628,6 +4684,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 			    control->exit_info_1, control->exit_info_2);
 		ret = -EINVAL;
 		break;
+	case SVM_VMGEXIT_SECURE_AVIC:
+		ret = sev_handle_savic_vmgexit(svm);
+		break;
 	case SVM_EXIT_MSR:
 		if (sev_savic_active(vcpu->kvm) && savic_handle_msr_exit(vcpu))
 			return 1;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a3edb6e720cd..8043833a1a8c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -337,6 +337,7 @@ struct vcpu_svm {
 	bool guest_gif;
 
 	bool sev_savic_has_pending_ipi;
+	gpa_t sev_savic_gpa;
 };
 
 struct svm_cpu_data {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (11 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 16:15   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests Neeraj Upadhyay
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

While Secure AVIC hardware accelerates End-of-Interrupt (EOI) processing
for edge-triggered interrupts, it requires hypervisor assistance for
level-triggered interrupts originating from the IOAPIC. For these
interrupts, a guest write to the EOI MSR triggers a VM-Exit.

The primary challenge in handling this exit is that the guest's real
In-Service Register (ISR) is not visible to KVM. When KVM receives an EOI,
it has no direct way of knowing which interrupt vector is being
acknowledged.

To solve this, use KVM's software vAPIC state as a shadow tracking
mechanism for active, level-triggered interrupts.

The implementation follows this flow:

1.  On interrupt injection (sev_savic_set_requested_irr), check KVM's
    software vAPIC Trigger Mode Register (TMR) to identify if the
    interrupt is level-triggered.

2.  If it is, set the corresponding vector in KVM's software shadow ISR.
    This marks the interrupt as "in-service" from KVM's perspective.

3.  When the guest later issues an EOI, the APIC_EOI MSR write exit
    handler finds the highest vector set in this shadow ISR.

4.  The handler then clears the vector from the shadow ISR and calls
    kvm_apic_set_eoi_accelerated() to propagate the EOI to the virtual
    IOAPIC, allowing it to de-assert the interrupt line.

This enables correct EOI handling for level-triggered interrupts in
Secure AVIC guests, despite the hardware-enforced opacity of the guest's
APIC state.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3e9cc50f2705..5be2956fb812 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4474,7 +4474,9 @@ static void savic_handle_icr_write(struct kvm_vcpu *kvm_vcpu, u64 icr)
 
 static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
 {
+	struct kvm_lapic *apic;
 	u32 msr, reg;
+	int vec;
 
 	msr = kvm_rcx_read(vcpu);
 	reg = (msr - APIC_BASE_MSR) << 4;
@@ -4492,6 +4494,12 @@ static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
 			return true;
 		}
 		break;
+	case APIC_EOI:
+		apic = vcpu->arch.apic;
+		vec = apic_find_highest_vector(apic->regs + APIC_ISR);
+		apic_clear_vector(vec, apic->regs + APIC_ISR);
+		kvm_apic_set_eoi_accelerated(vcpu, vec);
+		return true;
 	default:
 		break;
 	}
@@ -5379,6 +5387,8 @@ void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
 			vec = vec_start + vec_pos;
 			apic_clear_vector(vec, apic->regs + APIC_IRR);
 			val = val & ~BIT(vec_pos);
+			if (apic_test_vector(vec, apic->regs + APIC_TMR))
+				apic_set_vector(vec, apic->regs + APIC_ISR);
 		} while (val);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (12 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 16:23   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests Neeraj Upadhyay
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

KVM tracks End-of-Interrupts (EOIs) for the legacy RTC interrupt (GSI 8)
to detect and report coalesced interrupts to userspace. This mechanism
fundamentally relies on KVM having visibility into the guest's interrupt
acknowledgment state.

This assumption is invalid for guests with a protected APIC (e.g., Secure
AVIC) for two main reasons:

a. The guest's true In-Service Register (ISR) is not visible to KVM,
   making it impossible to know if the previous interrupt is still active.
   So, lazy pending EOI checks cannot be done.

b. The RTC interrupt is edge-triggered, and its EOI is accelerated by the
   hardware without a VM-Exit. KVM never sees the EOI event.

Since KVM can observe neither the interrupt's service status nor its EOI,
the tracking logic is invalid. So, disable this feature for all protected
APIC guests. This change means that userspace will no longer be able to
detect coalesced RTC interrupts for these specific guest types.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/ioapic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2b5d389bca5f..308778ba4f58 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -113,6 +113,9 @@ static void __rtc_irq_eoi_tracking_restore_one(struct kvm_vcpu *vcpu)
 	struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
 	union kvm_ioapic_redirect_entry *e;
 
+	if (vcpu->arch.apic->guest_apic_protected)
+		return;
+
 	e = &ioapic->redirtbl[RTC_GSI];
 	if (!kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
 				 e->fields.dest_id,
@@ -476,6 +479,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
 {
 	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
 	struct kvm_lapic_irq irqe;
+	struct kvm_vcpu *vcpu;
 	int ret;
 
 	if (entry->fields.mask ||
@@ -505,7 +509,9 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
 		BUG_ON(ioapic->rtc_status.pending_eoi != 0);
 		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe,
 					       &ioapic->rtc_status.dest_map);
-		ioapic->rtc_status.pending_eoi = (ret < 0 ? 0 : ret);
+		vcpu = kvm_get_vcpu(ioapic->kvm, 0);
+		if (!vcpu->arch.apic->guest_apic_protected)
+			ioapic->rtc_status.pending_eoi = (ret < 0 ? 0 : ret);
 	} else
 		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe, NULL);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (13 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 16:32   ` Tom Lendacky
  2025-09-23  5:03 ` [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC Neeraj Upadhyay
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The kvm_wait_lapic_expire() function is a pre-VMRUN optimization that
allows a vCPU to wait for an imminent LAPIC timer interrupt. However,
this function is not fully compatible with protected APIC models like
Secure AVIC because it relies on inspecting KVM's software vAPIC state.
For Secure AVIC, the true timer state is hardware-managed and opaque
to KVM. For this reason, kvm_wait_lapic_expire() does not check whether
timer interrupt is injected for the guests which have protected APIC
state.

For the protected APIC guests, the check for injected timer need to be
done by the callers of kvm_wait_lapic_expire(). So, for Secure AVIC
guests, check to be injected vectors in the requested_IRR for injected
timer interrupt before doing a kvm_wait_lapic_expire().

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 8 ++++++++
 arch/x86/kvm/svm/svm.c | 3 ++-
 arch/x86/kvm/svm/svm.h | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5be2956fb812..3f6cf8d5068a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -5405,3 +5405,11 @@ bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
 	return READ_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi) ||
 		kvm_apic_has_interrupt(vcpu) != -1;
 }
+
+bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu)
+{
+	u32 reg  = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTT);
+	int vec = reg & APIC_VECTOR_MASK;
+
+	return to_svm(vcpu)->vmcb->control.requested_irr[vec / 32] & BIT(vec % 32);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a945bc094c1a..d0d972731ea7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4335,7 +4335,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 	    vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
 		update_debugctlmsr(svm->vmcb->save.dbgctl);
 
-	kvm_wait_lapic_expire(vcpu);
+	if (!sev_savic_active(vcpu->kvm) || sev_savic_timer_int_injected(vcpu))
+		kvm_wait_lapic_expire(vcpu);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8043833a1a8c..ecc4ea11822d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -878,6 +878,7 @@ static inline bool sev_savic_active(struct kvm *kvm)
 }
 void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected);
 bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu);
+bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu);
 #else
 static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
 {
@@ -917,6 +918,7 @@ static inline struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
 static inline void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) {}
 static inline void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected) {}
 static inline bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; }
+static inline bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu) { return true; }
 #endif
 
 /* vmenter.S */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (14 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23  5:03 ` [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests Neeraj Upadhyay
  2025-09-23 10:02 ` [syzbot ci] Re: AMD: Add Secure AVIC KVM Support syzbot ci
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The paravirtualized APIC features, PV_EOI and PV_SEND_IPI, are
predicated on KVM having full visibility and control over the guest's
vAPIC state. This assumption is invalid for guests with a protected APIC
(e.g., AMD SEV-SNP with Secure AVIC, Intel TDX), where the APIC state is
opaque to the hypervisor and managed by the hardware.

- PV_EOI: KVM cannot service a PV_EOI MSR write because it has no
  access to the guest's true In-Service Register (ISR). For these
  guests, EOIs are either accelerated by hardware or virtualized via
  a different, technology-specific VM-Exit, not the PV MSR.

- PV_SEND_IPI: Protected guest models have their own specific IPI
  virtualization flows (e.g., VMGEXIT on ICR write for Secure AVIC).
  Exposing the generic PV_SEND_IPI hypercall would provide a
  conflicting, incorrect path that bypasses the required secure flow.

To prevent the guest from using these incompatible interfaces, clear
the KVM_FEATURE_PV_EOI and KVM_FEATURE_PV_SEND_IPI PV feature CPUID
bits when for guests with protected APIC.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/cpuid.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e2836a255b16..01b3c4e88282 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -245,6 +245,10 @@ static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
 	if (kvm_hlt_in_guest(vcpu->kvm))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
+	if (vcpu->arch.apic->guest_apic_protected)
+		best->eax &= ~((1 << KVM_FEATURE_PV_EOI) |
+			       (1 << KVM_FEATURE_PV_SEND_IPI));
+
 	return best->eax;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (15 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC Neeraj Upadhyay
@ 2025-09-23  5:03 ` Neeraj Upadhyay
  2025-09-23 10:02 ` [syzbot ci] Re: AMD: Add Secure AVIC KVM Support syzbot ci
  17 siblings, 0 replies; 32+ messages in thread
From: Neeraj Upadhyay @ 2025-09-23  5:03 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini
  Cc: linux-kernel, Thomas.Lendacky, nikunj, Santosh.Shukla,
	Vasant.Hegde, Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang,
	naveen.rao, tiala

The preceding patches have implemented all the necessary KVM
infrastructure to support the Secure AVIC feature for SEV-SNP guests,
including interrupt/NMI injection, IPI virtualization, and EOI handling.

Despite the backend support being complete, KVM does not yet advertise
this capability. As a result, userspace tools cannot create VMs that
utilize this feature.

To enable the feature, add the SVM_SEV_FEAT_SECURE_AVIC flag to the
sev_supported_vmsa_features bitmask. This bitmask communicates
KVM's supported VMSA features to userspace.

This is the final enabling patch in the series, allowing the creation
of Secure AVIC-enabled virtual machines.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3f6cf8d5068a..fe3d65c50afd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3092,6 +3092,9 @@ void __init sev_hardware_setup(void)
 	sev_supported_vmsa_features = 0;
 	if (sev_es_debug_swap_enabled)
 		sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+
+	if (sev_snp_savic_enabled)
+		sev_supported_vmsa_features |= SVM_SEV_FEAT_SECURE_AVIC;
 }
 
 void sev_hardware_unsetup(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [syzbot ci] Re: AMD: Add Secure AVIC KVM Support
  2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
                   ` (16 preceding siblings ...)
  2025-09-23  5:03 ` [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests Neeraj Upadhyay
@ 2025-09-23 10:02 ` syzbot ci
  2025-09-23 10:17   ` Upadhyay, Neeraj
  17 siblings, 1 reply; 32+ messages in thread
From: syzbot ci @ 2025-09-23 10:02 UTC (permalink / raw)
  To: bp, david.kaplan, huibo.wang, kvm, linux-kernel, naveen.rao,
	neeraj.upadhyay, nikunj, pbonzini, santosh.shukla, seanjc,
	suravee.suthikulpanit, thomas.lendacky, tiala, vasant.hegde
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v2] AMD: Add Secure AVIC KVM Support
https://lore.kernel.org/all/20250923050317.205482-1-Neeraj.Upadhyay@amd.com
* [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
* [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature
* [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM
* [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC
* [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
* [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC
* [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support for Secure AVIC
* [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception for Secure AVIC
* [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests
* [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests
* [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support for Secure AVIC guests
* [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
* [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
* [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
* [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests
* [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
* [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests

and found the following issue:
general protection fault in kvm_apply_cpuid_pv_features_quirk

Full report is available here:
https://ci.syzbot.org/series/887b895e-0315-498c-99e5-966704f16fb5

***

general protection fault in kvm_apply_cpuid_pv_features_quirk

tree:      kvm-next
URL:       https://kernel.googlesource.com/pub/scm/virt/kvm/kvm/
base:      a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/a65d3de7-36d8-4181-8566-80e0f0719955/config
C repro:   https://ci.syzbot.org/findings/939a8c5a-41b2-4e9b-9129-80dff6d039c4/c_repro
syz repro: https://ci.syzbot.org/findings/939a8c5a-41b2-4e9b-9129-80dff6d039c4/syz_repro

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000013: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000098-0x000000000000009f]
CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:kvm_apply_cpuid_pv_features_quirk+0x38c/0x4f0 arch/x86/kvm/cpuid.c:248
Code: c1 e8 03 80 3c 10 00 74 12 4c 89 ff e8 9d d8 d4 00 48 ba 00 00 00 00 00 fc ff df bb 9c 00 00 00 49 03 1f 48 89 d8 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 c2 00 00 00 80 3b 00 74 2e e8 4e 6a 71 00
RSP: 0018:ffffc90004f871a0 EFLAGS: 00010203
RAX: 0000000000000013 RBX: 000000000000009c RCX: ffff888107562440
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90004f87250 R08: 0000000000000005 R09: 000000008b838003
R10: ffffc90004f872e0 R11: fffff520009f0e61 R12: ffff888034f30970
R13: 1ffff110069e612e R14: ffff888020170528 R15: ffff888034f302f8
FS:  000055556af3f500(0000) GS:ffff8880b861b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff40f56c8 CR3: 0000000020cc0000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 kvm_vcpu_after_set_cpuid+0xc75/0x18a0 arch/x86/kvm/cpuid.c:432
 kvm_set_cpuid+0xea4/0x1110 arch/x86/kvm/cpuid.c:551
 kvm_vcpu_ioctl_set_cpuid2+0xbe/0x130 arch/x86/kvm/cpuid.c:626
 kvm_arch_vcpu_ioctl+0x13c5/0x2a80 arch/x86/kvm/x86.c:5975
 kvm_vcpu_ioctl+0x74d/0xe90 virt/kvm/kvm_main.c:4637
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:598 [inline]
 __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:584
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f14f278e82b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffff40f55f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffff40f5d40 RCX: 00007f14f278e82b
RDX: 00007ffff40f5d40 RSI: 000000004008ae90 RDI: 0000000000000005
RBP: 00002000008fc000 R08: 0000000000000000 R09: 0000000000000006
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000063 R14: 00002000008fb000 R15: 00002000008fc800
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kvm_apply_cpuid_pv_features_quirk+0x38c/0x4f0 arch/x86/kvm/cpuid.c:248
Code: c1 e8 03 80 3c 10 00 74 12 4c 89 ff e8 9d d8 d4 00 48 ba 00 00 00 00 00 fc ff df bb 9c 00 00 00 49 03 1f 48 89 d8 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 c2 00 00 00 80 3b 00 74 2e e8 4e 6a 71 00
RSP: 0018:ffffc90004f871a0 EFLAGS: 00010203
RAX: 0000000000000013 RBX: 000000000000009c RCX: ffff888107562440
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90004f87250 R08: 0000000000000005 R09: 000000008b838003
R10: ffffc90004f872e0 R11: fffff520009f0e61 R12: ffff888034f30970
R13: 1ffff110069e612e R14: ffff888020170528 R15: ffff888034f302f8
FS:  000055556af3f500(0000) GS:ffff8881a3c1b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055df3be04900 CR3: 0000000020cc0000 CR4: 0000000000352ef0
----------------
Code disassembly (best guess):
   0:	c1 e8 03             	shr    $0x3,%eax
   3:	80 3c 10 00          	cmpb   $0x0,(%rax,%rdx,1)
   7:	74 12                	je     0x1b
   9:	4c 89 ff             	mov    %r15,%rdi
   c:	e8 9d d8 d4 00       	call   0xd4d8ae
  11:	48 ba 00 00 00 00 00 	movabs $0xdffffc0000000000,%rdx
  18:	fc ff df
  1b:	bb 9c 00 00 00       	mov    $0x9c,%ebx
  20:	49 03 1f             	add    (%r15),%rbx
  23:	48 89 d8             	mov    %rbx,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	0f b6 04 10          	movzbl (%rax,%rdx,1),%eax <-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	0f 85 c2 00 00 00    	jne    0xf8
  36:	80 3b 00             	cmpb   $0x0,(%rbx)
  39:	74 2e                	je     0x69
  3b:	e8 4e 6a 71 00       	call   0x716a8e


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [syzbot ci] Re: AMD: Add Secure AVIC KVM Support
  2025-09-23 10:02 ` [syzbot ci] Re: AMD: Add Secure AVIC KVM Support syzbot ci
@ 2025-09-23 10:17   ` Upadhyay, Neeraj
  0 siblings, 0 replies; 32+ messages in thread
From: Upadhyay, Neeraj @ 2025-09-23 10:17 UTC (permalink / raw)
  To: syzbot ci, bp, david.kaplan, huibo.wang, kvm, linux-kernel,
	naveen.rao, nikunj, pbonzini, santosh.shukla, seanjc,
	suravee.suthikulpanit, thomas.lendacky, tiala, vasant.hegde
  Cc: syzbot, syzkaller-bugs



On 9/23/2025 3:32 PM, syzbot ci wrote:
> syzbot ci has tested the following series
> 
> [v2] AMD: Add Secure AVIC KVM Support
> https://lore.kernel.org/all/20250923050317.205482-1-Neeraj.Upadhyay@amd.com
> * [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
> * [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature
> * [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM
> * [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC
> * [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
> * [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC
> * [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support for Secure AVIC
> * [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception for Secure AVIC
> * [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests
> * [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests
> * [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support for Secure AVIC guests
> * [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
> * [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
> * [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
> * [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests
> * [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
> * [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests
> 
> and found the following issue:
> general protection fault in kvm_apply_cpuid_pv_features_quirk
> 
> Full report is available here:
> https://ci.syzbot.org/series/887b895e-0315-498c-99e5-966704f16fb5
> 
> ***
> 
> general protection fault in kvm_apply_cpuid_pv_features_quirk
> 

Thanks for the report. I will update the check to below:

       if (lapic_in_kernel(vcpu) && vcpu->arch.apic->guest_apic_protected)
               best->eax &= ~((1 << KVM_FEATURE_PV_EOI) |
                              (1 << KVM_FEATURE_PV_SEND_IPI));


- Neeraj




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests Neeraj Upadhyay
@ 2025-09-23 13:55   ` Tom Lendacky
  2025-09-25  5:16     ` Upadhyay, Neeraj
  0 siblings, 1 reply; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 13:55 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
> enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
> guest APIC backing page and bitfields to control enablement of Secure
> AVIC and whether the guest allows NMIs to be injected by the hypervisor.
> This MSR is populated by the guest and can be read by the guest to get
> the GPA of the APIC backing page. The MSR can only be accessed in Secure
> AVIC mode; accessing it when not in Secure AVIC mode results in #GP. So,
> KVM should not intercept it.

The reason KVM should not intercept the MSR access is that the guest
would not be able to actually set the MSR if it is intercepted.

Thanks,
Tom

> 
> Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h | 1 +
>  arch/x86/kvm/svm/sev.c           | 6 +++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index b65c3ba5fa14..9f16030dd849 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -707,6 +707,7 @@
>  #define MSR_AMD64_SEG_RMP_ENABLED_BIT	0
>  #define MSR_AMD64_SEG_RMP_ENABLED	BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
>  #define MSR_AMD64_RMP_SEGMENT_SHIFT(x)	(((x) & GENMASK_ULL(13, 8)) >> 8)
> +#define MSR_AMD64_SAVIC_CONTROL		0xc0010138
>  
>  #define MSR_SVSM_CAA			0xc001f000
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b2eae102681c..afe4127a1918 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4487,7 +4487,8 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
>  
>  static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  {
> -	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
>  
>  	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
> @@ -4546,6 +4547,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  
>  	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
>  	svm_clr_intercept(svm, INTERCEPT_XSETBV);
> +
> +	if (sev_savic_active(vcpu->kvm))
> +		svm_set_intercept_for_msr(vcpu, MSR_AMD64_SAVIC_CONTROL, MSR_TYPE_RW, false);
>  }
>  
>  void sev_init_vmcb(struct vcpu_svm *svm)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC
  2025-09-23  5:03 ` [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC Neeraj Upadhyay
@ 2025-09-23 14:47   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 14:47 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> For AMD SEV-SNP guests with Secure AVIC, the virtual APIC state is
> not visible to KVM and managed by the hardware. This renders the
> traditional interrupt injection mechanism, which directly modifies
> guest state, unusable. Instead, interrupt delivery must be mediated
> through a new interface in the VMCB. Implement support for this
> mechanism.
> 
> First, new VMCB control fields, requested_irr and update_irr, are
> defined to allow KVM to communicate pending interrupts to the hardware
> before VMRUN.
> 
> Hook the core interrupt injection path, svm_inject_irq(). Instead of
> injecting directly, transfer pending interrupts from KVM's software
> IRR to the new requested_irr VMCB field and delegate final delivery
> to the hardware.
> 
> Since the hardware is now responsible for the timing and delivery of
> interrupts to the guest (including managing the guest's RFLAGS.IF and
> vAPIC state), bypass the standard KVM interrupt window checks in
> svm_interrupt_allowed() and svm_enable_irq_window(). Similarly, interrupt
> re-injection is handled by the hardware and requires no explicit KVM
> involvement.
> 
> Finally, update the logic for detecting pending interrupts. Add the
> vendor op, protected_apic_has_interrupt(), to check only KVM's software
> vAPIC IRR state.
> 
> Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/include/asm/svm.h |  8 +++++--
>  arch/x86/kvm/lapic.c       | 17 ++++++++++++---
>  arch/x86/kvm/svm/sev.c     | 44 ++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c     | 13 +++++++++++
>  arch/x86/kvm/svm/svm.h     |  4 ++++
>  arch/x86/kvm/x86.c         | 15 ++++++++++++-
>  6 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index ab3d55654c77..0faf262f9f9f 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -162,10 +162,14 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u64 vmsa_pa;		/* Used for an SEV-ES guest */
>  	u8 reserved_8[16];
>  	u16 bus_lock_counter;		/* Offset 0x120 */
> -	u8 reserved_9[22];
> +	u8 reserved_9[18];
> +	u8 update_irr;			/* Offset 0x134 */

The APM has this as a 4 byte field.

> +	u8 reserved_10[3];
>  	u64 allowed_sev_features;	/* Offset 0x138 */
>  	u64 guest_sev_features;		/* Offset 0x140 */
> -	u8 reserved_10[664];
> +	u8 reserved_11[8];
> +	u32 requested_irr[8];		/* Offset 0x150 */
> +	u8 reserved_12[624];
>  	/*
>  	 * Offset 0x3e0, 32 bytes reserved
>  	 * for use by hypervisor/software.
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fc437341e03..3199c7c6db05 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2938,11 +2938,22 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  	if (!kvm_apic_present(vcpu))
>  		return -1;
>  
> -	if (apic->guest_apic_protected)
> +	if (!apic->guest_apic_protected) {
> +		__apic_update_ppr(apic, &ppr);
> +		return apic_has_interrupt_for_ppr(apic, ppr);
> +	}
> +
> +	if (!apic->prot_apic_intr_inject)
>  		return -1;
>  
> -	__apic_update_ppr(apic, &ppr);
> -	return apic_has_interrupt_for_ppr(apic, ppr);
> +	/*
> +	 * For guest-protected virtual APIC, hardware manages the virtual
> +	 * PPR and interrupt delivery to the guest. So, checking the KVM
> +	 * managed virtual APIC's APIC_IRR state for any pending vectors
> +	 * is the only thing required here.
> +	 */
> +	return apic_search_irr(apic);

Just a though, but I wonder if this would look cleaner by doing:

	if (apic->guest_apic_protected) {
		if (!apic->prot_apic_intr_inject)
			return -1;

		/*
		 * For guest-protected ...
		 */
		return apic_search_irr(apic);
	}

	__apic_update_ppr(apic, &ppr);
	return apic_has_interrupt_for_ppr(apic, ppr);

> +
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt);
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index afe4127a1918..78cefc14a2ee 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -28,6 +28,7 @@
>  #include <asm/debugreg.h>
>  #include <asm/msr.h>
>  #include <asm/sev.h>
> +#include <asm/apic.h>
>  
>  #include "mmu.h"
>  #include "x86.h"
> @@ -35,6 +36,7 @@
>  #include "svm_ops.h"
>  #include "cpuid.h"
>  #include "trace.h"
> +#include "lapic.h"
>  
>  #define GHCB_VERSION_MAX	2ULL
>  #define GHCB_VERSION_DEFAULT	2ULL
> @@ -5064,3 +5066,45 @@ void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa)
>  
>  	free_page((unsigned long)vmsa);
>  }
> +
> +void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
> +{
> +	unsigned int i, vec, vec_pos, vec_start;
> +	struct kvm_lapic *apic;
> +	bool has_interrupts;
> +	u32 val;
> +
> +	/* Secure AVIC HW takes care of re-injection */
> +	if (reinjected)
> +		return;
> +
> +	apic = svm->vcpu.arch.apic;
> +	has_interrupts = false;
> +
> +	for (i = 0; i < ARRAY_SIZE(svm->vmcb->control.requested_irr); i++) {
> +		val = apic_get_reg(apic->regs, APIC_IRR + i * 0x10);
> +		if (!val)
> +			continue;

Add a blank line here.

> +		has_interrupts = true;
> +		svm->vmcb->control.requested_irr[i] |= val;

Add a blank line here.

> +		vec_start = i * 32;

Move this line to just below the comment.

> +		/*
> +		 * Clear each vector one by one to avoid race with concurrent
> +		 * APIC_IRR updates from the deliver_interrupt() path.
> +		 */
> +		do {
> +			vec_pos = __ffs(val);
> +			vec = vec_start + vec_pos;
> +			apic_clear_vector(vec, apic->regs + APIC_IRR);
> +			val = val & ~BIT(vec_pos);
> +		} while (val);

Would the following be cleaner?

for_each_set_bit(vec_pos, &val, 32)
	apic_clear_vector(vec_start + vec_pos, apic->regs + APIC_IRR);

Might have to make "val" an unsigned long, though, and not sure how that
affects OR'ing it into requested_irr.

> +	}
> +
> +	if (has_interrupts)
> +		svm->vmcb->control.update_irr |= BIT(0);
> +}
> +
> +bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_apic_has_interrupt(vcpu) != -1;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 064ec98d7e67..7811a87bc111 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -52,6 +52,8 @@
>  #include "svm.h"
>  #include "svm_ops.h"
>  
> +#include "lapic.h"

Is this include really needed?

> +
>  #include "kvm_onhyperv.h"
>  #include "svm_onhyperv.h"
>  
> @@ -3689,6 +3691,9 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	u32 type;
>  
> +	if (sev_savic_active(vcpu->kvm))
> +		return sev_savic_set_requested_irr(svm, reinjected);
> +
>  	if (vcpu->arch.interrupt.soft) {
>  		if (svm_update_soft_interrupt_rip(vcpu))
>  			return;
> @@ -3870,6 +3875,9 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (sev_savic_active(vcpu->kvm))
> +		return 1;

Maybe just add a comment above this about why you always return 1 for
Secure AVIC.

> +
>  	if (svm->nested.nested_run_pending)
>  		return -EBUSY;
>  
> @@ -3890,6 +3898,9 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (sev_savic_active(vcpu->kvm))
> +		return;

Ditto here on the comment.

> +
>  	/*
>  	 * In case GIF=0 we can't rely on the CPU to tell us when GIF becomes
>  	 * 1, because that's a separate STGI/VMRUN intercept.  The next time we
> @@ -5132,6 +5143,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.apicv_post_state_restore = avic_apicv_post_state_restore,
>  	.required_apicv_inhibits = AVIC_REQUIRED_APICV_INHIBITS,
>  
> +	.protected_apic_has_interrupt = sev_savic_has_pending_interrupt,
> +
>  	.get_exit_info = svm_get_exit_info,
>  	.get_entry_info = svm_get_entry_info,
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 1090a48adeda..60dc424d62c4 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -873,6 +873,8 @@ static inline bool sev_savic_active(struct kvm *kvm)
>  {
>  	return to_kvm_sev_info(kvm)->vmsa_features & SVM_SEV_FEAT_SECURE_AVIC;
>  }
> +void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected);
> +bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu);
>  #else
>  static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
>  {
> @@ -910,6 +912,8 @@ static inline struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
>  	return NULL;
>  }
>  static inline void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) {}
> +static inline void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected) {}
> +static inline bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; }
>  #endif
>  
>  /* vmenter.S */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 33fba801b205..65ebdc6deb92 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10369,7 +10369,20 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
>  		if (r < 0)
>  			goto out;
>  		if (r) {
> -			int irq = kvm_cpu_get_interrupt(vcpu);
> +			int irq;
> +
> +			/*
> +			 * Do not ack the interrupt here for guest-protected VAPIC
> +			 * which requires interrupt injection to the guest.

Maybe a bit more detail about why you don't want to do the ACK?

Thanks,
Tom

> +			 *
> +			 * ->inject_irq reads the KVM's VAPIC's APIC_IRR state and
> +			 * clears it.
> +			 */
> +			if (vcpu->arch.apic->guest_apic_protected &&
> +			    vcpu->arch.apic->prot_apic_intr_inject)
> +				irq = kvm_apic_has_interrupt(vcpu);
> +			else
> +				irq = kvm_cpu_get_interrupt(vcpu);
>  
>  			if (!WARN_ON_ONCE(irq == -1)) {
>  				kvm_queue_interrupt(vcpu, irq, false);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception for Secure AVIC
  2025-09-23  5:03 ` [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception " Neeraj Upadhyay
@ 2025-09-23 15:00   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 15:00 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> From: Kishon Vijay Abraham I <kvijayab@amd.com>
> 
> Secure AVIC does not support injecting exception from the hypervisor.
> Take an early return from svm_inject_exception() for Secure AVIC enabled
> guests.
> 
> Hardware takes care of delivering exceptions initiated by the guest as
> well as re-injecting exceptions initiated by the guest (in case there's
> an intercept before delivering the exception).
> 
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/svm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7811a87bc111..fdd612c975ae 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -374,6 +374,9 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
>  	struct kvm_queued_exception *ex = &vcpu->arch.exception;
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (sev_savic_active(vcpu->kvm))
> +		return;

A comment above this would be good to have.

Thanks,
Tom

> +
>  	kvm_deliver_exception_payload(vcpu, ex);
>  
>  	if (kvm_exception_is_soft(ex->vector) &&

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23 15:15   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 15:15 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> Exceptions cannot be explicitly injected from the hypervisor to
> Secure AVIC enabled guests. So, KVM cannot inject exceptions into
> a Secure AVIC guest. If KVM were to intercept an exception (e.g., #PF
> or #GP), it would be unable to deliver it back to the guest, effectively
> dropping the event and leading to guest misbehavior or hangs. So,
> clear exception intercepts so that all exceptions are handled directly by
> the guest without KVM intervention.
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a64fcc7637c7..837ab55a3330 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4761,8 +4761,17 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
>  	svm_clr_intercept(svm, INTERCEPT_XSETBV);
>  
> -	if (sev_savic_active(vcpu->kvm))
> +	if (sev_savic_active(vcpu->kvm)) {
>  		svm_set_intercept_for_msr(vcpu, MSR_AMD64_SAVIC_CONTROL, MSR_TYPE_RW, false);
> +
> +		/* Clear all exception intercepts. */
> +		clr_exception_intercept(svm, PF_VECTOR);
> +		clr_exception_intercept(svm, UD_VECTOR);
> +		clr_exception_intercept(svm, MC_VECTOR);
> +		clr_exception_intercept(svm, AC_VECTOR);
> +		clr_exception_intercept(svm, DB_VECTOR);
> +		clr_exception_intercept(svm, GP_VECTOR);

Some of these are cleared no matter what prior to here. For example,
PF_VECTOR is cleared if npt_enabled is true (which is required for SEV),
UD_VECTOR and GP_VECTOR are cleared in sev_init_vmcb().

For the MC_VECTOR interception, the SVM code just ignores it today by
returning 1 immediately, so clearing the interception looks like a NOP,
but I might be missing something.

Thanks,
Tom

> +	}
>  }
>  
>  void sev_init_vmcb(struct vcpu_svm *svm)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area " Neeraj Upadhyay
@ 2025-09-23 15:16   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 15:16 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> From: Kishon Vijay Abraham I <kvijayab@amd.com>
> 
> Unlike standard SVM which uses the V_GIF (Virtual Global Interrupt Flag)
> bit in the VMCB, Secure AVIC ignores this field.
> 
> Instead, the hardware requires an equivalent V_GIF bit to be set within
> the vintr_ctrl field of the VMSA (Virtual Machine Save Area). Failure
> to set this bit will cause the hardware to block all interrupt delivery,
> rendering the guest non-functional.
> 
> To enable interrupts for Secure AVIC guests, modify sev_es_sync_vmsa()
> to unconditionally set the V_GIF_MASK in the VMSA's vintr_ctrl field
> whenever Secure AVIC is active. This ensures the hardware correctly
> identifies the guest as interruptible.
> 
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 837ab55a3330..2dee210efb37 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -884,6 +884,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>  
>  	save->sev_features = sev->vmsa_features;
>  
> +	if (sev_savic_active(vcpu->kvm))
> +		save->vintr_ctrl |= V_GIF_MASK;

A comment above this would be good.

Thanks,
Tom

> +
>  	/*
>  	 * Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
>  	 * breaking older measurements.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support for Secure AVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support " Neeraj Upadhyay
@ 2025-09-23 15:25   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 15:25 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> The Secure AVIC hardware introduces a new model for handling Non-Maskable
> Interrupts (NMIs). This model differs significantly from standard SVM, as
> guest NMI state is managed by the hardware and is not visible to KVM.
> 
> Consequently, KVM can no longer use the generic EVENT_INJ mechanism and
> must not track NMI masking state in software. Instead, it must adopt the
> vNMI (Virtual NMI) flow, which is the only mechanism supported by
> Secure AVIC.
> 
> Enable NMI support by making three key changes:
> 
> 1.  Enable NMI in VMSA: Set the V_NMI_ENABLE_MASK bit in the VMSA's
>     vintr_ctr field. This is a hardware prerequisite to enable the
>     vNMI feature for the guest.
> 
> 2.  Use vNMI for Injection: Modify svm_inject_nmi() to use the vNMI
>     flow for Secure AVIC guests. When an NMI is requested, set the
>     V_NMI_PENDING_MASK in the VMCB instead of using EVENT_INJ.
> 
> 3.  Update NMI Windowing: Modify svm_nmi_allowed() to reflect that
>     hardware now manages NMI blocking. KVM's only responsibility is to
>     avoid queuing a new vNMI if one is already pending. The check is
>     now simplified to whether V_NMI_PENDING_MASK is already set.
> 
> Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c |  2 +-
>  arch/x86/kvm/svm/svm.c | 56 ++++++++++++++++++++++++++----------------
>  2 files changed, 36 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2dee210efb37..7c66aefe428a 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -885,7 +885,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>  	save->sev_features = sev->vmsa_features;
>  
>  	if (sev_savic_active(vcpu->kvm))
> -		save->vintr_ctrl |= V_GIF_MASK;
> +		save->vintr_ctrl |= V_GIF_MASK | V_NMI_ENABLE_MASK;
>  
>  	/*
>  	 * Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index fdd612c975ae..a945bc094c1a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3635,27 +3635,6 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
> -	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
> -
> -	if (svm->nmi_l1_to_l2)
> -		return;
> -
> -	/*
> -	 * No need to manually track NMI masking when vNMI is enabled, hardware
> -	 * automatically sets V_NMI_BLOCKING_MASK as appropriate, including the
> -	 * case where software directly injects an NMI.
> -	 */
> -	if (!is_vnmi_enabled(svm)) {
> -		svm->nmi_masked = true;
> -		svm_set_iret_intercept(svm);
> -	}
> -	++vcpu->stat.nmi_injections;
> -}

A pre-patch that moves this function would make the changes you make to
it in this patch more obvious.

Thanks,
Tom

> -
>  static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3689,6 +3668,33 @@ static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> +static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (sev_savic_active(vcpu->kvm)) {
> +		svm_set_vnmi_pending(vcpu);
> +		++vcpu->stat.nmi_injections;
> +		return;
> +	}
> +
> +	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
> +
> +	if (svm->nmi_l1_to_l2)
> +		return;
> +
> +	/*
> +	 * No need to manually track NMI masking when vNMI is enabled, hardware
> +	 * automatically sets V_NMI_BLOCKING_MASK as appropriate, including the
> +	 * case where software directly injects an NMI.
> +	 */
> +	if (!is_vnmi_enabled(svm)) {
> +		svm->nmi_masked = true;
> +		svm_set_iret_intercept(svm);
> +	}
> +	++vcpu->stat.nmi_injections;
> +}
> +
>  static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3836,6 +3842,14 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	/* Secure AVIC only support V_NMI based NMI injection. */
> +	if (sev_savic_active(vcpu->kvm)) {
> +		if (svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK)
> +			return 0;
> +		return 1;
> +	}
> +
>  	if (svm->nested.nested_run_pending)
>  		return -EBUSY;
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
  2025-09-23  5:03 ` [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page Neeraj Upadhyay
@ 2025-09-23 16:02   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 16:02 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> The Secure AVIC hardware requires uninterrupted access to the guest's
> APIC backing page. If this page is not present in the Nested Page Table
> (NPT) during a hardware access, a non-recoverable nested page fault
> occurs. This sets a BUSY flag in the VMSA and causes subsequent
> VMRUNs to fail with an unrecoverable VMEXIT_BUSY, effectively
> killing the vCPU.
> 
> This situation can arise if the backing page resides within a 2MB large
> page in the NPT. If other parts of that large page are modified (e.g.,
> memory state changes), KVM would split the 2MB NPT entry into 4KB
> entries. This process can temporarily zap the PTE for the backing page,
> creating a window for the fatal hardware access.
> 
> Introduce a new GHCB VMGEXIT protocol, SVM_VMGEXIT_SECURE_AVIC, to
> allow the guest to explicitly inform KVM of the APIC backing page's
> location, thereby enabling KVM to guarantee its presence in the NPT.
> 
> Implement two actions for this protocol:
> 
> - SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE:
>   On this request, KVM receives the GPA of the backing page. To prevent
>   the 2MB page-split issue, immediately perform a PSMASH on the GPA by
>   calling sev_handle_rmp_fault(). This proactively breaks any
>   containing 2MB NPT entry into 4KB pages, isolating the backing page's
>   PTE and guaranteeing its presence. Store the GPA for future reference.
> 
> - SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE:
>   On this request, clear the stored GPA, releasing KVM from its
>   obligation to maintain the NPT entry. Return the previously
>   registered GPA to the guest.
> 
> This mechanism ensures the stability of the APIC backing page mapping,
> which is critical for the correct operation of Secure AVIC.
> 
> Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/include/uapi/asm/svm.h |  3 ++
>  arch/x86/kvm/svm/sev.c          | 59 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h          |  1 +
>  3 files changed, 63 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 9c640a521a67..f1ef52e0fab1 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -118,6 +118,9 @@
>  #define SVM_VMGEXIT_AP_CREATE			1
>  #define SVM_VMGEXIT_AP_DESTROY			2
>  #define SVM_VMGEXIT_SNP_RUN_VMPL		0x80000018
> +#define SVM_VMGEXIT_SECURE_AVIC			0x8000001a
> +#define SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE	0
> +#define SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE	1
>  #define SVM_VMGEXIT_HV_FEATURES			0x8000fffd
>  #define SVM_VMGEXIT_TERM_REQUEST		0x8000fffe
>  #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code)	\
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 7c66aefe428a..3e9cc50f2705 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3399,6 +3399,15 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  		    !kvm_ghcb_rcx_is_valid(svm))
>  			goto vmgexit_err;
>  		break;
> +	case SVM_VMGEXIT_SECURE_AVIC:
> +		if (!sev_savic_active(vcpu->kvm))
> +			goto vmgexit_err;
> +		if (!kvm_ghcb_rax_is_valid(svm))
> +			goto vmgexit_err;
> +		if (svm->vmcb->control.exit_info_1 == SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE)
> +			if (!kvm_ghcb_rbx_is_valid(svm))
> +				goto vmgexit_err;
> +		break;
>  	case SVM_VMGEXIT_MMIO_READ:
>  	case SVM_VMGEXIT_MMIO_WRITE:
>  		if (!kvm_ghcb_sw_scratch_is_valid(svm))
> @@ -4490,6 +4499,53 @@ static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
>  	return false;
>  }
>  
> +static int sev_handle_savic_vmgexit(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = NULL;

This gets confusing below, how about calling this target_vcpu. Also, it
shouldn't need initializing, right?

> +	u64 apic_id;
> +
> +	apic_id = kvm_rax_read(&svm->vcpu);
> +
> +	if (apic_id == -1ULL) {
> +		vcpu = &svm->vcpu;
> +	} else {
> +		vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
> +		if (!vcpu)
> +			goto savic_request_invalid;
> +	}
> +
> +	switch (svm->vmcb->control.exit_info_1) {
> +	case SVM_VMGEXIT_SAVIC_REGISTER_BACKING_PAGE:
> +		gpa_t gpa;
> +
> +		gpa = kvm_rbx_read(&svm->vcpu);
> +		if (!PAGE_ALIGNED(gpa))
> +			goto savic_request_invalid;
> +
> +		/*
> +		 * sev_handle_rmp_fault() invocation would result in PSMASH if
> +		 * NPTE size is 2M.
> +		 */

Why you're invoking sev_handle_rmp_fault() would be more appropriate in
the comment.

Thanks,
Tom

> +		sev_handle_rmp_fault(vcpu, gpa, 0);
> +		to_svm(vcpu)->sev_savic_gpa = gpa;
> +		break;
> +	case SVM_VMGEXIT_SAVIC_UNREGISTER_BACKING_PAGE:
> +		kvm_rbx_write(&svm->vcpu, to_svm(vcpu)->sev_savic_gpa);
> +		to_svm(vcpu)->sev_savic_gpa = 0;
> +		break;
> +	default:
> +		goto savic_request_invalid;
> +	}
> +
> +	return 1;
> +
> +savic_request_invalid:
> +	ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
> +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
> +
> +	return 1;
> +}
> +
>  int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4628,6 +4684,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  			    control->exit_info_1, control->exit_info_2);
>  		ret = -EINVAL;
>  		break;
> +	case SVM_VMGEXIT_SECURE_AVIC:
> +		ret = sev_handle_savic_vmgexit(svm);
> +		break;
>  	case SVM_EXIT_MSR:
>  		if (sev_savic_active(vcpu->kvm) && savic_handle_msr_exit(vcpu))
>  			return 1;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a3edb6e720cd..8043833a1a8c 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -337,6 +337,7 @@ struct vcpu_svm {
>  	bool guest_gif;
>  
>  	bool sev_savic_has_pending_ipi;
> +	gpa_t sev_savic_gpa;
>  };
>  
>  struct svm_cpu_data {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23 16:15   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 16:15 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> While Secure AVIC hardware accelerates End-of-Interrupt (EOI) processing
> for edge-triggered interrupts, it requires hypervisor assistance for
> level-triggered interrupts originating from the IOAPIC. For these
> interrupts, a guest write to the EOI MSR triggers a VM-Exit.
> 
> The primary challenge in handling this exit is that the guest's real
> In-Service Register (ISR) is not visible to KVM. When KVM receives an EOI,
> it has no direct way of knowing which interrupt vector is being
> acknowledged.
> 
> To solve this, use KVM's software vAPIC state as a shadow tracking
> mechanism for active, level-triggered interrupts.
> 
> The implementation follows this flow:
> 
> 1.  On interrupt injection (sev_savic_set_requested_irr), check KVM's
>     software vAPIC Trigger Mode Register (TMR) to identify if the
>     interrupt is level-triggered.
> 
> 2.  If it is, set the corresponding vector in KVM's software shadow ISR.
>     This marks the interrupt as "in-service" from KVM's perspective.
> 
> 3.  When the guest later issues an EOI, the APIC_EOI MSR write exit
>     handler finds the highest vector set in this shadow ISR.
> 
> 4.  The handler then clears the vector from the shadow ISR and calls
>     kvm_apic_set_eoi_accelerated() to propagate the EOI to the virtual
>     IOAPIC, allowing it to de-assert the interrupt line.
> 
> This enables correct EOI handling for level-triggered interrupts in
> Secure AVIC guests, despite the hardware-enforced opacity of the guest's
> APIC state.
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 3e9cc50f2705..5be2956fb812 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4474,7 +4474,9 @@ static void savic_handle_icr_write(struct kvm_vcpu *kvm_vcpu, u64 icr)
>  
>  static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_lapic *apic;
>  	u32 msr, reg;
> +	int vec;
>  
>  	msr = kvm_rcx_read(vcpu);
>  	reg = (msr - APIC_BASE_MSR) << 4;
> @@ -4492,6 +4494,12 @@ static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
>  			return true;
>  		}
>  		break;
> +	case APIC_EOI:
> +		apic = vcpu->arch.apic;
> +		vec = apic_find_highest_vector(apic->regs + APIC_ISR);
> +		apic_clear_vector(vec, apic->regs + APIC_ISR);
> +		kvm_apic_set_eoi_accelerated(vcpu, vec);
> +		return true;

Do you need to ensure that this is truly a WRMSR being done vs a RDMSR?
Or are you guaranteed that it is a WRMSR at this point?

Thanks,
Tom

>  	default:
>  		break;
>  	}
> @@ -5379,6 +5387,8 @@ void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
>  			vec = vec_start + vec_pos;
>  			apic_clear_vector(vec, apic->regs + APIC_IRR);
>  			val = val & ~BIT(vec_pos);
> +			if (apic_test_vector(vec, apic->regs + APIC_TMR))
> +				apic_set_vector(vec, apic->regs + APIC_ISR);
>  		} while (val);
>  	}
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests Neeraj Upadhyay
@ 2025-09-23 16:23   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 16:23 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> KVM tracks End-of-Interrupts (EOIs) for the legacy RTC interrupt (GSI 8)
> to detect and report coalesced interrupts to userspace. This mechanism
> fundamentally relies on KVM having visibility into the guest's interrupt
> acknowledgment state.
> 
> This assumption is invalid for guests with a protected APIC (e.g., Secure
> AVIC) for two main reasons:
> 
> a. The guest's true In-Service Register (ISR) is not visible to KVM,
>    making it impossible to know if the previous interrupt is still active.
>    So, lazy pending EOI checks cannot be done.
> 
> b. The RTC interrupt is edge-triggered, and its EOI is accelerated by the
>    hardware without a VM-Exit. KVM never sees the EOI event.
> 
> Since KVM can observe neither the interrupt's service status nor its EOI,
> the tracking logic is invalid. So, disable this feature for all protected
> APIC guests. This change means that userspace will no longer be able to
> detect coalesced RTC interrupts for these specific guest types.
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/ioapic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 2b5d389bca5f..308778ba4f58 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -113,6 +113,9 @@ static void __rtc_irq_eoi_tracking_restore_one(struct kvm_vcpu *vcpu)
>  	struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>  	union kvm_ioapic_redirect_entry *e;
>  
> +	if (vcpu->arch.apic->guest_apic_protected)
> +		return;

A comment above this code would be good.

> +
>  	e = &ioapic->redirtbl[RTC_GSI];
>  	if (!kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
>  				 e->fields.dest_id,
> @@ -476,6 +479,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
>  {
>  	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
>  	struct kvm_lapic_irq irqe;
> +	struct kvm_vcpu *vcpu;
>  	int ret;
>  
>  	if (entry->fields.mask ||
> @@ -505,7 +509,9 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
>  		BUG_ON(ioapic->rtc_status.pending_eoi != 0);
>  		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe,
>  					       &ioapic->rtc_status.dest_map);
> -		ioapic->rtc_status.pending_eoi = (ret < 0 ? 0 : ret);
> +		vcpu = kvm_get_vcpu(ioapic->kvm, 0);
> +		if (!vcpu->arch.apic->guest_apic_protected)
> +			ioapic->rtc_status.pending_eoi = (ret < 0 ? 0 : ret);

And a comment about this, too.

Thanks,
Tom

>  	} else
>  		ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe, NULL);
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests
  2025-09-23  5:03 ` [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests Neeraj Upadhyay
@ 2025-09-23 16:32   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-23 16:32 UTC (permalink / raw)
  To: Neeraj Upadhyay, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/23/25 00:03, Neeraj Upadhyay wrote:
> The kvm_wait_lapic_expire() function is a pre-VMRUN optimization that
> allows a vCPU to wait for an imminent LAPIC timer interrupt. However,
> this function is not fully compatible with protected APIC models like
> Secure AVIC because it relies on inspecting KVM's software vAPIC state.
> For Secure AVIC, the true timer state is hardware-managed and opaque
> to KVM. For this reason, kvm_wait_lapic_expire() does not check whether
> timer interrupt is injected for the guests which have protected APIC
> state.
> 
> For the protected APIC guests, the check for injected timer need to be
> done by the callers of kvm_wait_lapic_expire(). So, for Secure AVIC
> guests, check to be injected vectors in the requested_IRR for injected
> timer interrupt before doing a kvm_wait_lapic_expire().
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 8 ++++++++
>  arch/x86/kvm/svm/svm.c | 3 ++-
>  arch/x86/kvm/svm/svm.h | 2 ++
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 5be2956fb812..3f6cf8d5068a 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -5405,3 +5405,11 @@ bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
>  	return READ_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi) ||
>  		kvm_apic_has_interrupt(vcpu) != -1;
>  }
> +
> +bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu)
> +{
> +	u32 reg  = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTT);

Extra space before the "="

> +	int vec = reg & APIC_VECTOR_MASK;
> +
> +	return to_svm(vcpu)->vmcb->control.requested_irr[vec / 32] & BIT(vec % 32);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a945bc094c1a..d0d972731ea7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4335,7 +4335,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>  	    vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
>  		update_debugctlmsr(svm->vmcb->save.dbgctl);
>  
> -	kvm_wait_lapic_expire(vcpu);
> +	if (!sev_savic_active(vcpu->kvm) || sev_savic_timer_int_injected(vcpu))
> +		kvm_wait_lapic_expire(vcpu);
>  
>  	/*
>  	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 8043833a1a8c..ecc4ea11822d 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -878,6 +878,7 @@ static inline bool sev_savic_active(struct kvm *kvm)
>  }
>  void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected);
>  bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu);
> +bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu);
>  #else
>  static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
>  {
> @@ -917,6 +918,7 @@ static inline struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
>  static inline void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) {}
>  static inline void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected) {}
>  static inline bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; }
> +static inline bool sev_savic_timer_int_injected(struct kvm_vcpu *vcpu) { return true; }

Shouldn't this return false? If CONFIG_KVM_AMD_SEV isn't defined, then
sev_savic_active() will always be false and this won't be called anyway.

Thanks,
Tom

>  #endif
>  
>  /* vmenter.S */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  2025-09-23 13:55   ` Tom Lendacky
@ 2025-09-25  5:16     ` Upadhyay, Neeraj
  2025-09-25 13:54       ` Tom Lendacky
  0 siblings, 1 reply; 32+ messages in thread
From: Upadhyay, Neeraj @ 2025-09-25  5:16 UTC (permalink / raw)
  To: Tom Lendacky, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala



On 9/23/2025 7:25 PM, Tom Lendacky wrote:
> On 9/23/25 00:03, Neeraj Upadhyay wrote:
>> Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
>> enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
>> guest APIC backing page and bitfields to control enablement of Secure
>> AVIC and whether the guest allows NMIs to be injected by the hypervisor.
>> This MSR is populated by the guest and can be read by the guest to get
>> the GPA of the APIC backing page. The MSR can only be accessed in Secure
>> AVIC mode; accessing it when not in Secure AVIC mode results in #GP. So,
>> KVM should not intercept it.
> 
> The reason KVM should not intercept the MSR access is that the guest
> would not be able to actually set the MSR if it is intercepted.
> 

Yes, something like below looks ok?

Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
guest APIC backing page and bitfields to control enablement of Secure
AVIC and whether the guest allows NMIs to be injected by the hypervisor.
This MSR is populated by the guest and can be read by the guest to get
the GPA of the APIC backing page. This MSR is only accessible by the
guest when the Secure AVIC feature is active; any other access attempt
will result in a #GP fault. So, KVM should not intercept access to this
MSR, as doing so prevents the guest from successfully reading/writing 
its configuration and enabling the feature.



- Neeraj


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  2025-09-25  5:16     ` Upadhyay, Neeraj
@ 2025-09-25 13:54       ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2025-09-25 13:54 UTC (permalink / raw)
  To: Upadhyay, Neeraj, kvm, seanjc, pbonzini
  Cc: linux-kernel, nikunj, Santosh.Shukla, Vasant.Hegde,
	Suravee.Suthikulpanit, bp, David.Kaplan, huibo.wang, naveen.rao,
	tiala

On 9/25/25 00:16, Upadhyay, Neeraj wrote:
> 
> 
> On 9/23/2025 7:25 PM, Tom Lendacky wrote:
>> On 9/23/25 00:03, Neeraj Upadhyay wrote:
>>> Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
>>> enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
>>> guest APIC backing page and bitfields to control enablement of Secure
>>> AVIC and whether the guest allows NMIs to be injected by the hypervisor.
>>> This MSR is populated by the guest and can be read by the guest to get
>>> the GPA of the APIC backing page. The MSR can only be accessed in Secure
>>> AVIC mode; accessing it when not in Secure AVIC mode results in #GP. So,
>>> KVM should not intercept it.
>>
>> The reason KVM should not intercept the MSR access is that the guest
>> would not be able to actually set the MSR if it is intercepted.
>>
> 
> Yes, something like below looks ok?
> 
> Disable interception for SECURE_AVIC_CONTROL MSR for Secure AVIC
> enabled guests. The SECURE_AVIC_CONTROL MSR holds the GPA of the
> guest APIC backing page and bitfields to control enablement of Secure
> AVIC and whether the guest allows NMIs to be injected by the hypervisor.
> This MSR is populated by the guest and can be read by the guest to get
> the GPA of the APIC backing page. This MSR is only accessible by the
> guest when the Secure AVIC feature is active; any other access attempt
> will result in a #GP fault. So, KVM should not intercept access to this
> MSR, as doing so prevents the guest from successfully reading/writing its
> configuration and enabling the feature.

It's probably more info than is really needed. Just saying something like
the following should be enough (feel free to improve on this):

Disable interception of the SECURE_AVIC_CONTROL MSR for Secure AVIC
enabled guests. The SECURE_AVIC_CONTROL MSR is used by the guest to
configure and enable Secure AVIC. In order for the guest to be able to
successfully do this, the MSR access must not be intercepted.

Thanks,
Tom


> 
> 
> 
> - Neeraj
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-09-25 13:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23  5:03 [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 01/17] KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 02/17] x86/cpufeatures: Add Secure AVIC CPU feature Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 03/17] KVM: SVM: Add support for Secure AVIC capability in KVM Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 04/17] KVM: SVM: Set guest APIC protection flags for Secure AVIC Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 05/17] KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests Neeraj Upadhyay
2025-09-23 13:55   ` Tom Lendacky
2025-09-25  5:16     ` Upadhyay, Neeraj
2025-09-25 13:54       ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 06/17] KVM: SVM: Implement interrupt injection for Secure AVIC Neeraj Upadhyay
2025-09-23 14:47   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support " Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 08/17] KVM: SVM: Do not inject exception " Neeraj Upadhyay
2025-09-23 15:00   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 09/17] KVM: SVM: Do not intercept exceptions for Secure AVIC guests Neeraj Upadhyay
2025-09-23 15:15   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 10/17] KVM: SVM: Set VGIF in VMSA area " Neeraj Upadhyay
2025-09-23 15:16   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 11/17] KVM: SVM: Enable NMI support " Neeraj Upadhyay
2025-09-23 15:25   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 12/17] KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page Neeraj Upadhyay
2025-09-23 16:02   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 13/17] KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests Neeraj Upadhyay
2025-09-23 16:15   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 14/17] KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests Neeraj Upadhyay
2025-09-23 16:23   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 15/17] KVM: SVM: Check injected timers for Secure AVIC guests Neeraj Upadhyay
2025-09-23 16:32   ` Tom Lendacky
2025-09-23  5:03 ` [RFC PATCH v2 16/17] KVM: x86/cpuid: Disable paravirt APIC features for protected APIC Neeraj Upadhyay
2025-09-23  5:03 ` [RFC PATCH v2 17/17] KVM: SVM: Advertise Secure AVIC support for SNP guests Neeraj Upadhyay
2025-09-23 10:02 ` [syzbot ci] Re: AMD: Add Secure AVIC KVM Support syzbot ci
2025-09-23 10:17   ` Upadhyay, Neeraj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox