linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Implement support for IBS virtualization
@ 2025-09-01  5:16 Manali Shukla
  2025-09-01  5:19 ` [PATCH v2 01/12] perf/amd/ibs: Fix race condition in IBS Manali Shukla
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:16 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Add support for IBS virtualization (VIBS). VIBS feature allows the
guest to collect IBS samples without exiting the guest.  There are
2 parts to it [1].
 - Virtualizing the IBS register state.
 - Ensuring the IBS interrupt is handled in the guest without exiting
   the hypervisor.

To deliver virtualized IBS interrupts to the guest, VIBS requires either
AVIC or Virtual NMI (VNMI) support [1]. During IBS sampling, the
hardware signals a VNMI. The source of this VNMI depends on the AVIC
configuration:

 - With AVIC disabled, the virtual NMI is hardware-accelerated.
 - With AVIC enabled, the virtual NMI is delivered via AVIC using Extended LVT.

The local interrupts are extended to include more LVT registers, to
allow additional interrupt sources, like instruction based sampling
etc. [3].

Although IBS virtualization requires either AVIC or VNMI to be enabled
in order to successfully deliver IBS NMIs to the guest, VNMI must be
enabled to ensure reliable delivery. This requirement stems from the
dynamic behavior of AVIC (This is needed because AVIC can change its
state while the guest is running). While a guest is launched with AVIC
enabled, AVIC can be inhibited at runtime. When AVIC is inhibited and
VNMI is disabled, there is no mechanism to deliver IBS NMIs to the
guest. Therefore, enabling VNMI is necessary to support IBS
virtualization reliably.

Note that, since IBS registers are swap type C [2], the hypervisor is
responsible for saving and restoring of IBS host state. Hypervisor needs
to disable host IBS before saving the state and enter the guest. After a
guest exit, the hypervisor needs to restore host IBS state and re-enable
IBS.

The mediated PMU has the capability to save the host context when
entering the guest by scheduling out all exclude_guest events, and to
restore the host context when exiting the guest by scheduling in the
previously scheduled-out events. This behavior aligns with the
requirement for IBS registers being of swap type C. Therefore, the
mediated PMU design can be leveraged to implement IBS virtualization.
As a result, enabling the mediated PMU is a necessary requirement for
IBS virtualization.

The initial version of this series has been posted here:
https://lore.kernel.org/kvm/f98687e0-1fee-8208-261f-d93152871f00@amd.com/

Since then, the mediated PMU patches [4] have matured significantly.
This series is a resurrection of previous VIBS series and leverages the
mediated PMU infrastructure to enable IBS virtualization.

How to enable VIBS?
----------------------------------------------
sudo echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
sudo modprobe -r kvm_amd
sudo modprobe kvm_amd enable_mediated_pmu=1 vnmi=1

Qemu changes can be found at below location:
----------------------------------------------
https://github.com/AMDESE/qemu/tree/vibs_v1

Qemu commandline to enable IBS virtualization:
------------------------------------------------
qemu-system-x86_64 -enable-kvm -cpu host \ ..

Testing done:
------------------------------------------------
- Following tests were executed on guest
  sudo perf record -e ibs_op// -c 100000 -a
  sudo perf record -e ibs_op// -c 100000 -C 10
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
  sudo perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
  sudo perf record -e ibs_op// -e ibs_fetch// -a --raw-samples -c 100000
  sudo perf report
  sudo perf script
  sudo perf report -D | grep -P "LdOp 1.*StOp 0" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | wc -l
  sudo perf report -D | grep -B1 -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | grep -P "DataSrc ([02-9]|1[0-2])=" | wc -l
- perf_fuzzer was run for 12hrs, no softlockups or unknown NMIs were
  seen.
-  Ran xapic_ipi_test and xapic_state_test to verify there was no
   regression after changes were made to the APIC register mask
   to accommodate extended APIC registers.

TO-DO:
-----------------------------------
Enable IBS virtualization on SEV-ES and SEV-SNP guests.

base-commit: 
https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5

[1]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

[3]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 16.4.5
     Extended Interrupts.

[4]: https://lore.kernel.org/kvm/463a0265-e854-4677-92f2-be17e46a3426@linux.intel.com/T/#t

v1->v2
- Incorporated review comments from Mi Dapeng
  - Change the name of kvm_lapic_state_w_extapic to kvm_ext_lapic_state.
  - Refactor APIC register mask handling in order to support extended
    APIC registers.
  - Miscellaneous changes

v1: https://lore.kernel.org/kvm/afafc865-b42f-4a9d-82d7-a72de16bb47b@amd.com/T/

Manali Shukla (7):
  perf/amd/ibs: Fix race condition in IBS
  KVM: x86: Refactor APIC register mask handling to support extended
    ranges
  KVM: Add KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC for extapic
  KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  KVM: x86: Extend CPUID range to include new leaf
  perf/x86/amd: Enable VPMU passthrough capability for IBS PMU
  perf/x86/amd: Remove exclude_guest check from perf_ibs_init()

Santosh Shukla (5):
  x86/cpufeatures: Add CPUID feature bit for Extended LVT
  KVM: x86: Add emulation support for Extented LVT registers
  x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests
  KVM: SVM: Extend VMCB area for virtualized IBS registers
  KVM: SVM: Add support for IBS Virtualization

 Documentation/virt/kvm/api.rst     |  23 +++++
 arch/x86/events/amd/ibs.c          |   8 +-
 arch/x86/include/asm/apicdef.h     |  17 ++++
 arch/x86/include/asm/cpufeatures.h |   2 +
 arch/x86/include/asm/kvm_host.h    |   1 +
 arch/x86/include/asm/svm.h         |  16 ++-
 arch/x86/include/uapi/asm/kvm.h    |   5 +
 arch/x86/kvm/cpuid.c               |  13 +++
 arch/x86/kvm/lapic.c               | 152 +++++++++++++++++++++--------
 arch/x86/kvm/lapic.h               |   9 +-
 arch/x86/kvm/reverse_cpuid.h       |  16 +++
 arch/x86/kvm/svm/avic.c            |   4 +
 arch/x86/kvm/svm/svm.c             |  98 +++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c             |   9 +-
 arch/x86/kvm/x86.c                 |  37 +++++--
 include/uapi/linux/kvm.h           |  10 ++
 16 files changed, 359 insertions(+), 61 deletions(-)


base-commit: 196d9e72c4b0bd68b74a4ec7f52d248f37d0f030

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 01/12] perf/amd/ibs: Fix race condition in IBS
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
@ 2025-09-01  5:19 ` Manali Shukla
  2025-09-01  5:21 ` [PATCH v2 02/12] KVM: x86: Refactor APIC register mask handling to support extended APIC registers Manali Shukla
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:19 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Consider the following scenario,

While scheduling out an IBS event from perf's core scheduling path,
event_sched_out() disables the IBS event by clearing the IBS enable
bit in perf_ibs_disable_event(). However, if a delayed IBS NMI is
delivered after the IBS enable bit is cleared, the IBS NMI handler
may still observe the valid bit set and incorrectly treat the sample
as valid. As a result, it re-enables IBS by setting the enable bit,
even though the event has already been scheduled out.

This leads to a situation where IBS is re-enabled after being
explicitly disabled, which is incorrect. Although this race does not
have visible side effects, it violates the expected behavior of the
perf subsystem.

The race is particularly noticeable when userspace repeatedly disables
and re-enables IBS using PERF_EVENT_IOC_DISABLE and
PERF_EVENT_IOC_ENABLE ioctls in a loop.

Fix this by checking the IBS_STOPPED bit in the IBS NMI handler before
re-enabling the IBS event. If the IBS_STOPPED bit is set, it indicates
that the event is either disabled or in the process of being disabled,
and the NMI handler should not re-enable it.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 112f43b23ebf..67ed9673f1ac 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1385,7 +1385,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		}
 		new_config |= period >> 4;
 
-		perf_ibs_enable_event(perf_ibs, hwc, new_config);
+		if (!test_bit(IBS_STOPPING, pcpu->state))
+			perf_ibs_enable_event(perf_ibs, hwc, new_config);
 	}
 
 	perf_event_update_userpage(event);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 02/12] KVM: x86: Refactor APIC register mask handling to support extended APIC registers
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
  2025-09-01  5:19 ` [PATCH v2 01/12] perf/amd/ibs: Fix race condition in IBS Manali Shukla
@ 2025-09-01  5:21 ` Manali Shukla
  2025-09-01  5:21 ` [PATCH v2 03/12] KVM: Add KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC for extapic Manali Shukla
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:21 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Modify the APIC register mask infrastructure to support both standard
APIC registers (0x0-0x3f0) and extended APIC registers (0x400-0x530).

This refactoring:
- Replaces the single u64 bitmask with a u64[2] array to accommodate
  the extended register range(128 bitmask)
- Updates the APIC_REG_MASK macro to handle both standard and extended
  register spaces
- Adapts kvm_lapic_readable_reg_mask() to use the new approach
- Adds APIC_REG_TEST macro to check register validity for standard
  APIC registers and Exended APIC registers
- Updates all callers to use the new interface

This is purely an infrastructure change to support the upcoming
extended APIC register emulation.

Suggested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/lapic.c   | 99 ++++++++++++++++++++++++++----------------
 arch/x86/kvm/lapic.h   |  2 +-
 arch/x86/kvm/vmx/vmx.c | 10 +++--
 3 files changed, 70 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e19545b8cc98..f92e3f53ee75 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1587,53 +1587,77 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 	return container_of(dev, struct kvm_lapic, dev);
 }
 
-#define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
-#define APIC_REGS_MASK(first, count) \
-	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
-
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
-{
-	/* Leave bits '0' for reserved and write-only registers. */
-	u64 valid_reg_mask =
-		APIC_REG_MASK(APIC_ID) |
-		APIC_REG_MASK(APIC_LVR) |
-		APIC_REG_MASK(APIC_TASKPRI) |
-		APIC_REG_MASK(APIC_PROCPRI) |
-		APIC_REG_MASK(APIC_LDR) |
-		APIC_REG_MASK(APIC_SPIV) |
-		APIC_REGS_MASK(APIC_ISR, APIC_ISR_NR) |
-		APIC_REGS_MASK(APIC_TMR, APIC_ISR_NR) |
-		APIC_REGS_MASK(APIC_IRR, APIC_ISR_NR) |
-		APIC_REG_MASK(APIC_ESR) |
-		APIC_REG_MASK(APIC_ICR) |
-		APIC_REG_MASK(APIC_LVTT) |
-		APIC_REG_MASK(APIC_LVTTHMR) |
-		APIC_REG_MASK(APIC_LVTPC) |
-		APIC_REG_MASK(APIC_LVT0) |
-		APIC_REG_MASK(APIC_LVT1) |
-		APIC_REG_MASK(APIC_LVTERR) |
-		APIC_REG_MASK(APIC_TMICT) |
-		APIC_REG_MASK(APIC_TMCCT) |
-		APIC_REG_MASK(APIC_TDCR);
+/*
+ * Helper macros for APIC register bitmask handling
+ * 2 element array is being used to represent 128-bit mask, where:
+ * - mask[0] tracks standard APIC registers (0x0-0x3f0)
+ * - mask[1] tracks extended APIC registers (0x400-0x530)
+ */
+
+#define APIC_REG_INDEX(reg)	(((reg) < 0x400) ? 0 : 1)
+#define APIC_REG_BIT(reg)	(((reg) < 0x400) ? ((reg) >> 4) : (((reg) - 0x400) >> 4))
+
+/* Set a bit in the mask for a single APIC register. */
+#define APIC_REG_MASK(reg, mask) do { \
+	(mask)[APIC_REG_INDEX(reg)] |= (1ULL << APIC_REG_BIT(reg)); \
+} while (0)
+
+/* Set bits in the mask for a range of consecutive APIC registers. */
+#define APIC_REGS_MASK(first, count, mask) do { \
+	(mask)[APIC_REG_INDEX(first)] |= ((1ULL << (count)) - 1) << APIC_REG_BIT(first); \
+} while (0)
+
+/* Macro to check whether the an APIC register bit is set in the mask. */
+#define APIC_REG_TEST(reg, mask) \
+	((mask)[APIC_REG_INDEX(reg)] & (1ULL << APIC_REG_BIT(reg)))
+
+#define APIC_LAST_REG_OFFSET		0x3f0
+#define APIC_EXT_LAST_REG_OFFSET	0x530
+
+void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 mask[2])
+{
+	mask[0] = 0;
+	mask[1] = 0;
+
+	APIC_REG_MASK(APIC_ID, mask);
+	APIC_REG_MASK(APIC_LVR, mask);
+	APIC_REG_MASK(APIC_TASKPRI, mask);
+	APIC_REG_MASK(APIC_PROCPRI, mask);
+	APIC_REG_MASK(APIC_LDR, mask);
+	APIC_REG_MASK(APIC_SPIV, mask);
+	APIC_REGS_MASK(APIC_ISR, APIC_ISR_NR, mask);
+	APIC_REGS_MASK(APIC_TMR, APIC_ISR_NR, mask);
+	APIC_REGS_MASK(APIC_IRR, APIC_ISR_NR, mask);
+	APIC_REG_MASK(APIC_ESR, mask);
+	APIC_REG_MASK(APIC_ICR, mask);
+	APIC_REG_MASK(APIC_LVTT, mask);
+	APIC_REG_MASK(APIC_LVTTHMR, mask);
+	APIC_REG_MASK(APIC_LVTPC, mask);
+	APIC_REG_MASK(APIC_LVT0, mask);
+	APIC_REG_MASK(APIC_LVT1, mask);
+	APIC_REG_MASK(APIC_LVTERR, mask);
+	APIC_REG_MASK(APIC_TMICT, mask);
+	APIC_REG_MASK(APIC_TMCCT, mask);
 
 	if (kvm_lapic_lvt_supported(apic, LVT_CMCI))
-		valid_reg_mask |= APIC_REG_MASK(APIC_LVTCMCI);
+		APIC_REG_MASK(APIC_LVTCMCI, mask);
 
 	/* ARBPRI, DFR, and ICR2 are not valid in x2APIC mode. */
-	if (!apic_x2apic_mode(apic))
-		valid_reg_mask |= APIC_REG_MASK(APIC_ARBPRI) |
-				  APIC_REG_MASK(APIC_DFR) |
-				  APIC_REG_MASK(APIC_ICR2);
-
-	return valid_reg_mask;
+	if (!apic_x2apic_mode(apic)) {
+		APIC_REG_MASK(APIC_ARBPRI, mask);
+		APIC_REG_MASK(APIC_DFR, mask);
+		APIC_REG_MASK(APIC_ICR2, mask);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
 
 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 			      void *data)
 {
+	unsigned int last_reg = APIC_LAST_REG_OFFSET;
 	unsigned char alignment = offset & 0xf;
 	u32 result;
+	u64 mask[2];
 
 	/*
 	 * WARN if KVM reads ICR in x2APIC mode, as it's an 8-byte register in
@@ -1644,8 +1668,9 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	if (alignment + len > 4)
 		return 1;
 
-	if (offset > 0x3f0 ||
-	    !(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
+	kvm_lapic_readable_reg_mask(apic, mask);
+
+	if (offset > last_reg || !APIC_REG_TEST(offset, mask))
 		return 1;
 
 	result = __apic_read(apic, offset & ~0xf);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8b00e29741de..a07f8524d04a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -147,7 +147,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
 void kvm_lapic_exit(void);
 
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
+void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 mask[2]);
 
 static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4a4691beba55..b13a20c9787e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4032,10 +4032,14 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
 	 * through reads for all valid registers by default in x2APIC+APICv
 	 * mode, only the current timer count needs on-demand emulation by KVM.
 	 */
-	if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
-		msr_bitmap[read_idx] = ~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
-	else
+	if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
+		u64 mask[2];
+
+		kvm_lapic_readable_reg_mask(vcpu->arch.apic, mask);
+		msr_bitmap[read_idx] = ~mask[0];
+	} else {
 		msr_bitmap[read_idx] = ~0ull;
+	}
 	msr_bitmap[write_idx] = ~0ull;
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 03/12] KVM: Add KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC for extapic
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
  2025-09-01  5:19 ` [PATCH v2 01/12] perf/amd/ibs: Fix race condition in IBS Manali Shukla
  2025-09-01  5:21 ` [PATCH v2 02/12] KVM: x86: Refactor APIC register mask handling to support extended APIC registers Manali Shukla
@ 2025-09-01  5:21 ` Manali Shukla
  2025-09-01  5:22 ` [PATCH v2 04/12] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:21 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Modern AMD processors expose four additional extended LVT registers in
the extended APIC register space, which can be used for additional
interrupt sources such as instruction-based sampling and others.

To support this, introduce two new vCPU-based IOCTLs:
KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC. These IOCTLs works similarly
to KVM_GET_LAPIC and KVM_SET_LAPIC, but operate on APIC page with
extended APIC register space located at APIC offsets 400h-530h.

These IOCTLs are intended for use when extended APIC support is
enabled in the guest. They allow saving and restoring the full APIC
page, including the extended registers.

To support this, the `struct kvm_ext_lapic_state` has been made
extensible rather than hardcoding its size, improving forward
compatibility.

Documentation for the new IOCTLs has also been added.

For more details on the extended APIC space, refer to AMD Programmer’s
Manual Volume 2, Section 16.4.5: Extended Interrupts.
https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 Documentation/virt/kvm/api.rst  | 23 ++++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h |  5 +++++
 arch/x86/kvm/lapic.c            | 12 ++++++-----
 arch/x86/kvm/lapic.h            |  6 ++++--
 arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++---------
 include/uapi/linux/kvm.h        | 10 +++++++++
 6 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6aa40ee05a4a..0653718a4f04 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2048,6 +2048,18 @@ error.
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_ext_lapic_state {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+  };
+
+Applications should use KVM_GET_EXT_LAPIC ioctl if extended APIC is
+enabled. KVM_GET_EXT_LAPIC reads Local APIC registers with extended
+APIC register space located at offsets 400h-530h and copies them into input
+argument.
+
 If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
 enabled, then the format of APIC_ID register depends on the APIC mode
 (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
@@ -2079,6 +2091,17 @@ always uses xAPIC format.
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_ext_lapic_state {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+  };
+
+Applications should use KVM_SET_EXT_LAPIC ioctl if extended APIC is enabled.
+KVM_SET_EXT_LAPIC copies input arguments with extended APIC register into
+Local APIC and extended APIC registers.
+
 The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
 regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
 See the note in KVM_GET_LAPIC.
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..d26e1e1bf856 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -124,6 +124,11 @@ struct kvm_lapic_state {
 	char regs[KVM_APIC_REG_SIZE];
 };
 
+#define KVM_APIC_EXT_REG_SIZE 0x540
+struct kvm_ext_lapic_state {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+};
+
 struct kvm_segment {
 	__u64 base;
 	__u32 limit;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f92e3f53ee75..8bf7e0d33da9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3058,7 +3058,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
 EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
 
 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s, bool set)
+		struct kvm_ext_lapic_state *s, bool set)
 {
 	if (apic_x2apic_mode(vcpu->arch.apic)) {
 		u32 x2apic_id = kvm_x2apic_id(vcpu->arch.apic);
@@ -3109,9 +3109,10 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_ext_lapic_state *s,
+		       unsigned int size)
 {
-	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
+	memcpy(s->regs, vcpu->arch.apic->regs, size);
 
 	/*
 	 * Get calculated timer current count for remaining timer period (if
@@ -3122,7 +3123,8 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	return kvm_apic_state_fixup(vcpu, s, false);
 }
 
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_ext_lapic_state *s,
+		       unsigned int size)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	int r;
@@ -3137,7 +3139,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 		kvm_recalculate_apic_map(vcpu->kvm);
 		return r;
 	}
-	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
+	memcpy(vcpu->arch.apic->regs, s->regs, size);
 
 	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
 	kvm_recalculate_apic_map(vcpu->kvm);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a07f8524d04a..b411de5f33a3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -122,9 +122,11 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
 
 int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu);
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_ext_lapic_state *s,
+		       unsigned int size);
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_ext_lapic_state *s,
+		       unsigned int size);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e612a34779d7..b249e4c74063 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5131,25 +5131,25 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 }
 
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_ext_lapic_state *s, unsigned int size)
 {
 	if (vcpu->arch.apic->guest_apic_protected)
 		return -EINVAL;
 
 	kvm_x86_call(sync_pir_to_irr)(vcpu);
 
-	return kvm_apic_get_state(vcpu, s);
+	return kvm_apic_get_state(vcpu, s, size);
 }
 
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_ext_lapic_state *s, unsigned int size)
 {
 	int r;
 
 	if (vcpu->arch.apic->guest_apic_protected)
 		return -EINVAL;
 
-	r = kvm_apic_set_state(vcpu, s);
+	r = kvm_apic_set_state(vcpu, s, size);
 	if (r)
 		return r;
 	update_cr8_intercept(vcpu);
@@ -5872,10 +5872,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 {
 	struct kvm_vcpu *vcpu = filp->private_data;
 	void __user *argp = (void __user *)arg;
+	unsigned long size;
 	int r;
 	union {
 		struct kvm_sregs2 *sregs2;
-		struct kvm_lapic_state *lapic;
+		struct kvm_ext_lapic_state *lapic;
 		struct kvm_xsave *xsave;
 		struct kvm_xcrs *xcrs;
 		void *buffer;
@@ -5885,35 +5886,51 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 	u.buffer = NULL;
 	switch (ioctl) {
+	case KVM_GET_EXT_LAPIC:
 	case KVM_GET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
+
+		if (ioctl == KVM_GET_EXT_LAPIC)
+			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
+		else
+			size = sizeof(struct kvm_lapic_state);
+
+		u.lapic = kzalloc(size, GFP_KERNEL);
 
 		r = -ENOMEM;
 		if (!u.lapic)
 			goto out;
-		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic, size);
 		if (r)
 			goto out;
+
 		r = -EFAULT;
-		if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state)))
+		if (copy_to_user(argp, u.lapic, size))
 			goto out;
+
 		r = 0;
 		break;
 	}
+	case KVM_SET_EXT_LAPIC:
 	case KVM_SET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = memdup_user(argp, sizeof(*u.lapic));
+
+		if (ioctl == KVM_SET_EXT_LAPIC)
+			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
+		else
+			size = sizeof(struct kvm_lapic_state);
+		u.lapic = memdup_user(argp, size);
+
 		if (IS_ERR(u.lapic)) {
 			r = PTR_ERR(u.lapic);
 			goto out_nofree;
 		}
 
-		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic, size);
 		break;
 	}
 	case KVM_INTERRUPT: {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0f0d49d2544..e72e536e82bc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1318,6 +1318,16 @@ struct kvm_vfio_spapr_tce {
 #define KVM_SET_FPU               _IOW(KVMIO,  0x8d, struct kvm_fpu)
 #define KVM_GET_LAPIC             _IOR(KVMIO,  0x8e, struct kvm_lapic_state)
 #define KVM_SET_LAPIC             _IOW(KVMIO,  0x8f, struct kvm_lapic_state)
+/*
+ * Added to save/restore local APIC registers with extended APIC (extapic)
+ * register space.
+ *
+ * Qemu emulates extapic logic only when KVM enables extapic functionality via
+ * KVM capability. In the condition where Qemu sets extapic registers, but KVM doesn't
+ * set extapic capability, Qemu ends up using KVM_GET_LAPIC and KVM_SET_LAPIC.
+ */
+#define KVM_GET_EXT_LAPIC	  _IOR(KVMIO,  0x8e, struct kvm_ext_lapic_state)
+#define KVM_SET_EXT_LAPIC	  _IOW(KVMIO,  0x8f, struct kvm_ext_lapic_state)
 #define KVM_SET_CPUID2            _IOW(KVMIO,  0x90, struct kvm_cpuid2)
 #define KVM_GET_CPUID2            _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
 /* Available with KVM_CAP_VAPIC */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 04/12] x86/cpufeatures: Add CPUID feature bit for Extended LVT
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (2 preceding siblings ...)
  2025-09-01  5:21 ` [PATCH v2 03/12] KVM: Add KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC for extapic Manali Shukla
@ 2025-09-01  5:22 ` Manali Shukla
  2025-09-01  5:22 ` [PATCH v2 05/12] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:22 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

Local interrupts can be extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS).

The Extended APIC feature register indicates the number of extended
Local Vector Table(LVT) registers in the local APIC.  Currently, there
are 4 extended LVT registers available which are located at APIC
offsets (400h-530h).

The EXTLVT feature bit changes the behavior associated with reading
and writing an extended LVT register when AVIC is enabled. When the
EXTLVT and AVIC are enabled, a write to an extended LVT register
changes from a fault style #VMEXIT to a trap style #VMEXIT and a read
of an extended LVT register no longer triggers a #VMEXIT [2].

Presence of the EXTLVT feature is indicated via CPUID function
0x8000000A_EDX[27].

More details about the EXTLVT feature can be found at [1].

[1]: AMD Programmer's Manual Volume 2,
Section 16.4.5 Extended Interrupts.
https://bugzilla.kernel.org/attachment.cgi?id=306250

[2]: AMD Programmer's Manual Volume 2,
Table 15-22. Guest vAPIC Register Access Behavior.
https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 286d509f9363..0dd44cbf7196 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -378,6 +378,7 @@
 #define X86_FEATURE_X2AVIC		(15*32+18) /* "x2avic" Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* "vnmi" Virtual NMI */
+#define X86_FEATURE_EXTLVT		(15*32+27) /* Extended Local vector Table */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* SVME addr check */
 #define X86_FEATURE_BUS_LOCK_THRESHOLD	(15*32+29) /* Bus lock threshold */
 #define X86_FEATURE_IDLE_HLT		(15*32+30) /* IDLE HLT intercept */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 05/12] KVM: x86: Add emulation support for Extented LVT registers
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (3 preceding siblings ...)
  2025-09-01  5:22 ` [PATCH v2 04/12] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
@ 2025-09-01  5:22 ` Manali Shukla
  2025-09-01  5:23 ` [PATCH v2 06/12] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:22 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

The local interrupts are extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS) and many more.

Currently there are four additional LVT registers defined and they are
located at APIC offsets 400h-530h.

AMD IBS driver is designed to use EXTLVT (Extended interrupt local
vector table) by default for driver initialization.

Extended LVT registers are required to be emulated to initialize the
guest IBS driver successfully.

Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
on Extended LVT.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/apicdef.h | 17 ++++++++++++++
 arch/x86/kvm/cpuid.c           |  6 +++++
 arch/x86/kvm/lapic.c           | 42 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/lapic.h           |  1 +
 arch/x86/kvm/svm/avic.c        |  4 ++++
 arch/x86/kvm/svm/svm.c         |  6 +++++
 6 files changed, 76 insertions(+)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 094106b6a538..4c0f580578aa 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -146,6 +146,23 @@
 #define		APIC_EILVT_MSG_EXT	0x7
 #define		APIC_EILVT_MASKED	(1 << 16)
 
+/*
+ * Initialize extended APIC registers to the default value when guest
+ * is started and EXTAPIC feature is enabled on the guest.
+ *
+ * APIC_EFEAT is a read only Extended APIC feature register, whose
+ * default value is 0x00040007. However, bits 0, 1, and 2 represent
+ * features that are not currently emulated by KVM. Therefore, these
+ * bits must be cleared during initialization. As a result, the
+ * default value used for APIC_EFEAT in KVM is 0x00040000.
+ *
+ * APIC_ECTRL is a read-write Extended APIC control register, whose
+ * default value is 0x0.
+ */
+
+#define		APIC_EFEAT_DEFAULT	0x00040000
+#define		APIC_ECTRL_DEFAULT	0x0
+
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
 #define APIC_BASE_MSR		0x800
 #define APIC_X2APIC_ID_MSR	0x802
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cc16e28bfab2..fd97000ddd13 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -443,6 +443,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	/* Invoke the vendor callback only after the above state is updated. */
 	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
 
+	/*
+	 * Initialize extended LVT registers at guest startup to support delivery
+	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
+	 */
+	kvm_apic_init_eilvt_regs(vcpu);
+
 	/*
 	 * Except for the MMU, which needs to do its thing any vendor specific
 	 * adjustments to the reserved GPA bits.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8bf7e0d33da9..576d2d127d04 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1648,6 +1648,12 @@ void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 mask[2])
 		APIC_REG_MASK(APIC_DFR, mask);
 		APIC_REG_MASK(APIC_ICR2, mask);
 	}
+
+	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
+		APIC_REG_MASK(APIC_EFEAT, mask);
+		APIC_REG_MASK(APIC_ECTRL, mask);
+		APIC_REGS_MASK(APIC_EILVTn(0), APIC_EILVT_NR_MAX, mask);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
 
@@ -1664,6 +1670,14 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	 * x2APIC and needs to be manually handled by the caller.
 	 */
 	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
+	/*
+	 * The local interrupts are extended to include LVT registers to allow
+	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
+	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
+	 */
+	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
+		last_reg = APIC_EXT_LAST_REG_OFFSET;
+	};
 
 	if (alignment + len > 4)
 		return 1;
@@ -2408,6 +2422,14 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		else
 			kvm_apic_send_ipi(apic, APIC_DEST_SELF | val, 0);
 		break;
+
+	case APIC_ECTRL:
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
+		kvm_lapic_set_reg(apic, reg, val);
+		break;
 	default:
 		ret = 1;
 		break;
@@ -2769,6 +2791,24 @@ void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
 	kvm_vcpu_srcu_read_lock(vcpu);
 }
 
+/*
+ * Initialize extended APIC registers to the default value when guest is
+ * started. The extended APIC registers should only be initialized when the
+ * EXTAPIC feature is enabled on the guest.
+ */
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	int i;
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_EXTAPIC)) {
+		kvm_lapic_set_reg(apic, APIC_EFEAT, APIC_EFEAT_DEFAULT);
+		kvm_lapic_set_reg(apic, APIC_ECTRL, APIC_ECTRL_DEFAULT);
+		for (i = 0; i < APIC_EILVT_NR_MAX; i++)
+			kvm_lapic_set_reg(apic, APIC_EILVTn(i), APIC_EILVT_MASKED);
+	}
+}
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -2830,6 +2870,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
 		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
+	kvm_apic_init_eilvt_regs(vcpu);
+
 	kvm_apic_update_apicv(vcpu);
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index b411de5f33a3..66084ca38b37 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -98,6 +98,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a34c5c3b164e..1b46de10e328 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -669,6 +669,10 @@ static bool is_avic_unaccelerated_access_trap(u32 offset)
 	case APIC_LVTERR:
 	case APIC_TMICT:
 	case APIC_TDCR:
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
 		ret = true;
 		break;
 	default:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2797c3ab7854..0471d72a7382 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -771,6 +771,12 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		X2APIC_MSR(APIC_TMICT),
 		X2APIC_MSR(APIC_TMCCT),
 		X2APIC_MSR(APIC_TDCR),
+		X2APIC_MSR(APIC_EFEAT),
+		X2APIC_MSR(APIC_ECTRL),
+		X2APIC_MSR(APIC_EILVTn(0)),
+		X2APIC_MSR(APIC_EILVTn(1)),
+		X2APIC_MSR(APIC_EILVTn(2)),
+		X2APIC_MSR(APIC_EILVTn(3)),
 	};
 	int i;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 06/12] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (4 preceding siblings ...)
  2025-09-01  5:22 ` [PATCH v2 05/12] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
@ 2025-09-01  5:23 ` Manali Shukla
  2025-09-01  5:23 ` [PATCH v2 07/12] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:23 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

The virtualized IBS (VIBS) feature allows the guest to collect IBS
samples without exiting the guest.

Presence of the VIBS feature is indicated via CPUID function
0x8000000A_EDX[26].

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 0dd44cbf7196..3c31dea00671 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -379,6 +379,7 @@
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* "vnmi" Virtual NMI */
 #define X86_FEATURE_EXTLVT		(15*32+27) /* Extended Local vector Table */
+#define X86_FEATURE_VIBS		(15*32+26) /* Virtual IBS */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* SVME addr check */
 #define X86_FEATURE_BUS_LOCK_THRESHOLD	(15*32+29) /* Bus lock threshold */
 #define X86_FEATURE_IDLE_HLT		(15*32+30) /* IDLE HLT intercept */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 07/12] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (5 preceding siblings ...)
  2025-09-01  5:23 ` [PATCH v2 06/12] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
@ 2025-09-01  5:23 ` Manali Shukla
  2025-09-01  5:24 ` [PATCH v2 08/12] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:23 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Add a KVM-only leaf for AMD's Instruction Based Sampling capabilities.
There are 12 capabilities which are added to KVM-only leaf, so that KVM
can set these capabilities for the guest, when IBS feature bit is
enabled on the guest.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/reverse_cpuid.h    | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5512e33db14a..c615ee5b1e9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,7 @@ enum kvm_only_cpuid_leafs {
 	CPUID_12_EAX	 = NCAPINTS,
 	CPUID_7_1_EDX,
 	CPUID_8000_0007_EDX,
+	CPUID_8000_001B_EAX,
 	CPUID_8000_0022_EAX,
 	CPUID_7_2_EDX,
 	CPUID_24_0_EBX,
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index c53b92379e6e..32b22c6508f1 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -56,6 +56,21 @@
 #define KVM_X86_FEATURE_TSA_SQ_NO	KVM_X86_FEATURE(CPUID_8000_0021_ECX, 1)
 #define KVM_X86_FEATURE_TSA_L1_NO	KVM_X86_FEATURE(CPUID_8000_0021_ECX, 2)
 
+/* AMD defined Instruction-base Sampling capabilities. CPUID level 0x8000001B (EAX). */
+#define X86_FEATURE_IBS_AVAIL		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 0)
+#define X86_FEATURE_IBS_FETCHSAM	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 1)
+#define X86_FEATURE_IBS_OPSAM		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 2)
+#define X86_FEATURE_IBS_RDWROPCNT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 3)
+#define X86_FEATURE_IBS_OPCNT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 4)
+#define X86_FEATURE_IBS_BRNTRGT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 5)
+#define X86_FEATURE_IBS_OPCNTEXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 6)
+#define X86_FEATURE_IBS_RIPINVALIDCHK	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 7)
+#define X86_FEATURE_IBS_OPBRNFUSE	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 8)
+#define X86_FEATURE_IBS_FETCHCTLEXTD	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 9)
+#define X86_FEATURE_IBS_ZEN4_EXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 11)
+#define X86_FEATURE_IBS_LOADLATFIL	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 12)
+#define X86_FEATURE_IBS_DTLBSTAT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 19)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -86,6 +101,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0022_EAX] = {0x80000022, 0, CPUID_EAX},
 	[CPUID_7_2_EDX]       = {         7, 2, CPUID_EDX},
 	[CPUID_24_0_EBX]      = {      0x24, 0, CPUID_EBX},
+	[CPUID_8000_001B_EAX] = {0x8000001b, 0, CPUID_EAX},
 	[CPUID_8000_0021_ECX] = {0x80000021, 0, CPUID_ECX},
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 08/12] KVM: x86: Extend CPUID range to include new leaf
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (6 preceding siblings ...)
  2025-09-01  5:23 ` [PATCH v2 07/12] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
@ 2025-09-01  5:24 ` Manali Shukla
  2025-09-01  5:24 ` [PATCH v2 09/12] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:24 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

CPUID leaf 0x8000001b (EAX) provides information about Instruction-Based
sampling capabilities on AMD Platforms.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/cpuid.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd97000ddd13..55ce7d86b0f0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1746,6 +1746,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = 0;
 		entry->edx = 0; /* reserved */
 		break;
+	/* AMD IBS capability */
+	case 0x8000001B:
+		if (!kvm_cpu_cap_has(X86_FEATURE_IBS))
+			entry->eax = 0;
+
+		entry->ebx = entry->ecx = entry->edx = 0;
+		break;
 	case 0x8000001F:
 		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 09/12] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (7 preceding siblings ...)
  2025-09-01  5:24 ` [PATCH v2 08/12] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
@ 2025-09-01  5:24 ` Manali Shukla
  2025-09-01  5:25 ` [PATCH v2 10/12] KVM: SVM: Add support for IBS Virtualization Manali Shukla
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:24 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

Define the new VMCB fields that will be used to save and restore the
satate of the following fetch and op IBS related MSRs.

  * MSRC001_1030 [IBS Fetch Control]
  * MSRC001_1031 [IBS Fetch Linear Address]
  * MSRC001_1033 [IBS Execution Control]
  * MSRC001_1034 [IBS Op Logical Address]
  * MSRC001_1035 [IBS Op Data]
  * MSRC001_1036 [IBS Op Data 2]
  * MSRC001_1037 [IBS Op Data 3]
  * MSRC001_1038 [IBS DC Linear Address]
  * MSRC001_103B [IBS Branch Target Address]
  * MSRC001_103C [IBS Fetch Control Extended]

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc27f676243..269a8327ab2a 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -359,6 +359,17 @@ struct vmcb_save_area {
 	u64 last_excp_to;
 	u8 reserved_0x298[72];
 	u64 spec_ctrl;		/* Guest version of SPEC_CTRL at 0x2E0 */
+	u8 reserved_0x2e8[1168];
+	u64 ibs_fetch_ctl;
+	u64 ibs_fetch_linear_addr;
+	u64 ibs_op_ctl;
+	u64 ibs_op_rip;
+	u64 ibs_op_data;
+	u64 ibs_op_data2;
+	u64 ibs_op_data3;
+	u64 ibs_dc_linear_addr;
+	u64 ibs_br_target;
+	u64 ibs_fetch_extd_ctl;
 } __packed;
 
 /* Save area definition for SEV-ES and SEV-SNP guests */
@@ -541,7 +552,7 @@ struct vmcb {
 	};
 } __packed;
 
-#define EXPECTED_VMCB_SAVE_AREA_SIZE		744
+#define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
 #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
 #define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
 #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
@@ -567,6 +578,7 @@ static inline void __unused_size_checks(void)
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
+	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x2e8);
 
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 10/12] KVM: SVM: Add support for IBS Virtualization
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (8 preceding siblings ...)
  2025-09-01  5:24 ` [PATCH v2 09/12] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
@ 2025-09-01  5:25 ` Manali Shukla
  2025-09-01  5:26 ` [PATCH v2 11/12] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
  2025-09-01  5:26 ` [PATCH v2 12/12] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

IBS virtualization (VIBS) allows a guest to collect Instruction-Based
Sampling (IBS) data using hardware-assisted virtualization. With VIBS
enabled, the hardware automatically saves and restores guest IBS state
during VM-Entry and VM-Exit via the VMCB State Save Area.

IBS-generated interrupts are delivered directly to the guest without
causing a VMEXIT.

VIBS depends on mediated PMU mode and requires either AVIC or NMI
virtualization for interrupt delivery. However, since AVIC can be
dynamically inhibited, VIBS requires VNMI to be enabled to ensure
reliable interrupt delivery. If AVIC is inhibited and VNMI is
disabled, the guest can encounter a VMEXIT_INVALID when IBS
virtualization is enabled for the guest.

Because IBS state is classified as swap type C, the hypervisor must
save its own IBS state before VMRUN and restore it after VMEXIT. It
must also disable IBS before VMRUN and re-enable it afterward. This
will be handled using mediated PMU support in subsequent patches by
enabling mediated PMU capability for IBS PMUs.

More details about IBS virtualization can be found at [1].

[1]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h |  2 +
 arch/x86/kvm/svm/svm.c     | 94 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 269a8327ab2a..9416a20bf4d3 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -222,6 +222,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
+#define VIRTUAL_IBS_ENABLE_MASK BIT_ULL(2)
+
 #define SVM_INTERRUPT_SHADOW_MASK	BIT_ULL(0)
 #define SVM_GUEST_INTERRUPT_MASK	BIT_ULL(1)
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0471d72a7382..0be24cf03675 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -155,6 +155,10 @@ module_param(vgif, int, 0444);
 int lbrv = true;
 module_param(lbrv, int, 0444);
 
+/* enable/disable IBS virtualization */
+static int vibs = true;
+module_param(vibs, int, 0444);
+
 static int tsc_scaling = true;
 module_param(tsc_scaling, int, 0444);
 
@@ -977,6 +981,20 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
 	}
 }
 
+static void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
+{
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSFETCHCTL, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSFETCHLINAD, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPCTL, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPRIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA2, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA3, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSDCLINAD, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSBRTARGET, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_ICIBSEXTDCTL, MSR_TYPE_RW, intercept);
+}
+
 static void grow_ple_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1118,6 +1136,20 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
+
+		/*
+		 * If hardware supports VIBS then no need to intercept IBS MSRs
+		 * when VIBS is enabled in guest.
+		 *
+		 * Enable VIBS by setting bit 2 at offset 0xb8 in VMCB.
+		 */
+		if (vibs) {
+			if (guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IBS) &&
+			    kvm_vcpu_has_mediated_pmu(vcpu)) {
+				svm_ibs_msr_interception(svm, false);
+				svm->vmcb->control.virt_ext |= VIRTUAL_IBS_ENABLE_MASK;
+			}
+		}
 	}
 
 	if (kvm_need_rdpmc_intercept(vcpu))
@@ -2894,6 +2926,27 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_AMD64_DE_CFG:
 		msr_info->data = svm->msr_decfg;
 		break;
+
+	case MSR_AMD64_IBSCTL:
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBS))
+			msr_info->data = IBSCTL_LVT_OFFSET_VALID;
+		else
+			msr_info->data = 0;
+		break;
+
+
+	/*
+	 * When IBS virtualization is enabled, guest reads from
+	 * MSR_AMD64_IBSFETCHPHYSAD and MSR_AMD64_IBSDCPHYSAD must return 0.
+	 * This is done for security reasons, as guests should not be allowed to
+	 * access or infer any information about the system's physical
+	 * addresses.
+	 */
+	case MSR_AMD64_IBSDCPHYSAD:
+	case MSR_AMD64_IBSFETCHPHYSAD:
+		msr_info->data = 0;
+		break;
+
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -3138,6 +3191,16 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->msr_decfg = data;
 		break;
 	}
+	/*
+	 * When IBS virtualization is enabled, guest writes to
+	 * MSR_AMD64_IBSFETCHPHYSAD and MSR_AMD64_IBSDCPHYSAD must be ignored.
+	 * This is done for security reasons, as guests should not be allowed to
+	 * access or infer any information about the system's physical
+	 * addresses.
+	 */
+	case MSR_AMD64_IBSDCPHYSAD:
+	case MSR_AMD64_IBSFETCHPHYSAD:
+		return 1;
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
@@ -5284,6 +5347,28 @@ static __init void svm_adjust_mmio_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
 }
 
+static void svm_ibs_set_cpu_caps(void)
+{
+	kvm_cpu_cap_check_and_set(X86_FEATURE_IBS);
+	kvm_cpu_cap_check_and_set(X86_FEATURE_EXTLVT);
+	kvm_cpu_cap_check_and_set(X86_FEATURE_EXTAPIC);
+	if (kvm_cpu_cap_has(X86_FEATURE_IBS)) {
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_AVAIL);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_FETCHSAM);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPSAM);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_RDWROPCNT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPCNT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_BRNTRGT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPCNTEXT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_RIPINVALIDCHK);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPBRNFUSE);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_FETCHCTLEXTD);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_ZEN4_EXT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_LOADLATFIL);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_DTLBSTAT);
+	}
+}
+
 static __init void svm_set_cpu_caps(void)
 {
 	kvm_set_cpu_caps();
@@ -5336,6 +5421,9 @@ static __init void svm_set_cpu_caps(void)
 	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD))
 		kvm_caps.has_bus_lock_exit = true;
 
+	if (vibs)
+		svm_ibs_set_cpu_caps();
+
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
@@ -5509,6 +5597,12 @@ static __init int svm_hardware_setup(void)
 		svm_x86_ops.set_vnmi_pending = NULL;
 	}
 
+	vibs = enable_mediated_pmu && vnmi && vibs
+		&& boot_cpu_has(X86_FEATURE_VIBS);
+
+	if (vibs)
+		pr_info("IBS virtualization supported\n");
+
 	if (!enable_pmu)
 		pr_info("PMU virtualization is disabled\n");
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 11/12] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (9 preceding siblings ...)
  2025-09-01  5:25 ` [PATCH v2 10/12] KVM: SVM: Add support for IBS Virtualization Manali Shukla
@ 2025-09-01  5:26 ` Manali Shukla
  2025-09-01  5:26 ` [PATCH v2 12/12] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:26 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

IBS MSRs are classified as Swap Type C, which requires the hypervisor
to save and restore its own IBS state before VMENTRY and after VMEXIT.

To support this, set the ibs_op and ibs_fetch PMUs with the
PERF_PMU_CAP_MEDIATED_VPMU capability. This ensures that these PMUs are
exclusively owned by the guest while it is running, allowing the
hypervisor to manage IBS state transitions correctly.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 67ed9673f1ac..6dc2d1cb8b09 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -793,6 +793,7 @@ static struct perf_ibs perf_ibs_fetch = {
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
 		.check_period	= perf_ibs_check_period,
+		.capabilities	= PERF_PMU_CAP_MEDIATED_VPMU,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
 	.config_mask		= IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
@@ -818,6 +819,7 @@ static struct perf_ibs perf_ibs_op = {
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
 		.check_period	= perf_ibs_check_period,
+		.capabilities	= PERF_PMU_CAP_MEDIATED_VPMU,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
 	.config_mask		= IBS_OP_MAX_CNT,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 12/12] perf/x86/amd: Remove exclude_guest check from perf_ibs_init()
  2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
                   ` (10 preceding siblings ...)
  2025-09-01  5:26 ` [PATCH v2 11/12] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
@ 2025-09-01  5:26 ` Manali Shukla
  11 siblings, 0 replies; 13+ messages in thread
From: Manali Shukla @ 2025-09-01  5:26 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Currently IBS driver doesn't allow the creation of IBS event with
exclue_guest set. As a result, amd_ibs_init() returns -EINVAL if
IBS event is created with exclude_guest set.

With the introduction of mediated PMU support, software-based handling
of exclude_guest is permitted for PMUs that have the
PERF_PMU_CAP_MEDIATED_VPMU capability.

Since ibs_op and ibs_fetch pmus has PERF_PMU_CAP_MEDIATED_VPMU
capability set, update perf_ibs_init() to remove exclude_guest check.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 6dc2d1cb8b09..fad2200ddc72 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -300,8 +300,7 @@ static int perf_ibs_init(struct perf_event *event)
 		return -EOPNOTSUPP;
 
 	/* handle exclude_{user,kernel} in the IRQ handler */
-	if (event->attr.exclude_host || event->attr.exclude_guest ||
-	    event->attr.exclude_idle)
+	if (event->attr.exclude_host || event->attr.exclude_idle)
 		return -EINVAL;
 
 	if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-01  5:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01  5:16 [PATCH v2 00/12] Implement support for IBS virtualization Manali Shukla
2025-09-01  5:19 ` [PATCH v2 01/12] perf/amd/ibs: Fix race condition in IBS Manali Shukla
2025-09-01  5:21 ` [PATCH v2 02/12] KVM: x86: Refactor APIC register mask handling to support extended APIC registers Manali Shukla
2025-09-01  5:21 ` [PATCH v2 03/12] KVM: Add KVM_GET_EXT_LAPIC and KVM_SET_EXT_LAPIC for extapic Manali Shukla
2025-09-01  5:22 ` [PATCH v2 04/12] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
2025-09-01  5:22 ` [PATCH v2 05/12] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
2025-09-01  5:23 ` [PATCH v2 06/12] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
2025-09-01  5:23 ` [PATCH v2 07/12] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
2025-09-01  5:24 ` [PATCH v2 08/12] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
2025-09-01  5:24 ` [PATCH v2 09/12] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
2025-09-01  5:25 ` [PATCH v2 10/12] KVM: SVM: Add support for IBS Virtualization Manali Shukla
2025-09-01  5:26 ` [PATCH v2 11/12] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
2025-09-01  5:26 ` [PATCH v2 12/12] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).