linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/11] Implement support for IBS virtualization
@ 2025-06-27 16:25 Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 01/11] perf/amd/ibs: Fix race condition in IBS Manali Shukla
                   ` (11 more replies)
  0 siblings, 12 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Add support for IBS virtualization (VIBS). VIBS feature allows the
guest to collect IBS samples without exiting the guest.  There are
2 parts to it [1].
 - Virtualizing the IBS register state.
 - Ensuring the IBS interrupt is handled in the guest without exiting
  the hypervisor.

To deliver virtualized IBS interrupts to the guest, VIBS requires either
AVIC or Virtual NMI (VNMI) support [1]. During IBS sampling, the
hardware signals a VNMI. The source of this VNMI depends on the AVIC
configuration:

 - With AVIC disabled, the virtual NMI is hardware-accelerated.
 - With AVIC enabled, the virtual NMI is delivered via AVIC using Extended LVT.

The local interrupts are extended to include more LVT registers, to
allow additional interrupt sources, like instruction based sampling
etc. [3].

Although IBS virtualization requires either AVIC or VNMI to be enabled
in order to successfully deliver IBS NMIs to the guest, VNMI must be
enabled to ensure reliable delivery. This requirement stems from the
dynamic behavior of AVIC. While a guest is launched with AVIC enabled,
AVIC can be inhibited at runtime. When AVIC is inhibited and VNMI is
disabled, there is no mechanism to deliver IBS NMIs to the guest.
Therefore, enabling VNMI is necessary to support IBS virtualization
reliably.

Note that, since IBS registers are swap type C [2], the hypervisor is
responsible for saving and restoring of IBS host state. Hypervisor needs
to disable host IBS before saving the state and enter the guest. After a
guest exit, the hypervisor needs to restore host IBS state and re-enable
IBS.

The mediated PMU has the capability to save the host context when
entering the guest by scheduling out all exclude_guest events, and to
restore the host context when exiting the guest by scheduling in the
previously scheduled-out events. This behavior aligns with the
requirement for IBS registers being of swap type C. Therefore, the
mediated PMU design can be leveraged to implement IBS virtualization.
As a result, enabling the mediated PMU is a necessary requirement for
IBS virtualization.

The initial version of this series has been posted here:
https://lore.kernel.org/kvm/f98687e0-1fee-8208-261f-d93152871f00@amd.com/

Since then, the mediated PMU patches [5] have matured significantly.
This series is a resurrection of previous VIBS series and leverages the
mediated PMU infrastructure to enable IBS virtualization.

How to enable VIBS?
----------------------------------------------
sudo echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
sudo modprobe -r kvm_amd
sudo modprobe kvm_amd enable_mediated_pmu=1 vnmi=1

Qemu changes can be found at below location:
----------------------------------------------
https://github.com/AMDESE/qemu/tree/vibs_v1

Qemu commandline to enable IBS virtualization:
------------------------------------------------
qemu-system-x86_64 -enable-kvm -cpu EPYC-Genoa,+ibs,+extlvt,+extapic,+svm,+pmu \ ..

Testing done:
------------------------------------------------
- Following tests were executed on guest
  sudo perf record -e ibs_op// -c 100000 -a
  sudo perf record -e ibs_op// -c 100000 -C 10
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
  sudo perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
  sudo ./tools/perf/perf record -e ibs_op// -e ibs_fetch// -a --raw-samples -c 100000
  sudo perf report
  sudo perf script
  sudo perf report -D | grep -P "LdOp 1.*StOp 0" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | wc -l
  sudo perf report -D | grep -B1 -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | grep -P "DataSrc ([02-9]|1[0-2])=" | wc -l
- perf_fuzzer was run for 3hrs, no softlockups or unknown NMIs were
  seen.

TO-DO: 
-----------------------------------
Enable IBS virtualization on SEV-ES and SEV-SNP guests.

base-commit (61374cc145f4) + [4] (Clean up KVM's MSR interception code)
+ [5] (Mediated vPMU 4.0 for x86). 

[1]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

[3]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 16.4.5
     Extended Interrupts.

[4]: https://lore.kernel.org/kvm/20250610225737.156318-1-seanjc@google.com/

[5]: https://lore.kernel.org/kvm/20250324173121.1275209-1-mizhang@google.com/

Manali Shukla (6):
  perf/amd/ibs: Fix race condition in IBS
  KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for
    extapic
  KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  KVM: x86: Extend CPUID range to include new leaf
  perf/x86/amd: Enable VPMU passthrough capability for IBS PMU
  perf/x86/amd: Remove exclude_guest check from perf_ibs_init()

Santosh Shukla (5):
  x86/cpufeatures: Add CPUID feature bit for Extended LVT
  KVM: x86: Add emulation support for Extented LVT registers
  x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests
  KVM: SVM: Extend VMCB area for virtualized IBS registers
  KVM: SVM: Add support for IBS Virtualization

 Documentation/virt/kvm/api.rst     | 23 +++++++
 arch/x86/events/amd/ibs.c          |  8 ++-
 arch/x86/include/asm/apicdef.h     | 17 ++++++
 arch/x86/include/asm/cpufeatures.h |  2 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/include/asm/svm.h         | 16 ++++-
 arch/x86/include/uapi/asm/kvm.h    |  5 ++
 arch/x86/kvm/cpuid.c               | 13 ++++
 arch/x86/kvm/lapic.c               | 81 ++++++++++++++++++++++---
 arch/x86/kvm/lapic.h               |  7 ++-
 arch/x86/kvm/reverse_cpuid.h       | 16 +++++
 arch/x86/kvm/svm/avic.c            |  4 ++
 arch/x86/kvm/svm/svm.c             | 96 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                 | 37 ++++++++----
 include/uapi/linux/kvm.h           | 10 ++++
 15 files changed, 313 insertions(+), 23 deletions(-)


base-commit: 61374cc145f4a56377eaf87c7409a97ec7a34041
-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1 01/11] perf/amd/ibs: Fix race condition in IBS
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Consider the following scenario,

While scheduling out an IBS event from perf's core scheduling path,
event_sched_out() disables the IBS event by clearing the IBS enable
bit in perf_ibs_disable_event(). However, if a delayed IBS NMI is
delivered after the IBS enable bit is cleared, the IBS NMI handler
may still observe the valid bit set and incorrectly treat the sample
as valid. As a result, it re-enables IBS by setting the enable bit,
even though the event has already been scheduled out.

This leads to a situation where IBS is re-enabled after being
explicitly disabled, which is incorrect. Although this race does not
have visible side effects, it violates the expected behavior of the
perf subsystem.

The race is particularly noticeable when userspace repeatedly disables
and re-enables IBS using PERF_EVENT_IOC_DISABLE and
PERF_EVENT_IOC_ENABLE ioctls in a loop.

Fix this by checking the IBS_STOPPED bit in the IBS NMI handler before
re-enabling the IBS event. If the IBS_STOPPED bit is set, it indicates
that the event is either disabled or in the process of being disabled,
and the NMI handler should not re-enable it.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 0252b7ea8bca..c998f68eeddc 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1386,7 +1386,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		}
 		new_config |= period >> 4;
 
-		perf_ibs_enable_event(perf_ibs, hwc, new_config);
+		if (!test_bit(IBS_STOPPING, pcpu->state))
+			perf_ibs_enable_event(perf_ibs, hwc, new_config);
 	}
 
 	perf_event_update_userpage(event);

base-commit: 61374cc145f4a56377eaf87c7409a97ec7a34041
prerequisite-patch-id: 0094bc7b958de1caba7b779c1c2dc96b3e1bcac1
prerequisite-patch-id: 58a6a462207dd7ec06998d6ff6a418f373f25d43
prerequisite-patch-id: 42d94622dfc4ccc786ce9bd6be186dbc5d32ed5b
prerequisite-patch-id: d93658ba28ece72eb6b9e2bb2a2a6188c4654216
prerequisite-patch-id: 669a4b410b39f4f34b9bcc4748a277ad2ac3b24c
prerequisite-patch-id: 89161a3395ead9159840634b594b973eee4728e0
prerequisite-patch-id: 58860afef836e3429817055807502201dd914602
prerequisite-patch-id: 9de4d874201da2dcec388e8fe4b3750ba3afc563
prerequisite-patch-id: b1d44b6a3ed8124ce9bd71474e367c8143baad41
prerequisite-patch-id: b7d50d21fe0f1c6d3b63d31d8cf573ad4bd06d33
prerequisite-patch-id: 526f07d6a60996b4839a170bacead2eeacf953bf
prerequisite-patch-id: 602259a8d1ac84dd95ad56463ebabc55e612400b
prerequisite-patch-id: ae0e84487c58d976362fbef7eaec565b14162d3e
prerequisite-patch-id: 78ba19a866d65e36352ec8f5bbd039bc2108e54c
prerequisite-patch-id: 973787b7d310a4cbe45836921402c6708bf3f67a
prerequisite-patch-id: cb2f413bf916cb895c26a27bd6415396c56b3e63
prerequisite-patch-id: 7546a92ef58aa9e40b1389650f5c7ffc28de40e5
prerequisite-patch-id: 0750ebfe9b7d25e9b6bdf838a179190c958aff97
prerequisite-patch-id: bd71d326c645eff74bf3f203ebde1739ad2eaa64
prerequisite-patch-id: 1937d5112d9a975009d3b75cebd16fabd7e595e0
prerequisite-patch-id: c3e3bac41574713b413d6ef13e373953080d26c9
prerequisite-patch-id: ebf87b381105b90d89670f3f0e123de8bc4e2086
prerequisite-patch-id: d450df9a0e717374c4a73355ef438e8b012e2ab7
prerequisite-patch-id: 91d5d4adfd44253424016b3132de587328f4d1f6
prerequisite-patch-id: 5743f22d48a3e410ab28ce6d81d6213e9854128d
prerequisite-patch-id: 0f8c5fff2d0ae8eb84446439bbb1792e078acac8
prerequisite-patch-id: 9ead85c0f9cd7a0a5448e2aff7e2d94fab9fa106
prerequisite-patch-id: 4ea3a56935fd4a23c2c1101738002bb3c89c8723
prerequisite-patch-id: 303869f48baa0d36f8a894bea87ba9283314efa1
prerequisite-patch-id: d774d5ff6a124c32c86a6db6bd5d97285591294c
prerequisite-patch-id: 9708b3ac43c53623fe553e88cf42d06940af1d43
prerequisite-patch-id: 5d890648afbf86e18eff47520a10a8f6eefb5f6c
prerequisite-patch-id: ddba6b9f04901285c77f3af1ea5ab50fe063b015
prerequisite-patch-id: 2fafac2db57921b28591ff0bf1e38f911870ef05
prerequisite-patch-id: 16cd8c4d184fd1e4217835aa43c38d2986bb30f3
prerequisite-patch-id: 1536075a6bd45c6c2484e9045e6f0173dfb4fbc2
prerequisite-patch-id: 3dcc0186dd0c08353bb3ba50c384085fc6ded721
prerequisite-patch-id: 6c2e2cd3416fba38621f75286283b452765ee3bc
prerequisite-patch-id: 4cdd5b8e215224a7dd8224914cf1dfacc4f52a96
prerequisite-patch-id: bce24d584bfaf81b23ba88aea97187387791e8fc
prerequisite-patch-id: 3981227aa4d106f583a9ac07ca08e416b60ea52e
prerequisite-patch-id: 9b56c12722196db1ed0ab3a5aaf3d5c5b26e8814
prerequisite-patch-id: 9f3c7f29d4142a13b919356c458059aac4732082
prerequisite-patch-id: 218445b88281283066b23a1844f51098fe670f49
prerequisite-patch-id: 5d92b7d25437e3d3e5e3a5a08e779bb23b10d5ad
prerequisite-patch-id: 1d1a5aee655b9adc11daa0b24e5a6a44dd2b55eb
prerequisite-patch-id: 94adf0a619adcc857014fd1b5b52d2bc6a920aca
prerequisite-patch-id: 5317018cfb51d5aa27bab2ca259fa6a36aad6303
prerequisite-patch-id: 083070d1b008b9396f64ff3ea1998e624db058ca
prerequisite-patch-id: d69b8afdc062ad10cc8dc2aad1759dbe70fc666c
prerequisite-patch-id: 1f1045ecce2d127cbfc0a382adba9c4bb711dd30
prerequisite-patch-id: a7a8a308c1b0eca850bc140019066a52a9aaf64a
prerequisite-patch-id: 1891c0fec1d1f2ba2dd26e79a8207b0e13a0f8bf
prerequisite-patch-id: f4027ba53a2e69f12fd22697dbf1f97951323d6a
prerequisite-patch-id: 8ca87584eaaa9fda7ffe7bfac4684af9e82988f5
prerequisite-patch-id: 0000000000000000000000000000000000000000
prerequisite-patch-id: 4537a12ab34d9c9a40d5602b51e8ddc968d6ff83
prerequisite-patch-id: bc8ece4d02f8b541d5fcf731059c35861be5eefe
prerequisite-patch-id: 1759acbd0b0f8e8b974e22c0627ea81a0b1cb431
prerequisite-patch-id: 2876c944ff6ef5ad1e146cf04674cafd08023369
prerequisite-patch-id: d7245aeede10610be8545ae9344ae0a4ce5c4227
prerequisite-patch-id: efad6c0b30a629d976fce1ca63005b064029354c
prerequisite-patch-id: 68e90423c6d9ddecf4262234c0791b807885f7cd
prerequisite-patch-id: b8015065a77b10b3d27cc0277ccc4d8cbb476008
prerequisite-patch-id: 76180001a4e5e51fcb68e08403a82edca008e8af
prerequisite-patch-id: 647937e4716b773c813aa2de6cf689c518db8459
prerequisite-patch-id: 61e0acf355c7cbbe888f03ae3ebe5fa3df83176e
prerequisite-patch-id: 32eb0a230627e45739d15b1ffbad9e897144fcca
prerequisite-patch-id: a0f10d6af86558cb3752f32111ebee4e6ad0887b
prerequisite-patch-id: 219266326bb1c41c441d92a82d5e38c9ca8a066f
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 01/11] perf/amd/ibs: Fix race condition in IBS Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-07-15  2:21   ` Mi, Dapeng
  2025-06-27 16:25 ` [PATCH v1 03/11] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Modern AMD processors expose four additional extended LVT registers in
the extended APIC register space, which can be used for additional
interrupt sources such as instruction-based sampling and others.

To support this, introduce two new vCPU-based IOCTLs:
KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC. These IOCTLs works
similarly to KVM_GET_LAPIC and KVM_SET_LAPIC, but operate on APIC page
with extended APIC register space located at APIC offsets 400h-530h.

These IOCTLs are intended for use when extended APIC support is
enabled in the guest. They allow saving and restoring the full APIC
page, including the extended registers.

To support this, the `struct kvm_lapic_state_w_extapic` has been made
extensible rather than hardcoding its size, improving forward
compatibility.

Documentation for the new IOCTLs has also been added.

For more details on the extended APIC space, refer to AMD Programmer’s
Manual Volume 2, Section 16.4.5: Extended Interrupts.
https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 Documentation/virt/kvm/api.rst  | 23 ++++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h |  5 +++++
 arch/x86/kvm/lapic.c            | 12 ++++++-----
 arch/x86/kvm/lapic.h            |  6 ++++--
 arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++---------
 include/uapi/linux/kvm.h        | 10 +++++++++
 6 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1bd2d42e6424..0ca11d43f833 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2041,6 +2041,18 @@ error.
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_lapic_state_w_extapic {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+  };
+
+Applications should use KVM_GET_LAPIC_W_EXTAPIC ioctl if extended APIC is
+enabled. KVM_GET_LAPIC_W_EXTAPIC reads Local APIC registers with extended
+APIC register space located at offsets 400h-530h and copies them into input
+argument.
+
 If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
 enabled, then the format of APIC_ID register depends on the APIC mode
 (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
@@ -2072,6 +2084,17 @@ always uses xAPIC format.
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_lapic_state_w_extapic {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+  };
+
+Applications should use KVM_SET_LAPIC_W_EXTAPIC ioctl if extended APIC is enabled.
+KVM_SET_LAPIC_W_EXTAPIC copies input arguments with extended APIC register into
+Local APIC and extended APIC registers.
+
 The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
 regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
 See the note in KVM_GET_LAPIC.
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 6f3499507c5e..91c3c5b8cae3 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -124,6 +124,11 @@ struct kvm_lapic_state {
 	char regs[KVM_APIC_REG_SIZE];
 };
 
+#define KVM_APIC_EXT_REG_SIZE 0x540
+struct kvm_lapic_state_w_extapic {
+	__DECLARE_FLEX_ARRAY(__u8, regs);
+};
+
 struct kvm_segment {
 	__u64 base;
 	__u32 limit;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 73418dc0ebb2..00ca2b0faa45 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3046,7 +3046,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
 EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
 
 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s, bool set)
+		struct kvm_lapic_state_w_extapic *s, bool set)
 {
 	if (apic_x2apic_mode(vcpu->arch.apic)) {
 		u32 x2apic_id = kvm_x2apic_id(vcpu->arch.apic);
@@ -3097,9 +3097,10 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size)
 {
-	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
+	memcpy(s->regs, vcpu->arch.apic->regs, size);
 
 	/*
 	 * Get calculated timer current count for remaining timer period (if
@@ -3111,7 +3112,8 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	return kvm_apic_state_fixup(vcpu, s, false);
 }
 
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	int r;
@@ -3126,7 +3128,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 		kvm_recalculate_apic_map(vcpu->kvm);
 		return r;
 	}
-	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
+	memcpy(vcpu->arch.apic->regs, s->regs, size);
 
 	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
 	kvm_recalculate_apic_map(vcpu->kvm);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4518b4e0552f..7ad946b3738d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -120,9 +120,11 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
 
 int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu);
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size);
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c880a512005e..c273bbbbbcc6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5156,25 +5156,25 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 }
 
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
 {
 	if (vcpu->arch.apic->guest_apic_protected)
 		return -EINVAL;
 
 	kvm_x86_call(sync_pir_to_irr)(vcpu);
 
-	return kvm_apic_get_state(vcpu, s);
+	return kvm_apic_get_state(vcpu, s, size);
 }
 
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
 {
 	int r;
 
 	if (vcpu->arch.apic->guest_apic_protected)
 		return -EINVAL;
 
-	r = kvm_apic_set_state(vcpu, s);
+	r = kvm_apic_set_state(vcpu, s, size);
 	if (r)
 		return r;
 	update_cr8_intercept(vcpu);
@@ -5903,10 +5903,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 {
 	struct kvm_vcpu *vcpu = filp->private_data;
 	void __user *argp = (void __user *)arg;
+	unsigned long size;
 	int r;
 	union {
 		struct kvm_sregs2 *sregs2;
-		struct kvm_lapic_state *lapic;
+		struct kvm_lapic_state_w_extapic *lapic;
 		struct kvm_xsave *xsave;
 		struct kvm_xcrs *xcrs;
 		void *buffer;
@@ -5916,35 +5917,51 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 	u.buffer = NULL;
 	switch (ioctl) {
+	case KVM_GET_LAPIC_W_EXTAPIC:
 	case KVM_GET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
+
+		if (ioctl == KVM_GET_LAPIC_W_EXTAPIC)
+			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
+		else
+			size = sizeof(struct kvm_lapic_state);
+
+		u.lapic = kzalloc(size, GFP_KERNEL);
 
 		r = -ENOMEM;
 		if (!u.lapic)
 			goto out;
-		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic, size);
 		if (r)
 			goto out;
+
 		r = -EFAULT;
-		if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state)))
+		if (copy_to_user(argp, u.lapic, size))
 			goto out;
+
 		r = 0;
 		break;
 	}
+	case KVM_SET_LAPIC_W_EXTAPIC:
 	case KVM_SET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = memdup_user(argp, sizeof(*u.lapic));
+
+		if (ioctl == KVM_SET_LAPIC_W_EXTAPIC)
+			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
+		else
+			size = sizeof(struct kvm_lapic_state);
+		u.lapic = memdup_user(argp, size);
+
 		if (IS_ERR(u.lapic)) {
 			r = PTR_ERR(u.lapic);
 			goto out_nofree;
 		}
 
-		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic, size);
 		break;
 	}
 	case KVM_INTERRUPT: {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d00b85cb168c..cf23c1b52c49 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1290,6 +1290,16 @@ struct kvm_vfio_spapr_tce {
 #define KVM_SET_FPU               _IOW(KVMIO,  0x8d, struct kvm_fpu)
 #define KVM_GET_LAPIC             _IOR(KVMIO,  0x8e, struct kvm_lapic_state)
 #define KVM_SET_LAPIC             _IOW(KVMIO,  0x8f, struct kvm_lapic_state)
+/*
+ * Added to save/restore local APIC registers with extended APIC (extapic)
+ * register space.
+ *
+ * Qemu emulates extapic logic only when KVM enables extapic functionality via
+ * KVM capability. In the condition where Qemu sets extapic registers, but KVM doesn't
+ * set extapic capability, Qemu ends up using KVM_GET_LAPIC and KVM_SET_LAPIC.
+ */
+#define KVM_GET_LAPIC_W_EXTAPIC   _IOR(KVMIO,  0x8e, struct kvm_lapic_state_w_extapic)
+#define KVM_SET_LAPIC_W_EXTAPIC   _IOW(KVMIO,  0x8f, struct kvm_lapic_state_w_extapic)
 #define KVM_SET_CPUID2            _IOW(KVMIO,  0x90, struct kvm_cpuid2)
 #define KVM_GET_CPUID2            _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
 /* Available with KVM_CAP_VAPIC */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 03/11] x86/cpufeatures: Add CPUID feature bit for Extended LVT
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 01/11] perf/amd/ibs: Fix race condition in IBS Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

Local interrupts can be extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS).

The Extended APIC feature register indicates the number of extended
Local Vector Table(LVT) registers in the local APIC.  Currently, there
are 4 extended LVT registers available which are located at APIC
offsets (400h-530h).

The EXTLVT feature bit changes the behavior associated with reading
and writing an extended LVT register when AVIC is enabled. When the
EXTLVT and AVIC are enabled, a write to an extended LVT register
changes from a fault style #VMEXIT to a trap style #VMEXIT and a read
of an extended LVT register no longer triggers a #VMEXIT [2].

Presence of the EXTLVT feature is indicated via CPUID function
0x8000000A_EDX[27].

More details about the EXTLVT feature can be found at [1].

[1]: AMD Programmer's Manual Volume 2,
Section 16.4.5 Extended Interrupts.
https://bugzilla.kernel.org/attachment.cgi?id=306250

[2]: AMD Programmer's Manual Volume 2,
Table 15-22. Guest vAPIC Register Access Behavior.
https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 77a265e0672e..d2ad0dd1e8db 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -378,6 +378,7 @@
 #define X86_FEATURE_X2AVIC		(15*32+18) /* "x2avic" Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* "vnmi" Virtual NMI */
+#define X86_FEATURE_EXTLVT		(15*32+27) /* Extended Local vector Table */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* SVME addr check */
 #define X86_FEATURE_BUS_LOCK_THRESHOLD	(15*32+29) /* Bus lock threshold */
 #define X86_FEATURE_IDLE_HLT		(15*32+30) /* IDLE HLT intercept */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (2 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 03/11] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-07-15  2:58   ` Mi, Dapeng
  2025-06-27 16:25 ` [PATCH v1 05/11] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

The local interrupts are extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS) and many more.

Currently there are four additional LVT registers defined and they are
located at APIC offsets 400h-530h.

AMD IBS driver is designed to use EXTLVT (Extended interrupt local
vector table) by default for driver initialization.

Extended LVT registers are required to be emulated to initialize the
guest IBS driver successfully.

Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
on Extended LVT.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/apicdef.h | 17 +++++++++
 arch/x86/kvm/cpuid.c           |  6 +++
 arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h           |  1 +
 arch/x86/kvm/svm/avic.c        |  4 ++
 arch/x86/kvm/svm/svm.c         |  4 ++
 6 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 094106b6a538..4c0f580578aa 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -146,6 +146,23 @@
 #define		APIC_EILVT_MSG_EXT	0x7
 #define		APIC_EILVT_MASKED	(1 << 16)
 
+/*
+ * Initialize extended APIC registers to the default value when guest
+ * is started and EXTAPIC feature is enabled on the guest.
+ *
+ * APIC_EFEAT is a read only Extended APIC feature register, whose
+ * default value is 0x00040007. However, bits 0, 1, and 2 represent
+ * features that are not currently emulated by KVM. Therefore, these
+ * bits must be cleared during initialization. As a result, the
+ * default value used for APIC_EFEAT in KVM is 0x00040000.
+ *
+ * APIC_ECTRL is a read-write Extended APIC control register, whose
+ * default value is 0x0.
+ */
+
+#define		APIC_EFEAT_DEFAULT	0x00040000
+#define		APIC_ECTRL_DEFAULT	0x0
+
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
 #define APIC_BASE_MSR		0x800
 #define APIC_X2APIC_ID_MSR	0x802
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eb7be340138b..7270d22fbf31 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	/* Invoke the vendor callback only after the above state is updated. */
 	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
 
+	/*
+	 * Initialize extended LVT registers at guest startup to support delivery
+	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
+	 */
+	kvm_apic_init_eilvt_regs(vcpu);
+
 	/*
 	 * Except for the MMU, which needs to do its thing any vendor specific
 	 * adjustments to the reserved GPA bits.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 00ca2b0faa45..cffe44eb3f2b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 }
 
 #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
+#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
 #define APIC_REGS_MASK(first, count) \
 	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
 
+#define APIC_LAST_REG_OFFSET		0x3f0
+#define APIC_EXT_LAST_REG_OFFSET	0x530
+
 u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
 {
 	/* Leave bits '0' for reserved and write-only registers. */
@@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 			      void *data)
 {
+	u64 valid_reg_ext_mask = 0;
+	unsigned int last_reg = APIC_LAST_REG_OFFSET;
 	unsigned char alignment = offset & 0xf;
 	u32 result;
 
@@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	 */
 	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
 
+	/*
+	 * The local interrupts are extended to include LVT registers to allow
+	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
+	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
+	 */
+	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
+		valid_reg_ext_mask =
+			APIC_REG_EXT_MASK(APIC_EFEAT) |
+			APIC_REG_EXT_MASK(APIC_ECTRL) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(3));
+		last_reg = APIC_EXT_LAST_REG_OFFSET;
+	}
+
 	if (alignment + len > 4)
 		return 1;
 
-	if (offset > 0x3f0 ||
-	    !(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
+	if (offset > last_reg)
 		return 1;
 
+	switch (offset) {
+	/*
+	 * Section 16.3.2 in the AMD Programmer's Manual Volume 2 states:
+	 * "APIC registers are aligned to 16-byte offsets and must be accessed
+	 * using naturally-aligned DWORD size read and writes."
+	 */
+	case KVM_APIC_REG_SIZE ... KVM_APIC_EXT_REG_SIZE - 16:
+		if (!(valid_reg_ext_mask & APIC_REG_EXT_MASK(offset)))
+			return 1;
+		break;
+	default:
+		if (!(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
+			return 1;
+
+	}
+
 	result = __apic_read(apic, offset & ~0xf);
 
 	trace_kvm_apic_read(offset, result);
@@ -2419,6 +2456,14 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		else
 			kvm_apic_send_ipi(apic, APIC_DEST_SELF | val, 0);
 		break;
+
+	case APIC_ECTRL:
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
+		kvm_lapic_set_reg(apic, reg, val);
+		break;
 	default:
 		ret = 1;
 		break;
@@ -2757,6 +2802,24 @@ void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
 	kvm_vcpu_srcu_read_lock(vcpu);
 }
 
+/*
+ * Initialize extended APIC registers to the default value when guest is
+ * started. The extended APIC registers should only be initialized when the
+ * EXTAPIC feature is enabled on the guest.
+ */
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	int i;
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_EXTAPIC)) {
+		kvm_lapic_set_reg(apic, APIC_EFEAT, APIC_EFEAT_DEFAULT);
+		kvm_lapic_set_reg(apic, APIC_ECTRL, APIC_ECTRL_DEFAULT);
+		for (i = 0; i < APIC_EILVT_NR_MAX; i++)
+			kvm_lapic_set_reg(apic, APIC_EILVTn(i), APIC_EILVT_MASKED);
+	}
+}
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -2818,6 +2881,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
 		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
+	kvm_apic_init_eilvt_regs(vcpu);
+
 	kvm_apic_update_apicv(vcpu);
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7ad946b3738d..ff0f9eb3417b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -96,6 +96,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 7338879d1c0c..323927fb6f57 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -682,6 +682,10 @@ static bool is_avic_unaccelerated_access_trap(u32 offset)
 	case APIC_LVTERR:
 	case APIC_TMICT:
 	case APIC_TDCR:
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
 		ret = true;
 		break;
 	default:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fffc3320ea00..f9a7ff37ea10 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -791,6 +791,10 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		X2APIC_MSR(APIC_TMICT),
 		X2APIC_MSR(APIC_TMCCT),
 		X2APIC_MSR(APIC_TDCR),
+		X2APIC_MSR(APIC_EILVTn(0)),
+		X2APIC_MSR(APIC_EILVTn(1)),
+		X2APIC_MSR(APIC_EILVTn(2)),
+		X2APIC_MSR(APIC_EILVTn(3)),
 	};
 	int i;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 05/11] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (3 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 06/11] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

The virtualized IBS (VIBS) feature allows the guest to collect IBS
samples without exiting the guest.

Presence of the VIBS feature is indicated via CPUID function
0x8000000A_EDX[26].

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d2ad0dd1e8db..32032e2ff961 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -379,6 +379,7 @@
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* "vnmi" Virtual NMI */
 #define X86_FEATURE_EXTLVT		(15*32+27) /* Extended Local vector Table */
+#define X86_FEATURE_VIBS		(15*32+26) /* Virtual IBS */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* SVME addr check */
 #define X86_FEATURE_BUS_LOCK_THRESHOLD	(15*32+29) /* Bus lock threshold */
 #define X86_FEATURE_IDLE_HLT		(15*32+30) /* IDLE HLT intercept */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 06/11] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (4 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 05/11] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 07/11] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Add a KVM-only leaf for AMD's Instruction Based Sampling capabilities.
There are 12 capabilities which are added to KVM-only leaf, so that KVM
can set these capabilities for the guest, when IBS feature bit is
enabled on the guest.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/reverse_cpuid.h    | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 732bac9403b1..8e3d96b6166b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -770,6 +770,7 @@ enum kvm_only_cpuid_leafs {
 	CPUID_12_EAX	 = NCAPINTS,
 	CPUID_7_1_EDX,
 	CPUID_8000_0007_EDX,
+	CPUID_8000_001B_EAX,
 	CPUID_8000_0022_EAX,
 	CPUID_7_2_EDX,
 	CPUID_24_0_EBX,
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index fde0ae986003..7f685361faa9 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -52,6 +52,21 @@
 /* CPUID level 0x80000022 (EAX) */
 #define KVM_X86_FEATURE_PERFMON_V2	KVM_X86_FEATURE(CPUID_8000_0022_EAX, 0)
 
+/* AMD defined Instruction-base Sampling capabilities. CPUID level 0x8000001B (EAX). */
+#define X86_FEATURE_IBS_AVAIL		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 0)
+#define X86_FEATURE_IBS_FETCHSAM	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 1)
+#define X86_FEATURE_IBS_OPSAM		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 2)
+#define X86_FEATURE_IBS_RDWROPCNT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 3)
+#define X86_FEATURE_IBS_OPCNT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 4)
+#define X86_FEATURE_IBS_BRNTRGT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 5)
+#define X86_FEATURE_IBS_OPCNTEXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 6)
+#define X86_FEATURE_IBS_RIPINVALIDCHK	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 7)
+#define X86_FEATURE_IBS_OPBRNFUSE	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 8)
+#define X86_FEATURE_IBS_FETCHCTLEXTD	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 9)
+#define X86_FEATURE_IBS_ZEN4_EXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 11)
+#define X86_FEATURE_IBS_LOADLATFIL	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 12)
+#define X86_FEATURE_IBS_DTLBSTAT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 19)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -82,6 +97,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0022_EAX] = {0x80000022, 0, CPUID_EAX},
 	[CPUID_7_2_EDX]       = {         7, 2, CPUID_EDX},
 	[CPUID_24_0_EBX]      = {      0x24, 0, CPUID_EBX},
+	[CPUID_8000_001B_EAX] = {0x8000001b, 0, CPUID_EAX},
 };
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 07/11] KVM: x86: Extend CPUID range to include new leaf
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (5 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 06/11] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

CPUID leaf 0x8000001b (EAX) provides information about Instruction-Based
sampling capabilities on AMD Platforms.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/cpuid.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7270d22fbf31..d77184485e26 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1751,6 +1751,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = 0;
 		entry->edx = 0; /* reserved */
 		break;
+	/* AMD IBS capability */
+	case 0x8000001B:
+		if (!kvm_cpu_cap_has(X86_FEATURE_IBS))
+			entry->eax = 0;
+
+		entry->ebx = entry->ecx = entry->edx = 0;
+		break;
 	case 0x8000001F:
 		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (6 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 07/11] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-07-15  3:13   ` Mi, Dapeng
  2025-06-27 16:25 ` [PATCH v1 09/11] KVM: SVM: Add support for IBS Virtualization Manali Shukla
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

Define the new VMCB fields that will beused to save and restore the
satate of the following fetch and op IBS related MSRs.

  * MSRC001_1030 [IBS Fetch Control]
  * MSRC001_1031 [IBS Fetch Linear Address]
  * MSRC001_1033 [IBS Execution Control]
  * MSRC001_1034 [IBS Op Logical Address]
  * MSRC001_1035 [IBS Op Data]
  * MSRC001_1036 [IBS Op Data 2]
  * MSRC001_1037 [IBS Op Data 3]
  * MSRC001_1038 [IBS DC Linear Address]
  * MSRC001_103B [IBS Branch Target Address]
  * MSRC001_103C [IBS Fetch Control Extended]

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ad954a1a6656..b62049b51ebb 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -356,6 +356,17 @@ struct vmcb_save_area {
 	u64 last_excp_to;
 	u8 reserved_0x298[72];
 	u64 spec_ctrl;		/* Guest version of SPEC_CTRL at 0x2E0 */
+	u8 reserved_0x2e8[1168];
+	u64 ibs_fetch_ctl;
+	u64 ibs_fetch_linear_addr;
+	u64 ibs_op_ctl;
+	u64 ibs_op_rip;
+	u64 ibs_op_data;
+	u64 ibs_op_data2;
+	u64 ibs_op_data3;
+	u64 ibs_dc_linear_addr;
+	u64 ibs_br_target;
+	u64 ibs_fetch_extd_ctl;
 } __packed;
 
 /* Save area definition for SEV-ES and SEV-SNP guests */
@@ -538,7 +549,7 @@ struct vmcb {
 	};
 } __packed;
 
-#define EXPECTED_VMCB_SAVE_AREA_SIZE		744
+#define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
 #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
 #define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
 #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
@@ -564,6 +575,7 @@ static inline void __unused_size_checks(void)
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
+	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x2e8);
 
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 09/11] KVM: SVM: Add support for IBS Virtualization
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (7 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 10/11] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

From: Santosh Shukla <santosh.shukla@amd.com>

IBS virtualization (VIBS) allows a guest to collect Instruction-Based
Sampling (IBS) data using hardware-assisted virtualization. With VIBS
enabled, the hardware automatically saves and restores guest IBS state
during VM-Entry and VM-Exit via the VMCB State Save Area.

IBS-generated interrupts are delivered directly to the guest without
causing a VMEXIT.

VIBS depends on mediated PMU mode and requires either AVIC or NMI
virtualization for interrupt delivery. However, since AVIC can be
dynamically inhibited, VIBS requires VNMI to be enabled to ensure
reliable interrupt delivery. If AVIC is inhibited and VNMI is
disabled, the guest can encounter a VMEXIT_INVALID when IBS
virtualization is enabled for the guest.

Because IBS state is classified as swap type C, the hypervisor must
save its own IBS state before VMRUN and restore it after VMEXIT. It
must also disable IBS before VMRUN and re-enable it afterward. This
will be handled using mediated PMU support in subsequent patches by
enabling mediated PMU capability for IBS PMUs.

More details about IBS virtualization can be found at [1].

[1]: https://bugzilla.kernel.org/attachment.cgi?id=306250
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h |  2 +
 arch/x86/kvm/svm/svm.c     | 94 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index b62049b51ebb..1df51cf19ba9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -222,6 +222,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
+#define VIRTUAL_IBS_ENABLE_MASK BIT_ULL(2)
+
 #define SVM_INTERRUPT_SHADOW_MASK	BIT_ULL(0)
 #define SVM_GUEST_INTERRUPT_MASK	BIT_ULL(1)
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f9a7ff37ea10..9340d3d3d1fe 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -154,6 +154,10 @@ module_param(vgif, int, 0444);
 int lbrv = true;
 module_param(lbrv, int, 0444);
 
+/* enable/disable IBS virtualization */
+static int vibs = true;
+module_param(vibs, int, 0444);
+
 static int tsc_scaling = true;
 module_param(tsc_scaling, int, 0444);
 
@@ -954,6 +958,20 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
 	}
 }
 
+static void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
+{
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSFETCHCTL, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSFETCHLINAD, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPCTL, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPRIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA2, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSOPDATA3, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSDCLINAD, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_IBSBRTARGET, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(&svm->vcpu, MSR_AMD64_ICIBSEXTDCTL, MSR_TYPE_RW, intercept);
+}
+
 static void grow_ple_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1095,6 +1113,20 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
+
+		/*
+		 * If hardware supports VIBS then no need to intercept IBS MSRs
+		 * when VIBS is enabled in guest.
+		 *
+		 * Enable VIBS by setting bit 2 at offset 0xb8 in VMCB.
+		 */
+		if (vibs) {
+			if (guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IBS) &&
+			    kvm_mediated_pmu_enabled(vcpu)) {
+				svm_ibs_msr_interception(svm, false);
+				svm->vmcb->control.virt_ext |= VIRTUAL_IBS_ENABLE_MASK;
+			}
+		}
 	}
 }
 
@@ -2871,6 +2903,27 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_AMD64_DE_CFG:
 		msr_info->data = svm->msr_decfg;
 		break;
+
+	case MSR_AMD64_IBSCTL:
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBS))
+			msr_info->data = IBSCTL_LVT_OFFSET_VALID;
+		else
+			msr_info->data = 0;
+		break;
+
+
+	/*
+	 * When IBS virtualization is enabled, guest reads from
+	 * MSR_AMD64_IBSFETCHPHYSAD and MSR_AMD64_IBSDCPHYSAD must return 0.
+	 * This is done for security reasons, as guests should not be allowed to
+	 * access or infer any information about the system's physical
+	 * addresses.
+	 */
+	case MSR_AMD64_IBSDCPHYSAD:
+	case MSR_AMD64_IBSFETCHPHYSAD:
+		msr_info->data = 0;
+		break;
+
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -3115,6 +3168,16 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->msr_decfg = data;
 		break;
 	}
+	/*
+	 * When IBS virtualization is enabled, guest writes to
+	 * MSR_AMD64_IBSFETCHPHYSAD and MSR_AMD64_IBSDCPHYSAD must be ignored.
+	 * This is done for security reasons, as guests should not be allowed to
+	 * access or infer any information about the system's physical
+	 * addresses.
+	 */
+	case MSR_AMD64_IBSDCPHYSAD:
+	case MSR_AMD64_IBSFETCHPHYSAD:
+		return 1;
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
@@ -5248,6 +5311,28 @@ static __init void svm_adjust_mmio_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
 }
 
+static void svm_ibs_set_cpu_caps(void)
+{
+	kvm_cpu_cap_check_and_set(X86_FEATURE_IBS);
+	kvm_cpu_cap_check_and_set(X86_FEATURE_EXTLVT);
+	kvm_cpu_cap_check_and_set(X86_FEATURE_EXTAPIC);
+	if (kvm_cpu_cap_has(X86_FEATURE_IBS)) {
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_AVAIL);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_FETCHSAM);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPSAM);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_RDWROPCNT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPCNT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_BRNTRGT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPCNTEXT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_RIPINVALIDCHK);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_OPBRNFUSE);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_FETCHCTLEXTD);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_ZEN4_EXT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_LOADLATFIL);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IBS_DTLBSTAT);
+	}
+}
+
 static __init void svm_set_cpu_caps(void)
 {
 	kvm_set_cpu_caps();
@@ -5300,6 +5385,9 @@ static __init void svm_set_cpu_caps(void)
 	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD))
 		kvm_caps.has_bus_lock_exit = true;
 
+	if (vibs)
+		svm_ibs_set_cpu_caps();
+
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
@@ -5472,6 +5560,12 @@ static __init int svm_hardware_setup(void)
 		svm_x86_ops.set_vnmi_pending = NULL;
 	}
 
+	vibs = enable_mediated_pmu && vnmi && vibs
+		&& boot_cpu_has(X86_FEATURE_VIBS);
+
+	if (vibs)
+		pr_info("IBS virtualization supported\n");
+
 	if (!enable_pmu)
 		pr_info("PMU virtualization is disabled\n");
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 10/11] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (8 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 09/11] KVM: SVM: Add support for IBS Virtualization Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-06-27 16:25 ` [PATCH v1 11/11] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla
  2025-07-14 11:51 ` [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

IBS MSRs are classified as Swap Type C, which requires the hypervisor
to save and restore its own IBS state before VMENTRY and after VMEXIT.

To support this, set the ibs_op and ibs_fetch PMUs with the
PERF_PMU_CAP_MEDIATED_VPMU capability. This ensures that these PMUs are
exclusively owned by the guest while it is running, allowing the
hypervisor to manage IBS state transitions correctly.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c998f68eeddc..00c36ce16957 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -792,6 +792,7 @@ static struct perf_ibs perf_ibs_fetch = {
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
 		.check_period	= perf_ibs_check_period,
+		.capabilities	= PERF_PMU_CAP_MEDIATED_VPMU,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
 	.config_mask		= IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
@@ -817,6 +818,7 @@ static struct perf_ibs perf_ibs_op = {
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
 		.check_period	= perf_ibs_check_period,
+		.capabilities	= PERF_PMU_CAP_MEDIATED_VPMU,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
 	.config_mask		= IBS_OP_MAX_CNT,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 11/11] perf/x86/amd: Remove exclude_guest check from perf_ibs_init()
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (9 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 10/11] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
@ 2025-06-27 16:25 ` Manali Shukla
  2025-07-14 11:51 ` [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-06-27 16:25 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, manali.shukla, bp, peterz, mingo,
	mizhang, thomas.lendacky, ravi.bangoria, Sandipan.Das

Currently IBS driver doesn't allow the creation of IBS event with
exclue_guest set. As a result, amd_ibs_init() returns -EINVAL if
IBS event is created with exclude_guest set.

With the introduction of mediated PMU support, software-based handling
of exclude_guest is permitted for PMUs that have the
PERF_PMU_CAP_MEDIATED_VPMU capability.

Since ibs_op and ibs_fetch pmus has PERF_PMU_CAP_MEDIATED_VPMU
capability set, update perf_ibs_init() to remove exclude_guest check.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/ibs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 00c36ce16957..35dc5a578778 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -299,8 +299,7 @@ static int perf_ibs_init(struct perf_event *event)
 		return -EOPNOTSUPP;
 
 	/* handle exclude_{user,kernel} in the IRQ handler */
-	if (event->attr.exclude_host || event->attr.exclude_guest ||
-	    event->attr.exclude_idle)
+	if (event->attr.exclude_host || event->attr.exclude_idle)
 		return -EINVAL;
 
 	if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 00/11] Implement support for IBS virtualization
  2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
                   ` (10 preceding siblings ...)
  2025-06-27 16:25 ` [PATCH v1 11/11] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla
@ 2025-07-14 11:51 ` Manali Shukla
  11 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-07-14 11:51 UTC (permalink / raw)
  To: kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das

On 6/27/2025 9:55 PM, Manali Shukla wrote:
> Add support for IBS virtualization (VIBS). VIBS feature allows the
> guest to collect IBS samples without exiting the guest.  There are
> 2 parts to it [1].
>  - Virtualizing the IBS register state.
>  - Ensuring the IBS interrupt is handled in the guest without exiting
>   the hypervisor.
> 
> To deliver virtualized IBS interrupts to the guest, VIBS requires either
> AVIC or Virtual NMI (VNMI) support [1]. During IBS sampling, the
> hardware signals a VNMI. The source of this VNMI depends on the AVIC
> configuration:
> 
>  - With AVIC disabled, the virtual NMI is hardware-accelerated.
>  - With AVIC enabled, the virtual NMI is delivered via AVIC using Extended LVT.
> 
> The local interrupts are extended to include more LVT registers, to
> allow additional interrupt sources, like instruction based sampling
> etc. [3].
> 
> Although IBS virtualization requires either AVIC or VNMI to be enabled
> in order to successfully deliver IBS NMIs to the guest, VNMI must be
> enabled to ensure reliable delivery. This requirement stems from the
> dynamic behavior of AVIC. While a guest is launched with AVIC enabled,
> AVIC can be inhibited at runtime. When AVIC is inhibited and VNMI is
> disabled, there is no mechanism to deliver IBS NMIs to the guest.
> Therefore, enabling VNMI is necessary to support IBS virtualization
> reliably.
> 
> Note that, since IBS registers are swap type C [2], the hypervisor is
> responsible for saving and restoring of IBS host state. Hypervisor needs
> to disable host IBS before saving the state and enter the guest. After a
> guest exit, the hypervisor needs to restore host IBS state and re-enable
> IBS.
> 
> The mediated PMU has the capability to save the host context when
> entering the guest by scheduling out all exclude_guest events, and to
> restore the host context when exiting the guest by scheduling in the
> previously scheduled-out events. This behavior aligns with the
> requirement for IBS registers being of swap type C. Therefore, the
> mediated PMU design can be leveraged to implement IBS virtualization.
> As a result, enabling the mediated PMU is a necessary requirement for
> IBS virtualization.
> 
> The initial version of this series has been posted here:
> https://lore.kernel.org/kvm/f98687e0-1fee-8208-261f-d93152871f00@amd.com/
> 
> Since then, the mediated PMU patches [5] have matured significantly.
> This series is a resurrection of previous VIBS series and leverages the
> mediated PMU infrastructure to enable IBS virtualization.
> 
> How to enable VIBS?
> ----------------------------------------------
> sudo echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
> sudo modprobe -r kvm_amd
> sudo modprobe kvm_amd enable_mediated_pmu=1 vnmi=1
> 
> Qemu changes can be found at below location:
> ----------------------------------------------
> https://github.com/AMDESE/qemu/tree/vibs_v1
> 
> Qemu commandline to enable IBS virtualization:
> ------------------------------------------------
> qemu-system-x86_64 -enable-kvm -cpu EPYC-Genoa,+ibs,+extlvt,+extapic,+svm,+pmu \ ..
> 
> Testing done:
> ------------------------------------------------
> - Following tests were executed on guest
>   sudo perf record -e ibs_op// -c 100000 -a
>   sudo perf record -e ibs_op// -c 100000 -C 10
>   sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
>   sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
>   sudo perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
>   sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
>   sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
>   sudo ./tools/perf/perf record -e ibs_op// -e ibs_fetch// -a --raw-samples -c 100000
>   sudo perf report
>   sudo perf script
>   sudo perf report -D | grep -P "LdOp 1.*StOp 0" | wc -l
>   sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1" | wc -l
>   sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | wc -l
>   sudo perf report -D | grep -B1 -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | grep -P "DataSrc ([02-9]|1[0-2])=" | wc -l
> - perf_fuzzer was run for 3hrs, no softlockups or unknown NMIs were
>   seen.
> 
> TO-DO: 
> -----------------------------------
> Enable IBS virtualization on SEV-ES and SEV-SNP guests.
> 
> base-commit (61374cc145f4) + [4] (Clean up KVM's MSR interception code)
> + [5] (Mediated vPMU 4.0 for x86). 
> 
> [1]: https://bugzilla.kernel.org/attachment.cgi?id=306250
>      AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
>      Instruction-Based Sampling Virtualization.
> 
> [2]: https://bugzilla.kernel.org/attachment.cgi?id=306250
>      AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
>      of VMCB, Table B-3 Swap Types.
> 
> [3]: https://bugzilla.kernel.org/attachment.cgi?id=306250
>      AMD64 Architecture Programmer’s Manual, Vol 2, Section 16.4.5
>      Extended Interrupts.
> 
> [4]: https://lore.kernel.org/kvm/20250610225737.156318-1-seanjc@google.com/
> 
> [5]: https://lore.kernel.org/kvm/20250324173121.1275209-1-mizhang@google.com/
> 
> Manali Shukla (6):
>   perf/amd/ibs: Fix race condition in IBS
>   KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for
>     extapic
>   KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
>   KVM: x86: Extend CPUID range to include new leaf
>   perf/x86/amd: Enable VPMU passthrough capability for IBS PMU
>   perf/x86/amd: Remove exclude_guest check from perf_ibs_init()
> 
> Santosh Shukla (5):
>   x86/cpufeatures: Add CPUID feature bit for Extended LVT
>   KVM: x86: Add emulation support for Extented LVT registers
>   x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests
>   KVM: SVM: Extend VMCB area for virtualized IBS registers
>   KVM: SVM: Add support for IBS Virtualization
> 
>  Documentation/virt/kvm/api.rst     | 23 +++++++
>  arch/x86/events/amd/ibs.c          |  8 ++-
>  arch/x86/include/asm/apicdef.h     | 17 ++++++
>  arch/x86/include/asm/cpufeatures.h |  2 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/include/asm/svm.h         | 16 ++++-
>  arch/x86/include/uapi/asm/kvm.h    |  5 ++
>  arch/x86/kvm/cpuid.c               | 13 ++++
>  arch/x86/kvm/lapic.c               | 81 ++++++++++++++++++++++---
>  arch/x86/kvm/lapic.h               |  7 ++-
>  arch/x86/kvm/reverse_cpuid.h       | 16 +++++
>  arch/x86/kvm/svm/avic.c            |  4 ++
>  arch/x86/kvm/svm/svm.c             | 96 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                 | 37 ++++++++----
>  include/uapi/linux/kvm.h           | 10 ++++
>  15 files changed, 313 insertions(+), 23 deletions(-)
> 
> 
> base-commit: 61374cc145f4a56377eaf87c7409a97ec7a34041


A gentle reminder for the review.

-Manali

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic
  2025-06-27 16:25 ` [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
@ 2025-07-15  2:21   ` Mi, Dapeng
  2025-07-16  7:45     ` Manali Shukla
  0 siblings, 1 reply; 22+ messages in thread
From: Mi, Dapeng @ 2025-07-15  2:21 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das


On 6/28/2025 12:25 AM, Manali Shukla wrote:
> Modern AMD processors expose four additional extended LVT registers in
> the extended APIC register space, which can be used for additional
> interrupt sources such as instruction-based sampling and others.
>
> To support this, introduce two new vCPU-based IOCTLs:
> KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC. These IOCTLs works
> similarly to KVM_GET_LAPIC and KVM_SET_LAPIC, but operate on APIC page
> with extended APIC register space located at APIC offsets 400h-530h.
>
> These IOCTLs are intended for use when extended APIC support is
> enabled in the guest. They allow saving and restoring the full APIC
> page, including the extended registers.
>
> To support this, the `struct kvm_lapic_state_w_extapic` has been made
> extensible rather than hardcoding its size, improving forward
> compatibility.
>
> Documentation for the new IOCTLs has also been added.
>
> For more details on the extended APIC space, refer to AMD Programmer’s
> Manual Volume 2, Section 16.4.5: Extended Interrupts.
> https://bugzilla.kernel.org/attachment.cgi?id=306250
>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>  Documentation/virt/kvm/api.rst  | 23 ++++++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h |  5 +++++
>  arch/x86/kvm/lapic.c            | 12 ++++++-----
>  arch/x86/kvm/lapic.h            |  6 ++++--
>  arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++---------
>  include/uapi/linux/kvm.h        | 10 +++++++++
>  6 files changed, 76 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 1bd2d42e6424..0ca11d43f833 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -2041,6 +2041,18 @@ error.
>  Reads the Local APIC registers and copies them into the input argument.  The
>  data format and layout are the same as documented in the architecture manual.
>  
> +::
> +
> +  #define KVM_APIC_EXT_REG_SIZE 0x540
> +  struct kvm_lapic_state_w_extapic {
> +	__DECLARE_FLEX_ARRAY(__u8, regs);
> +  };
> +
> +Applications should use KVM_GET_LAPIC_W_EXTAPIC ioctl if extended APIC is
> +enabled. KVM_GET_LAPIC_W_EXTAPIC reads Local APIC registers with extended
> +APIC register space located at offsets 400h-530h and copies them into input
> +argument.
> +
>  If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
>  enabled, then the format of APIC_ID register depends on the APIC mode
>  (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
> @@ -2072,6 +2084,17 @@ always uses xAPIC format.
>  Copies the input argument into the Local APIC registers.  The data format
>  and layout are the same as documented in the architecture manual.
>  
> +::
> +
> +  #define KVM_APIC_EXT_REG_SIZE 0x540
> +  struct kvm_lapic_state_w_extapic {
> +	__DECLARE_FLEX_ARRAY(__u8, regs);
> +  };
> +
> +Applications should use KVM_SET_LAPIC_W_EXTAPIC ioctl if extended APIC is enabled.
> +KVM_SET_LAPIC_W_EXTAPIC copies input arguments with extended APIC register into
> +Local APIC and extended APIC registers.
> +
>  The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
>  regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
>  See the note in KVM_GET_LAPIC.
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 6f3499507c5e..91c3c5b8cae3 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -124,6 +124,11 @@ struct kvm_lapic_state {
>  	char regs[KVM_APIC_REG_SIZE];
>  };
>  
> +#define KVM_APIC_EXT_REG_SIZE 0x540
> +struct kvm_lapic_state_w_extapic {
> +	__DECLARE_FLEX_ARRAY(__u8, regs);
> +};

The name "kvm_lapic_state_w_extapic" seems a little bit too long, maybe
"kvm_ext_lapic_state" is enough?


> +
>  struct kvm_segment {
>  	__u64 base;
>  	__u32 limit;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 73418dc0ebb2..00ca2b0faa45 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3046,7 +3046,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
>  EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
>  
>  static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> -		struct kvm_lapic_state *s, bool set)
> +		struct kvm_lapic_state_w_extapic *s, bool set)
>  {
>  	if (apic_x2apic_mode(vcpu->arch.apic)) {
>  		u32 x2apic_id = kvm_x2apic_id(vcpu->arch.apic);
> @@ -3097,9 +3097,10 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> -int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> +int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
> +		       unsigned int size)
>  {
> -	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
> +	memcpy(s->regs, vcpu->arch.apic->regs, size);
>  
>  	/*
>  	 * Get calculated timer current count for remaining timer period (if
> @@ -3111,7 +3112,8 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  	return kvm_apic_state_fixup(vcpu, s, false);
>  }
>  
> -int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> +int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
> +		       unsigned int size)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	int r;
> @@ -3126,7 +3128,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  		kvm_recalculate_apic_map(vcpu->kvm);
>  		return r;
>  	}
> -	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
> +	memcpy(vcpu->arch.apic->regs, s->regs, size);
>  
>  	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
>  	kvm_recalculate_apic_map(vcpu->kvm);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 4518b4e0552f..7ad946b3738d 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -120,9 +120,11 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
>  
>  int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
> -int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
> -int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
>  void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu);
> +int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
> +		       unsigned int size);
> +int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
> +		       unsigned int size);
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
>  
>  u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c880a512005e..c273bbbbbcc6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5156,25 +5156,25 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  }
>  
>  static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
> -				    struct kvm_lapic_state *s)
> +				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
>  {
>  	if (vcpu->arch.apic->guest_apic_protected)
>  		return -EINVAL;
>  
>  	kvm_x86_call(sync_pir_to_irr)(vcpu);
>  
> -	return kvm_apic_get_state(vcpu, s);
> +	return kvm_apic_get_state(vcpu, s, size);
>  }
>  
>  static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
> -				    struct kvm_lapic_state *s)
> +				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
>  {
>  	int r;
>  
>  	if (vcpu->arch.apic->guest_apic_protected)
>  		return -EINVAL;
>  
> -	r = kvm_apic_set_state(vcpu, s);
> +	r = kvm_apic_set_state(vcpu, s, size);
>  	if (r)
>  		return r;
>  	update_cr8_intercept(vcpu);
> @@ -5903,10 +5903,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  {
>  	struct kvm_vcpu *vcpu = filp->private_data;
>  	void __user *argp = (void __user *)arg;
> +	unsigned long size;
>  	int r;
>  	union {
>  		struct kvm_sregs2 *sregs2;
> -		struct kvm_lapic_state *lapic;
> +		struct kvm_lapic_state_w_extapic *lapic;
>  		struct kvm_xsave *xsave;
>  		struct kvm_xcrs *xcrs;
>  		void *buffer;
> @@ -5916,35 +5917,51 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  
>  	u.buffer = NULL;
>  	switch (ioctl) {
> +	case KVM_GET_LAPIC_W_EXTAPIC:
>  	case KVM_GET_LAPIC: {
>  		r = -EINVAL;
>  		if (!lapic_in_kernel(vcpu))
>  			goto out;
> -		u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
> +
> +		if (ioctl == KVM_GET_LAPIC_W_EXTAPIC)
> +			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
> +		else
> +			size = sizeof(struct kvm_lapic_state);
> +
> +		u.lapic = kzalloc(size, GFP_KERNEL);
>  
>  		r = -ENOMEM;
>  		if (!u.lapic)
>  			goto out;
> -		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic);
> +		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic, size);
>  		if (r)
>  			goto out;
> +
>  		r = -EFAULT;
> -		if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state)))
> +		if (copy_to_user(argp, u.lapic, size))
>  			goto out;
> +
>  		r = 0;
>  		break;
>  	}
> +	case KVM_SET_LAPIC_W_EXTAPIC:
>  	case KVM_SET_LAPIC: {
>  		r = -EINVAL;
>  		if (!lapic_in_kernel(vcpu))
>  			goto out;
> -		u.lapic = memdup_user(argp, sizeof(*u.lapic));
> +
> +		if (ioctl == KVM_SET_LAPIC_W_EXTAPIC)
> +			size = struct_size(u.lapic, regs, KVM_APIC_EXT_REG_SIZE);
> +		else
> +			size = sizeof(struct kvm_lapic_state);
> +		u.lapic = memdup_user(argp, size);
> +
>  		if (IS_ERR(u.lapic)) {
>  			r = PTR_ERR(u.lapic);
>  			goto out_nofree;
>  		}
>  
> -		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
> +		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic, size);
>  		break;
>  	}
>  	case KVM_INTERRUPT: {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d00b85cb168c..cf23c1b52c49 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1290,6 +1290,16 @@ struct kvm_vfio_spapr_tce {
>  #define KVM_SET_FPU               _IOW(KVMIO,  0x8d, struct kvm_fpu)
>  #define KVM_GET_LAPIC             _IOR(KVMIO,  0x8e, struct kvm_lapic_state)
>  #define KVM_SET_LAPIC             _IOW(KVMIO,  0x8f, struct kvm_lapic_state)
> +/*
> + * Added to save/restore local APIC registers with extended APIC (extapic)
> + * register space.
> + *
> + * Qemu emulates extapic logic only when KVM enables extapic functionality via
> + * KVM capability. In the condition where Qemu sets extapic registers, but KVM doesn't
> + * set extapic capability, Qemu ends up using KVM_GET_LAPIC and KVM_SET_LAPIC.
> + */
> +#define KVM_GET_LAPIC_W_EXTAPIC   _IOR(KVMIO,  0x8e, struct kvm_lapic_state_w_extapic)
> +#define KVM_SET_LAPIC_W_EXTAPIC   _IOW(KVMIO,  0x8f, struct kvm_lapic_state_w_extapic)
>  #define KVM_SET_CPUID2            _IOW(KVMIO,  0x90, struct kvm_cpuid2)
>  #define KVM_GET_CPUID2            _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
>  /* Available with KVM_CAP_VAPIC */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-06-27 16:25 ` [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
@ 2025-07-15  2:58   ` Mi, Dapeng
  2025-07-16 10:10     ` Manali Shukla
  0 siblings, 1 reply; 22+ messages in thread
From: Mi, Dapeng @ 2025-07-15  2:58 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das


On 6/28/2025 12:25 AM, Manali Shukla wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
>
> The local interrupts are extended to include more LVT registers in
> order to allow additional interrupt sources, like Instruction Based
> Sampling (IBS) and many more.
>
> Currently there are four additional LVT registers defined and they are
> located at APIC offsets 400h-530h.
>
> AMD IBS driver is designed to use EXTLVT (Extended interrupt local
> vector table) by default for driver initialization.
>
> Extended LVT registers are required to be emulated to initialize the
> guest IBS driver successfully.
>
> Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
> https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
> on Extended LVT.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>  arch/x86/include/asm/apicdef.h | 17 +++++++++
>  arch/x86/kvm/cpuid.c           |  6 +++
>  arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/lapic.h           |  1 +
>  arch/x86/kvm/svm/avic.c        |  4 ++
>  arch/x86/kvm/svm/svm.c         |  4 ++
>  6 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
> index 094106b6a538..4c0f580578aa 100644
> --- a/arch/x86/include/asm/apicdef.h
> +++ b/arch/x86/include/asm/apicdef.h
> @@ -146,6 +146,23 @@
>  #define		APIC_EILVT_MSG_EXT	0x7
>  #define		APIC_EILVT_MASKED	(1 << 16)
>  
> +/*
> + * Initialize extended APIC registers to the default value when guest
> + * is started and EXTAPIC feature is enabled on the guest.
> + *
> + * APIC_EFEAT is a read only Extended APIC feature register, whose
> + * default value is 0x00040007. However, bits 0, 1, and 2 represent
> + * features that are not currently emulated by KVM. Therefore, these
> + * bits must be cleared during initialization. As a result, the
> + * default value used for APIC_EFEAT in KVM is 0x00040000.
> + *
> + * APIC_ECTRL is a read-write Extended APIC control register, whose
> + * default value is 0x0.
> + */
> +
> +#define		APIC_EFEAT_DEFAULT	0x00040000
> +#define		APIC_ECTRL_DEFAULT	0x0
> +
>  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
>  #define APIC_BASE_MSR		0x800
>  #define APIC_X2APIC_ID_MSR	0x802
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index eb7be340138b..7270d22fbf31 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	/* Invoke the vendor callback only after the above state is updated. */
>  	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
>  
> +	/*
> +	 * Initialize extended LVT registers at guest startup to support delivery
> +	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
> +	 */
> +	kvm_apic_init_eilvt_regs(vcpu);
> +
>  	/*
>  	 * Except for the MMU, which needs to do its thing any vendor specific
>  	 * adjustments to the reserved GPA bits.
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 00ca2b0faa45..cffe44eb3f2b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>  }
>  
>  #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
> +#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))

It seems there is no difference on the MASK definition between
APIC_REG_MASK() and APIC_REG_EXT_MASK(). Why not directly use the original
APIC_REG_MASK()?

BTW, If we indeed need to define this new macro, could we define the macro
like blow?

#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) - 0x400) >> 4))

It's more easily to understand. 


>  #define APIC_REGS_MASK(first, count) \
>  	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
>  
> +#define APIC_LAST_REG_OFFSET		0x3f0
> +#define APIC_EXT_LAST_REG_OFFSET	0x530
> +
>  u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
>  {
>  	/* Leave bits '0' for reserved and write-only registers. */
> @@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
>  static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  			      void *data)
>  {
> +	u64 valid_reg_ext_mask = 0;
> +	unsigned int last_reg = APIC_LAST_REG_OFFSET;
>  	unsigned char alignment = offset & 0xf;
>  	u32 result;
>  
> @@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  	 */
>  	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
>  
> +	/*
> +	 * The local interrupts are extended to include LVT registers to allow
> +	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
> +	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
> +	 */
> +	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
> +		valid_reg_ext_mask =
> +			APIC_REG_EXT_MASK(APIC_EFEAT) |
> +			APIC_REG_EXT_MASK(APIC_ECTRL) |
> +			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
> +			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
> +			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
> +			APIC_REG_EXT_MASK(APIC_EILVTn(3));
> +		last_reg = APIC_EXT_LAST_REG_OFFSET;
> +	}

Why not move this code piece into kvm_lapic_readable_reg_mask() and
directly use APIC_REG_MASK() for these extended regs? Then we don't need to
modify the below code. 


> +
>  	if (alignment + len > 4)
>  		return 1;
>  
> -	if (offset > 0x3f0 ||
> -	    !(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
> +	if (offset > last_reg)
>  		return 1;
>  
> +	switch (offset) {
> +	/*
> +	 * Section 16.3.2 in the AMD Programmer's Manual Volume 2 states:
> +	 * "APIC registers are aligned to 16-byte offsets and must be accessed
> +	 * using naturally-aligned DWORD size read and writes."
> +	 */
> +	case KVM_APIC_REG_SIZE ... KVM_APIC_EXT_REG_SIZE - 16:
> +		if (!(valid_reg_ext_mask & APIC_REG_EXT_MASK(offset)))
> +			return 1;
> +		break;
> +	default:
> +		if (!(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
> +			return 1;
> +
> +	}
> +
>  	result = __apic_read(apic, offset & ~0xf);
>  
>  	trace_kvm_apic_read(offset, result);
> @@ -2419,6 +2456,14 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  		else
>  			kvm_apic_send_ipi(apic, APIC_DEST_SELF | val, 0);
>  		break;
> +
> +	case APIC_ECTRL:
> +	case APIC_EILVTn(0):
> +	case APIC_EILVTn(1):
> +	case APIC_EILVTn(2):
> +	case APIC_EILVTn(3):
> +		kvm_lapic_set_reg(apic, reg, val);
> +		break;
>  	default:
>  		ret = 1;
>  		break;
> @@ -2757,6 +2802,24 @@ void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
>  	kvm_vcpu_srcu_read_lock(vcpu);
>  }
>  
> +/*
> + * Initialize extended APIC registers to the default value when guest is
> + * started. The extended APIC registers should only be initialized when the
> + * EXTAPIC feature is enabled on the guest.
> + */
> +void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +	int i;
> +
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_EXTAPIC)) {
> +		kvm_lapic_set_reg(apic, APIC_EFEAT, APIC_EFEAT_DEFAULT);
> +		kvm_lapic_set_reg(apic, APIC_ECTRL, APIC_ECTRL_DEFAULT);
> +		for (i = 0; i < APIC_EILVT_NR_MAX; i++)
> +			kvm_lapic_set_reg(apic, APIC_EILVTn(i), APIC_EILVT_MASKED);
> +	}
> +}
> +
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -2818,6 +2881,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
>  		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
> +	kvm_apic_init_eilvt_regs(vcpu);
> +
>  	kvm_apic_update_apicv(vcpu);
>  	update_divide_count(apic);
>  	atomic_set(&apic->lapic_timer.pending, 0);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 7ad946b3738d..ff0f9eb3417b 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -96,6 +96,7 @@ void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
> +void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
>  void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 7338879d1c0c..323927fb6f57 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -682,6 +682,10 @@ static bool is_avic_unaccelerated_access_trap(u32 offset)
>  	case APIC_LVTERR:
>  	case APIC_TMICT:
>  	case APIC_TDCR:
> +	case APIC_EILVTn(0):
> +	case APIC_EILVTn(1):
> +	case APIC_EILVTn(2):
> +	case APIC_EILVTn(3):
>  		ret = true;
>  		break;
>  	default:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index fffc3320ea00..f9a7ff37ea10 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -791,6 +791,10 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
>  		X2APIC_MSR(APIC_TMICT),
>  		X2APIC_MSR(APIC_TMCCT),
>  		X2APIC_MSR(APIC_TDCR),
> +		X2APIC_MSR(APIC_EILVTn(0)),
> +		X2APIC_MSR(APIC_EILVTn(1)),
> +		X2APIC_MSR(APIC_EILVTn(2)),
> +		X2APIC_MSR(APIC_EILVTn(3)),
>  	};
>  	int i;
>  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2025-06-27 16:25 ` [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
@ 2025-07-15  3:13   ` Mi, Dapeng
  2025-07-16  7:40     ` Manali Shukla
  0 siblings, 1 reply; 22+ messages in thread
From: Mi, Dapeng @ 2025-07-15  3:13 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das


On 6/28/2025 12:25 AM, Manali Shukla wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
>
> Define the new VMCB fields that will beused to save and restore the

s/beused/be used/


> satate of the following fetch and op IBS related MSRs.
>
>   * MSRC001_1030 [IBS Fetch Control]
>   * MSRC001_1031 [IBS Fetch Linear Address]
>   * MSRC001_1033 [IBS Execution Control]
>   * MSRC001_1034 [IBS Op Logical Address]
>   * MSRC001_1035 [IBS Op Data]
>   * MSRC001_1036 [IBS Op Data 2]
>   * MSRC001_1037 [IBS Op Data 3]
>   * MSRC001_1038 [IBS DC Linear Address]
>   * MSRC001_103B [IBS Branch Target Address]
>   * MSRC001_103C [IBS Fetch Control Extended]
>
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>  arch/x86/include/asm/svm.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index ad954a1a6656..b62049b51ebb 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -356,6 +356,17 @@ struct vmcb_save_area {
>  	u64 last_excp_to;
>  	u8 reserved_0x298[72];
>  	u64 spec_ctrl;		/* Guest version of SPEC_CTRL at 0x2E0 */
> +	u8 reserved_0x2e8[1168];
> +	u64 ibs_fetch_ctl;
> +	u64 ibs_fetch_linear_addr;
> +	u64 ibs_op_ctl;
> +	u64 ibs_op_rip;
> +	u64 ibs_op_data;
> +	u64 ibs_op_data2;
> +	u64 ibs_op_data3;
> +	u64 ibs_dc_linear_addr;
> +	u64 ibs_br_target;
> +	u64 ibs_fetch_extd_ctl;
>  } __packed;
>  
>  /* Save area definition for SEV-ES and SEV-SNP guests */
> @@ -538,7 +549,7 @@ struct vmcb {
>  	};
>  } __packed;
>  
> -#define EXPECTED_VMCB_SAVE_AREA_SIZE		744
> +#define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
>  #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
>  #define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
>  #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
> @@ -564,6 +575,7 @@ static inline void __unused_size_checks(void)
>  	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
>  	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
>  	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
> +	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x2e8);
>  
>  	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
>  	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2025-07-15  3:13   ` Mi, Dapeng
@ 2025-07-16  7:40     ` Manali Shukla
  0 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-07-16  7:40 UTC (permalink / raw)
  To: Mi, Dapeng, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das

Hi Dapeng Mi,

Thank you for reviewing my changes.

On 7/15/2025 8:43 AM, Mi, Dapeng wrote:
> 
> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>> From: Santosh Shukla <santosh.shukla@amd.com>
>>
>> Define the new VMCB fields that will beused to save and restore the
> 
> s/beused/be used/

Ack. I will correct it in V2.

-Manali

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic
  2025-07-15  2:21   ` Mi, Dapeng
@ 2025-07-16  7:45     ` Manali Shukla
  0 siblings, 0 replies; 22+ messages in thread
From: Manali Shukla @ 2025-07-16  7:45 UTC (permalink / raw)
  To: Mi, Dapeng, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das

Hi Dapeng Mi,

Thank you for reviewing my patches.

On 7/15/2025 7:51 AM, Mi, Dapeng wrote:
> 
> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>> Modern AMD processors expose four additional extended LVT registers in
>> the extended APIC register space, which can be used for additional
>> interrupt sources such as instruction-based sampling and others.
>>
>> To support this, introduce two new vCPU-based IOCTLs:
>> KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC. These IOCTLs works
>> similarly to KVM_GET_LAPIC and KVM_SET_LAPIC, but operate on APIC page
>> with extended APIC register space located at APIC offsets 400h-530h.
>>
>> These IOCTLs are intended for use when extended APIC support is
>> enabled in the guest. They allow saving and restoring the full APIC
>> page, including the extended registers.
>>
>> To support this, the `struct kvm_lapic_state_w_extapic` has been made
>> extensible rather than hardcoding its size, improving forward
>> compatibility.
>>
>> Documentation for the new IOCTLs has also been added.
>>
>> For more details on the extended APIC space, refer to AMD Programmer’s
>> Manual Volume 2, Section 16.4.5: Extended Interrupts.
>> https://bugzilla.kernel.org/attachment.cgi?id=306250
>>
>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>> ---
>>  Documentation/virt/kvm/api.rst  | 23 ++++++++++++++++++++
>>  arch/x86/include/uapi/asm/kvm.h |  5 +++++
>>  arch/x86/kvm/lapic.c            | 12 ++++++-----
>>  arch/x86/kvm/lapic.h            |  6 ++++--
>>  arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++---------
>>  include/uapi/linux/kvm.h        | 10 +++++++++
>>  6 files changed, 76 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 1bd2d42e6424..0ca11d43f833 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -2041,6 +2041,18 @@ error.
>>  Reads the Local APIC registers and copies them into the input argument.  The
>>  data format and layout are the same as documented in the architecture manual.
>>  
>> +::
>> +
>> +  #define KVM_APIC_EXT_REG_SIZE 0x540
>> +  struct kvm_lapic_state_w_extapic {
>> +	__DECLARE_FLEX_ARRAY(__u8, regs);
>> +  };
>> +
>> +Applications should use KVM_GET_LAPIC_W_EXTAPIC ioctl if extended APIC is
>> +enabled. KVM_GET_LAPIC_W_EXTAPIC reads Local APIC registers with extended
>> +APIC register space located at offsets 400h-530h and copies them into input
>> +argument.
>> +
>>  If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
>>  enabled, then the format of APIC_ID register depends on the APIC mode
>>  (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
>> @@ -2072,6 +2084,17 @@ always uses xAPIC format.
>>  Copies the input argument into the Local APIC registers.  The data format
>>  and layout are the same as documented in the architecture manual.
>>  
>> +::
>> +
>> +  #define KVM_APIC_EXT_REG_SIZE 0x540
>> +  struct kvm_lapic_state_w_extapic {
>> +	__DECLARE_FLEX_ARRAY(__u8, regs);
>> +  };
>> +
>> +Applications should use KVM_SET_LAPIC_W_EXTAPIC ioctl if extended APIC is enabled.
>> +KVM_SET_LAPIC_W_EXTAPIC copies input arguments with extended APIC register into
>> +Local APIC and extended APIC registers.
>> +
>>  The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
>>  regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
>>  See the note in KVM_GET_LAPIC.
>> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>> index 6f3499507c5e..91c3c5b8cae3 100644
>> --- a/arch/x86/include/uapi/asm/kvm.h
>> +++ b/arch/x86/include/uapi/asm/kvm.h
>> @@ -124,6 +124,11 @@ struct kvm_lapic_state {
>>  	char regs[KVM_APIC_REG_SIZE];
>>  };
>>  
>> +#define KVM_APIC_EXT_REG_SIZE 0x540
>> +struct kvm_lapic_state_w_extapic {
>> +	__DECLARE_FLEX_ARRAY(__u8, regs);
>> +};
> 
> The name "kvm_lapic_state_w_extapic" seems a little bit too long, maybe
> "kvm_ext_lapic_state" is enough?

I also found the name to be quite long, but I couldn't come up with a
better alternative. I'm fine with keeping kvm_ext_lapic_state as it
appears concise and self-explanatory. I will change the name in V2.

-Manali



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-07-15  2:58   ` Mi, Dapeng
@ 2025-07-16 10:10     ` Manali Shukla
  2025-07-17  2:02       ` Mi, Dapeng
  0 siblings, 1 reply; 22+ messages in thread
From: Manali Shukla @ 2025-07-16 10:10 UTC (permalink / raw)
  To: Mi, Dapeng, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das

Hi Dapeng Mi,

Thanks for reviewing my patches.

On 7/15/2025 8:28 AM, Mi, Dapeng wrote:
> 
> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>> From: Santosh Shukla <santosh.shukla@amd.com>
>>
>> The local interrupts are extended to include more LVT registers in
>> order to allow additional interrupt sources, like Instruction Based
>> Sampling (IBS) and many more.
>>
>> Currently there are four additional LVT registers defined and they are
>> located at APIC offsets 400h-530h.
>>
>> AMD IBS driver is designed to use EXTLVT (Extended interrupt local
>> vector table) by default for driver initialization.
>>
>> Extended LVT registers are required to be emulated to initialize the
>> guest IBS driver successfully.
>>
>> Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
>> https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
>> on Extended LVT.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>> ---
>>  arch/x86/include/asm/apicdef.h | 17 +++++++++
>>  arch/x86/kvm/cpuid.c           |  6 +++
>>  arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
>>  arch/x86/kvm/lapic.h           |  1 +
>>  arch/x86/kvm/svm/avic.c        |  4 ++
>>  arch/x86/kvm/svm/svm.c         |  4 ++
>>  6 files changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>> index 094106b6a538..4c0f580578aa 100644
>> --- a/arch/x86/include/asm/apicdef.h
>> +++ b/arch/x86/include/asm/apicdef.h
>> @@ -146,6 +146,23 @@
>>  #define		APIC_EILVT_MSG_EXT	0x7
>>  #define		APIC_EILVT_MASKED	(1 << 16)
>>  
>> +/*
>> + * Initialize extended APIC registers to the default value when guest
>> + * is started and EXTAPIC feature is enabled on the guest.
>> + *
>> + * APIC_EFEAT is a read only Extended APIC feature register, whose
>> + * default value is 0x00040007. However, bits 0, 1, and 2 represent
>> + * features that are not currently emulated by KVM. Therefore, these
>> + * bits must be cleared during initialization. As a result, the
>> + * default value used for APIC_EFEAT in KVM is 0x00040000.
>> + *
>> + * APIC_ECTRL is a read-write Extended APIC control register, whose
>> + * default value is 0x0.
>> + */
>> +
>> +#define		APIC_EFEAT_DEFAULT	0x00040000
>> +#define		APIC_ECTRL_DEFAULT	0x0
>> +
>>  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
>>  #define APIC_BASE_MSR		0x800
>>  #define APIC_X2APIC_ID_MSR	0x802
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index eb7be340138b..7270d22fbf31 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>  	/* Invoke the vendor callback only after the above state is updated. */
>>  	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
>>  
>> +	/*
>> +	 * Initialize extended LVT registers at guest startup to support delivery
>> +	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
>> +	 */
>> +	kvm_apic_init_eilvt_regs(vcpu);
>> +
>>  	/*
>>  	 * Except for the MMU, which needs to do its thing any vendor specific
>>  	 * adjustments to the reserved GPA bits.
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 00ca2b0faa45..cffe44eb3f2b 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>>  }
>>  
>>  #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
>> +#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
> 
> It seems there is no difference on the MASK definition between
> APIC_REG_MASK() and APIC_REG_EXT_MASK(). Why not directly use the original
> APIC_REG_MASK()?
> 

The Extended LVT registers range from 0x400 to 0x530. When using
APIC_REG_MASK(reg) with reg = 0x400 (as an example), the operation
results in a right shift of 64(0x40) bits, causing an overflow. This was
the actual reason of creating a new macro for extended APIC register space.

> BTW, If we indeed need to define this new macro, could we define the macro
> like blow?
> 
> #define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) - 0x400) >> 4))
> 
> It's more easily to understand. 
> 

I can define the macro in this way.

> 
>>  #define APIC_REGS_MASK(first, count) \
>>  	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
>>  
>> +#define APIC_LAST_REG_OFFSET		0x3f0
>> +#define APIC_EXT_LAST_REG_OFFSET	0x530
>> +
>>  u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
>>  {
>>  	/* Leave bits '0' for reserved and write-only registers. */
>> @@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
>>  static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>  			      void *data)
>>  {
>> +	u64 valid_reg_ext_mask = 0;
>> +	unsigned int last_reg = APIC_LAST_REG_OFFSET;
>>  	unsigned char alignment = offset & 0xf;
>>  	u32 result;
>>  
>> @@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>  	 */
>>  	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
>>  
>> +	/*
>> +	 * The local interrupts are extended to include LVT registers to allow
>> +	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
>> +	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
>> +	 */
>> +	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
>> +		valid_reg_ext_mask =
>> +			APIC_REG_EXT_MASK(APIC_EFEAT) |
>> +			APIC_REG_EXT_MASK(APIC_ECTRL) |
>> +			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
>> +			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
>> +			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
>> +			APIC_REG_EXT_MASK(APIC_EILVTn(3));
>> +		last_reg = APIC_EXT_LAST_REG_OFFSET;
>> +	}
> 
> Why not move this code piece into kvm_lapic_readable_reg_mask() and
> directly use APIC_REG_MASK() for these extended regs? Then we don't need to
> modify the below code. 
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-07-16 10:10     ` Manali Shukla
@ 2025-07-17  2:02       ` Mi, Dapeng
  2025-08-01  9:33         ` Manali Shukla
  0 siblings, 1 reply; 22+ messages in thread
From: Mi, Dapeng @ 2025-07-17  2:02 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das


On 7/16/2025 6:10 PM, Manali Shukla wrote:
> Hi Dapeng Mi,
>
> Thanks for reviewing my patches.
>
> On 7/15/2025 8:28 AM, Mi, Dapeng wrote:
>> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>>> From: Santosh Shukla <santosh.shukla@amd.com>
>>>
>>> The local interrupts are extended to include more LVT registers in
>>> order to allow additional interrupt sources, like Instruction Based
>>> Sampling (IBS) and many more.
>>>
>>> Currently there are four additional LVT registers defined and they are
>>> located at APIC offsets 400h-530h.
>>>
>>> AMD IBS driver is designed to use EXTLVT (Extended interrupt local
>>> vector table) by default for driver initialization.
>>>
>>> Extended LVT registers are required to be emulated to initialize the
>>> guest IBS driver successfully.
>>>
>>> Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
>>> https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
>>> on Extended LVT.
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>>> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>>> ---
>>>  arch/x86/include/asm/apicdef.h | 17 +++++++++
>>>  arch/x86/kvm/cpuid.c           |  6 +++
>>>  arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
>>>  arch/x86/kvm/lapic.h           |  1 +
>>>  arch/x86/kvm/svm/avic.c        |  4 ++
>>>  arch/x86/kvm/svm/svm.c         |  4 ++
>>>  6 files changed, 99 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>>> index 094106b6a538..4c0f580578aa 100644
>>> --- a/arch/x86/include/asm/apicdef.h
>>> +++ b/arch/x86/include/asm/apicdef.h
>>> @@ -146,6 +146,23 @@
>>>  #define		APIC_EILVT_MSG_EXT	0x7
>>>  #define		APIC_EILVT_MASKED	(1 << 16)
>>>  
>>> +/*
>>> + * Initialize extended APIC registers to the default value when guest
>>> + * is started and EXTAPIC feature is enabled on the guest.
>>> + *
>>> + * APIC_EFEAT is a read only Extended APIC feature register, whose
>>> + * default value is 0x00040007. However, bits 0, 1, and 2 represent
>>> + * features that are not currently emulated by KVM. Therefore, these
>>> + * bits must be cleared during initialization. As a result, the
>>> + * default value used for APIC_EFEAT in KVM is 0x00040000.
>>> + *
>>> + * APIC_ECTRL is a read-write Extended APIC control register, whose
>>> + * default value is 0x0.
>>> + */
>>> +
>>> +#define		APIC_EFEAT_DEFAULT	0x00040000
>>> +#define		APIC_ECTRL_DEFAULT	0x0
>>> +
>>>  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
>>>  #define APIC_BASE_MSR		0x800
>>>  #define APIC_X2APIC_ID_MSR	0x802
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index eb7be340138b..7270d22fbf31 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>  	/* Invoke the vendor callback only after the above state is updated. */
>>>  	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
>>>  
>>> +	/*
>>> +	 * Initialize extended LVT registers at guest startup to support delivery
>>> +	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
>>> +	 */
>>> +	kvm_apic_init_eilvt_regs(vcpu);
>>> +
>>>  	/*
>>>  	 * Except for the MMU, which needs to do its thing any vendor specific
>>>  	 * adjustments to the reserved GPA bits.
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 00ca2b0faa45..cffe44eb3f2b 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>>>  }
>>>  
>>>  #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
>>> +#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
>> It seems there is no difference on the MASK definition between
>> APIC_REG_MASK() and APIC_REG_EXT_MASK(). Why not directly use the original
>> APIC_REG_MASK()?
>>
> The Extended LVT registers range from 0x400 to 0x530. When using
> APIC_REG_MASK(reg) with reg = 0x400 (as an example), the operation
> results in a right shift of 64(0x40) bits, causing an overflow. This was
> the actual reason of creating a new macro for extended APIC register space.

I see. Just ignored that the bit could extend 64 bits.


>
>> BTW, If we indeed need to define this new macro, could we define the macro
>> like blow?
>>
>> #define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) - 0x400) >> 4))
>>
>> It's more easily to understand. 
>>
> I can define the macro in this way.
>
>>>  #define APIC_REGS_MASK(first, count) \
>>>  	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
>>>  
>>> +#define APIC_LAST_REG_OFFSET		0x3f0
>>> +#define APIC_EXT_LAST_REG_OFFSET	0x530
>>> +
>>>  u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
>>>  {
>>>  	/* Leave bits '0' for reserved and write-only registers. */
>>> @@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
>>>  static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>  			      void *data)
>>>  {
>>> +	u64 valid_reg_ext_mask = 0;
>>> +	unsigned int last_reg = APIC_LAST_REG_OFFSET;
>>>  	unsigned char alignment = offset & 0xf;
>>>  	u32 result;
>>>  
>>> @@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>  	 */
>>>  	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
>>>  
>>> +	/*
>>> +	 * The local interrupts are extended to include LVT registers to allow
>>> +	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
>>> +	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
>>> +	 */
>>> +	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
>>> +		valid_reg_ext_mask =
>>> +			APIC_REG_EXT_MASK(APIC_EFEAT) |
>>> +			APIC_REG_EXT_MASK(APIC_ECTRL) |
>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(3));
>>> +		last_reg = APIC_EXT_LAST_REG_OFFSET;
>>> +	}
>> Why not move this code piece into kvm_lapic_readable_reg_mask() and
>> directly use APIC_REG_MASK() for these extended regs? Then we don't need to
>> modify the below code. 

I still think we should get a unified APIC reg mask even for the extended
APIC with kvm_lapic_readable_reg_mask() helper. We can extend current
kvm_lapic_readable_reg_mask() and let it return a 128 bits bitmap, maybe
like this,

void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 *mask)

This makes code more easily maintain. 


>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-07-17  2:02       ` Mi, Dapeng
@ 2025-08-01  9:33         ` Manali Shukla
  2025-08-05  1:10           ` Mi, Dapeng
  0 siblings, 1 reply; 22+ messages in thread
From: Manali Shukla @ 2025-08-01  9:33 UTC (permalink / raw)
  To: Mi, Dapeng, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das

On 7/17/2025 7:32 AM, Mi, Dapeng wrote:
> 
> On 7/16/2025 6:10 PM, Manali Shukla wrote:
>> Hi Dapeng Mi,
>>
>> Thanks for reviewing my patches.
>>
>> On 7/15/2025 8:28 AM, Mi, Dapeng wrote:
>>> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>>>> From: Santosh Shukla <santosh.shukla@amd.com>
>>>>
>>>> The local interrupts are extended to include more LVT registers in
>>>> order to allow additional interrupt sources, like Instruction Based
>>>> Sampling (IBS) and many more.
>>>>
>>>> Currently there are four additional LVT registers defined and they are
>>>> located at APIC offsets 400h-530h.
>>>>
>>>> AMD IBS driver is designed to use EXTLVT (Extended interrupt local
>>>> vector table) by default for driver initialization.
>>>>
>>>> Extended LVT registers are required to be emulated to initialize the
>>>> guest IBS driver successfully.
>>>>
>>>> Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
>>>> https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
>>>> on Extended LVT.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>>>> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>>>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>>>> ---
>>>>  arch/x86/include/asm/apicdef.h | 17 +++++++++
>>>>  arch/x86/kvm/cpuid.c           |  6 +++
>>>>  arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
>>>>  arch/x86/kvm/lapic.h           |  1 +
>>>>  arch/x86/kvm/svm/avic.c        |  4 ++
>>>>  arch/x86/kvm/svm/svm.c         |  4 ++
>>>>  6 files changed, 99 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>>>> index 094106b6a538..4c0f580578aa 100644
>>>> --- a/arch/x86/include/asm/apicdef.h
>>>> +++ b/arch/x86/include/asm/apicdef.h
>>>> @@ -146,6 +146,23 @@
>>>>  #define		APIC_EILVT_MSG_EXT	0x7
>>>>  #define		APIC_EILVT_MASKED	(1 << 16)
>>>>  
>>>> +/*
>>>> + * Initialize extended APIC registers to the default value when guest
>>>> + * is started and EXTAPIC feature is enabled on the guest.
>>>> + *
>>>> + * APIC_EFEAT is a read only Extended APIC feature register, whose
>>>> + * default value is 0x00040007. However, bits 0, 1, and 2 represent
>>>> + * features that are not currently emulated by KVM. Therefore, these
>>>> + * bits must be cleared during initialization. As a result, the
>>>> + * default value used for APIC_EFEAT in KVM is 0x00040000.
>>>> + *
>>>> + * APIC_ECTRL is a read-write Extended APIC control register, whose
>>>> + * default value is 0x0.
>>>> + */
>>>> +
>>>> +#define		APIC_EFEAT_DEFAULT	0x00040000
>>>> +#define		APIC_ECTRL_DEFAULT	0x0
>>>> +
>>>>  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
>>>>  #define APIC_BASE_MSR		0x800
>>>>  #define APIC_X2APIC_ID_MSR	0x802
>>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>>> index eb7be340138b..7270d22fbf31 100644
>>>> --- a/arch/x86/kvm/cpuid.c
>>>> +++ b/arch/x86/kvm/cpuid.c
>>>> @@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>>  	/* Invoke the vendor callback only after the above state is updated. */
>>>>  	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
>>>>  
>>>> +	/*
>>>> +	 * Initialize extended LVT registers at guest startup to support delivery
>>>> +	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
>>>> +	 */
>>>> +	kvm_apic_init_eilvt_regs(vcpu);
>>>> +
>>>>  	/*
>>>>  	 * Except for the MMU, which needs to do its thing any vendor specific
>>>>  	 * adjustments to the reserved GPA bits.
>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>> index 00ca2b0faa45..cffe44eb3f2b 100644
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>>>>  }
>>>>  
>>>>  #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
>>>> +#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
>>> It seems there is no difference on the MASK definition between
>>> APIC_REG_MASK() and APIC_REG_EXT_MASK(). Why not directly use the original
>>> APIC_REG_MASK()?
>>>
>> The Extended LVT registers range from 0x400 to 0x530. When using
>> APIC_REG_MASK(reg) with reg = 0x400 (as an example), the operation
>> results in a right shift of 64(0x40) bits, causing an overflow. This was
>> the actual reason of creating a new macro for extended APIC register space.
> 
> I see. Just ignored that the bit could extend 64 bits.
> 
> 
>>
>>> BTW, If we indeed need to define this new macro, could we define the macro
>>> like blow?
>>>
>>> #define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) - 0x400) >> 4))
>>>
>>> It's more easily to understand. 
>>>
>> I can define the macro in this way.
>>
>>>>  #define APIC_REGS_MASK(first, count) \
>>>>  	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
>>>>  
>>>> +#define APIC_LAST_REG_OFFSET		0x3f0
>>>> +#define APIC_EXT_LAST_REG_OFFSET	0x530
>>>> +
>>>>  u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
>>>>  {
>>>>  	/* Leave bits '0' for reserved and write-only registers. */
>>>> @@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
>>>>  static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>>  			      void *data)
>>>>  {
>>>> +	u64 valid_reg_ext_mask = 0;
>>>> +	unsigned int last_reg = APIC_LAST_REG_OFFSET;
>>>>  	unsigned char alignment = offset & 0xf;
>>>>  	u32 result;
>>>>  
>>>> @@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>>  	 */
>>>>  	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
>>>>  
>>>> +	/*
>>>> +	 * The local interrupts are extended to include LVT registers to allow
>>>> +	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
>>>> +	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
>>>> +	 */
>>>> +	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
>>>> +		valid_reg_ext_mask =
>>>> +			APIC_REG_EXT_MASK(APIC_EFEAT) |
>>>> +			APIC_REG_EXT_MASK(APIC_ECTRL) |
>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(3));
>>>> +		last_reg = APIC_EXT_LAST_REG_OFFSET;
>>>> +	}
>>> Why not move this code piece into kvm_lapic_readable_reg_mask() and
>>> directly use APIC_REG_MASK() for these extended regs? Then we don't need to
>>> modify the below code. 
> 
> I still think we should get a unified APIC reg mask even for the extended
> APIC with kvm_lapic_readable_reg_mask() helper. We can extend current
> kvm_lapic_readable_reg_mask() and let it return a 128 bits bitmap, maybe
> like this,
> 
> void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 *mask)
> 
> This makes code more easily maintain. 
> 
> 

Sorry for the delay.

The reason why I am wary of this approach is because
kvm_lapic_readable_reg_mask() is currently being used in
vmx_update_msr_bitmap_x2apic(), where we directly use its return value:

    if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
        msr_bitmap[read_idx] =
~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
    else
        msr_bitmap[read_idx] = ~0ull;
    msr_bitmap[write_idx] = ~0ull;

Where msr_bitmap is a u64 array.

Changing kvm_lapic_readable_reg_mask() to return a 128-bit mask would
require changes in vmx_update_msr_bitmap_x2apic() too.

- Manali

>>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers
  2025-08-01  9:33         ` Manali Shukla
@ 2025-08-05  1:10           ` Mi, Dapeng
  0 siblings, 0 replies; 22+ messages in thread
From: Mi, Dapeng @ 2025-08-05  1:10 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-perf-users, linux-doc
  Cc: seanjc, pbonzini, nikunj, bp, peterz, mingo, mizhang,
	thomas.lendacky, ravi.bangoria, Sandipan.Das


On 8/1/2025 5:33 PM, Manali Shukla wrote:
> On 7/17/2025 7:32 AM, Mi, Dapeng wrote:
>> On 7/16/2025 6:10 PM, Manali Shukla wrote:
>>> Hi Dapeng Mi,
>>>
>>> Thanks for reviewing my patches.
>>>
>>> On 7/15/2025 8:28 AM, Mi, Dapeng wrote:
>>>> On 6/28/2025 12:25 AM, Manali Shukla wrote:
>>>>> From: Santosh Shukla <santosh.shukla@amd.com>
>>>>>
>>>>> The local interrupts are extended to include more LVT registers in
>>>>> order to allow additional interrupt sources, like Instruction Based
>>>>> Sampling (IBS) and many more.
>>>>>
>>>>> Currently there are four additional LVT registers defined and they are
>>>>> located at APIC offsets 400h-530h.
>>>>>
>>>>> AMD IBS driver is designed to use EXTLVT (Extended interrupt local
>>>>> vector table) by default for driver initialization.
>>>>>
>>>>> Extended LVT registers are required to be emulated to initialize the
>>>>> guest IBS driver successfully.
>>>>>
>>>>> Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=306250 for more details
>>>>> on Extended LVT.
>>>>>
>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>>>>> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>>>>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>>>>> ---
>>>>>  arch/x86/include/asm/apicdef.h | 17 +++++++++
>>>>>  arch/x86/kvm/cpuid.c           |  6 +++
>>>>>  arch/x86/kvm/lapic.c           | 69 +++++++++++++++++++++++++++++++++-
>>>>>  arch/x86/kvm/lapic.h           |  1 +
>>>>>  arch/x86/kvm/svm/avic.c        |  4 ++
>>>>>  arch/x86/kvm/svm/svm.c         |  4 ++
>>>>>  6 files changed, 99 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>>>>> index 094106b6a538..4c0f580578aa 100644
>>>>> --- a/arch/x86/include/asm/apicdef.h
>>>>> +++ b/arch/x86/include/asm/apicdef.h
>>>>> @@ -146,6 +146,23 @@
>>>>>  #define		APIC_EILVT_MSG_EXT	0x7
>>>>>  #define		APIC_EILVT_MASKED	(1 << 16)
>>>>>  
>>>>> +/*
>>>>> + * Initialize extended APIC registers to the default value when guest
>>>>> + * is started and EXTAPIC feature is enabled on the guest.
>>>>> + *
>>>>> + * APIC_EFEAT is a read only Extended APIC feature register, whose
>>>>> + * default value is 0x00040007. However, bits 0, 1, and 2 represent
>>>>> + * features that are not currently emulated by KVM. Therefore, these
>>>>> + * bits must be cleared during initialization. As a result, the
>>>>> + * default value used for APIC_EFEAT in KVM is 0x00040000.
>>>>> + *
>>>>> + * APIC_ECTRL is a read-write Extended APIC control register, whose
>>>>> + * default value is 0x0.
>>>>> + */
>>>>> +
>>>>> +#define		APIC_EFEAT_DEFAULT	0x00040000
>>>>> +#define		APIC_ECTRL_DEFAULT	0x0
>>>>> +
>>>>>  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
>>>>>  #define APIC_BASE_MSR		0x800
>>>>>  #define APIC_X2APIC_ID_MSR	0x802
>>>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>>>> index eb7be340138b..7270d22fbf31 100644
>>>>> --- a/arch/x86/kvm/cpuid.c
>>>>> +++ b/arch/x86/kvm/cpuid.c
>>>>> @@ -458,6 +458,12 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>>>  	/* Invoke the vendor callback only after the above state is updated. */
>>>>>  	kvm_x86_call(vcpu_after_set_cpuid)(vcpu);
>>>>>  
>>>>> +	/*
>>>>> +	 * Initialize extended LVT registers at guest startup to support delivery
>>>>> +	 * of interrupts via the extended APIC space (offsets 0x400–0x530).
>>>>> +	 */
>>>>> +	kvm_apic_init_eilvt_regs(vcpu);
>>>>> +
>>>>>  	/*
>>>>>  	 * Except for the MMU, which needs to do its thing any vendor specific
>>>>>  	 * adjustments to the reserved GPA bits.
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index 00ca2b0faa45..cffe44eb3f2b 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -1624,9 +1624,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>>>>>  }
>>>>>  
>>>>>  #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
>>>>> +#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
>>>> It seems there is no difference on the MASK definition between
>>>> APIC_REG_MASK() and APIC_REG_EXT_MASK(). Why not directly use the original
>>>> APIC_REG_MASK()?
>>>>
>>> The Extended LVT registers range from 0x400 to 0x530. When using
>>> APIC_REG_MASK(reg) with reg = 0x400 (as an example), the operation
>>> results in a right shift of 64(0x40) bits, causing an overflow. This was
>>> the actual reason of creating a new macro for extended APIC register space.
>> I see. Just ignored that the bit could extend 64 bits.
>>
>>
>>>> BTW, If we indeed need to define this new macro, could we define the macro
>>>> like blow?
>>>>
>>>> #define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) - 0x400) >> 4))
>>>>
>>>> It's more easily to understand. 
>>>>
>>> I can define the macro in this way.
>>>
>>>>>  #define APIC_REGS_MASK(first, count) \
>>>>>  	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
>>>>>  
>>>>> +#define APIC_LAST_REG_OFFSET		0x3f0
>>>>> +#define APIC_EXT_LAST_REG_OFFSET	0x530
>>>>> +
>>>>>  u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
>>>>>  {
>>>>>  	/* Leave bits '0' for reserved and write-only registers. */
>>>>> @@ -1668,6 +1672,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
>>>>>  static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>>>  			      void *data)
>>>>>  {
>>>>> +	u64 valid_reg_ext_mask = 0;
>>>>> +	unsigned int last_reg = APIC_LAST_REG_OFFSET;
>>>>>  	unsigned char alignment = offset & 0xf;
>>>>>  	u32 result;
>>>>>  
>>>>> @@ -1677,13 +1683,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>>>>  	 */
>>>>>  	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
>>>>>  
>>>>> +	/*
>>>>> +	 * The local interrupts are extended to include LVT registers to allow
>>>>> +	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
>>>>> +	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
>>>>> +	 */
>>>>> +	if (guest_cpu_cap_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
>>>>> +		valid_reg_ext_mask =
>>>>> +			APIC_REG_EXT_MASK(APIC_EFEAT) |
>>>>> +			APIC_REG_EXT_MASK(APIC_ECTRL) |
>>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
>>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
>>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
>>>>> +			APIC_REG_EXT_MASK(APIC_EILVTn(3));
>>>>> +		last_reg = APIC_EXT_LAST_REG_OFFSET;
>>>>> +	}
>>>> Why not move this code piece into kvm_lapic_readable_reg_mask() and
>>>> directly use APIC_REG_MASK() for these extended regs? Then we don't need to
>>>> modify the below code. 
>> I still think we should get a unified APIC reg mask even for the extended
>> APIC with kvm_lapic_readable_reg_mask() helper. We can extend current
>> kvm_lapic_readable_reg_mask() and let it return a 128 bits bitmap, maybe
>> like this,
>>
>> void kvm_lapic_readable_reg_mask(struct kvm_lapic *apic, u64 *mask)
>>
>> This makes code more easily maintain. 
>>
>>
> Sorry for the delay.
>
> The reason why I am wary of this approach is because
> kvm_lapic_readable_reg_mask() is currently being used in
> vmx_update_msr_bitmap_x2apic(), where we directly use its return value:
>
>     if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
>         msr_bitmap[read_idx] =
> ~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
>     else
>         msr_bitmap[read_idx] = ~0ull;
>     msr_bitmap[write_idx] = ~0ull;
>
> Where msr_bitmap is a u64 array.
>
> Changing kvm_lapic_readable_reg_mask() to return a 128-bit mask would
> require changes in vmx_update_msr_bitmap_x2apic() too.

Yes, I know. IMO, it's worth to do it. 


>
> - Manali
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-08-05  1:11 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 16:25 [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla
2025-06-27 16:25 ` [PATCH v1 01/11] perf/amd/ibs: Fix race condition in IBS Manali Shukla
2025-06-27 16:25 ` [PATCH v1 02/11] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
2025-07-15  2:21   ` Mi, Dapeng
2025-07-16  7:45     ` Manali Shukla
2025-06-27 16:25 ` [PATCH v1 03/11] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
2025-06-27 16:25 ` [PATCH v1 04/11] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
2025-07-15  2:58   ` Mi, Dapeng
2025-07-16 10:10     ` Manali Shukla
2025-07-17  2:02       ` Mi, Dapeng
2025-08-01  9:33         ` Manali Shukla
2025-08-05  1:10           ` Mi, Dapeng
2025-06-27 16:25 ` [PATCH v1 05/11] x86/cpufeatures: Add CPUID feature bit for VIBS in SVM/SEV guests Manali Shukla
2025-06-27 16:25 ` [PATCH v1 06/11] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
2025-06-27 16:25 ` [PATCH v1 07/11] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
2025-06-27 16:25 ` [PATCH v1 08/11] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
2025-07-15  3:13   ` Mi, Dapeng
2025-07-16  7:40     ` Manali Shukla
2025-06-27 16:25 ` [PATCH v1 09/11] KVM: SVM: Add support for IBS Virtualization Manali Shukla
2025-06-27 16:25 ` [PATCH v1 10/11] perf/x86/amd: Enable VPMU passthrough capability for IBS PMU Manali Shukla
2025-06-27 16:25 ` [PATCH v1 11/11] perf/x86/amd: Remove exclude_guest check from perf_ibs_init() Manali Shukla
2025-07-14 11:51 ` [PATCH v1 00/11] Implement support for IBS virtualization Manali Shukla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).