[PATCH v7 00/20] ARM64 PMU Partitioning

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v7 00/20] ARM64 PMU Partitioning
@ 2026-05-04 21:17 Colton Lewis
  2026-05-04 21:17 ` [PATCH v7 01/20] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
                   ` (20 more replies)
  0 siblings, 21 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

This series creates a new PMU scheme on ARM, a partitioned PMU that
allows reserving a subset of counters for more direct guest access,
significantly reducing overhead. More details, including performance
benchmarks, can be read in the v1 cover letter linked below.

An overview of what this series accomplishes was presented at KVM
Forum 2025. Slides [1] and video [2] are linked below.

After a few false starts, meeting with Will Deacon and Mark Rutland to
discuss implementation ideas, and a few more false starts, I finally
have an implementation of dynamic counter reservation that works
without disrupting host perf too much. Now the host only loses access
to the guest counters when a vCPU resides on the CPU.

The key was creating perf_pmu_resched_update, which behaves exactly
like perf_pmu_resched except it takes a callback to call in between
when the perf events are scheduled out and when they are scheduled
back in. That allows us to update the PMU's available counters when we
know they are not currently in use without needing to expose private
perf core functions and triple check they are not being called in a
way that violates existing assumptions.

Because this introduces a possibility of perf reschedule during vCPU
load, I've optimized to only do that operation if there are host
events occupying the intended guest counters at the time of the load.

The kernel command line parameter for the driver still exists, but now
only defines an upper limit of counters the guest might use rather
than taking those counters from the host permanently.

v7:

* Implement dynamic counter reservation as described above. One side
  effect is the PMUv3 driver now needs much fewer changes to enforce
  the boundary.

* Move register accesses out of fast path for non-FGT hardware. The
  performance impact was negligible and this moves bloat out of the
  fast path and allows a more reliable design with more code sharing.

* Make PMCCNTR a special case in the context swap again because trying
  to access it with PMXEVCNTR is undefined.

* Fix a bug where kvm_pmu_guest_counter_mask was using & instead of |.

* Re-expose the dedicated instruction counter to the host since it was
  decided the guest will not own it.

* Change the global armv8pmu_reserved_host_counters to
  armv8pmu_is_partitoned because it was only used in boolean checks.

* Fix typo in vcpu attribute commit so the spelling of the flag in the
  commit message matches the code.

* Rebase to v7.0-rc7

v6:
https://lore.kernel.org/kvmarm/20260209221414.2169465-1-coltonlewis@google.com/

v5:
https://lore.kernel.org/kvmarm/20251209205121.1871534-1-coltonlewis@google.com/

v4:
https://lore.kernel.org/kvmarm/20250714225917.1396543-1-coltonlewis@google.com/

v3:
https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/

v2:
https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/

v1:
https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/

[1] https://gitlab.com/qemu-project/kvm-forum/-/raw/main/_attachments/2025/Optimizing__itvHkhc.pdf
[2] https://www.youtube.com/watch?v=YRzZ8jMIA6M&list=PLW3ep1uCIRfxwmllXTOA2txfDWN6vUOHp&index=9

Colton Lewis (19):
  arm64: cpufeature: Add cpucap for HPMN0
  KVM: arm64: Reorganize PMU functions
  perf: arm_pmuv3: Generalize counter bitmasks
  perf: arm_pmuv3: Check cntr_mask before using pmccntr
  perf: arm_pmuv3: Add method to partition the PMU
  KVM: arm64: Set up FGT for Partitioned PMU
  KVM: arm64: Add Partitioned PMU register trap handlers
  KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
  KVM: arm64: Context swap Partitioned PMU guest registers
  KVM: arm64: Enforce PMU event filter at vcpu_load()
  perf: Add perf_pmu_resched_update()
  KVM: arm64: Apply dynamic guest counter reservations
  KVM: arm64: Implement lazy PMU context swaps
  perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
  KVM: arm64: Detect overflows for the Partitioned PMU
  KVM: arm64: Add vCPU device attr to partition the PMU
  KVM: selftests: Add find_bit to KVM library
  KVM: arm64: selftests: Add test case for Partitioned PMU
  KVM: arm64: selftests: Relax testing for exceptions when partitioned

Marc Zyngier (1):
  KVM: arm64: Reorganize PMU includes

 arch/arm/include/asm/arm_pmuv3.h              |  18 +
 arch/arm64/include/asm/arm_pmuv3.h            |  12 +-
 arch/arm64/include/asm/kvm_host.h             |  17 +-
 arch/arm64/include/asm/kvm_types.h            |   6 +-
 arch/arm64/include/uapi/asm/kvm.h             |   2 +
 arch/arm64/kernel/cpufeature.c                |   8 +
 arch/arm64/kvm/Makefile                       |   2 +-
 arch/arm64/kvm/arm.c                          |   2 +
 arch/arm64/kvm/config.c                       |  41 +-
 arch/arm64/kvm/debug.c                        |  31 +-
 arch/arm64/kvm/pmu-direct.c                   | 494 ++++++++++++
 arch/arm64/kvm/pmu-emul.c                     | 674 +----------------
 arch/arm64/kvm/pmu.c                          | 701 ++++++++++++++++++
 arch/arm64/kvm/sys_regs.c                     | 250 ++++++-
 arch/arm64/tools/cpucaps                      |   1 +
 arch/arm64/tools/sysreg                       |   6 +-
 drivers/perf/arm_pmuv3.c                      | 111 ++-
 include/kvm/arm_pmu.h                         | 110 +++
 include/linux/perf/arm_pmu.h                  |   3 +
 include/linux/perf/arm_pmuv3.h                |  14 +-
 include/linux/perf_event.h                    |   3 +
 kernel/events/core.c                          |  28 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/arm64/vpmu_counter_access.c | 112 ++-
 tools/testing/selftests/kvm/lib/find_bit.c    |   1 +
 25 files changed, 1861 insertions(+), 787 deletions(-)
 create mode 100644 arch/arm64/kvm/pmu-direct.c
 create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c

base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
--
2.54.0.545.g6539524ca2-goog

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 01/20] arm64: cpufeature: Add cpucap for HPMN0
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 21:17 ` [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes Colton Lewis
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Add a capability for FEAT_HPMN0, whether MDCR_EL2.HPMN can specify 0
counters reserved for the guest.

This required changing HPMN0 to an UnsignedEnum in tools/sysreg
because otherwise not all the appropriate macros are generated to add
it to arm64_cpu_capabilities_arm64_features.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kernel/cpufeature.c | 8 ++++++++
 arch/arm64/kvm/sys_regs.c      | 3 ++-
 arch/arm64/tools/cpucaps       | 1 +
 arch/arm64/tools/sysreg        | 6 +++---
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 32c2dbcc0c641..5c6c76a9696cc 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -556,6 +556,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_HPMN0_SHIFT, 4, 0),
 	S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
@@ -2964,6 +2965,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, FGT, FGT2)
 	},
+	{
+		.desc = "HPMN0",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_HAS_HPMN0,
+		.matches = has_cpuid_feature,
+		ARM64_CPUID_FIELDS(ID_AA64DFR0_EL1, HPMN0, IMP)
+	},
 #ifdef CONFIG_ARM64_SME
 	{
 		.desc = "Scalable Matrix Extension",
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1b4cacb6e918a..0a8e8ee69cd00 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3232,7 +3232,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 		    ID_AA64DFR0_EL1_DoubleLock_MASK |
 		    ID_AA64DFR0_EL1_WRPs_MASK |
 		    ID_AA64DFR0_EL1_PMUVer_MASK |
-		    ID_AA64DFR0_EL1_DebugVer_MASK),
+		    ID_AA64DFR0_EL1_DebugVer_MASK |
+		    ID_AA64DFR0_EL1_HPMN0_MASK),
 	ID_SANITISED(ID_AA64DFR1_EL1),
 	ID_UNALLOCATED(5,2),
 	ID_UNALLOCATED(5,3),
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 7261553b644b2..654b165781854 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -42,6 +42,7 @@ HAS_GIC_PRIO_MASKING
 HAS_GIC_PRIO_RELAXED_SYNC
 HAS_ICH_HCR_EL2_TDIR
 HAS_HCR_NV1
+HAS_HPMN0
 HAS_HCX
 HAS_LDAPR
 HAS_LPA2
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 9d1c211080571..92135f8834be0 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1666,9 +1666,9 @@ EndEnum
 EndSysreg
 
 Sysreg	ID_AA64DFR0_EL1	3	0	0	5	0
-Enum	63:60	HPMN0
-	0b0000	UNPREDICTABLE
-	0b0001	DEF
+UnsignedEnum	63:60	HPMN0
+	0b0000	NI
+	0b0001	IMP
 EndEnum
 UnsignedEnum	59:56	ExtTrcBuff
 	0b0000	NI
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
  2026-05-04 21:17 ` [PATCH v7 01/20] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 21:44   ` sashiko-bot
  2026-05-04 21:17 ` [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions Colton Lewis
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

From: Marc Zyngier <maz@kernel.org>

Including *all* of asm/kvm_host.h in asm/arm_pmuv3.h is a bad idea
because that is much more than arm_pmuv3.h logically needs and creates
a circular dependency that makes it easy to introduce compiler errors
when editing this code.

asm/kvm_host.h includes kvm/arm_pmu.h includes perf/arm_pmuv3.h
includes asm/arm_pmuv3.h includes asm/kvm_host.h

Reorganize the PMU includes to be more sane. In particular:

* Remove the circular dependency by removing the kvm_host.h include
  from asm/arm_pmuv3.h since 99% of it isn't needed.

* Move the remaining tiny bit of KVM/PMU interface from kvm_host.h
  into arm_pmu.h

* Conditionally on ARM64, include the more targeted arm_pmu.h directly
  in the arm_pmuv3.c driver.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/include/asm/arm_pmuv3.h |  2 --
 arch/arm64/include/asm/kvm_host.h  | 14 --------------
 drivers/perf/arm_pmuv3.c           |  5 +++++
 include/kvm/arm_pmu.h              | 19 +++++++++++++++++++
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 8a777dec8d88a..cf2b2212e00a2 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -6,8 +6,6 @@
 #ifndef __ASM_PMUV3_H
 #define __ASM_PMUV3_H
 
-#include <asm/kvm_host.h>
-
 #include <asm/cpufeature.h>
 #include <asm/sysreg.h>
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70cb9cfd760a3..1f789ba589d56 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1441,25 +1441,11 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
 
-static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
-{
-	return (!has_vhe() && attr->exclude_host);
-}
-
 #ifdef CONFIG_KVM
-void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
-void kvm_clr_pmu_events(u64 clr);
-bool kvm_set_pmuserenr(u64 val);
 void kvm_enable_trbe(void);
 void kvm_disable_trbe(void);
 void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest);
 #else
-static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
-static inline void kvm_clr_pmu_events(u64 clr) {}
-static inline bool kvm_set_pmuserenr(u64 val)
-{
-	return false;
-}
 static inline void kvm_enable_trbe(void) {}
 static inline void kvm_disable_trbe(void) {}
 static inline void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest) {}
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8014ff766cff5..8d3b832cd633a 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -9,6 +9,11 @@
  */
 
 #include <asm/irq_regs.h>
+
+#if defined(CONFIG_ARM64)
+#include <kvm/arm_pmu.h>
+#endif
+
 #include <asm/perf_event.h>
 #include <asm/virt.h>
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 96754b51b4116..e91d15a7a564b 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -9,9 +9,19 @@
 
 #include <linux/perf_event.h>
 #include <linux/perf/arm_pmuv3.h>
+#include <linux/perf/arm_pmu.h>
 
 #define KVM_ARMV8_PMU_MAX_COUNTERS	32
 
+#define kvm_pmu_counter_deferred(attr)			\
+	({						\
+		!has_vhe() && (attr)->exclude_host;	\
+	})
+
+struct kvm;
+struct kvm_device_attr;
+struct kvm_vcpu;
+
 #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
 struct kvm_pmc {
 	u8 idx;	/* index into the pmu->pmc array */
@@ -66,6 +76,9 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu);
 
 struct kvm_pmu_events *kvm_get_pmu_events(void);
+void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
+void kvm_clr_pmu_events(u64 clr);
+bool kvm_set_pmuserenr(u64 val);
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_resync_el0(void);
@@ -159,6 +172,12 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
 
 #define kvm_vcpu_has_pmu(vcpu)		({ false; })
 static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {}
+static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
+static inline void kvm_clr_pmu_events(u64 clr) {}
+static inline bool kvm_set_pmuserenr(u64 val)
+{
+	return false;
+}
 static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {}
 static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {}
 static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes
  2026-05-04 21:17 ` [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes Colton Lewis
@ 2026-05-04 21:44   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 21:44 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] `include/kvm/arm_pmu.h` is no longer self-contained because it relies on `has_vhe()` without including `<asm/virt.h>`.
--

commit 016ed8cf72a663237c0c6836305a2edc3ef74982
Author: Marc Zyngier <maz@kernel.org>

KVM: arm64: Reorganize PMU includes

This commit reorganizes the PMU includes to resolve a circular dependency
between asm/kvm_host.h and asm/arm_pmuv3.h. It removes the inclusion of
kvm_host.h from arm_pmuv3.h and moves the remaining KVM/PMU interfaces into
arm_pmu.h.

[ ... ]

> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 96754b51b4116..e91d15a7a564b 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -9,9 +9,19 @@
>  
>  #include <linux/perf_event.h>
>  #include <linux/perf/arm_pmuv3.h>
> +#include <linux/perf/arm_pmu.h>
>  
>  #define KVM_ARMV8_PMU_MAX_COUNTERS	32
>  
> +#define kvm_pmu_counter_deferred(attr)			\
> +	({						\
> +		!has_vhe() && (attr)->exclude_host;	\
> +	})

This isn't a bug, but by converting kvm_pmu_counter_deferred() from a static
inline function to a macro, do we lose type checking on the attr argument?

Additionally, this macro relies on has_vhe(), but asm/virt.h is not included
in this header. Does this leave the header no longer self-contained, relying
on callers to include asm/virt.h before using this macro?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=2

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
  2026-05-04 21:17 ` [PATCH v7 01/20] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
  2026-05-04 21:17 ` [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 22:02   ` sashiko-bot
  2026-05-04 21:17 ` [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

A lot of functions in pmu-emul.c aren't specific to the emulated PMU
implementation. Move them to the more appropriate pmu.c file where
shared PMU functions should live.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/pmu-emul.c | 672 +------------------------------------
 arch/arm64/kvm/pmu.c      | 676 ++++++++++++++++++++++++++++++++++++++
 include/kvm/arm_pmu.h     |   7 +
 3 files changed, 684 insertions(+), 671 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 93cc9bbb5cecd..a40db0d5120ff 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -17,19 +17,10 @@
 
 #define PERF_ATTR_CFG1_COUNTER_64BIT	BIT(0)
 
-static LIST_HEAD(arm_pmus);
-static DEFINE_MUTEX(arm_pmus_lock);
-
 static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc);
 static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc);
 static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
 
-bool kvm_supports_guest_pmuv3(void)
-{
-	guard(mutex)(&arm_pmus_lock);
-	return !list_empty(&arm_pmus);
-}
-
 static struct kvm_vcpu *kvm_pmc_to_vcpu(const struct kvm_pmc *pmc)
 {
 	return container_of(pmc, struct kvm_vcpu, arch.pmu.pmc[pmc->idx]);
@@ -40,46 +31,6 @@ static struct kvm_pmc *kvm_vcpu_idx_to_pmc(struct kvm_vcpu *vcpu, int cnt_idx)
 	return &vcpu->arch.pmu.pmc[cnt_idx];
 }
 
-static u32 __kvm_pmu_event_mask(unsigned int pmuver)
-{
-	switch (pmuver) {
-	case ID_AA64DFR0_EL1_PMUVer_IMP:
-		return GENMASK(9, 0);
-	case ID_AA64DFR0_EL1_PMUVer_V3P1:
-	case ID_AA64DFR0_EL1_PMUVer_V3P4:
-	case ID_AA64DFR0_EL1_PMUVer_V3P5:
-	case ID_AA64DFR0_EL1_PMUVer_V3P7:
-		return GENMASK(15, 0);
-	default:		/* Shouldn't be here, just for sanity */
-		WARN_ONCE(1, "Unknown PMU version %d\n", pmuver);
-		return 0;
-	}
-}
-
-static u32 kvm_pmu_event_mask(struct kvm *kvm)
-{
-	u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1);
-	u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0);
-
-	return __kvm_pmu_event_mask(pmuver);
-}
-
-u64 kvm_pmu_evtyper_mask(struct kvm *kvm)
-{
-	u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 |
-		   kvm_pmu_event_mask(kvm);
-
-	if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP))
-		mask |= ARMV8_PMU_INCLUDE_EL2;
-
-	if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP))
-		mask |= ARMV8_PMU_EXCLUDE_NS_EL0 |
-			ARMV8_PMU_EXCLUDE_NS_EL1 |
-			ARMV8_PMU_EXCLUDE_EL3;
-
-	return mask;
-}
-
 /**
  * kvm_pmc_is_64bit - determine if counter is 64bit
  * @pmc: counter context
@@ -272,59 +223,6 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
 	irq_work_sync(&vcpu->arch.pmu.overflow_work);
 }
 
-static u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu)
-{
-	unsigned int hpmn, n;
-
-	if (!vcpu_has_nv(vcpu))
-		return 0;
-
-	hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
-	n = vcpu->kvm->arch.nr_pmu_counters;
-
-	/*
-	 * Programming HPMN to a value greater than PMCR_EL0.N is
-	 * CONSTRAINED UNPREDICTABLE. Make the implementation choice that an
-	 * UNKNOWN number of counters (in our case, zero) are reserved for EL2.
-	 */
-	if (hpmn >= n)
-		return 0;
-
-	/*
-	 * Programming HPMN=0 is CONSTRAINED UNPREDICTABLE if FEAT_HPMN0 isn't
-	 * implemented. Since KVM's ability to emulate HPMN=0 does not directly
-	 * depend on hardware (all PMU registers are trapped), make the
-	 * implementation choice that all counters are included in the second
-	 * range reserved for EL2/EL3.
-	 */
-	return GENMASK(n - 1, hpmn);
-}
-
-bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx)
-{
-	return kvm_pmu_hyp_counter_mask(vcpu) & BIT(idx);
-}
-
-u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu)
-{
-	u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
-
-	if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu))
-		return mask;
-
-	return mask & ~kvm_pmu_hyp_counter_mask(vcpu);
-}
-
-u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu)
-{
-	u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu));
-
-	if (val == 0)
-		return BIT(ARMV8_PMU_CYCLE_IDX);
-	else
-		return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX);
-}
-
 static void kvm_pmc_enable_perf_event(struct kvm_pmc *pmc)
 {
 	if (!pmc->perf_event) {
@@ -370,7 +268,7 @@ void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val)
  * counter where the values of the global enable control, PMOVSSET_EL0[n], and
  * PMINTENSET_EL1[n] are all 1.
  */
-static bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
+bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 {
 	u64 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
 
@@ -393,24 +291,6 @@ static bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 	return reg;
 }
 
-static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm_pmu *pmu = &vcpu->arch.pmu;
-	bool overflow;
-
-	overflow = kvm_pmu_overflow_status(vcpu);
-	if (pmu->irq_level == overflow)
-		return;
-
-	pmu->irq_level = overflow;
-
-	if (likely(irqchip_in_kernel(vcpu->kvm))) {
-		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu,
-					      pmu->irq_num, overflow, pmu);
-		WARN_ON(ret);
-	}
-}
-
 bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
@@ -436,43 +316,6 @@ void kvm_pmu_update_run(struct kvm_vcpu *vcpu)
 		regs->device_irq_level |= KVM_ARM_DEV_PMU;
 }
 
-/**
- * kvm_pmu_flush_hwstate - flush pmu state to cpu
- * @vcpu: The vcpu pointer
- *
- * Check if the PMU has overflowed while we were running in the host, and inject
- * an interrupt if that was the case.
- */
-void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
-{
-	kvm_pmu_update_state(vcpu);
-}
-
-/**
- * kvm_pmu_sync_hwstate - sync pmu state from cpu
- * @vcpu: The vcpu pointer
- *
- * Check if the PMU has overflowed while we were running in the guest, and
- * inject an interrupt if that was the case.
- */
-void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
-{
-	kvm_pmu_update_state(vcpu);
-}
-
-/*
- * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding
- * to the event.
- * This is why we need a callback to do it once outside of the NMI context.
- */
-static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work)
-{
-	struct kvm_vcpu *vcpu;
-
-	vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work);
-	kvm_vcpu_kick(vcpu);
-}
-
 /*
  * Perform an increment on any of the counters described in @mask,
  * generating the overflow if required, and propagate it as a chained
@@ -784,132 +627,6 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 	kvm_pmu_create_perf_event(pmc);
 }
 
-void kvm_host_pmu_init(struct arm_pmu *pmu)
-{
-	struct arm_pmu_entry *entry;
-
-	/*
-	 * Check the sanitised PMU version for the system, as KVM does not
-	 * support implementations where PMUv3 exists on a subset of CPUs.
-	 */
-	if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit()))
-		return;
-
-	guard(mutex)(&arm_pmus_lock);
-
-	entry = kmalloc_obj(*entry);
-	if (!entry)
-		return;
-
-	entry->arm_pmu = pmu;
-	list_add_tail(&entry->entry, &arm_pmus);
-}
-
-static struct arm_pmu *kvm_pmu_probe_armpmu(void)
-{
-	struct arm_pmu_entry *entry;
-	struct arm_pmu *pmu;
-	int cpu;
-
-	guard(mutex)(&arm_pmus_lock);
-
-	/*
-	 * It is safe to use a stale cpu to iterate the list of PMUs so long as
-	 * the same value is used for the entirety of the loop. Given this, and
-	 * the fact that no percpu data is used for the lookup there is no need
-	 * to disable preemption.
-	 *
-	 * It is still necessary to get a valid cpu, though, to probe for the
-	 * default PMU instance as userspace is not required to specify a PMU
-	 * type. In order to uphold the preexisting behavior KVM selects the
-	 * PMU instance for the core during vcpu init. A dependent use
-	 * case would be a user with disdain of all things big.LITTLE that
-	 * affines the VMM to a particular cluster of cores.
-	 *
-	 * In any case, userspace should just do the sane thing and use the UAPI
-	 * to select a PMU type directly. But, be wary of the baggage being
-	 * carried here.
-	 */
-	cpu = raw_smp_processor_id();
-	list_for_each_entry(entry, &arm_pmus, entry) {
-		pmu = entry->arm_pmu;
-
-		if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
-			return pmu;
-	}
-
-	return NULL;
-}
-
-static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1)
-{
-	u32 hi[2], lo[2];
-
-	bitmap_to_arr32(lo, pmu->pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-	bitmap_to_arr32(hi, pmu->pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-
-	return ((u64)hi[pmceid1] << 32) | lo[pmceid1];
-}
-
-static u64 compute_pmceid0(struct arm_pmu *pmu)
-{
-	u64 val = __compute_pmceid(pmu, 0);
-
-	/* always support SW_INCR */
-	val |= BIT(ARMV8_PMUV3_PERFCTR_SW_INCR);
-	/* always support CHAIN */
-	val |= BIT(ARMV8_PMUV3_PERFCTR_CHAIN);
-	return val;
-}
-
-static u64 compute_pmceid1(struct arm_pmu *pmu)
-{
-	u64 val = __compute_pmceid(pmu, 1);
-
-	/*
-	 * Don't advertise STALL_SLOT*, as PMMIR_EL0 is handled
-	 * as RAZ
-	 */
-	val &= ~(BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32) |
-		 BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND - 32) |
-		 BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND - 32));
-	return val;
-}
-
-u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
-{
-	struct arm_pmu *cpu_pmu = vcpu->kvm->arch.arm_pmu;
-	unsigned long *bmap = vcpu->kvm->arch.pmu_filter;
-	u64 val, mask = 0;
-	int base, i, nr_events;
-
-	if (!pmceid1) {
-		val = compute_pmceid0(cpu_pmu);
-		base = 0;
-	} else {
-		val = compute_pmceid1(cpu_pmu);
-		base = 32;
-	}
-
-	if (!bmap)
-		return val;
-
-	nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1;
-
-	for (i = 0; i < 32; i += 8) {
-		u64 byte;
-
-		byte = bitmap_get_value8(bmap, base + i);
-		mask |= byte << i;
-		if (nr_events >= (0x4000 + base + 32)) {
-			byte = bitmap_get_value8(bmap, 0x4000 + base + i);
-			mask |= byte << (32 + i);
-		}
-	}
-
-	return val & mask;
-}
-
 void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu)
 {
 	u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
@@ -921,393 +638,6 @@ void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu)
 	kvm_pmu_reprogram_counter_mask(vcpu, mask);
 }
 
-int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
-{
-	if (!vcpu->arch.pmu.created)
-		return -EINVAL;
-
-	/*
-	 * A valid interrupt configuration for the PMU is either to have a
-	 * properly configured interrupt number and using an in-kernel
-	 * irqchip, or to not have an in-kernel GIC and not set an IRQ.
-	 */
-	if (irqchip_in_kernel(vcpu->kvm)) {
-		int irq = vcpu->arch.pmu.irq_num;
-		/*
-		 * If we are using an in-kernel vgic, at this point we know
-		 * the vgic will be initialized, so we can check the PMU irq
-		 * number against the dimensions of the vgic and make sure
-		 * it's valid.
-		 */
-		if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq))
-			return -EINVAL;
-	} else if (kvm_arm_pmu_irq_initialized(vcpu)) {
-		   return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
-{
-	if (irqchip_in_kernel(vcpu->kvm)) {
-		int ret;
-
-		/*
-		 * If using the PMU with an in-kernel virtual GIC
-		 * implementation, we require the GIC to be already
-		 * initialized when initializing the PMU.
-		 */
-		if (!vgic_initialized(vcpu->kvm))
-			return -ENODEV;
-
-		if (!kvm_arm_pmu_irq_initialized(vcpu))
-			return -ENXIO;
-
-		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num,
-					 &vcpu->arch.pmu);
-		if (ret)
-			return ret;
-	}
-
-	init_irq_work(&vcpu->arch.pmu.overflow_work,
-		      kvm_pmu_perf_overflow_notify_vcpu);
-
-	vcpu->arch.pmu.created = true;
-	return 0;
-}
-
-/*
- * For one VM the interrupt type must be same for each vcpu.
- * As a PPI, the interrupt number is the same for all vcpus,
- * while as an SPI it must be a separate number per vcpu.
- */
-static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
-{
-	unsigned long i;
-	struct kvm_vcpu *vcpu;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!kvm_arm_pmu_irq_initialized(vcpu))
-			continue;
-
-		if (irq_is_ppi(irq)) {
-			if (vcpu->arch.pmu.irq_num != irq)
-				return false;
-		} else {
-			if (vcpu->arch.pmu.irq_num == irq)
-				return false;
-		}
-	}
-
-	return true;
-}
-
-/**
- * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters.
- * @kvm: The kvm pointer
- */
-u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
-{
-	struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
-
-	/*
-	 * PMUv3 requires that all event counters are capable of counting any
-	 * event, though the same may not be true of non-PMUv3 hardware.
-	 */
-	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
-		return 1;
-
-	/*
-	 * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
-	 * Ignore those and return only the general-purpose counters.
-	 */
-	return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS);
-}
-
-static void kvm_arm_set_nr_counters(struct kvm *kvm, unsigned int nr)
-{
-	kvm->arch.nr_pmu_counters = nr;
-
-	/* Reset MDCR_EL2.HPMN behind the vcpus' back... */
-	if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) {
-		struct kvm_vcpu *vcpu;
-		unsigned long i;
-
-		kvm_for_each_vcpu(i, vcpu, kvm) {
-			u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2);
-			val &= ~MDCR_EL2_HPMN;
-			val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.nr_pmu_counters);
-			__vcpu_assign_sys_reg(vcpu, MDCR_EL2, val);
-		}
-	}
-}
-
-static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
-{
-	lockdep_assert_held(&kvm->arch.config_lock);
-
-	kvm->arch.arm_pmu = arm_pmu;
-	kvm_arm_set_nr_counters(kvm, kvm_arm_pmu_get_max_counters(kvm));
-}
-
-/**
- * kvm_arm_set_default_pmu - No PMU set, get the default one.
- * @kvm: The kvm pointer
- *
- * The observant among you will notice that the supported_cpus
- * mask does not get updated for the default PMU even though it
- * is quite possible the selected instance supports only a
- * subset of cores in the system. This is intentional, and
- * upholds the preexisting behavior on heterogeneous systems
- * where vCPUs can be scheduled on any core but the guest
- * counters could stop working.
- */
-int kvm_arm_set_default_pmu(struct kvm *kvm)
-{
-	struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu();
-
-	if (!arm_pmu)
-		return -ENODEV;
-
-	kvm_arm_set_pmu(kvm, arm_pmu);
-	return 0;
-}
-
-static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
-{
-	struct kvm *kvm = vcpu->kvm;
-	struct arm_pmu_entry *entry;
-	struct arm_pmu *arm_pmu;
-	int ret = -ENXIO;
-
-	lockdep_assert_held(&kvm->arch.config_lock);
-	mutex_lock(&arm_pmus_lock);
-
-	list_for_each_entry(entry, &arm_pmus, entry) {
-		arm_pmu = entry->arm_pmu;
-		if (arm_pmu->pmu.type == pmu_id) {
-			if (kvm_vm_has_ran_once(kvm) ||
-			    (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
-				ret = -EBUSY;
-				break;
-			}
-
-			kvm_arm_set_pmu(kvm, arm_pmu);
-			cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus);
-			ret = 0;
-			break;
-		}
-	}
-
-	mutex_unlock(&arm_pmus_lock);
-	return ret;
-}
-
-static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n)
-{
-	struct kvm *kvm = vcpu->kvm;
-
-	if (!kvm->arch.arm_pmu)
-		return -EINVAL;
-
-	if (n > kvm_arm_pmu_get_max_counters(kvm))
-		return -EINVAL;
-
-	kvm_arm_set_nr_counters(kvm, n);
-	return 0;
-}
-
-int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
-{
-	struct kvm *kvm = vcpu->kvm;
-
-	lockdep_assert_held(&kvm->arch.config_lock);
-
-	if (!kvm_vcpu_has_pmu(vcpu))
-		return -ENODEV;
-
-	if (vcpu->arch.pmu.created)
-		return -EBUSY;
-
-	switch (attr->attr) {
-	case KVM_ARM_VCPU_PMU_V3_IRQ: {
-		int __user *uaddr = (int __user *)(long)attr->addr;
-		int irq;
-
-		if (!irqchip_in_kernel(kvm))
-			return -EINVAL;
-
-		if (get_user(irq, uaddr))
-			return -EFAULT;
-
-		/* The PMU overflow interrupt can be a PPI or a valid SPI. */
-		if (!(irq_is_ppi(irq) || irq_is_spi(irq)))
-			return -EINVAL;
-
-		if (!pmu_irq_is_valid(kvm, irq))
-			return -EINVAL;
-
-		if (kvm_arm_pmu_irq_initialized(vcpu))
-			return -EBUSY;
-
-		kvm_debug("Set kvm ARM PMU irq: %d\n", irq);
-		vcpu->arch.pmu.irq_num = irq;
-		return 0;
-	}
-	case KVM_ARM_VCPU_PMU_V3_FILTER: {
-		u8 pmuver = kvm_arm_pmu_get_pmuver_limit();
-		struct kvm_pmu_event_filter __user *uaddr;
-		struct kvm_pmu_event_filter filter;
-		int nr_events;
-
-		/*
-		 * Allow userspace to specify an event filter for the entire
-		 * event range supported by PMUVer of the hardware, rather
-		 * than the guest's PMUVer for KVM backward compatibility.
-		 */
-		nr_events = __kvm_pmu_event_mask(pmuver) + 1;
-
-		uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
-
-		if (copy_from_user(&filter, uaddr, sizeof(filter)))
-			return -EFAULT;
-
-		if (((u32)filter.base_event + filter.nevents) > nr_events ||
-		    (filter.action != KVM_PMU_EVENT_ALLOW &&
-		     filter.action != KVM_PMU_EVENT_DENY))
-			return -EINVAL;
-
-		if (kvm_vm_has_ran_once(kvm))
-			return -EBUSY;
-
-		if (!kvm->arch.pmu_filter) {
-			kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
-			if (!kvm->arch.pmu_filter)
-				return -ENOMEM;
-
-			/*
-			 * The default depends on the first applied filter.
-			 * If it allows events, the default is to deny.
-			 * Conversely, if the first filter denies a set of
-			 * events, the default is to allow.
-			 */
-			if (filter.action == KVM_PMU_EVENT_ALLOW)
-				bitmap_zero(kvm->arch.pmu_filter, nr_events);
-			else
-				bitmap_fill(kvm->arch.pmu_filter, nr_events);
-		}
-
-		if (filter.action == KVM_PMU_EVENT_ALLOW)
-			bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
-		else
-			bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
-
-		return 0;
-	}
-	case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
-		int __user *uaddr = (int __user *)(long)attr->addr;
-		int pmu_id;
-
-		if (get_user(pmu_id, uaddr))
-			return -EFAULT;
-
-		return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id);
-	}
-	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: {
-		unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr;
-		unsigned int n;
-
-		if (get_user(n, uaddr))
-			return -EFAULT;
-
-		return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
-	}
-	case KVM_ARM_VCPU_PMU_V3_INIT:
-		return kvm_arm_pmu_v3_init(vcpu);
-	}
-
-	return -ENXIO;
-}
-
-int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
-{
-	switch (attr->attr) {
-	case KVM_ARM_VCPU_PMU_V3_IRQ: {
-		int __user *uaddr = (int __user *)(long)attr->addr;
-		int irq;
-
-		if (!irqchip_in_kernel(vcpu->kvm))
-			return -EINVAL;
-
-		if (!kvm_vcpu_has_pmu(vcpu))
-			return -ENODEV;
-
-		if (!kvm_arm_pmu_irq_initialized(vcpu))
-			return -ENXIO;
-
-		irq = vcpu->arch.pmu.irq_num;
-		return put_user(irq, uaddr);
-	}
-	}
-
-	return -ENXIO;
-}
-
-int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
-{
-	switch (attr->attr) {
-	case KVM_ARM_VCPU_PMU_V3_IRQ:
-	case KVM_ARM_VCPU_PMU_V3_INIT:
-	case KVM_ARM_VCPU_PMU_V3_FILTER:
-	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
-	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
-		if (kvm_vcpu_has_pmu(vcpu))
-			return 0;
-	}
-
-	return -ENXIO;
-}
-
-u8 kvm_arm_pmu_get_pmuver_limit(void)
-{
-	unsigned int pmuver;
-
-	pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer,
-			       read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1));
-
-	/*
-	 * Spoof a barebones PMUv3 implementation if the system supports IMPDEF
-	 * traps of the PMUv3 sysregs
-	 */
-	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
-		return ID_AA64DFR0_EL1_PMUVer_IMP;
-
-	/*
-	 * Otherwise, treat IMPLEMENTATION DEFINED functionality as
-	 * unimplemented
-	 */
-	if (pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF)
-		return 0;
-
-	return min(pmuver, ID_AA64DFR0_EL1_PMUVer_V3P5);
-}
-
-/**
- * kvm_vcpu_read_pmcr - Read PMCR_EL0 register for the vCPU
- * @vcpu: The vcpu pointer
- */
-u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu)
-{
-	u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
-	u64 n = vcpu->kvm->arch.nr_pmu_counters;
-
-	if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu))
-		n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
-
-	return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N);
-}
-
 void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu)
 {
 	bool reprogrammed = false;
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 6b48a3d16d0d5..ee2f0f7e61bcf 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -8,8 +8,22 @@
 #include <linux/perf/arm_pmu.h>
 #include <linux/perf/arm_pmuv3.h>
 
+#include <kvm/arm_pmu.h>
+
+#include <asm/kvm_emulate.h>
+
+static LIST_HEAD(arm_pmus);
+static DEFINE_MUTEX(arm_pmus_lock);
 static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
 
+#define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)
+
+bool kvm_supports_guest_pmuv3(void)
+{
+	guard(mutex)(&arm_pmus_lock);
+	return !list_empty(&arm_pmus);
+}
+
 /*
  * Given the perf event attributes and system type, determine
  * if we are going to need to switch counters at guest entry/exit.
@@ -209,3 +223,665 @@ void kvm_vcpu_pmu_resync_el0(void)
 
 	kvm_make_request(KVM_REQ_RESYNC_PMU_EL0, vcpu);
 }
+
+void kvm_host_pmu_init(struct arm_pmu *pmu)
+{
+	struct arm_pmu_entry *entry;
+
+	/*
+	 * Check the sanitised PMU version for the system, as KVM does not
+	 * support implementations where PMUv3 exists on a subset of CPUs.
+	 */
+	if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit()))
+		return;
+
+	guard(mutex)(&arm_pmus_lock);
+
+	entry = kmalloc_obj(*entry);
+	if (!entry)
+		return;
+
+	entry->arm_pmu = pmu;
+	list_add_tail(&entry->entry, &arm_pmus);
+}
+
+static struct arm_pmu *kvm_pmu_probe_armpmu(void)
+{
+	struct arm_pmu_entry *entry;
+	struct arm_pmu *pmu;
+	int cpu;
+
+	guard(mutex)(&arm_pmus_lock);
+
+	/*
+	 * It is safe to use a stale cpu to iterate the list of PMUs so long as
+	 * the same value is used for the entirety of the loop. Given this, and
+	 * the fact that no percpu data is used for the lookup there is no need
+	 * to disable preemption.
+	 *
+	 * It is still necessary to get a valid cpu, though, to probe for the
+	 * default PMU instance as userspace is not required to specify a PMU
+	 * type. In order to uphold the preexisting behavior KVM selects the
+	 * PMU instance for the core during vcpu init. A dependent use
+	 * case would be a user with disdain of all things big.LITTLE that
+	 * affines the VMM to a particular cluster of cores.
+	 *
+	 * In any case, userspace should just do the sane thing and use the UAPI
+	 * to select a PMU type directly. But, be wary of the baggage being
+	 * carried here.
+	 */
+	cpu = raw_smp_processor_id();
+	list_for_each_entry(entry, &arm_pmus, entry) {
+		pmu = entry->arm_pmu;
+
+		if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
+			return pmu;
+	}
+
+	return NULL;
+}
+
+static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1)
+{
+	u32 hi[2], lo[2];
+
+	bitmap_to_arr32(lo, pmu->pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
+	bitmap_to_arr32(hi, pmu->pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
+
+	return ((u64)hi[pmceid1] << 32) | lo[pmceid1];
+}
+
+static u64 compute_pmceid0(struct arm_pmu *pmu)
+{
+	u64 val = __compute_pmceid(pmu, 0);
+
+	/* always support SW_INCR */
+	val |= BIT(ARMV8_PMUV3_PERFCTR_SW_INCR);
+	/* always support CHAIN */
+	val |= BIT(ARMV8_PMUV3_PERFCTR_CHAIN);
+	return val;
+}
+
+static u64 compute_pmceid1(struct arm_pmu *pmu)
+{
+	u64 val = __compute_pmceid(pmu, 1);
+
+	/*
+	 * Don't advertise STALL_SLOT*, as PMMIR_EL0 is handled
+	 * as RAZ
+	 */
+	val &= ~(BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32) |
+		 BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND - 32) |
+		 BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND - 32));
+	return val;
+}
+
+u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
+{
+	struct arm_pmu *cpu_pmu = vcpu->kvm->arch.arm_pmu;
+	unsigned long *bmap = vcpu->kvm->arch.pmu_filter;
+	u64 val, mask = 0;
+	int base, i, nr_events;
+
+	if (!pmceid1) {
+		val = compute_pmceid0(cpu_pmu);
+		base = 0;
+	} else {
+		val = compute_pmceid1(cpu_pmu);
+		base = 32;
+	}
+
+	if (!bmap)
+		return val;
+
+	nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1;
+
+	for (i = 0; i < 32; i += 8) {
+		u64 byte;
+
+		byte = bitmap_get_value8(bmap, base + i);
+		mask |= byte << i;
+		if (nr_events >= (0x4000 + base + 32)) {
+			byte = bitmap_get_value8(bmap, 0x4000 + base + i);
+			mask |= byte << (32 + i);
+		}
+	}
+
+	return val & mask;
+}
+
+/*
+ * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding
+ * to the event.
+ * This is why we need a callback to do it once outside of the NMI context.
+ */
+static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work);
+	kvm_vcpu_kick(vcpu);
+}
+
+static u32 __kvm_pmu_event_mask(unsigned int pmuver)
+{
+	switch (pmuver) {
+	case ID_AA64DFR0_EL1_PMUVer_IMP:
+		return GENMASK(9, 0);
+	case ID_AA64DFR0_EL1_PMUVer_V3P1:
+	case ID_AA64DFR0_EL1_PMUVer_V3P4:
+	case ID_AA64DFR0_EL1_PMUVer_V3P5:
+	case ID_AA64DFR0_EL1_PMUVer_V3P7:
+		return GENMASK(15, 0);
+	default:		/* Shouldn't be here, just for sanity */
+		WARN_ONCE(1, "Unknown PMU version %d\n", pmuver);
+		return 0;
+	}
+}
+
+u32 kvm_pmu_event_mask(struct kvm *kvm)
+{
+	u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1);
+	u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0);
+
+	return __kvm_pmu_event_mask(pmuver);
+}
+
+u64 kvm_pmu_evtyper_mask(struct kvm *kvm)
+{
+	u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 |
+		   kvm_pmu_event_mask(kvm);
+
+	if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP))
+		mask |= ARMV8_PMU_INCLUDE_EL2;
+
+	if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP))
+		mask |= ARMV8_PMU_EXCLUDE_NS_EL0 |
+			ARMV8_PMU_EXCLUDE_NS_EL1 |
+			ARMV8_PMU_EXCLUDE_EL3;
+
+	return mask;
+}
+
+static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = &vcpu->arch.pmu;
+	bool overflow;
+
+	overflow = kvm_pmu_overflow_status(vcpu);
+	if (pmu->irq_level == overflow)
+		return;
+
+	pmu->irq_level = overflow;
+
+	if (likely(irqchip_in_kernel(vcpu->kvm))) {
+		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu,
+					      pmu->irq_num, overflow, pmu);
+		WARN_ON(ret);
+	}
+}
+
+/**
+ * kvm_pmu_flush_hwstate - flush pmu state to cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the PMU has overflowed while we were running in the host, and inject
+ * an interrupt if that was the case.
+ */
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
+{
+	kvm_pmu_update_state(vcpu);
+}
+
+/**
+ * kvm_pmu_sync_hwstate - sync pmu state from cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the PMU has overflowed while we were running in the guest, and
+ * inject an interrupt if that was the case.
+ */
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	kvm_pmu_update_state(vcpu);
+}
+
+int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->arch.pmu.created)
+		return -EINVAL;
+
+	/*
+	 * A valid interrupt configuration for the PMU is either to have a
+	 * properly configured interrupt number and using an in-kernel
+	 * irqchip, or to not have an in-kernel GIC and not set an IRQ.
+	 */
+	if (irqchip_in_kernel(vcpu->kvm)) {
+		int irq = vcpu->arch.pmu.irq_num;
+		/*
+		 * If we are using an in-kernel vgic, at this point we know
+		 * the vgic will be initialized, so we can check the PMU irq
+		 * number against the dimensions of the vgic and make sure
+		 * it's valid.
+		 */
+		if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq))
+			return -EINVAL;
+	} else if (kvm_arm_pmu_irq_initialized(vcpu)) {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
+{
+	if (irqchip_in_kernel(vcpu->kvm)) {
+		int ret;
+
+		/*
+		 * If using the PMU with an in-kernel virtual GIC
+		 * implementation, we require the GIC to be already
+		 * initialized when initializing the PMU.
+		 */
+		if (!vgic_initialized(vcpu->kvm))
+			return -ENODEV;
+
+		if (!kvm_arm_pmu_irq_initialized(vcpu))
+			return -ENXIO;
+
+		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num,
+					 &vcpu->arch.pmu);
+		if (ret)
+			return ret;
+	}
+
+	init_irq_work(&vcpu->arch.pmu.overflow_work,
+		      kvm_pmu_perf_overflow_notify_vcpu);
+
+	vcpu->arch.pmu.created = true;
+	return 0;
+}
+
+/*
+ * For one VM the interrupt type must be same for each vcpu.
+ * As a PPI, the interrupt number is the same for all vcpus,
+ * while as an SPI it must be a separate number per vcpu.
+ */
+static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
+{
+	unsigned long i;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_arm_pmu_irq_initialized(vcpu))
+			continue;
+
+		if (irq_is_ppi(irq)) {
+			if (vcpu->arch.pmu.irq_num != irq)
+				return false;
+		} else {
+			if (vcpu->arch.pmu.irq_num == irq)
+				return false;
+		}
+	}
+
+	return true;
+}
+
+/**
+ * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters.
+ * @kvm: The kvm pointer
+ */
+u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
+{
+	struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
+
+	/*
+	 * PMUv3 requires that all event counters are capable of counting any
+	 * event, though the same may not be true of non-PMUv3 hardware.
+	 */
+	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
+		return 1;
+
+	/*
+	 * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
+	 * Ignore those and return only the general-purpose counters.
+	 */
+	return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS);
+}
+
+static void kvm_arm_set_nr_counters(struct kvm *kvm, unsigned int nr)
+{
+	kvm->arch.nr_pmu_counters = nr;
+
+	/* Reset MDCR_EL2.HPMN behind the vcpus' back... */
+	if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) {
+		struct kvm_vcpu *vcpu;
+		unsigned long i;
+
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2);
+
+			val &= ~MDCR_EL2_HPMN;
+			val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.nr_pmu_counters);
+			__vcpu_assign_sys_reg(vcpu, MDCR_EL2, val);
+		}
+	}
+}
+
+static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
+{
+	lockdep_assert_held(&kvm->arch.config_lock);
+
+	kvm->arch.arm_pmu = arm_pmu;
+	kvm_arm_set_nr_counters(kvm, kvm_arm_pmu_get_max_counters(kvm));
+}
+
+/**
+ * kvm_arm_set_default_pmu - No PMU set, get the default one.
+ * @kvm: The kvm pointer
+ *
+ * The observant among you will notice that the supported_cpus
+ * mask does not get updated for the default PMU even though it
+ * is quite possible the selected instance supports only a
+ * subset of cores in the system. This is intentional, and
+ * upholds the preexisting behavior on heterogeneous systems
+ * where vCPUs can be scheduled on any core but the guest
+ * counters could stop working.
+ */
+int kvm_arm_set_default_pmu(struct kvm *kvm)
+{
+	struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu();
+
+	if (!arm_pmu)
+		return -ENODEV;
+
+	kvm_arm_set_pmu(kvm, arm_pmu);
+	return 0;
+}
+
+static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct arm_pmu_entry *entry;
+	struct arm_pmu *arm_pmu;
+	int ret = -ENXIO;
+
+	lockdep_assert_held(&kvm->arch.config_lock);
+	mutex_lock(&arm_pmus_lock);
+
+	list_for_each_entry(entry, &arm_pmus, entry) {
+		arm_pmu = entry->arm_pmu;
+		if (arm_pmu->pmu.type == pmu_id) {
+			if (kvm_vm_has_ran_once(kvm) ||
+			    (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
+				ret = -EBUSY;
+				break;
+			}
+
+			kvm_arm_set_pmu(kvm, arm_pmu);
+			cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus);
+			ret = 0;
+			break;
+		}
+	}
+
+	mutex_unlock(&arm_pmus_lock);
+	return ret;
+}
+
+static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n)
+{
+	struct kvm *kvm = vcpu->kvm;
+
+	if (!kvm->arch.arm_pmu)
+		return -EINVAL;
+
+	if (n > kvm_arm_pmu_get_max_counters(kvm))
+		return -EINVAL;
+
+	kvm_arm_set_nr_counters(kvm, n);
+	return 0;
+}
+
+int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	struct kvm *kvm = vcpu->kvm;
+
+	lockdep_assert_held(&kvm->arch.config_lock);
+
+	if (!kvm_vcpu_has_pmu(vcpu))
+		return -ENODEV;
+
+	if (vcpu->arch.pmu.created)
+		return -EBUSY;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_PMU_V3_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!irqchip_in_kernel(kvm))
+			return -EINVAL;
+
+		if (get_user(irq, uaddr))
+			return -EFAULT;
+
+		/* The PMU overflow interrupt can be a PPI or a valid SPI. */
+		if (!(irq_is_ppi(irq) || irq_is_spi(irq)))
+			return -EINVAL;
+
+		if (!pmu_irq_is_valid(kvm, irq))
+			return -EINVAL;
+
+		if (kvm_arm_pmu_irq_initialized(vcpu))
+			return -EBUSY;
+
+		kvm_debug("Set kvm ARM PMU irq: %d\n", irq);
+		vcpu->arch.pmu.irq_num = irq;
+		return 0;
+	}
+	case KVM_ARM_VCPU_PMU_V3_FILTER: {
+		u8 pmuver = kvm_arm_pmu_get_pmuver_limit();
+		struct kvm_pmu_event_filter __user *uaddr;
+		struct kvm_pmu_event_filter filter;
+		int nr_events;
+
+		/*
+		 * Allow userspace to specify an event filter for the entire
+		 * event range supported by PMUVer of the hardware, rather
+		 * than the guest's PMUVer for KVM backward compatibility.
+		 */
+		nr_events = __kvm_pmu_event_mask(pmuver) + 1;
+
+		uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
+
+		if (copy_from_user(&filter, uaddr, sizeof(filter)))
+			return -EFAULT;
+
+		if (((u32)filter.base_event + filter.nevents) > nr_events ||
+		    (filter.action != KVM_PMU_EVENT_ALLOW &&
+		     filter.action != KVM_PMU_EVENT_DENY))
+			return -EINVAL;
+
+		if (kvm_vm_has_ran_once(kvm))
+			return -EBUSY;
+
+		if (!kvm->arch.pmu_filter) {
+			kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
+			if (!kvm->arch.pmu_filter)
+				return -ENOMEM;
+
+			/*
+			 * The default depends on the first applied filter.
+			 * If it allows events, the default is to deny.
+			 * Conversely, if the first filter denies a set of
+			 * events, the default is to allow.
+			 */
+			if (filter.action == KVM_PMU_EVENT_ALLOW)
+				bitmap_zero(kvm->arch.pmu_filter, nr_events);
+			else
+				bitmap_fill(kvm->arch.pmu_filter, nr_events);
+		}
+
+		if (filter.action == KVM_PMU_EVENT_ALLOW)
+			bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+		else
+			bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+
+		return 0;
+	}
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int pmu_id;
+
+		if (get_user(pmu_id, uaddr))
+			return -EFAULT;
+
+		return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id);
+	}
+	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: {
+		unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr;
+		unsigned int n;
+
+		if (get_user(n, uaddr))
+			return -EFAULT;
+
+		return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
+	}
+	case KVM_ARM_VCPU_PMU_V3_INIT:
+		return kvm_arm_pmu_v3_init(vcpu);
+	}
+
+	return -ENXIO;
+}
+
+int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_PMU_V3_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!irqchip_in_kernel(vcpu->kvm))
+			return -EINVAL;
+
+		if (!kvm_vcpu_has_pmu(vcpu))
+			return -ENODEV;
+
+		if (!kvm_arm_pmu_irq_initialized(vcpu))
+			return -ENXIO;
+
+		irq = vcpu->arch.pmu.irq_num;
+		return put_user(irq, uaddr);
+	}
+	}
+
+	return -ENXIO;
+}
+
+int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_PMU_V3_IRQ:
+	case KVM_ARM_VCPU_PMU_V3_INIT:
+	case KVM_ARM_VCPU_PMU_V3_FILTER:
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
+	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
+		if (kvm_vcpu_has_pmu(vcpu))
+			return 0;
+	}
+
+	return -ENXIO;
+}
+
+u8 kvm_arm_pmu_get_pmuver_limit(void)
+{
+	unsigned int pmuver;
+
+	pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer,
+			       read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1));
+
+	/*
+	 * Spoof a barebones PMUv3 implementation if the system supports IMPDEF
+	 * traps of the PMUv3 sysregs
+	 */
+	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
+		return ID_AA64DFR0_EL1_PMUVer_IMP;
+
+	/*
+	 * Otherwise, treat IMPLEMENTATION DEFINED functionality as
+	 * unimplemented
+	 */
+	if (pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF)
+		return 0;
+
+	return min(pmuver, ID_AA64DFR0_EL1_PMUVer_V3P5);
+}
+
+u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu)
+{
+	u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu));
+
+	if (val == 0)
+		return BIT(ARMV8_PMU_CYCLE_IDX);
+	else
+		return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX);
+}
+
+u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu)
+{
+	unsigned int hpmn, n;
+
+	if (!vcpu_has_nv(vcpu))
+		return 0;
+
+	hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
+	n = vcpu->kvm->arch.nr_pmu_counters;
+
+	/*
+	 * Programming HPMN to a value greater than PMCR_EL0.N is
+	 * CONSTRAINED UNPREDICTABLE. Make the implementation choice that an
+	 * UNKNOWN number of counters (in our case, zero) are reserved for EL2.
+	 */
+	if (hpmn >= n)
+		return 0;
+
+	/*
+	 * Programming HPMN=0 is CONSTRAINED UNPREDICTABLE if FEAT_HPMN0 isn't
+	 * implemented. Since KVM's ability to emulate HPMN=0 does not directly
+	 * depend on hardware (all PMU registers are trapped), make the
+	 * implementation choice that all counters are included in the second
+	 * range reserved for EL2/EL3.
+	 */
+	return GENMASK(n - 1, hpmn);
+}
+
+bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx)
+{
+	return kvm_pmu_hyp_counter_mask(vcpu) & BIT(idx);
+}
+
+u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu)
+{
+	u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
+
+	if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu))
+		return mask;
+
+	return mask & ~kvm_pmu_hyp_counter_mask(vcpu);
+}
+
+/**
+ * kvm_vcpu_read_pmcr - Read PMCR_EL0 register for the vCPU
+ * @vcpu: The vcpu pointer
+ */
+u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu)
+{
+	u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
+	u64 n = vcpu->kvm->arch.nr_pmu_counters;
+
+	if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu))
+		n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
+
+	return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N);
+}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index e91d15a7a564b..24a471cf59d56 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -53,13 +53,16 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
 void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
 u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu);
+u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu);
 u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu);
+u32 kvm_pmu_event_mask(struct kvm *kvm);
 u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1);
 void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu);
 void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu);
 bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_pmu_update_run(struct kvm_vcpu *vcpu);
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
@@ -132,6 +135,10 @@ static inline u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+static inline u32 kvm_pmu_event_mask(struct kvm *kvm)
+{
+	return 0;
+}
 static inline void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val) {}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions
  2026-05-04 21:17 ` [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions Colton Lewis
@ 2026-05-04 22:02   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:02 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Low] Redundant macro definition of `kvm_arm_pmu_irq_initialized`.
- [Low] Missing `#include <linux/uaccess.h>` in `arch/arm64/kvm/pmu.c`.
--

commit 9aeaae469d996e1d76ed323e65048d5b72469b0b
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Reorganize PMU functions

A lot of functions in pmu-emul.c aren't specific to the emulated PMU
implementation. Move them to the more appropriate pmu.c file where
shared PMU functions should live.

> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 6b48a3d16d0d5..ee2f0f7e61bcf 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -8,8 +8,22 @@
>  #include <linux/perf/arm_pmu.h>
>  #include <linux/perf/arm_pmuv3.h>
>  
> +#include <kvm/arm_pmu.h>
> +
> +#include <asm/kvm_emulate.h>

This isn't a bug, but should linux/uaccess.h be explicitly included here?

The refactoring moved functions like kvm_arm_pmu_v3_set_attr() to this file,
which rely on get_user() and copy_from_user().

> +static LIST_HEAD(arm_pmus);
> +static DEFINE_MUTEX(arm_pmus_lock);
>  static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
>  
> +#define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)

Is this macro definition redundant?

It appears that kvm_arm_pmu_irq_initialized() is already defined in
include/kvm/arm_pmu.h, and this file includes kvm/arm_pmu.h just above.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=3

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (2 preceding siblings ...)
  2026-05-04 21:17 ` [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 21:41   ` sashiko-bot
  2026-05-04 21:17 ` [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr Colton Lewis
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

The OVSR bitmasks are valid for enable and interrupt registers as well as
overflow registers. Generalize the names.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 drivers/perf/arm_pmuv3.c       |  4 ++--
 include/linux/perf/arm_pmuv3.h | 14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8d3b832cd633a..1cceb1f614515 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -534,7 +534,7 @@ static void armv8pmu_pmcr_write(u64 val)
 
 static int armv8pmu_has_overflowed(u64 pmovsr)
 {
-	return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK);
+	return !!(pmovsr & ARMV8_PMU_CNT_MASK_ALL);
 }
 
 static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx)
@@ -770,7 +770,7 @@ static u64 armv8pmu_getreset_flags(void)
 	value = read_pmovsclr();
 
 	/* Write to clear flags */
-	value &= ARMV8_PMU_OVERFLOWED_MASK;
+	value &= ARMV8_PMU_CNT_MASK_ALL;
 	write_pmovsclr(value);
 
 	return value;
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index d698efba28a27..fd2a34b4a64d1 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -224,14 +224,14 @@
 				 ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
 
 /*
- * PMOVSR: counters overflow flag status reg
+ * Counter bitmask layouts for overflow, enable, and interrupts
  */
-#define ARMV8_PMU_OVSR_P		GENMASK(30, 0)
-#define ARMV8_PMU_OVSR_C		BIT(31)
-#define ARMV8_PMU_OVSR_F		BIT_ULL(32) /* arm64 only */
-/* Mask for writable bits is both P and C fields */
-#define ARMV8_PMU_OVERFLOWED_MASK	(ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \
-					ARMV8_PMU_OVSR_F)
+#define ARMV8_PMU_CNT_MASK_P		GENMASK(30, 0)
+#define ARMV8_PMU_CNT_MASK_C		BIT(31)
+#define ARMV8_PMU_CNT_MASK_F		BIT_ULL(32) /* arm64 only */
+#define ARMV8_PMU_CNT_MASK_ALL		(ARMV8_PMU_CNT_MASK_P | \
+					 ARMV8_PMU_CNT_MASK_C | \
+					 ARMV8_PMU_CNT_MASK_F)
 
 /*
  * PMXEVTYPER: Event selection reg
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks
  2026-05-04 21:17 ` [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
@ 2026-05-04 21:41   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 21:41 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Missing synchronization for the tools mirrored header `tools/include/perf/arm_pmuv3.h`.
--

commit 11c1a089e4abf2f887089a60dd977c70575ab494
Author: Colton Lewis <coltonlewis@google.com>

perf: arm_pmuv3: Generalize counter bitmasks

This commit renames the OVSR bitmasks to be more general since they apply
to enable and interrupt registers as well as overflow registers.

> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index d698efba28a27..fd2a34b4a64d1 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -224,14 +224,14 @@
>  				 ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
>  
>  /*
> - * PMOVSR: counters overflow flag status reg
> + * Counter bitmask layouts for overflow, enable, and interrupts
>   */
> -#define ARMV8_PMU_OVSR_P		GENMASK(30, 0)
> -#define ARMV8_PMU_OVSR_C		BIT(31)
> -#define ARMV8_PMU_OVSR_F		BIT_ULL(32) /* arm64 only */
> -/* Mask for writable bits is both P and C fields */
> -#define ARMV8_PMU_OVERFLOWED_MASK	(ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \
> -					ARMV8_PMU_OVSR_F)
> +#define ARMV8_PMU_CNT_MASK_P		GENMASK(30, 0)
> +#define ARMV8_PMU_CNT_MASK_C		BIT(31)
> +#define ARMV8_PMU_CNT_MASK_F		BIT_ULL(32) /* arm64 only */
> +#define ARMV8_PMU_CNT_MASK_ALL		(ARMV8_PMU_CNT_MASK_P | \
> +					 ARMV8_PMU_CNT_MASK_C | \
> +					 ARMV8_PMU_CNT_MASK_F)

This isn't a bug, but should tools/include/perf/arm_pmuv3.h also be updated
to reflect these macro renames?

The kernel maintains a mirrored copy of this header which is used by KVM
selftests (tools/testing/selftests/kvm/arm64/vpmu_counter_access.c). Keeping
the tools header synchronized helps avoid future build confusion for test
developers.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (3 preceding siblings ...)
  2026-05-04 21:17 ` [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 21:49   ` sashiko-bot
  2026-05-04 21:17 ` [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Check cntr_mask before using pmccntr to ensure it's available. With a
partitioned PMU, there may be instances where pmccntr is being used by
the guest and will ba absent from cntr_mask.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 drivers/perf/arm_pmuv3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 1cceb1f614515..7ff3139dda893 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -993,6 +993,9 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
 	if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
 		return false;
 
+	if (!test_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask))
+		return false;
+
 	/*
 	 * A CPU_CYCLES event with threshold counting cannot use PMCCNTR_EL0
 	 * since it lacks threshold support.
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr
  2026-05-04 21:17 ` [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr Colton Lewis
@ 2026-05-04 21:49   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 21:49 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Bypassing the fallback safety check for 64-bit user-access events when pmccntr is partitioned causes counter corruption.
--

commit 1df2bf6962076bf4d1bac00a5756584e40f646ca
Author: Colton Lewis <coltonlewis@google.com>

perf: arm_pmuv3: Check cntr_mask before using pmccntr

This commit adds a check for cntr_mask before using pmccntr to ensure it is
available. This is intended to handle partitioned PMUs where pmccntr might
be used by the guest and absent from the mask.

> Check cntr_mask before using pmccntr to ensure it's available. With a
> partitioned PMU, there may be instances where pmccntr is being used by
> the guest and will ba absent from cntr_mask.

This isn't a bug, but there's a typo (ba) in the commit message.

> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 1cceb1f614515..7ff3139dda893 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -993,6 +993,9 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>  	if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>  		return false;
>  
> +	if (!test_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask))
> +		return false;
> +

Does moving this check into armv8pmu_can_use_pmccntr() introduce a regression
that bypasses the 64-bit user-access fallback safety check in
armv8pmu_get_event_idx()?

If pmccntr is reserved for the guest and removed from cntr_mask,
armv8pmu_can_use_pmccntr() will return false. Would this skip the -EAGAIN
safeguard for 64-bit user-access cycle events?

drivers/perf/arm_pmuv3.c:armv8pmu_get_event_idx() {
    ...
	if (armv8pmu_can_use_pmccntr(cpuc, event)) {
		if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
			return ARMV8_PMU_CYCLE_IDX;
		else if (armv8pmu_event_is_64bit(event) &&
			   armv8pmu_event_want_user_access(event) &&
			   !armv8pmu_has_long_event(cpu_pmu))
				return -EAGAIN;
	}
    ...
	if (armv8pmu_event_is_chained(event))
		return	armv8pmu_get_chain_idx(cpuc, cpu_pmu);
	else
		return armv8pmu_get_single_idx(cpuc, cpu_pmu);
}

If armv8pmu_can_use_pmccntr() returns false, the logic will fall through to
armv8pmu_get_single_idx() and allocate a 32-bit general-purpose counter for
the 64-bit user-access event.

Because the event maintains the ARMPMU_EVT_64BIT flag, will
armpmu_event_update() incorrectly calculate deltas using a 64-bit mask? If
the 32-bit hardware counter overflows, could this regression lead to massive
erroneous jumps in the counter?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (4 preceding siblings ...)
  2026-05-04 21:17 ` [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr Colton Lewis
@ 2026-05-04 21:17 ` Colton Lewis
  2026-05-04 21:53   ` sashiko-bot
  2026-05-11 14:51   ` James Clark
  2026-05-04 21:18 ` [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:17 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

For PMUv3, the register field MDCR_EL2.HPMN partitiones the PMU
counters into two ranges where counters 0..HPMN-1 are accessible by
EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by
EL2.

Create a module parameter reserved_host_counters to reserve a number
of counters for the host. Counters not reserved for the host may be
used by a guest VM when the PMU is partitioned.

Add the function armv8pmu_partition() to check the validity of the
reservation and record a partition has happened and the maximum
allowable value for HPMN.

Due to the difficulty this feature would create for the driver running
in nVHE mode, partitioning is only allowed in VHE mode. In order to
support a partitioning on nVHE we'd need to explicitly disable guest
counters on every exit and reset HPMN to place all counters in the
first range.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm/include/asm/arm_pmuv3.h   |  4 ++
 arch/arm64/include/asm/arm_pmuv3.h |  5 ++
 arch/arm64/kvm/Makefile            |  2 +-
 arch/arm64/kvm/pmu-direct.c        | 22 +++++++++
 drivers/perf/arm_pmuv3.c           | 77 ++++++++++++++++++++++++++++--
 include/kvm/arm_pmu.h              |  8 ++++
 include/linux/perf/arm_pmu.h       |  2 +
 7 files changed, 115 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/kvm/pmu-direct.c

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index 2ec0e5e83fc98..154503f054886 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -221,6 +221,10 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
 	return false;
 }
 
+static inline bool has_host_pmu_partition_support(void)
+{
+	return false;
+}
 static inline bool kvm_set_pmuserenr(u64 val)
 {
 	return false;
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index cf2b2212e00a2..27c4d6d47da31 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -171,6 +171,11 @@ static inline bool pmuv3_implemented(int pmuver)
 		 pmuver == ID_AA64DFR0_EL1_PMUVer_NI);
 }
 
+static inline bool is_pmuv3p1(int pmuver)
+{
+	return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P1;
+}
+
 static inline bool is_pmuv3p4(int pmuver)
 {
 	return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P4;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 3ebc0570345cc..baf0f296c0e53 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -26,7 +26,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
 	 vgic/vgic-v5.o
 
-kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
+kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu-direct.o pmu.o
 kvm-$(CONFIG_ARM64_PTR_AUTH)  += pauth.o
 kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
 
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
new file mode 100644
index 0000000000000..74e40e4915416
--- /dev/null
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Google LLC
+ * Author: Colton Lewis <coltonlewis@google.com>
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/arm_pmuv3.h>
+
+/**
+ * has_host_pmu_partition_support() - Determine if partitioning is possible
+ *
+ * Partitioning is only supported in VHE mode with PMUv3
+ *
+ * Return: True if partitioning is possible, false otherwise
+ */
+bool has_host_pmu_partition_support(void)
+{
+	return has_vhe() &&
+		system_supports_pmuv3();
+}
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 7ff3139dda893..6e447227d801f 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -42,6 +42,13 @@
 #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_ACCESS		0xEC
 #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_MISS		0xED
 
+static int reserved_host_counters __read_mostly = -1;
+bool armv8pmu_is_partitioned;
+
+module_param(reserved_host_counters, int, 0);
+MODULE_PARM_DESC(reserved_host_counters,
+		 "PMU Partition: -1 = No partition; +N = Reserve N counters for the host");
+
 /*
  * ARMv8 Architectural defined events, not all of these may
  * be supported on any given implementation. Unsupported events will
@@ -532,6 +539,11 @@ static void armv8pmu_pmcr_write(u64 val)
 	write_pmcr(val);
 }
 
+static u64 armv8pmu_pmcr_n_read(void)
+{
+	return FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
+}
+
 static int armv8pmu_has_overflowed(u64 pmovsr)
 {
 	return !!(pmovsr & ARMV8_PMU_CNT_MASK_ALL);
@@ -1312,6 +1324,54 @@ struct armv8pmu_probe_info {
 	bool present;
 };
 
+/**
+ * armv8pmu_reservation_is_valid() - Determine if reservation is allowed
+ * @host_counters: Number of host counters to reserve
+ *
+ * Determine if the number of host counters in the argument is an
+ * allowed reservation, 0 to NR_COUNTERS inclusive.
+ *
+ * Return: True if reservation allowed, false otherwise
+ */
+static bool armv8pmu_reservation_is_valid(int host_counters)
+{
+	return host_counters >= 0 &&
+		host_counters <= armv8pmu_pmcr_n_read();
+}
+
+/**
+ * armv8pmu_partition() - Partition the PMU
+ * @pmu: Pointer to pmu being partitioned
+ * @host_counters: Number of host counters to reserve
+ *
+ * Partition the given PMU by taking a number of host counters to
+ * reserve and, if it is a valid reservation, recording the
+ * corresponding HPMN value in the max_guest_counters field of the PMU and
+ * clearing the guest-reserved counters from the counter mask.
+ *
+ * Return: 0 on success, -ERROR otherwise
+ */
+static int armv8pmu_partition(struct arm_pmu *pmu, int host_counters)
+{
+	u8 nr_counters;
+	u8 hpmn;
+
+	if (!armv8pmu_reservation_is_valid(host_counters)) {
+		pr_err("PMU partition reservation of %d host counters is not valid", host_counters);
+		return -EINVAL;
+	}
+
+	nr_counters = armv8pmu_pmcr_n_read();
+	hpmn = nr_counters - host_counters;
+
+	pmu->max_guest_counters = hpmn;
+	armv8pmu_is_partitioned = true;
+
+	pr_info("Partitioned PMU with %d host counters -> %u guest counters", host_counters, hpmn);
+
+	return 0;
+}
+
 static void __armv8pmu_probe_pmu(void *info)
 {
 	struct armv8pmu_probe_info *probe = info;
@@ -1326,17 +1386,26 @@ static void __armv8pmu_probe_pmu(void *info)
 
 	cpu_pmu->pmuver = pmuver;
 	probe->present = true;
+	cpu_pmu->max_guest_counters = -1;
 
 	/* Read the nb of CNTx counters supported from PMNC */
-	bitmap_set(cpu_pmu->cntr_mask,
-		   0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
+	bitmap_set(cpu_pmu->hw_cntr_mask, 0, armv8pmu_pmcr_n_read());
 
 	/* Add the CPU cycles counter */
-	set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
+	set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->hw_cntr_mask);
 
 	/* Add the CPU instructions counter */
 	if (pmuv3_has_icntr())
-		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
+		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->hw_cntr_mask);
+
+	bitmap_copy(cpu_pmu->cntr_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
+
+	if (reserved_host_counters >= 0) {
+		if (has_host_pmu_partition_support())
+			armv8pmu_partition(cpu_pmu, reserved_host_counters);
+		else
+			pr_err("PMU partition is not supported");
+	}
 
 	pmceid[0] = pmceid_raw[0] = read_pmceid0();
 	pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 24a471cf59d56..95f404cdcb2df 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -47,7 +47,10 @@ struct arm_pmu_entry {
 	struct arm_pmu *arm_pmu;
 };
 
+extern bool armv8pmu_is_partitioned;
+
 bool kvm_supports_guest_pmuv3(void);
+bool has_host_pmu_partition_support(void);
 #define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
@@ -117,6 +120,11 @@ static inline bool kvm_supports_guest_pmuv3(void)
 	return false;
 }
 
+static inline bool has_host_pmu_partition_support(void)
+{
+	return false;
+}
+
 #define kvm_arm_pmu_irq_initialized(v)	(false)
 static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu,
 					    u64 select_idx)
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 52b37f7bdbf9e..f7b000bb3eca8 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -109,6 +109,7 @@ struct arm_pmu {
 	 */
 	int		(*map_pmuv3_event)(unsigned int eventsel);
 	DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
+	DECLARE_BITMAP(hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
 	bool		secure_access; /* 32-bit ARM only */
 	struct platform_device	*plat_device;
 	struct pmu_hw_events	__percpu *hw_events;
@@ -129,6 +130,7 @@ struct arm_pmu {
 
 	/* Only to be used by ACPI probing code */
 	unsigned long acpi_cpuid;
+	int		max_guest_counters;
 };
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU
  2026-05-04 21:17 ` [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
@ 2026-05-04 21:53   ` sashiko-bot
  2026-05-11 14:51   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 21:53 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Low] Missing newlines in printk messages.
- [Low] Kernel doc for armv8pmu_partition() directly contradicts the implementation.
--

commit 6d1ad2c30d1fee65c9f04a9893ee57e6ba41fe28
Author: Colton Lewis <coltonlewis@google.com>

perf: arm_pmuv3: Add method to partition the PMU

This commit creates a module parameter reserved_host_counters to reserve
a number of PMU counters for the host. It also adds the function
armv8pmu_partition() to check the validity of the reservation and record
that a partition has happened.

> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 7ff3139dda893..6e447227d801f 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c

[ ... ]

> +/**
> + * armv8pmu_partition() - Partition the PMU
> + * @pmu: Pointer to pmu being partitioned
> + * @host_counters: Number of host counters to reserve
> + *
> + * Partition the given PMU by taking a number of host counters to
> + * reserve and, if it is a valid reservation, recording the
> + * corresponding HPMN value in the max_guest_counters field of the PMU and
> + * clearing the guest-reserved counters from the counter mask.
> + *
> + * Return: 0 on success, -ERROR otherwise
> + */

Is this kernel doc description accurate for the current implementation?

The documentation states this function is responsible for clearing the
guest-reserved counters from the counter mask, but the implementation below
leaves the counter mask unmodified.

> +static int armv8pmu_partition(struct arm_pmu *pmu, int host_counters)
> +{
> +	u8 nr_counters;
> +	u8 hpmn;
> +
> +	if (!armv8pmu_reservation_is_valid(host_counters)) {
> +		pr_err("PMU partition reservation of %d host counters is not valid", host_counters);

Does this message need a trailing newline? 

In the kernel, pr_err() does not automatically append a newline, so
subsequent kernel log messages might be appended to the same line.

> +		return -EINVAL;
> +	}
> +
> +	nr_counters = armv8pmu_pmcr_n_read();
> +	hpmn = nr_counters - host_counters;
> +
> +	pmu->max_guest_counters = hpmn;
> +	armv8pmu_is_partitioned = true;
> +
> +	pr_info("Partitioned PMU with %d host counters -> %u guest counters", host_counters, hpmn);

Similarly, does this pr_info() call also need a \n to prevent garbling the
dmesg output?

> +
> +	return 0;
> +}
> +
>  static void __armv8pmu_probe_pmu(void *info)
>  {

[ ... ]

>  	/* Add the CPU instructions counter */
>  	if (pmuv3_has_icntr())
> -		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
> +		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->hw_cntr_mask);
> +
> +	bitmap_copy(cpu_pmu->cntr_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
> +
> +	if (reserved_host_counters >= 0) {
> +		if (has_host_pmu_partition_support())
> +			armv8pmu_partition(cpu_pmu, reserved_host_counters);
> +		else
> +			pr_err("PMU partition is not supported");

Might this pr_err() also be missing a trailing newline?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=6

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU
  2026-05-04 21:17 ` [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
  2026-05-04 21:53   ` sashiko-bot
@ 2026-05-11 14:51   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: James Clark @ 2026-05-11 14:51 UTC (permalink / raw)
  To: Colton Lewis
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-perf-users,
	linux-kselftest, kvm



On 04/05/2026 10:17 pm, Colton Lewis wrote:
> For PMUv3, the register field MDCR_EL2.HPMN partitiones the PMU
> counters into two ranges where counters 0..HPMN-1 are accessible by
> EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by
> EL2.
> 
> Create a module parameter reserved_host_counters to reserve a number
> of counters for the host. Counters not reserved for the host may be
> used by a guest VM when the PMU is partitioned.
> 
> Add the function armv8pmu_partition() to check the validity of the
> reservation and record a partition has happened and the maximum
> allowable value for HPMN.
> 
> Due to the difficulty this feature would create for the driver running
> in nVHE mode, partitioning is only allowed in VHE mode. In order to
> support a partitioning on nVHE we'd need to explicitly disable guest
> counters on every exit and reset HPMN to place all counters in the
> first range.
> 
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
>   arch/arm/include/asm/arm_pmuv3.h   |  4 ++
>   arch/arm64/include/asm/arm_pmuv3.h |  5 ++
>   arch/arm64/kvm/Makefile            |  2 +-
>   arch/arm64/kvm/pmu-direct.c        | 22 +++++++++
>   drivers/perf/arm_pmuv3.c           | 77 ++++++++++++++++++++++++++++--
>   include/kvm/arm_pmu.h              |  8 ++++
>   include/linux/perf/arm_pmu.h       |  2 +
>   7 files changed, 115 insertions(+), 5 deletions(-)
>   create mode 100644 arch/arm64/kvm/pmu-direct.c
> 
> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index 2ec0e5e83fc98..154503f054886 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -221,6 +221,10 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
>   	return false;
>   }
>   
> +static inline bool has_host_pmu_partition_support(void)
> +{
> +	return false;
> +}
>   static inline bool kvm_set_pmuserenr(u64 val)
>   {
>   	return false;
> diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
> index cf2b2212e00a2..27c4d6d47da31 100644
> --- a/arch/arm64/include/asm/arm_pmuv3.h
> +++ b/arch/arm64/include/asm/arm_pmuv3.h
> @@ -171,6 +171,11 @@ static inline bool pmuv3_implemented(int pmuver)
>   		 pmuver == ID_AA64DFR0_EL1_PMUVer_NI);
>   }
>   
> +static inline bool is_pmuv3p1(int pmuver)
> +{
> +	return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P1;
> +}
> +
>   static inline bool is_pmuv3p4(int pmuver)
>   {
>   	return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P4;
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 3ebc0570345cc..baf0f296c0e53 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -26,7 +26,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>   	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
>   	 vgic/vgic-v5.o
>   
> -kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
> +kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu-direct.o pmu.o
>   kvm-$(CONFIG_ARM64_PTR_AUTH)  += pauth.o
>   kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
>   
> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> new file mode 100644
> index 0000000000000..74e40e4915416
> --- /dev/null
> +++ b/arch/arm64/kvm/pmu-direct.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Google LLC
> + * Author: Colton Lewis <coltonlewis@google.com>
> + */
> +
> +#include <linux/kvm_host.h>
> +
> +#include <asm/arm_pmuv3.h>
> +
> +/**
> + * has_host_pmu_partition_support() - Determine if partitioning is possible
> + *
> + * Partitioning is only supported in VHE mode with PMUv3
> + *
> + * Return: True if partitioning is possible, false otherwise
> + */
> +bool has_host_pmu_partition_support(void)
> +{
> +	return has_vhe() &&
> +		system_supports_pmuv3();
> +}
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 7ff3139dda893..6e447227d801f 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -42,6 +42,13 @@
>   #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_ACCESS		0xEC
>   #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_MISS		0xED
>   
> +static int reserved_host_counters __read_mostly = -1;
> +bool armv8pmu_is_partitioned;
> +
> +module_param(reserved_host_counters, int, 0);
> +MODULE_PARM_DESC(reserved_host_counters,
> +		 "PMU Partition: -1 = No partition; +N = Reserve N counters for the host");
> +
>   /*
>    * ARMv8 Architectural defined events, not all of these may
>    * be supported on any given implementation. Unsupported events will
> @@ -532,6 +539,11 @@ static void armv8pmu_pmcr_write(u64 val)
>   	write_pmcr(val);
>   }
>   
> +static u64 armv8pmu_pmcr_n_read(void)
> +{
> +	return FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
> +}
> +
>   static int armv8pmu_has_overflowed(u64 pmovsr)
>   {
>   	return !!(pmovsr & ARMV8_PMU_CNT_MASK_ALL);
> @@ -1312,6 +1324,54 @@ struct armv8pmu_probe_info {
>   	bool present;
>   };
>   
> +/**
> + * armv8pmu_reservation_is_valid() - Determine if reservation is allowed
> + * @host_counters: Number of host counters to reserve
> + *
> + * Determine if the number of host counters in the argument is an
> + * allowed reservation, 0 to NR_COUNTERS inclusive.
> + *
> + * Return: True if reservation allowed, false otherwise
> + */
> +static bool armv8pmu_reservation_is_valid(int host_counters)
> +{
> +	return host_counters >= 0 &&
> +		host_counters <= armv8pmu_pmcr_n_read();
> +}
> +
> +/**
> + * armv8pmu_partition() - Partition the PMU
> + * @pmu: Pointer to pmu being partitioned
> + * @host_counters: Number of host counters to reserve
> + *
> + * Partition the given PMU by taking a number of host counters to
> + * reserve and, if it is a valid reservation, recording the
> + * corresponding HPMN value in the max_guest_counters field of the PMU and
> + * clearing the guest-reserved counters from the counter mask.
> + *
> + * Return: 0 on success, -ERROR otherwise
> + */
> +static int armv8pmu_partition(struct arm_pmu *pmu, int host_counters)
> +{
> +	u8 nr_counters;
> +	u8 hpmn;
> +
> +	if (!armv8pmu_reservation_is_valid(host_counters)) {
> +		pr_err("PMU partition reservation of %d host counters is not valid", host_counters);
> +		return -EINVAL;
> +	}
> +
> +	nr_counters = armv8pmu_pmcr_n_read();
> +	hpmn = nr_counters - host_counters;
> +
> +	pmu->max_guest_counters = hpmn;
> +	armv8pmu_is_partitioned = true;
> +
> +	pr_info("Partitioned PMU with %d host counters -> %u guest counters", host_counters, hpmn);
> +
> +	return 0;
> +}
> +
>   static void __armv8pmu_probe_pmu(void *info)
>   {
>   	struct armv8pmu_probe_info *probe = info;
> @@ -1326,17 +1386,26 @@ static void __armv8pmu_probe_pmu(void *info)
>   
>   	cpu_pmu->pmuver = pmuver;
>   	probe->present = true;
> +	cpu_pmu->max_guest_counters = -1;
>   
>   	/* Read the nb of CNTx counters supported from PMNC */
> -	bitmap_set(cpu_pmu->cntr_mask,
> -		   0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
> +	bitmap_set(cpu_pmu->hw_cntr_mask, 0, armv8pmu_pmcr_n_read());
>   
>   	/* Add the CPU cycles counter */
> -	set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
> +	set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->hw_cntr_mask);
>   
>   	/* Add the CPU instructions counter */
>   	if (pmuv3_has_icntr())
> -		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
> +		set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->hw_cntr_mask);
> +
> +	bitmap_copy(cpu_pmu->cntr_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
> +
> +	if (reserved_host_counters >= 0) {
> +		if (has_host_pmu_partition_support())
> +			armv8pmu_partition(cpu_pmu, reserved_host_counters);
> +		else
> +			pr_err("PMU partition is not supported");
> +	}
>   
>   	pmceid[0] = pmceid_raw[0] = read_pmceid0();
>   	pmceid[1] = pmceid_raw[1] = read_pmceid1();
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 24a471cf59d56..95f404cdcb2df 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -47,7 +47,10 @@ struct arm_pmu_entry {
>   	struct arm_pmu *arm_pmu;
>   };
>   
> +extern bool armv8pmu_is_partitioned;
> +
>   bool kvm_supports_guest_pmuv3(void);
> +bool has_host_pmu_partition_support(void);
>   #define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)
>   u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
>   void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
> @@ -117,6 +120,11 @@ static inline bool kvm_supports_guest_pmuv3(void)
>   	return false;
>   }
>   
> +static inline bool has_host_pmu_partition_support(void)
> +{
> +	return false;
> +}
> +
>   #define kvm_arm_pmu_irq_initialized(v)	(false)
>   static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu,
>   					    u64 select_idx)
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 52b37f7bdbf9e..f7b000bb3eca8 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -109,6 +109,7 @@ struct arm_pmu {
>   	 */
>   	int		(*map_pmuv3_event)(unsigned int eventsel);
>   	DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
> +	DECLARE_BITMAP(hw_cntr_mask, ARMPMU_MAX_HWEVENTS);

I think this needs a comment or a clearer name. Both cntr_mask and 
hw_cntr_mask are used in KVM and the PMU driver and it's not immediately 
obvious what the difference is.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (5 preceding siblings ...)
  2026-05-04 21:17 ` [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:09   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers Colton Lewis
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

In order to gain the best performance benefit from partitioning the
PMU, utilize fine grain traps (FEAT_FGT and FEAT_FGT2) to avoid
trapping common PMU register accesses by the guest to remove that
overhead.

Untrapped:
* PMCR_EL0
* PMUSERENR_EL0
* PMSELR_EL0
* PMCCNTR_EL0
* PMCNTEN_EL0
* PMINTEN_EL1
* PMEVCNTRn_EL0

These are safe to untrap because writing MDCR_EL2.HPMN as this series
will do limits the effect of writes to any of these registers to the
partition of counters 0..HPMN-1. Reads from these registers will not
leak information from between guests as all these registers are
context swapped by a later patch in this series. Reads from these
registers also do not leak any information about the host's hardware
beyond what is promised by PMUv3.

Trapped:
* PMOVS_EL0
* PMEVTYPERn_EL0
* PMCCFILTR_EL0
* PMICNTR_EL0
* PMICFILTR_EL0
* PMCEIDn_EL0
* PMMIR_EL1

PMOVS remains trapped so KVM can track overflow IRQs that will need to
be injected into the guest.

PMICNTR and PMIFILTR remain trapped because KVM is not handling them
yet.

PMEVTYPERn remains trapped so KVM can limit which events guests can
count, such as disallowing counting at EL2. PMCCFILTR and PMCIFILTR
are special cases of the same.

PMCEIDn and PMMIR remain trapped because they can leak information
specific to the host hardware implementation.

NOTE: This patch temporarily forces kvm_vcpu_pmu_is_partitioned() to
be false to prevent partial feature activation for easier debugging.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm/include/asm/arm_pmuv3.h |  4 +++
 arch/arm64/kvm/config.c          | 41 ++++++++++++++++++++++--
 arch/arm64/kvm/pmu-direct.c      | 54 ++++++++++++++++++++++++++++++++
 include/kvm/arm_pmu.h            | 30 ++++++++++++++++++
 4 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index 154503f054886..42d62aa48d0a6 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -231,6 +231,10 @@ static inline bool kvm_set_pmuserenr(u64 val)
 }
 
 static inline void kvm_vcpu_pmu_resync_el0(void) {}
+static inline bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
+{
+	return false;
+}
 
 /* PMU Version in DFR Register */
 #define ARMV8_PMU_DFR_VER_NI        0
diff --git a/arch/arm64/kvm/config.c b/arch/arm64/kvm/config.c
index d9f553cbf9dfd..3b5f028f5bf11 100644
--- a/arch/arm64/kvm/config.c
+++ b/arch/arm64/kvm/config.c
@@ -1598,12 +1598,47 @@ static void __compute_hfgwtr(struct kvm_vcpu *vcpu)
 		*vcpu_fgt(vcpu, HFGWTR_EL2) |= HFGWTR_EL2_TCR_EL1;
 }
 
+static void __compute_hdfgrtr(struct kvm_vcpu *vcpu)
+{
+	__compute_fgt(vcpu, HDFGRTR_EL2);
+
+	*vcpu_fgt(vcpu, HDFGRTR_EL2) |=
+		HDFGRTR_EL2_PMOVS
+		| HDFGRTR_EL2_PMCCFILTR_EL0
+		| HDFGRTR_EL2_PMEVTYPERn_EL0
+		| HDFGRTR_EL2_PMCEIDn_EL0
+		| HDFGRTR_EL2_PMMIR_EL1;
+}
+
 static void __compute_hdfgwtr(struct kvm_vcpu *vcpu)
 {
 	__compute_fgt(vcpu, HDFGWTR_EL2);
 
 	if (is_hyp_ctxt(vcpu))
 		*vcpu_fgt(vcpu, HDFGWTR_EL2) |= HDFGWTR_EL2_MDSCR_EL1;
+
+	*vcpu_fgt(vcpu, HDFGWTR_EL2) |=
+		HDFGWTR_EL2_PMOVS
+		| HDFGWTR_EL2_PMCCFILTR_EL0
+		| HDFGWTR_EL2_PMEVTYPERn_EL0;
+}
+
+static void __compute_hdfgrtr2(struct kvm_vcpu *vcpu)
+{
+	__compute_fgt(vcpu, HDFGRTR2_EL2);
+
+	*vcpu_fgt(vcpu, HDFGRTR2_EL2) &=
+		~(HDFGRTR2_EL2_nPMICFILTR_EL0
+		  | HDFGRTR2_EL2_nPMICNTR_EL0);
+}
+
+static void __compute_hdfgwtr2(struct kvm_vcpu *vcpu)
+{
+	__compute_fgt(vcpu, HDFGWTR2_EL2);
+
+	*vcpu_fgt(vcpu, HDFGWTR2_EL2) &=
+		~(HDFGWTR2_EL2_nPMICFILTR_EL0
+		  | HDFGWTR2_EL2_nPMICNTR_EL0);
 }
 
 void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu)
@@ -1614,7 +1649,7 @@ void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu)
 	__compute_fgt(vcpu, HFGRTR_EL2);
 	__compute_hfgwtr(vcpu);
 	__compute_fgt(vcpu, HFGITR_EL2);
-	__compute_fgt(vcpu, HDFGRTR_EL2);
+	__compute_hdfgrtr(vcpu);
 	__compute_hdfgwtr(vcpu);
 	__compute_fgt(vcpu, HAFGRTR_EL2);
 
@@ -1624,6 +1659,6 @@ void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu)
 	__compute_fgt(vcpu, HFGRTR2_EL2);
 	__compute_fgt(vcpu, HFGWTR2_EL2);
 	__compute_fgt(vcpu, HFGITR2_EL2);
-	__compute_fgt(vcpu, HDFGRTR2_EL2);
-	__compute_fgt(vcpu, HDFGWTR2_EL2);
+	__compute_hdfgrtr2(vcpu);
+	__compute_hdfgwtr2(vcpu);
 }
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 74e40e4915416..2148bc46079c4 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -5,6 +5,8 @@
  */
 
 #include <linux/kvm_host.h>
+#include <linux/perf/arm_pmu.h>
+#include <linux/perf/arm_pmuv3.h>
 
 #include <asm/arm_pmuv3.h>
 
@@ -20,3 +22,55 @@ bool has_host_pmu_partition_support(void)
 	return has_vhe() &&
 		system_supports_pmuv3();
 }
+
+/**
+ * kvm_pmu_is_partitioned() - Determine if given PMU is partitioned
+ * @pmu: Pointer to arm_pmu struct
+ *
+ * Determine if given PMU is partitioned by looking at hpmn field. The
+ * PMU is partitioned if this field is less than the number of
+ * counters in the system.
+ *
+ * Return: True if the PMU is partitioned, false otherwise
+ */
+bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
+{
+	if (!pmu)
+		return false;
+
+	return pmu->max_guest_counters >= 0 &&
+		pmu->max_guest_counters <= *host_data_ptr(nr_event_counters);
+}
+
+/**
+ * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
+ * @vcpu: Pointer to kvm_vcpu struct
+ *
+ * Determine if given VCPU has a partitioned PMU by extracting that
+ * field and passing it to :c:func:`kvm_pmu_is_partitioned`
+ *
+ * Return: True if the VCPU PMU is partitioned, false otherwise
+ */
+bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
+{
+	return kvm_pmu_is_partitioned(vcpu->kvm->arch.arm_pmu) &&
+		false;
+}
+
+/**
+ * kvm_vcpu_pmu_use_fgt() - Determine if we can use FGT
+ * @vcpu: Pointer to struct kvm_vcpu
+ *
+ * Determine if we can use FGT for direct access to registers. We can
+ * if capabilities permit the number of guest counters requested.
+ *
+ * Return: True if we can use FGT, false otherwise
+ */
+bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
+{
+	u8 hpmn = vcpu->kvm->arch.nr_pmu_counters;
+
+	return kvm_vcpu_pmu_is_partitioned(vcpu) &&
+		cpus_have_final_cap(ARM64_HAS_FGT) &&
+		(hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0));
+}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 95f404cdcb2df..c3987e0c01775 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -92,6 +92,23 @@ void kvm_vcpu_pmu_resync_el0(void);
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3))
 
+bool kvm_pmu_is_partitioned(struct arm_pmu *pmu);
+
+#if !defined(__KVM_NVHE_HYPERVISOR__)
+bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu);
+#else
+static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
+static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+#endif
+
 /*
  * Updates the vcpu's view of the pmu events for this cpu.
  * Must be called before every vcpu run after disabling interrupts, to ensure
@@ -131,6 +148,14 @@ static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu,
 {
 	return 0;
 }
+static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
 static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu,
 					     u64 select_idx, u64 val) {}
 static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu,
@@ -228,6 +253,11 @@ static inline bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int id
 
 static inline void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) {}
 
+static inline bool kvm_pmu_is_partitioned(void *pmu)
+{
+	return false;
+}
+
 #endif
 
 #endif
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU
  2026-05-04 21:18 ` [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
@ 2026-05-04 22:09   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:09 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Untrapping PMSELR_EL0 causes KVM's trap handlers to use a stale selector index, breaking PMU emulation and event filtering.
- [High] Untrapping PMCR_EL0 reads exposes the physical counter limit (N) to the guest, leading to UNDEF crashes during PMU probing.
- [High] `kvm_pmu_is_partitioned()` uses `host_data_ptr(nr_event_counters)` to read the current CPU's capacity, causing preemption safety and correctness issues.
- [High] Conflicting definitions and missing forward declarations for `kvm_pmu_is_partitioned` break the build when KVM is disabled.
- [Low] Typo in commit message system register names.
--

commit c59f16e72b7efa37d6a241d40b8efb2151b0a1ca
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Set up FGT for Partitioned PMU

This commit sets up fine grain traps (FGT) to avoid trapping common PMU
register accesses by the guest. It untraps several registers while keeping
others trapped for overflow tracking and event filtering.

> Untrapped:
> * PMCR_EL0
> * PMUSERENR_EL0
> * PMSELR_EL0

If PMCR_EL0 reads are untrapped, won't the guest see the host's physical
number of counters (N) instead of the restricted limit? If the guest OS
PMU driver discovers the physical N and attempts to configure counters
greater than or equal to HPMN, won't KVM's emulation inject an UNDEFINED
exception and crash the guest?

Additionally, since PMSELR_EL0 is untrapped, the guest can modify the
hardware register directly without KVM's knowledge. When KVM handles
traps for aliases like PMXEVTYPER_EL0, will it read a stale selector
index from the memory backing of PMSELR_EL0? Could this allow a guest
to bypass KVM's event filtering by desynchronizing the hardware selector
from KVM's memory backing?

[ ... ]

> PMICNTR and PMIFILTR remain trapped because KVM is not handling them
> yet.
> 
> PMEVTYPERn remains trapped so KVM can limit which events guests can
> count, such as disallowing counting at EL2. PMCCFILTR and PMCIFILTR
> are special cases of the same.

This isn't a bug, but are PMIFILTR and PMCIFILTR typos in the commit
message? Should these be PMICFILTR and PMCCFILTR respectively?

> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index 154503f054886..42d62aa48d0a6 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -231,6 +231,10 @@ static inline bool kvm_set_pmuserenr(u64 val)
>  }
>  
>  static inline void kvm_vcpu_pmu_resync_el0(void) {}
> +static inline bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
> +{
> +	return false;
> +}

Since struct arm_pmu is not forward-declared before its use here, could
this cause build warnings or errors in certain configurations?

[ ... ]

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 74e40e4915416..2148bc46079c4 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c

[ ... ]

> +/**
> + * kvm_pmu_is_partitioned() - Determine if given PMU is partitioned
> + * @pmu: Pointer to arm_pmu struct
> + *
> + * Determine if given PMU is partitioned by looking at hpmn field. The
> + * PMU is partitioned if this field is less than the number of
> + * counters in the system.
> + *
> + * Return: True if the PMU is partitioned, false otherwise
> + */
> +bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
> +{
> +	if (!pmu)
> +		return false;
> +
> +	return pmu->max_guest_counters >= 0 &&
> +		pmu->max_guest_counters <= *host_data_ptr(nr_event_counters);
> +}

Because host_data_ptr() uses this_cpu_ptr_hyp_sym() and this_cpu_ptr(),
will this trigger a preemption warning if called from a preemptible
context?

Also, on big.LITTLE systems, does this compare the specific PMU's limit
against the currently executing CPU's capacity rather than the correct
capacity for that PMU?

[ ... ]

> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 95f404cdcb2df..c3987e0c01775 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h

[ ... ]

> @@ -228,6 +253,11 @@ static inline bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int id
>  
>  static inline void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) {}
>  
> +static inline bool kvm_pmu_is_partitioned(void *pmu)
> +{
> +	return false;
> +}

Does this fallback definition using void *pmu conflict with the one added
to arch/arm/include/asm/arm_pmuv3.h that uses struct arm_pmu *pmu? If KVM
is disabled and a driver includes both headers, could this result in a
fatal conflicting types compiler error?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=7

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (6 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:06   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU Colton Lewis
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

We may want a partitioned PMU but not have FEAT_FGT to untrap the
specific registers that would normally be untrapped. Add handling for
those trapped register accesses that does the right thing if the PMU
is partitioned.

For registers that shouldn't be written to hardware because they
require special handling (PMEVTYPER and PMOVS), write to the virtual
register. A later patch will ensure these are handled correctly at
vcpu_load time.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/sys_regs.c | 236 +++++++++++++++++++++++++++++++-------
 1 file changed, 197 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0a8e8ee69cd00..cc3d1804ab200 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -985,9 +985,25 @@ static u64 reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	return __vcpu_sys_reg(vcpu, r->reg);
 }
 
+static void pmu_write_pmuserenr(struct kvm_vcpu *vcpu, u64 val)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		write_sysreg(val, pmuserenr_el0);
+	else
+		__vcpu_assign_sys_reg(vcpu, PMUSERENR_EL0, val);
+}
+
+static u64 pmu_read_pmuserenr(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		return read_sysreg(pmuserenr_el0);
+	else
+		return __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+}
+
 static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
 {
-	u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+	u64 reg = pmu_read_pmuserenr(vcpu);
 	bool enabled = (reg & flags) || vcpu_mode_priv(vcpu);
 
 	if (!enabled)
@@ -1016,6 +1032,29 @@ static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
 	return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN);
 }
 
+static void pmu_write_pmcr(struct kvm_vcpu *vcpu, u64 val)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		write_sysreg(val, pmcr_el0);
+		return;
+	}
+
+	kvm_pmu_handle_pmcr(vcpu, val);
+}
+
+static u64 pmu_read_pmcr(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		return u64_replace_bits(
+			read_sysreg(pmcr_el0),
+			vcpu->kvm->arch.nr_pmu_counters,
+			ARMV8_PMU_PMCR_N);
+	}
+
+	return kvm_vcpu_read_pmcr(vcpu);
+
+}
+
 static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
@@ -1026,18 +1065,17 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (p->is_write) {
 		/*
-		 * Only update writeable bits of PMCR (continuing into
-		 * kvm_pmu_handle_pmcr() as well)
+		 * Only update writeable bits of PMCR
 		 */
-		val = kvm_vcpu_read_pmcr(vcpu);
+		val = pmu_read_pmcr(vcpu);
 		val &= ~ARMV8_PMU_PMCR_MASK;
 		val |= p->regval & ARMV8_PMU_PMCR_MASK;
 		if (!kvm_supports_32bit_el0())
 			val |= ARMV8_PMU_PMCR_LC;
-		kvm_pmu_handle_pmcr(vcpu, val);
+		pmu_write_pmcr(vcpu, val);
 	} else {
 		/* PMCR.P & PMCR.C are RAZ */
-		val = kvm_vcpu_read_pmcr(vcpu)
+		val = pmu_read_pmcr(vcpu)
 		      & ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
 		p->regval = val;
 	}
@@ -1045,6 +1083,24 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static void pmu_write_pmselr(struct kvm_vcpu *vcpu, u64 val)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		write_sysreg(val, pmselr_el0);
+		return;
+	}
+
+	__vcpu_assign_sys_reg(vcpu, PMSELR_EL0, val);
+}
+
+static u64 pmu_read_pmselr(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		return read_sysreg(pmselr_el0);
+
+	return __vcpu_sys_reg(vcpu, PMSELR_EL0);
+}
+
 static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
@@ -1052,10 +1108,10 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		return false;
 
 	if (p->is_write)
-		__vcpu_assign_sys_reg(vcpu, PMSELR_EL0, p->regval);
+		pmu_write_pmselr(vcpu, p->regval);
 	else
 		/* return PMSELR.SEL field */
-		p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
+		p->regval = pmu_read_pmselr(vcpu)
 			    & PMSELR_EL0_SEL_MASK;
 
 	return true;
@@ -1128,6 +1184,44 @@ static int set_pmu_evcntr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
 	return 0;
 }
 
+static void pmu_write_evcntr(struct kvm_vcpu *vcpu, u64 val, u64 idx)
+{
+	u64 pmselr;
+
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_counter_value(vcpu, idx, val);
+		return;
+	}
+
+	if (idx == ARMV8_PMU_CYCLE_IDX) {
+		write_sysreg(val, pmccntr_el0);
+		return;
+	}
+
+	pmselr = read_sysreg(pmselr_el0);
+	write_sysreg(idx, pmselr_el0);
+	write_sysreg(val, pmxevcntr_el0);
+	write_sysreg(pmselr, pmselr_el0);
+}
+
+static u64 pmu_read_evcntr(struct kvm_vcpu *vcpu, u64 idx)
+{
+	u64 pmselr;
+	u64 val;
+
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
+		return kvm_pmu_get_counter_value(vcpu, idx);
+
+	if (idx == ARMV8_PMU_CYCLE_IDX)
+		return read_sysreg(pmccntr_el0);
+
+	pmselr = read_sysreg(pmselr_el0);
+	write_sysreg(idx, pmselr_el0);
+	val = read_sysreg(pmxevcntr_el0);
+	write_sysreg(pmselr, pmselr_el0);
+	return val;
+}
+
 static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
 			      struct sys_reg_params *p,
 			      const struct sys_reg_desc *r)
@@ -1141,7 +1235,7 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
 				return false;
 
 			idx = SYS_FIELD_GET(PMSELR_EL0, SEL,
-					    __vcpu_sys_reg(vcpu, PMSELR_EL0));
+					    pmu_read_pmselr(vcpu));
 		} else if (r->Op2 == 0) {
 			/* PMCCNTR_EL0 */
 			if (pmu_access_cycle_counter_el0_disabled(vcpu))
@@ -1173,14 +1267,34 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
 		if (pmu_access_el0_disabled(vcpu))
 			return false;
 
-		kvm_pmu_set_counter_value(vcpu, idx, p->regval);
+		pmu_write_evcntr(vcpu, p->regval, idx);
 	} else {
-		p->regval = kvm_pmu_get_counter_value(vcpu, idx);
+		p->regval = pmu_read_evcntr(vcpu, idx);
 	}
 
 	return true;
 }
 
+
+static void pmu_write_evtyper(struct kvm_vcpu *vcpu, u64 val, u64 idx)
+{
+	u64 mask;
+
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		mask = kvm_pmu_evtyper_mask(vcpu->kvm);
+		__vcpu_assign_sys_reg(vcpu, PMEVTYPER0_EL0 + idx, val & mask);
+		return;
+	}
+
+	kvm_pmu_set_counter_event_type(vcpu, val, idx);
+	kvm_vcpu_pmu_restore_guest(vcpu);
+}
+
+static u64 pmu_read_evtyper(struct kvm_vcpu *vcpu, u64 idx)
+{
+	return __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + idx);
+}
+
 static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			       const struct sys_reg_desc *r)
 {
@@ -1191,7 +1305,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
 		/* PMXEVTYPER_EL0 */
-		idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
+		idx = SYS_FIELD_GET(PMSELR_EL0, SEL, pmu_read_pmselr(vcpu));
 		reg = PMEVTYPER0_EL0 + idx;
 	} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
 		idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
@@ -1207,12 +1321,10 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (!pmu_counter_idx_valid(vcpu, idx))
 		return false;
 
-	if (p->is_write) {
-		kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
-		kvm_vcpu_pmu_restore_guest(vcpu);
-	} else {
-		p->regval = __vcpu_sys_reg(vcpu, reg);
-	}
+	if (p->is_write)
+		pmu_write_evtyper(vcpu, p->regval, idx);
+	else
+		p->regval = pmu_read_evtyper(vcpu, idx);
 
 	return true;
 }
@@ -1235,6 +1347,35 @@ static int get_pmreg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 *v
 	return 0;
 }
 
+static void pmu_write_pmcnten(struct kvm_vcpu *vcpu, u64 val, bool set)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		if (set)
+			write_sysreg(val, pmcntenset_el0);
+		else
+			write_sysreg(val, pmcntenclr_el0);
+
+		return;
+	}
+
+	if (set)
+		/* accessing PMCNTENSET_EL0 */
+		__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, |=, val);
+	else
+		/* accessing PMCNTENCLR_EL0 */
+		__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, &=, ~val);
+
+	kvm_pmu_reprogram_counter_mask(vcpu, val);
+}
+
+static u64 pmu_read_pmcnten(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		return read_sysreg(pmcntenset_el0);
+
+	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+}
+
 static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
@@ -1246,40 +1387,58 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	mask = kvm_pmu_accessible_counter_mask(vcpu);
 	if (p->is_write) {
 		val = p->regval & mask;
-		if (r->Op2 & 0x1)
-			/* accessing PMCNTENSET_EL0 */
-			__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, |=, val);
-		else
-			/* accessing PMCNTENCLR_EL0 */
-			__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, &=, ~val);
-
-		kvm_pmu_reprogram_counter_mask(vcpu, val);
+		pmu_write_pmcnten(vcpu, val, r->Op2 & 0x1);
 	} else {
-		p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+		p->regval = pmu_read_pmcnten(vcpu);
 	}
 
 	return true;
 }
 
+static void pmu_write_pminten(struct kvm_vcpu *vcpu, u64 val, bool set)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		if (set)
+			write_sysreg(val, pmintenset_el1);
+		else
+			write_sysreg(val, pmintenclr_el1);
+
+		return;
+	}
+
+	if (set)
+		/* accessing PMINTENSET_EL1 */
+		__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, |=, val);
+	else
+		/* accessing PMINTENCLR_EL1 */
+		__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, &=, ~val);
+
+	kvm_pmu_reprogram_counter_mask(vcpu, val);
+}
+
+static u64 pmu_read_pminten(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		return read_sysreg(pmintenset_el1);
+
+	return __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
+}
+
 static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
+	u64 val, mask;
 
 	if (check_pmu_access_disabled(vcpu, 0))
 		return false;
 
+	mask = kvm_pmu_accessible_counter_mask(vcpu);
 	if (p->is_write) {
-		u64 val = p->regval & mask;
+		val = p->regval & mask;
 
-		if (r->Op2 & 0x1)
-			/* accessing PMINTENSET_EL1 */
-			__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, |=, val);
-		else
-			/* accessing PMINTENCLR_EL1 */
-			__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, &=, ~val);
+		pmu_write_pminten(vcpu, val, r->Op2 & 0x1);
 	} else {
-		p->regval = __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
+		p->regval = pmu_read_pminten(vcpu);
 	}
 
 	return true;
@@ -1330,10 +1489,9 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		if (!vcpu_mode_priv(vcpu))
 			return undef_access(vcpu, p, r);
 
-		__vcpu_assign_sys_reg(vcpu, PMUSERENR_EL0,
-				      (p->regval & ARMV8_PMU_USERENR_MASK));
+		pmu_write_pmuserenr(vcpu, p->regval & ARMV8_PMU_USERENR_MASK);
 	} else {
-		p->regval = __vcpu_sys_reg(vcpu, PMUSERENR_EL0)
+		p->regval = pmu_read_pmuserenr(vcpu)
 			    & ARMV8_PMU_USERENR_MASK;
 	}
 
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers
  2026-05-04 21:18 ` [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers Colton Lewis
@ 2026-05-04 22:06   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:06 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [Critical] Guest PMCR_EL0 writes directly trigger hardware reset at host privilege level, clearing host PMU counters.
- [High] Unmasked reads of PMCNTENSET_EL0 and PMINTENSET_EL1 leak host PMU configuration to the guest.
- [High] Race condition and data corruption due to hardware PMU registers being accessed without disabling IRQs or preemption.
- [High] Architectural violation: Missing instruction synchronization barrier (ISB) between PMSELR_EL0 write and PMXEVCNTR_EL0 access.
- [Medium] Nested virtualization boundary bypass in `pmu_read_pmcr`.
- [Low] Dead code and brittle logic: Unused `reg` variable in `access_pmu_evtyper` relies on implicit enum layout.
--

commit 6a28050b2477f91b45c9d48b5b98d954e310fb44
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Add Partitioned PMU register trap handlers

We may want a partitioned PMU but not have FEAT_FGT to untrap the
specific registers that would normally be untrapped. Add handling for
those trapped register accesses that does the right thing if the PMU
is partitioned.

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0a8e8ee69cd00..cc3d1804ab200 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c

[ ... ]

> @@ -1016,6 +1032,29 @@ static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
>  	return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN);
>  }
>  
> +static void pmu_write_pmcr(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		write_sysreg(val, pmcr_el0);
> +		return;
> +	}
> +
> +	kvm_pmu_handle_pmcr(vcpu, val);
> +}

Since KVM sysreg trap handlers execute at EL2 in VHE mode, does writing the
guest's value directly to pmcr_el0 allow a guest to reset all host PMU
counters?

According to the ARM Architecture Reference Manual, writing PMCR_EL0.P = 1 or
PMCR_EL0.C = 1 from EL2 unconditionally resets all event counters or the cycle
counter respectively, bypassing MDCR_EL2.HPMN isolation.

> +
> +static u64 pmu_read_pmcr(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		return u64_replace_bits(
> +			read_sysreg(pmcr_el0),
> +			vcpu->kvm->arch.nr_pmu_counters,
> +			ARMV8_PMU_PMCR_N);
> +	}
> +
> +	return kvm_vcpu_read_pmcr(vcpu);
> +
> +}

Does this hardcoded replacement of the N field with
vcpu->kvm->arch.nr_pmu_counters ignore L1's nested virtualization limits?

The standard KVM read path in kvm_vcpu_read_pmcr() recalculates the N field
based on nested state to restrict counters for an L2 guest.

[ ... ]

> @@ -1128,6 +1184,44 @@ static int set_pmu_evcntr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
>  	return 0;
>  }
>  
> +static void pmu_write_evcntr(struct kvm_vcpu *vcpu, u64 val, u64 idx)
> +{
> +	u64 pmselr;
> +
> +	if (!kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		kvm_pmu_set_counter_value(vcpu, idx, val);
> +		return;
> +	}
> +
> +	if (idx == ARMV8_PMU_CYCLE_IDX) {
> +		write_sysreg(val, pmccntr_el0);
> +		return;
> +	}
> +
> +	pmselr = read_sysreg(pmselr_el0);
> +	write_sysreg(idx, pmselr_el0);
> +	write_sysreg(val, pmxevcntr_el0);
> +	write_sysreg(pmselr, pmselr_el0);
> +}

Are preemption and local IRQs enabled here?

If a host Performance Monitoring Interrupt (PMI) fires between setting
pmselr_el0 and accessing pmxevcntr_el0, couldn't the host PMU driver modify
pmselr_el0 to service its own counter, causing KVM to access the wrong
counter when it resumes?

Also, is an instruction synchronization barrier (ISB) required between
writing pmselr_el0 and accessing pmxevcntr_el0?

The ARM Architecture Reference Manual requires a context synchronization event
after writing to a selector register before accessing multiplexed registers
like pmxevcntr_el0.

> +
> +static u64 pmu_read_evcntr(struct kvm_vcpu *vcpu, u64 idx)
> +{
> +	u64 pmselr;
> +	u64 val;
> +
> +	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
> +		return kvm_pmu_get_counter_value(vcpu, idx);
> +
> +	if (idx == ARMV8_PMU_CYCLE_IDX)
> +		return read_sysreg(pmccntr_el0);
> +
> +	pmselr = read_sysreg(pmselr_el0);
> +	write_sysreg(idx, pmselr_el0);
> +	val = read_sysreg(pmxevcntr_el0);
> +	write_sysreg(pmselr, pmselr_el0);
> +	return val;
> +}

The same questions regarding preemption, local IRQs, and the missing ISB
apply here when reading pmxevcntr_el0.

[ ... ]

> +static void pmu_write_evtyper(struct kvm_vcpu *vcpu, u64 val, u64 idx)
> +{
> +	u64 mask;
> +
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		mask = kvm_pmu_evtyper_mask(vcpu->kvm);
> +		__vcpu_assign_sys_reg(vcpu, PMEVTYPER0_EL0 + idx, val & mask);
> +		return;
> +	}
> +
> +	kvm_pmu_set_counter_event_type(vcpu, val, idx);
> +	kvm_vcpu_pmu_restore_guest(vcpu);
> +}

Does PMEVTYPER0_EL0 + idx assume a specific enum layout to reach PMCCFILTR_EL0
when idx is ARMV8_PMU_CYCLE_IDX (31)?

[ ... ]

> @@ -1191,7 +1305,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  
>  	if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
>  		/* PMXEVTYPER_EL0 */
> -		idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
> +		idx = SYS_FIELD_GET(PMSELR_EL0, SEL, pmu_read_pmselr(vcpu));
>  		reg = PMEVTYPER0_EL0 + idx;
>  	} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
>  		idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
> @@ -1207,12 +1321,10 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	if (!pmu_counter_idx_valid(vcpu, idx))
>  		return false;
>  
> -	if (p->is_write) {
> -		kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
> -		kvm_vcpu_pmu_restore_guest(vcpu);
> -	} else {
> -		p->regval = __vcpu_sys_reg(vcpu, reg);
> -	}
> +	if (p->is_write)
> +		pmu_write_evtyper(vcpu, p->regval, idx);
> +	else
> +		p->regval = pmu_read_evtyper(vcpu, idx);

Is the reg variable computed above now completely unused in this function?

[ ... ]

> +static u64 pmu_read_pmcnten(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu))
> +		return read_sysreg(pmcntenset_el0);
> +
> +	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
> +}
> +
>  static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  			   const struct sys_reg_desc *r)
>  {
> @@ -1246,40 +1387,58 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	mask = kvm_pmu_accessible_counter_mask(vcpu);
>  	if (p->is_write) {
>  		val = p->regval & mask;
> -		if (r->Op2 & 0x1)
> -			/* accessing PMCNTENSET_EL0 */
> -			__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, |=, val);
> -		else
> -			/* accessing PMCNTENCLR_EL0 */
> -			__vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, &=, ~val);
> -
> -		kvm_pmu_reprogram_counter_mask(vcpu, val);
> +		pmu_write_pmcnten(vcpu, val, r->Op2 & 0x1);
>  	} else {
> -		p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
> +		p->regval = pmu_read_pmcnten(vcpu);
>  	}
>  
>  	return true;
>  }

Since KVM executes at EL2, does reading pmcntenset_el0 directly return the
global PMU state, including enable bits for host-owned counters (indices >=
HPMN)?

Without applying kvm_pmu_accessible_counter_mask(vcpu) to the result in the
read path, it appears to expose host PMU activity to the guest.

[ ... ]

> +static u64 pmu_read_pminten(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu))
> +		return read_sysreg(pmintenset_el1);
> +
> +	return __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
> +}
> +
>  static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  			   const struct sys_reg_desc *r)
>  {
> -	u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
> +	u64 val, mask;
>  
>  	if (check_pmu_access_disabled(vcpu, 0))
>  		return false;
>  
> +	mask = kvm_pmu_accessible_counter_mask(vcpu);
>  	if (p->is_write) {
> -		u64 val = p->regval & mask;
> +		val = p->regval & mask;
>  
> -		if (r->Op2 & 0x1)
> -			/* accessing PMINTENSET_EL1 */
> -			__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, |=, val);
> -		else
> -			/* accessing PMINTENCLR_EL1 */
> -			__vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, &=, ~val);
> +		pmu_write_pminten(vcpu, val, r->Op2 & 0x1);
>  	} else {
> -		p->regval = __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
> +		p->regval = pmu_read_pminten(vcpu);
>  	}
>  
>  	return true;
>  }

Similarly, does reading pmintenset_el1 directly leak host PMU interrupt
configuration to the guest without applying the mask?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=8

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (7 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:02   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Set up MDCR_EL2 to handle a Partitioned PMU. That means calculate an
appropriate value for HPMN instead of the default maximum setting the
host allows (which implies no partition) so hardware enforces that a
guest will only see the counters in the guest partition.

Setting HPMN to a non default value means the global enable bit for
the host counters is now MDCR_EL2.HPME instead of the usual
PMCR_EL0.E. Enable the HPME bit to allow the host to count guest
events. Since HPME only has an effect when HPMN is set which we only
do for the guest, it is correct to enable it unconditionally here.

Unset the TPM and TPMCR bits, which trap all PMU accesses, if
FGT (fine grain trapping) is being used.

If available, set the filtering bits HPMD and HCCD to be extra sure
nothing in the guest counts at EL2.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/debug.c      | 29 ++++++++++++++++++++++++++---
 arch/arm64/kvm/pmu-direct.c | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/pmu.c        |  7 +++++++
 include/kvm/arm_pmu.h       | 11 +++++++++++
 4 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 3ad6b7c6e4ba7..0ab89c91e19cb 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -36,20 +36,43 @@ static int cpu_has_spe(u64 dfr0)
  */
 static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 {
+	int hpmn = kvm_pmu_hpmn(vcpu);
+
 	preempt_disable();
 
 	/*
 	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
 	 * to disable guest access to the profiling and trace buffers
 	 */
-	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
-					 *host_data_ptr(nr_event_counters));
+
+	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
 	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
 				MDCR_EL2_TPMS |
 				MDCR_EL2_TTRF |
 				MDCR_EL2_TPMCR |
 				MDCR_EL2_TDRA |
-				MDCR_EL2_TDOSA);
+				MDCR_EL2_TDOSA |
+				MDCR_EL2_HPME);
+
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		/*
+		 * Filtering these should be redundant because we trap
+		 * all the TYPER and FILTR registers anyway and ensure
+		 * they filter EL2, but set the bits if they are here.
+		 */
+		if (is_pmuv3p1(read_pmuver()))
+			vcpu->arch.mdcr_el2 |= MDCR_EL2_HPMD;
+		if (is_pmuv3p5(read_pmuver()))
+			vcpu->arch.mdcr_el2 |= MDCR_EL2_HCCD;
+
+		/*
+		 * Take out the coarse grain traps if we are using
+		 * fine grain traps.
+		 */
+		if (kvm_vcpu_pmu_use_fgt(vcpu))
+			vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_TPM | MDCR_EL2_TPMCR);
+
+	}
 
 	/* Is the VM being debugged by userspace? */
 	if (vcpu->guest_debug)
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 2148bc46079c4..63ac72910e4b5 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -74,3 +74,27 @@ bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
 		cpus_have_final_cap(ARM64_HAS_FGT) &&
 		(hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0));
 }
+
+/**
+ * kvm_pmu_hpmn() - Calculate HPMN field value
+ * @vcpu: Pointer to struct kvm_vcpu
+ *
+ * Calculate the appropriate value to set for MDCR_EL2.HPMN. If
+ * partitioned, this is the number of counters set for the guest if
+ * supported, falling back to max_guest_counters if needed. If we are not
+ * partitioned or can't set the implied HPMN value, fall back to the
+ * host value.
+ *
+ * Return: A valid HPMN value
+ */
+u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
+{
+	u8 nr_guest_cntr = vcpu->kvm->arch.nr_pmu_counters;
+
+	if (kvm_vcpu_pmu_is_partitioned(vcpu)
+	    && !vcpu_on_unsupported_cpu(vcpu)
+	    && (cpus_have_final_cap(ARM64_HAS_HPMN0) || nr_guest_cntr > 0))
+		return nr_guest_cntr;
+
+	return *host_data_ptr(nr_event_counters);
+}
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index ee2f0f7e61bcf..8c10ad05661bc 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -542,6 +542,13 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
 	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
 		return 1;
 
+	/*
+	 * If partitioned then we are limited by the max counters in
+	 * the guest partition.
+	 */
+	if (kvm_pmu_is_partitioned(arm_pmu))
+		return arm_pmu->max_guest_counters;
+
 	/*
 	 * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
 	 * Ignore those and return only the general-purpose counters.
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index c3987e0c01775..6aaeb27642540 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -94,6 +94,9 @@ void kvm_vcpu_pmu_resync_el0(void);
 
 bool kvm_pmu_is_partitioned(struct arm_pmu *pmu);
 
+u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu);
+u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu);
+
 #if !defined(__KVM_NVHE_HYPERVISOR__)
 bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu);
@@ -156,6 +159,14 @@ static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
 {
 	return false;
 }
+static inline u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
 static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu,
 					     u64 select_idx, u64 val) {}
 static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu,
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
  2026-05-04 21:18 ` [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU Colton Lewis
@ 2026-05-04 22:02   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:02 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Unconditionally setting MDCR_EL2.HPME forces host PMU counters on, breaking the host's ability to globally disable them.
- [High] Setting MDCR_EL2.HPMD and MDCR_EL2.HCCD unconditionally disables host profiling at EL2, blinding `perf` to the VHE host kernel.
- [High] kvm_pmu_hpmn() fails to fall back to max_guest_counters when HPMN0 is unsupported, breaking host partitioning.
- [High] Unsafe per-CPU data access with preemption enabled causes TOCTOU races and UNPREDICTABLE state on heterogeneous CPUs.
- [High] Desynchronized vcpu_on_unsupported_cpu() checks grant the guest untrapped access to the physical PMU of unsupported CPUs.
--

commit 2fa21e7add3769fb81de901ba24477b380914693
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU

This commit configures MDCR_EL2 to support PMU partitioning by calculating the
appropriate HPMN value and configuring coarse and fine-grained traps. It also
enables host PMU counting and sets filtering bits for the PMU hardware.

> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 3ad6b7c6e4ba7..0ab89c91e19cb 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -36,20 +36,43 @@ static int cpu_has_spe(u64 dfr0)
>   */
>  static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
>  {
> +	int hpmn = kvm_pmu_hpmn(vcpu);
> +
>  	preempt_disable();

Does calling kvm_pmu_hpmn() before preempt_disable() cause a Time-of-Check to
Time-of-Use race here?

kvm_pmu_hpmn() uses host_data_ptr(nr_event_counters), which accesses
per-CPU data. If the thread migrates, could KVM program MDCR_EL2.HPMN
using the counter value from the wrong physical CPU?

[ ... ]

> -	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
> -					 *host_data_ptr(nr_event_counters));
> +	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
>  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>  				MDCR_EL2_TPMS |
>  				MDCR_EL2_TTRF |
>  				MDCR_EL2_TPMCR |
>  				MDCR_EL2_TDRA |
> -				MDCR_EL2_TDOSA);
> +				MDCR_EL2_TDOSA |
> +				MDCR_EL2_HPME);

Does unconditionally setting MDCR_EL2.HPME force the host PMU counters on?

When PMU partitioning is active, MDCR_EL2.HPME becomes the global enable for
the host partition. If KVM forces this to 1 while the vCPU is loaded, does
the host PMU driver lose the ability to globally stop its counters, for
example during counter reprogramming or overflow IRQ handling?

> +	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		/*
> +		 * Filtering these should be redundant because we trap
> +		 * all the TYPER and FILTR registers anyway and ensure
> +		 * they filter EL2, but set the bits if they are here.
> +		 */
> +		if (is_pmuv3p1(read_pmuver()))
> +			vcpu->arch.mdcr_el2 |= MDCR_EL2_HPMD;
> +		if (is_pmuv3p5(read_pmuver()))
> +			vcpu->arch.mdcr_el2 |= MDCR_EL2_HCCD;

The commit message states these bits are set to ensure nothing in the guest
counts at EL2. Do these bits actually apply to the host partition rather than
the guest?

According to the ARM Architecture Reference Manual, when partitioned, the
guest counters are already prohibited from counting at EL2, while HPMD and
HCCD apply to the host partition counters. Does setting these bits permanently
disable the host's own counters from profiling at EL2?

> +		/*
> +		 * Take out the coarse grain traps if we are using
> +		 * fine grain traps.
> +		 */
> +		if (kvm_vcpu_pmu_use_fgt(vcpu))
> +			vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_TPM | MDCR_EL2_TPMCR);
> +
> +	}

Can this grant the guest untrapped access to the physical PMU on unsupported
CPUs?

If a vCPU is on an unsupported CPU, kvm_pmu_hpmn() returns the host's maximum
counters, effectively disabling the partition. However, kvm_vcpu_pmu_use_fgt()
does not check vcpu_on_unsupported_cpu(). Will it return true and clear the
trap bits here, allowing direct hardware access to the PMU?

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 2148bc46079c4..63ac72910e4b5 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c

[ ... ]

> +u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
> +{
> +	u8 nr_guest_cntr = vcpu->kvm->arch.nr_pmu_counters;
> +
> +	if (kvm_vcpu_pmu_is_partitioned(vcpu)
> +	    && !vcpu_on_unsupported_cpu(vcpu)
> +	    && (cpus_have_final_cap(ARM64_HAS_HPMN0) || nr_guest_cntr > 0))
> +		return nr_guest_cntr;
> +
> +	return *host_data_ptr(nr_event_counters);
> +}

The commit message states this will fall back to max_guest_counters if needed
when partitioned but unable to set the implied HPMN value. Does the
implementation miss this fallback?

If the conditions fail (for instance, nr_guest_cntr is 0 on hardware without
HPMN0), it returns the total number of host counters instead of
max_guest_counters. Will setting HPMN to the maximum counters destroy the host
partition and cause the host's reserved counters to silently stop?

> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index ee2f0f7e61bcf..8c10ad05661bc 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -542,6 +542,13 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
>  	if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
>  		return 1;
>  
> +	/*
> +	 * If partitioned then we are limited by the max counters in
> +	 * the guest partition.
> +	 */
> +	if (kvm_pmu_is_partitioned(arm_pmu))
> +		return arm_pmu->max_guest_counters;
> +

Can this be called with preemption enabled?

kvm_arm_pmu_get_max_counters() can be called from userspace ioctls. Since
kvm_pmu_is_partitioned() evaluates host_data_ptr(), does this create a race
condition with per-CPU data access similar to the one in
kvm_arm_setup_mdcr_el2()?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=9

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (8 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:01   ` sashiko-bot
  2026-05-11 14:49   ` James Clark
  2026-05-04 21:18 ` [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load() Colton Lewis
                   ` (10 subsequent siblings)
  20 siblings, 2 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Save and restore newly untrapped registers that can be directly
accessed by the guest when the PMU is partitioned.

* PMEVCNTRn_EL0
* PMCCNTR_EL0
* PMSELR_EL0
* PMCR_EL0
* PMCNTEN_EL0
* PMINTEN_EL1

If we know we are not partitioned (that is, using the emulated vPMU),
then return immediately. A later patch will make this lazy so the
context swaps don't happen unless the guest has accessed the PMU.

PMEVTYPER is handled in a following patch since we must apply the KVM
event filter before writing values to hardware.

PMOVS guest counters are cleared to avoid the possibility of
generating spurious interrupts when PMINTEN is written. This is fine
because the virtual register for PMOVS is always the canonical value.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm/include/asm/arm_pmuv3.h |   4 +
 arch/arm64/kvm/arm.c             |   2 +
 arch/arm64/kvm/pmu-direct.c      | 169 +++++++++++++++++++++++++++++++
 include/kvm/arm_pmu.h            |  16 +++
 4 files changed, 191 insertions(+)

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index 42d62aa48d0a6..eebc89bdab7a1 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -235,6 +235,10 @@ static inline bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
 {
 	return false;
 }
+static inline u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
+{
+	return ~0;
+}
 
 /* PMU Version in DFR Register */
 #define ARMV8_PMU_DFR_VER_NI        0
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73a..a942f2bc13fc4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -680,6 +680,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		kvm_vcpu_load_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
 	kvm_vcpu_pmu_restore_guest(vcpu);
+	kvm_pmu_load(vcpu);
 	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
 		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
@@ -721,6 +722,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
+	kvm_pmu_put(vcpu);
 	if (vcpu_has_nv(vcpu))
 		kvm_vcpu_put_hw_mmu(vcpu);
 	kvm_arm_vmid_clear_active();
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 63ac72910e4b5..360d022d918d5 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -9,6 +9,7 @@
 #include <linux/perf/arm_pmuv3.h>
 
 #include <asm/arm_pmuv3.h>
+#include <asm/kvm_emulate.h>
 
 /**
  * has_host_pmu_partition_support() - Determine if partitioning is possible
@@ -98,3 +99,171 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
 
 	return *host_data_ptr(nr_event_counters);
 }
+
+/**
+ * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters
+ * @pmu: Pointer to arm_pmu struct
+ *
+ * Compute the bitmask that selects the host-reserved counters in the
+ * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters
+ * in HPMN..N
+ *
+ * Return: Bitmask
+ */
+u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
+{
+	u8 nr_counters = *host_data_ptr(nr_event_counters);
+
+	if (kvm_pmu_is_partitioned(pmu))
+		return GENMASK(nr_counters - 1, pmu->max_guest_counters);
+
+	return ARMV8_PMU_CNT_MASK_ALL;
+}
+
+/**
+ * kvm_pmu_guest_counter_mask() - Compute bitmask of guest-reserved counters
+ * @pmu: Pointer to arm_pmu struct
+ *
+ * Compute the bitmask that selects the guest-reserved counters in the
+ * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters
+ * in 0..HPMN and the cycle and instruction counters.
+ *
+ * Return: Bitmask
+ */
+u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu)
+{
+	if (kvm_pmu_is_partitioned(pmu))
+		return ARMV8_PMU_CNT_MASK_C | GENMASK(pmu->max_guest_counters - 1, 0);
+
+	return 0;
+}
+
+/**
+ * kvm_pmu_load() - Load untrapped PMU registers
+ * @vcpu: Pointer to struct kvm_vcpu
+ *
+ * Load all untrapped PMU registers from the VCPU into the PCPU. Mask
+ * to only bits belonging to guest-reserved counters and leave
+ * host-reserved counters alone in bitmask registers.
+ */
+void kvm_pmu_load(struct kvm_vcpu *vcpu)
+{
+	struct arm_pmu *pmu;
+	unsigned long guest_counters;
+	u64 mask;
+	u8 i;
+	u64 val;
+
+	/*
+	 * If we aren't guest-owned then we know the guest isn't using
+	 * the PMU anyway, so no need to bother with the swap.
+	 */
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
+		return;
+
+	preempt_disable();
+
+	pmu = vcpu->kvm->arch.arm_pmu;
+	guest_counters = kvm_pmu_guest_counter_mask(pmu);
+
+	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
+		val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);
+
+		if (i == ARMV8_PMU_CYCLE_IDX) {
+			write_sysreg(val, pmccntr_el0);
+		} else {
+			write_sysreg(i, pmselr_el0);
+			write_sysreg(val, pmxevcntr_el0);
+		}
+	}
+
+	val = __vcpu_sys_reg(vcpu, PMSELR_EL0);
+	write_sysreg(val, pmselr_el0);
+
+	/* Save only the stateful writable bits. */
+	val = __vcpu_sys_reg(vcpu, PMCR_EL0);
+	mask = ARMV8_PMU_PMCR_MASK &
+		~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
+	write_sysreg(val & mask, pmcr_el0);
+
+	/*
+	 * When handling these:
+	 * 1. Apply only the bits for guest counters (indicated by mask)
+	 * 2. Use the different registers for set and clear
+	 */
+	mask = kvm_pmu_guest_counter_mask(pmu);
+
+	/* Clear the hardware overflow flags so there is no chance of
+	 * creating spurious interrupts. The hardware here is never
+	 * the canonical version anyway.
+	 */
+	write_sysreg(mask, pmovsclr_el0);
+
+	val = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+	write_sysreg(val & mask, pmcntenset_el0);
+	write_sysreg(~val & mask, pmcntenclr_el0);
+
+	val = __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
+	write_sysreg(val & mask, pmintenset_el1);
+	write_sysreg(~val & mask, pmintenclr_el1);
+
+	preempt_enable();
+}
+
+/**
+ * kvm_pmu_put() - Put untrapped PMU registers
+ * @vcpu: Pointer to struct kvm_vcpu
+ *
+ * Put all untrapped PMU registers from the VCPU into the PCPU. Mask
+ * to only bits belonging to guest-reserved counters and leave
+ * host-reserved counters alone in bitmask registers.
+ */
+void kvm_pmu_put(struct kvm_vcpu *vcpu)
+{
+	struct arm_pmu *pmu;
+	unsigned long guest_counters;
+	u64 mask;
+	u8 i;
+	u64 val;
+
+	/*
+	 * If we aren't guest-owned then we know the guest is not
+	 * accessing the PMU anyway, so no need to bother with the
+	 * swap.
+	 */
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
+		return;
+
+	preempt_disable();
+
+	pmu = vcpu->kvm->arch.arm_pmu;
+	guest_counters = kvm_pmu_guest_counter_mask(pmu);
+
+	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
+		if (i == ARMV8_PMU_CYCLE_IDX) {
+			val = read_sysreg(pmccntr_el0);
+		} else {
+			write_sysreg(i, pmselr_el0);
+			val = read_sysreg(pmxevcntr_el0);
+		}
+
+		__vcpu_assign_sys_reg(vcpu, PMEVCNTR0_EL0 + i, val);
+	}
+
+	val = read_sysreg(pmselr_el0);
+	__vcpu_assign_sys_reg(vcpu, PMSELR_EL0, val);
+
+	val = read_sysreg(pmcr_el0);
+	__vcpu_assign_sys_reg(vcpu, PMCR_EL0, val);
+
+	/* Mask these to only save the guest relevant bits. */
+	mask = kvm_pmu_guest_counter_mask(pmu);
+
+	val = read_sysreg(pmcntenset_el0);
+	__vcpu_assign_sys_reg(vcpu, PMCNTENSET_EL0, val & mask);
+
+	val = read_sysreg(pmintenset_el1);
+	__vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask);
+
+	preempt_enable();
+}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 6aaeb27642540..fa881dc5f5832 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -96,6 +96,10 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu);
 
 u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu);
 u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu);
+u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu);
+u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu);
+void kvm_pmu_load(struct kvm_vcpu *vcpu);
+void kvm_pmu_put(struct kvm_vcpu *vcpu);
 
 #if !defined(__KVM_NVHE_HYPERVISOR__)
 bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
@@ -167,6 +171,8 @@ static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+static inline void kvm_pmu_load(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_put(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu,
 					     u64 select_idx, u64 val) {}
 static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu,
@@ -269,6 +275,16 @@ static inline bool kvm_pmu_is_partitioned(void *pmu)
 	return false;
 }
 
+static inline u64 kvm_pmu_host_counter_mask(void *kvm)
+{
+	return ~0;
+}
+
+static inline u64 kvm_pmu_guest_counter_mask(void *kvm)
+{
+	return 0;
+}
+
 #endif
 
 #endif
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers
  2026-05-04 21:18 ` [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
@ 2026-05-04 22:01   ` sashiko-bot
  2026-05-11 14:49   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:01 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] kvm_pmu_put() clobbers the guest's PMSELR_EL0 register by reading it after a loop that mutates it.
- [High] Missing ISB instruction after writing PMSELR_EL0 violates ARM architecture requirements.
- [High] kvm_pmu_put() clobbers read-only virtual PMCR_EL0 fields and fails to restore the host's PMCR_EL0.
- [High] Hardware PMU counter overflows (PMOVSSET_EL0) occurring without guest interrupts enabled are permanently lost.
- [High] kvm_pmu_guest_counter_mask() incorrectly uses the hardware maximum instead of the vCPU's configured counter limit.
- [High] Guest PMU counters and interrupts are left enabled in hardware when switching back to the host.
- [Medium] GENMASK macros produce undefined behavior and invalid masks when calculating 0-length counter sets.
--

commit cbcf45d4a902a24d3f32262cdbec6312af24a97d
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Context swap Partitioned PMU guest registers

Save and restore newly untrapped registers that can be directly
accessed by the guest when the PMU is partitioned.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 63ac72910e4b5..360d022d918d5 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c

[ ... ]

> @@ -98,3 +99,171 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
> +u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
> +{
> +	u8 nr_counters = *host_data_ptr(nr_event_counters);
> +
> +	if (kvm_pmu_is_partitioned(pmu))
> +		return GENMASK(nr_counters - 1, pmu->max_guest_counters);

Does this cause a regression where the GENMASK macro evaluates to an invalid
all-ones mask on ARM64 if nr_counters equals pmu->max_guest_counters?

> +
> +	return ARMV8_PMU_CNT_MASK_ALL;
> +}
> +
> +/**
> + * kvm_pmu_guest_counter_mask() - Compute bitmask of guest-reserved counters

[ ... ]

> +u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu)
> +{
> +	if (kvm_pmu_is_partitioned(pmu))
> +		return ARMV8_PMU_CNT_MASK_C | GENMASK(pmu->max_guest_counters - 1, 0);

Could this mask incorrectly include host-owned counters if a VM is configured
with fewer counters than the physical maximum? It appears to use the
hardware maximum instead of the vCPU's configured counter limit.

Also, does this trigger the same GENMASK undefined behavior regression as
above if pmu->max_guest_counters is 0?

> +
> +	return 0;
> +}
> +
> +/**
> + * kvm_pmu_load() - Load untrapped PMU registers

[ ... ]

> +void kvm_pmu_load(struct kvm_vcpu *vcpu)
> +{

[ ... ]

> +	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
> +		val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);
> +
> +		if (i == ARMV8_PMU_CYCLE_IDX) {
> +			write_sysreg(val, pmccntr_el0);
> +		} else {
> +			write_sysreg(i, pmselr_el0);
> +			write_sysreg(val, pmxevcntr_el0);

Should there be a synchronization event between writing pmselr_el0 and
accessing pmxevcntr_el0? The ARMv8 architecture requires an isb instruction
here to prevent the pipeline from reordering the access and targeting the
previously selected counter.

> +		}
> +	}
> +
> +	val = __vcpu_sys_reg(vcpu, PMSELR_EL0);
> +	write_sysreg(val, pmselr_el0);
> +
> +	/* Save only the stateful writable bits. */
> +	val = __vcpu_sys_reg(vcpu, PMCR_EL0);
> +	mask = ARMV8_PMU_PMCR_MASK &
> +		~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
> +	write_sysreg(val & mask, pmcr_el0);

Will overwriting the physical pmcr_el0 here silently cause a regression for
host PMU functionality? It looks like the host's original pmcr_el0 is never
restored when switching back to unpartitioned mode in kvm_pmu_put.

> +
> +	/*
> +	 * When handling these:
> +	 * 1. Apply only the bits for guest counters (indicated by mask)
> +	 * 2. Use the different registers for set and clear
> +	 */
> +	mask = kvm_pmu_guest_counter_mask(pmu);
> +
> +	/* Clear the hardware overflow flags so there is no chance of
> +	 * creating spurious interrupts. The hardware here is never
> +	 * the canonical version anyway.
> +	 */
> +	write_sysreg(mask, pmovsclr_el0);

Since kvm_pmu_put does not seem to read and save the hardware pmovsset_el0
register on VM exit, does clearing it here mean that any untrapped counter
overflows that occurred without guest interrupts enabled are permanently
lost?

> +
> +	val = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
> +	write_sysreg(val & mask, pmcntenset_el0);
> +	write_sysreg(~val & mask, pmcntenclr_el0);

[ ... ]

> +void kvm_pmu_put(struct kvm_vcpu *vcpu)
> +{

[ ... ]

> +	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
> +		if (i == ARMV8_PMU_CYCLE_IDX) {
> +			val = read_sysreg(pmccntr_el0);
> +		} else {
> +			write_sysreg(i, pmselr_el0);
> +			val = read_sysreg(pmxevcntr_el0);

Is a synchronization event needed here as well, similar to the logic in
kvm_pmu_load?

> +		}
> +
> +		__vcpu_assign_sys_reg(vcpu, PMEVCNTR0_EL0 + i, val);
> +	}
> +
> +	val = read_sysreg(pmselr_el0);
> +	__vcpu_assign_sys_reg(vcpu, PMSELR_EL0, val);

Does this regression clobber the guest's pmselr_el0 state? The loop directly
above mutates the hardware pmselr_el0 register. Reading it immediately after
the loop would save the last loop index accessed instead of the guest's
programmed value.

> +
> +	val = read_sysreg(pmcr_el0);
> +	__vcpu_assign_sys_reg(vcpu, PMCR_EL0, val);

Could this blindly overwrite the read-only fields of the virtual pmcr_el0
(like N, IMP, IDCODE) with the host's physical values?

> +
> +	/* Mask these to only save the guest relevant bits. */
> +	mask = kvm_pmu_guest_counter_mask(pmu);
> +
> +	val = read_sysreg(pmcntenset_el0);
> +	__vcpu_assign_sys_reg(vcpu, PMCNTENSET_EL0, val & mask);
> +
> +	val = read_sysreg(pmintenset_el1);
> +	__vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask);

Since partitioned vPMUs do not use host perf events, does this leave the
guest's hardware counters and interrupts active while the CPU executes
host code? This regression might leak host execution counts into the guest's
virtual counters and generate spurious overflow interrupts on the host.

> +
> +	preempt_enable();
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=10

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers
  2026-05-04 21:18 ` [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
  2026-05-04 22:01   ` sashiko-bot
@ 2026-05-11 14:49   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: James Clark @ 2026-05-11 14:49 UTC (permalink / raw)
  To: Colton Lewis
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-perf-users,
	linux-kselftest, kvm



On 04/05/2026 10:18 pm, Colton Lewis wrote:
> Save and restore newly untrapped registers that can be directly
> accessed by the guest when the PMU is partitioned.
> 
> * PMEVCNTRn_EL0
> * PMCCNTR_EL0
> * PMSELR_EL0
> * PMCR_EL0
> * PMCNTEN_EL0
> * PMINTEN_EL1
> 
> If we know we are not partitioned (that is, using the emulated vPMU),
> then return immediately. A later patch will make this lazy so the
> context swaps don't happen unless the guest has accessed the PMU.
> 
> PMEVTYPER is handled in a following patch since we must apply the KVM
> event filter before writing values to hardware.
> 
> PMOVS guest counters are cleared to avoid the possibility of
> generating spurious interrupts when PMINTEN is written. This is fine
> because the virtual register for PMOVS is always the canonical value.
> 
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
>   arch/arm/include/asm/arm_pmuv3.h |   4 +
>   arch/arm64/kvm/arm.c             |   2 +
>   arch/arm64/kvm/pmu-direct.c      | 169 +++++++++++++++++++++++++++++++
>   include/kvm/arm_pmu.h            |  16 +++
>   4 files changed, 191 insertions(+)
> 
> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index 42d62aa48d0a6..eebc89bdab7a1 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -235,6 +235,10 @@ static inline bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
>   {
>   	return false;
>   }
> +static inline u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
> +{
> +	return ~0;
> +}
>   
>   /* PMU Version in DFR Register */
>   #define ARMV8_PMU_DFR_VER_NI        0
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 410ffd41fd73a..a942f2bc13fc4 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -680,6 +680,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   		kvm_vcpu_load_vhe(vcpu);
>   	kvm_arch_vcpu_load_fp(vcpu);
>   	kvm_vcpu_pmu_restore_guest(vcpu);
> +	kvm_pmu_load(vcpu);
>   	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>   		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>   
> @@ -721,6 +722,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   	kvm_timer_vcpu_put(vcpu);
>   	kvm_vgic_put(vcpu);
>   	kvm_vcpu_pmu_restore_host(vcpu);
> +	kvm_pmu_put(vcpu);
>   	if (vcpu_has_nv(vcpu))
>   		kvm_vcpu_put_hw_mmu(vcpu);
>   	kvm_arm_vmid_clear_active();
> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 63ac72910e4b5..360d022d918d5 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
> @@ -9,6 +9,7 @@
>   #include <linux/perf/arm_pmuv3.h>
>   
>   #include <asm/arm_pmuv3.h>
> +#include <asm/kvm_emulate.h>
>   
>   /**
>    * has_host_pmu_partition_support() - Determine if partitioning is possible
> @@ -98,3 +99,171 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
>   
>   	return *host_data_ptr(nr_event_counters);
>   }
> +
> +/**
> + * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters
> + * @pmu: Pointer to arm_pmu struct
> + *
> + * Compute the bitmask that selects the host-reserved counters in the
> + * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters
> + * in HPMN..N
> + *
> + * Return: Bitmask
> + */
> +u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
> +{
> +	u8 nr_counters = *host_data_ptr(nr_event_counters);
> +
> +	if (kvm_pmu_is_partitioned(pmu))
> +		return GENMASK(nr_counters - 1, pmu->max_guest_counters);
> +
> +	return ARMV8_PMU_CNT_MASK_ALL;
> +}
> +
> +/**
> + * kvm_pmu_guest_counter_mask() - Compute bitmask of guest-reserved counters
> + * @pmu: Pointer to arm_pmu struct
> + *
> + * Compute the bitmask that selects the guest-reserved counters in the
> + * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters
> + * in 0..HPMN and the cycle and instruction counters.
> + *
> + * Return: Bitmask
> + */
> +u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu)
> +{
> +	if (kvm_pmu_is_partitioned(pmu))
> +		return ARMV8_PMU_CNT_MASK_C | GENMASK(pmu->max_guest_counters - 1, 0);
> +
> +	return 0;
> +}

Minor nit: slightly inconsistent use of types. Returns a u64 but doesn't 
use GENMASK_ULL and is also usually saved into a long when it's called.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load()
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (9 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:31   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 12/20] perf: Add perf_pmu_resched_update() Colton Lewis
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

The KVM API for event filtering says that counters do not count when
blocked by the event filter. To enforce that, the event filter must be
rechecked on every load since it might have changed since the last
time the guest wrote a value. If the event is filtered, exclude
counting at all exception levels before writing the hardware.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/pmu-direct.c | 54 +++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 360d022d918d5..2252d3b905db9 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -138,6 +138,59 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu)
 	return 0;
 }
 
+/**
+ * kvm_pmu_apply_event_filter()
+ * @vcpu: Pointer to vcpu struct
+ *
+ * To uphold the guarantee of the KVM PMU event filter, we must ensure
+ * no counter counts if the event is filtered. Accomplish this by
+ * filtering all exception levels if the event is filtered.
+ */
+static void kvm_pmu_apply_event_filter(struct kvm_vcpu *vcpu)
+{
+	struct arm_pmu *pmu = vcpu->kvm->arch.arm_pmu;
+	unsigned long guest_counters;
+	u64 evtyper_set = ARMV8_PMU_EXCLUDE_EL0 |
+		ARMV8_PMU_EXCLUDE_EL1;
+	u64 evtyper_clr = ARMV8_PMU_INCLUDE_EL2;
+	bool guest_include_el2;
+	u8 i;
+	u64 val;
+	u64 evsel;
+
+	if (!pmu)
+		return;
+
+	guest_counters = kvm_pmu_guest_counter_mask(pmu);
+
+	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
+		if (i == ARMV8_PMU_CYCLE_IDX) {
+			val = __vcpu_sys_reg(vcpu, PMCCFILTR_EL0);
+			evsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
+		} else {
+			val = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i);
+			evsel = val & kvm_pmu_event_mask(vcpu->kvm);
+		}
+
+		guest_include_el2 = (val & ARMV8_PMU_INCLUDE_EL2);
+		val &= ~evtyper_clr;
+
+		if (unlikely(is_hyp_ctxt(vcpu)) && guest_include_el2)
+			val &= ~ARMV8_PMU_EXCLUDE_EL1;
+
+		if (vcpu->kvm->arch.pmu_filter &&
+		    !test_bit(evsel, vcpu->kvm->arch.pmu_filter))
+			val |= evtyper_set;
+
+		if (i == ARMV8_PMU_CYCLE_IDX) {
+			write_sysreg(val, pmccntr_el0);
+		} else {
+			write_sysreg(i, pmselr_el0);
+			write_sysreg(val, pmxevtyper_el0);
+		}
+	}
+}
+
 /**
  * kvm_pmu_load() - Load untrapped PMU registers
  * @vcpu: Pointer to struct kvm_vcpu
@@ -165,6 +218,7 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
 
 	pmu = vcpu->kvm->arch.arm_pmu;
 	guest_counters = kvm_pmu_guest_counter_mask(pmu);
+	kvm_pmu_apply_event_filter(vcpu);
 
 	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
 		val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load()
  2026-05-04 21:18 ` [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load() Colton Lewis
@ 2026-05-04 22:31   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:31 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] kvm_pmu_apply_event_filter() incorrectly writes the cycle counter's event filter configuration to the cycle counter value register (pmccntr_el0) instead of the filter register (pmccfiltr_el0).
- [High] Hardware event filters are only updated on VCPU context switches, leaving hardware state stale when the guest dynamically reconfigures events.
- [High] When running a guest in nested EL2 context, kvm_pmu_apply_event_filter() fails to disable hardware EL1 counting if the guest did not request counting in EL2.
--

commit 63c3532220965281cfb17a041907d38b581dd90c
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Enforce PMU event filter at vcpu_load()

The KVM API for event filtering says that counters do not count when
blocked by the event filter. To enforce that, the event filter must be
rechecked on every load since it might have changed since the last
time the guest wrote a value. If the event is filtered, exclude
counting at all exception levels before writing the hardware.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 360d022d918d5..2252d3b905db9 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c

[ ... ]

> +		guest_include_el2 = (val & ARMV8_PMU_INCLUDE_EL2);
> +		val &= ~evtyper_clr;
> +
> +		if (unlikely(is_hyp_ctxt(vcpu)) && guest_include_el2)
> +			val &= ~ARMV8_PMU_EXCLUDE_EL1;

When the guest is running in nested EL2 context (which executes in hardware
EL1), what happens if the guest configured the counter to count in guest EL1
but not in guest EL2?

If guest_include_el2 is false, do we need to explicitly set
ARMV8_PMU_EXCLUDE_EL1 in the hardware register here to ensure the counter
pauses when the guest enters EL2?

> +
> +		if (vcpu->kvm->arch.pmu_filter &&
> +		    !test_bit(evsel, vcpu->kvm->arch.pmu_filter))
> +			val |= evtyper_set;
> +
> +		if (i == ARMV8_PMU_CYCLE_IDX) {
> +			write_sysreg(val, pmccntr_el0);

Does this write the filter configuration into the cycle counter value register
instead of the filter register?

If we use pmccfiltr_el0 here instead, will it correctly apply the exclusion
mask?

> +		} else {
> +			write_sysreg(i, pmselr_el0);
> +			write_sysreg(val, pmxevtyper_el0);
> +		}
> +	}
> +}
> +
>  /**
>   * kvm_pmu_load() - Load untrapped PMU registers
>   * @vcpu: Pointer to struct kvm_vcpu
> @@ -165,6 +218,7 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
>  
>  	pmu = vcpu->kvm->arch.arm_pmu;
>  	guest_counters = kvm_pmu_guest_counter_mask(pmu);
> +	kvm_pmu_apply_event_filter(vcpu);

If the guest dynamically reconfigures events by writing to PMEVTYPERn_EL0 or
PMCCFILTR_EL0, does the physical hardware PMU continue counting the old event
until the VCPU happens to be scheduled out and back in?

Is there another place where we push the new value to the physical hardware
during a standard sysreg trap return to avoid leaving the hardware state stale?

>  
>  	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
>  		val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=11

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 12/20] perf: Add perf_pmu_resched_update()
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (10 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load() Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 21:55   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

To modify PMU guest counter reservations dynamically, we need to
update the available counters safely.

Introduce perf_pmu_resched_update() to allow updating the PMU struct
in between scheduling perf events out and scheduling them back in
again. It takes a callback operation to call in between schedule out
and schedule in. This accomplishes the goal with minimal perf API
expansion.

Refactor ctx_resched call the callback in the right place.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       | 28 +++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 48d851fbd8ea5..a08db3ee38b10 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1242,6 +1242,9 @@ extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 
 extern void perf_pmu_resched(struct pmu *pmu);
+extern void perf_pmu_resched_update(struct pmu *pmu,
+				    void (*update)(struct pmu *, void *),
+				    void *data);
 
 extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 89b40e4397177..62fec73caabad 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2983,9 +2983,10 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
  * event_type is a bit mask of the types of events involved. For CPU events,
  * event_type is only either EVENT_PINNED or EVENT_FLEXIBLE.
  */
-static void ctx_resched(struct perf_cpu_context *cpuctx,
-			struct perf_event_context *task_ctx,
-			struct pmu *pmu, enum event_type_t event_type)
+static void __ctx_resched(struct perf_cpu_context *cpuctx,
+			  struct perf_event_context *task_ctx,
+			  struct pmu *pmu, enum event_type_t event_type,
+			  void (*update)(struct pmu *, void *), void *data)
 {
 	bool cpu_event = !!(event_type & EVENT_CPU);
 	struct perf_event_pmu_context *epc;
@@ -3021,6 +3022,9 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 	else if (event_type & EVENT_PINNED)
 		ctx_sched_out(&cpuctx->ctx, pmu, EVENT_FLEXIBLE);
 
+	if (update)
+		update(pmu, data);
+
 	perf_event_sched_in(cpuctx, task_ctx, pmu, 0);
 
 	for_each_epc(epc, &cpuctx->ctx, pmu, 0)
@@ -3032,6 +3036,24 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 	}
 }
 
+static void ctx_resched(struct perf_cpu_context *cpuctx,
+			struct perf_event_context *task_ctx,
+			struct pmu *pmu, enum event_type_t event_type)
+{
+	__ctx_resched(cpuctx, task_ctx, pmu, event_type, NULL, NULL);
+}
+
+void perf_pmu_resched_update(struct pmu *pmu, void (*update)(struct pmu *, void *), void *data)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+	struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+	perf_ctx_lock(cpuctx, task_ctx);
+	__ctx_resched(cpuctx, task_ctx, pmu, EVENT_ALL|EVENT_CPU, update, data);
+	perf_ctx_unlock(cpuctx, task_ctx);
+}
+EXPORT_SYMBOL_GPL(perf_pmu_resched_update);
+
 void perf_pmu_resched(struct pmu *pmu)
 {
 	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 12/20] perf: Add perf_pmu_resched_update()
  2026-05-04 21:18 ` [PATCH v7 12/20] perf: Add perf_pmu_resched_update() Colton Lewis
@ 2026-05-04 21:55   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 21:55 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Deadlock vulnerability in exported perf_pmu_resched_update() API due to missing interrupt disablement
- [Medium] Undocumented strict atomic context constraints for the update callback risk PREEMPT_RT lock inversion
--

commit bfd872050e6147889915de2de8e49972cd31d5f4
Author: Colton Lewis <coltonlewis@google.com>

perf: Add perf_pmu_resched_update()

This commit introduces perf_pmu_resched_update() to allow dynamic modification
of PMU guest counter reservations safely. It takes a callback that is executed
between scheduling perf events out and back in.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 89b40e4397177..62fec73caabad 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c

[ ... ]

> @@ -3021,6 +3022,9 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
>  	else if (event_type & EVENT_PINNED)
>  		ctx_sched_out(&cpuctx->ctx, pmu, EVENT_FLEXIBLE);
>  
> +	if (update)
> +		update(pmu, data);

Are there restrictions on what this update callback can do?

Because the callback is synchronously invoked while holding the perf context
raw_spinlock_t locks, any attempt to acquire a standard spinlock_t, allocate
memory, or block will trigger a sleeping while atomic bug or a lock inversion
panic on PREEMPT_RT kernels.

Could we document these strict atomic context requirements so future callers
using this exported API do not inadvertently crash the kernel?

> +
>  	perf_event_sched_in(cpuctx, task_ctx, pmu, 0);
>  
>  	for_each_epc(epc, &cpuctx->ctx, pmu, 0)

[ ... ]

> @@ -3032,6 +3036,24 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
>  	}
>  }
>  
> +static void ctx_resched(struct perf_cpu_context *cpuctx,
> +			struct perf_event_context *task_ctx,
> +			struct pmu *pmu, enum event_type_t event_type)
> +{
> +	__ctx_resched(cpuctx, task_ctx, pmu, event_type, NULL, NULL);
> +}
> +
> +void perf_pmu_resched_update(struct pmu *pmu, void (*update)(struct pmu *, void *), void *data)
> +{
> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> +	struct perf_event_context *task_ctx = cpuctx->task_ctx;
> +
> +	perf_ctx_lock(cpuctx, task_ctx);

Can this cause a deadlock if a hardware interrupt fires?

Looking at perf_ctx_lock(), it uses raw_spin_lock(&ctx->lock) which disables
preemption but leaves local interrupts enabled. If a Performance Monitoring
Interrupt or a timer tick executing perf_event_task_tick() fires on this CPU
while the lock is held, the interrupt handler will attempt to acquire the
same raw spinlock.

Should this function disable local interrupts (e.g., using local_irq_save())
before retrieving the CPU pointer and acquiring the locks?

> +	__ctx_resched(cpuctx, task_ctx, pmu, EVENT_ALL|EVENT_CPU, update, data);
> +	perf_ctx_unlock(cpuctx, task_ctx);
> +}
> +EXPORT_SYMBOL_GPL(perf_pmu_resched_update);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=12

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (11 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 12/20] perf: Add perf_pmu_resched_update() Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:11   ` sashiko-bot
  2026-05-11 14:47   ` James Clark
  2026-05-04 21:18 ` [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps Colton Lewis
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Apply dynamic guest counter reservations by checking if the requested
guest mask collides with any events the host has scheduled and calling
pmu_perf_resched_update() with a hook that updates the mask of
available counters in between schedule out and schedule in.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/pmu-direct.c  | 69 ++++++++++++++++++++++++++++++++++++
 include/linux/perf/arm_pmu.h |  1 +
 2 files changed, 70 insertions(+)

diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 2252d3b905db9..14cc419dbafad 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -100,6 +100,73 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
 	return *host_data_ptr(nr_event_counters);
 }
 
+/* Callback to update counter mask between perf scheduling */
+static void kvm_pmu_update_mask(struct pmu *pmu, void *data)
+{
+	struct arm_pmu *arm_pmu = to_arm_pmu(pmu);
+	unsigned long *new_mask = data;
+
+	bitmap_copy(arm_pmu->cntr_mask, new_mask, ARMPMU_MAX_HWEVENTS);
+}
+
+/**
+ * kvm_pmu_set_guest_counters() - Handle dynamic counter reservations
+ * @cpu_pmu: struct arm_pmu to potentially modify
+ * @guest_mask: new guest mask for the pmu
+ *
+ * Check if guest counters will interfere with current host events and
+ * call into perf_pmu_resched_update if a reschedule is required.
+ */
+static void kvm_pmu_set_guest_counters(struct arm_pmu *cpu_pmu, u64 guest_mask)
+{
+	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
+	DECLARE_BITMAP(guest_bitmap, ARMPMU_MAX_HWEVENTS);
+	DECLARE_BITMAP(new_mask, ARMPMU_MAX_HWEVENTS);
+	bool need_resched = false;
+
+	bitmap_from_arr64(guest_bitmap, &guest_mask, ARMPMU_MAX_HWEVENTS);
+	bitmap_copy(new_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
+
+	if (guest_mask) {
+		/* Subtract guest counters from available host mask */
+		bitmap_andnot(new_mask, new_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS);
+
+		/* Did we collide with an active host event? */
+		if (bitmap_intersects(cpuc->used_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS)) {
+			int idx;
+
+			need_resched = true;
+			cpuc->host_squeezed = true;
+
+			/* Look for pinned events that are about to be preempted */
+			for_each_set_bit(idx, guest_bitmap, ARMPMU_MAX_HWEVENTS) {
+				if (test_bit(idx, cpuc->used_mask) && cpuc->events[idx] &&
+				    cpuc->events[idx]->attr.pinned) {
+					pr_warn_ratelimited("perf: Pinned host event squeezed out by KVM guest PMU partition\n");
+					break;
+				}
+			}
+		}
+	} else {
+		/*
+		 * Restoring to hw_cntr_mask.
+		 * Only resched if we previously squeezed an event.
+		 */
+		if (cpuc->host_squeezed) {
+			need_resched = true;
+			cpuc->host_squeezed = false;
+		}
+	}
+
+	if (need_resched) {
+		/* Collision: run full perf reschedule */
+		perf_pmu_resched_update(&cpu_pmu->pmu, kvm_pmu_update_mask, new_mask);
+	} else {
+		/* Host was never using guest counters anyway */
+		bitmap_copy(cpu_pmu->cntr_mask, new_mask, ARMPMU_MAX_HWEVENTS);
+	}
+}
+
 /**
  * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters
  * @pmu: Pointer to arm_pmu struct
@@ -218,6 +285,7 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
 
 	pmu = vcpu->kvm->arch.arm_pmu;
 	guest_counters = kvm_pmu_guest_counter_mask(pmu);
+	kvm_pmu_set_guest_counters(pmu, guest_counters);
 	kvm_pmu_apply_event_filter(vcpu);
 
 	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
@@ -319,5 +387,6 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu)
 	val = read_sysreg(pmintenset_el1);
 	__vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask);
 
+	kvm_pmu_set_guest_counters(pmu, 0);
 	preempt_enable();
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f7b000bb3eca8..63f88fec5e80f 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -75,6 +75,7 @@ struct pmu_hw_events {
 
 	/* Active events requesting branch records */
 	unsigned int		branch_users;
+	bool host_squeezed;
 };
 
 enum armpmu_attr_groups {
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations
  2026-05-04 21:18 ` [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
@ 2026-05-04 22:11   ` sashiko-bot
  2026-05-11 14:47   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:11 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Hard lockup / Deadlock due to acquiring `perf_ctx_lock` with hardware interrupts enabled.
- [High] Data race and cross-CPU isolation breach due to dynamic modification of the global PMU counter mask (`cntr_mask`).
- [High] NULL pointer dereference risk due to lockless, unprotected access to the per-CPU perf events array.
- [Medium] Pending host perf events remain permanently inactive due to skipped PMU reschedule updates.
--

commit ad275b1ce5aad3594e39d36f37c39f73178b1c0b
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Apply dynamic guest counter reservations

Apply dynamic guest counter reservations by checking if the requested
guest mask collides with any events the host has scheduled and calling
pmu_perf_resched_update() with a hook that updates the mask of
available counters in between schedule out and schedule in.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 2252d3b905db9..14cc419dbafad 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c

[ ... ]

> @@ -100,6 +100,73 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
> +static void kvm_pmu_set_guest_counters(struct arm_pmu *cpu_pmu, u64 guest_mask)
> +{
> +	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> +	DECLARE_BITMAP(guest_bitmap, ARMPMU_MAX_HWEVENTS);
> +	DECLARE_BITMAP(new_mask, ARMPMU_MAX_HWEVENTS);
> +	bool need_resched = false;
> +
> +	bitmap_from_arr64(guest_bitmap, &guest_mask, ARMPMU_MAX_HWEVENTS);
> +	bitmap_copy(new_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
> +
> +	if (guest_mask) {
> +		/* Subtract guest counters from available host mask */
> +		bitmap_andnot(new_mask, new_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS);
> +
> +		/* Did we collide with an active host event? */
> +		if (bitmap_intersects(cpuc->used_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS)) {
> +			int idx;
> +
> +			need_resched = true;
> +			cpuc->host_squeezed = true;
> +
> +			/* Look for pinned events that are about to be preempted */
> +			for_each_set_bit(idx, guest_bitmap, ARMPMU_MAX_HWEVENTS) {
> +				if (test_bit(idx, cpuc->used_mask) && cpuc->events[idx] &&
> +				    cpuc->events[idx]->attr.pinned) {

Is there a risk of a NULL pointer dereference here?

The per-CPU events array is accessed without locks and with hardware interrupts
enabled. If an interrupt occurs (like an IPI triggering an event removal)
between the evaluation of cpuc->events[idx] and reading ->attr.pinned, and
the compiler issues separate memory reads due to a lack of READ_ONCE(), could
the second read observe a NULL pointer?

> +					pr_warn_ratelimited("perf: Pinned host event squeezed out by KVM guest PMU partition\n");
> +					break;
> +				}
> +			}
> +		}
> +	} else {
> +		/*
> +		 * Restoring to hw_cntr_mask.
> +		 * Only resched if we previously squeezed an event.
> +		 */
> +		if (cpuc->host_squeezed) {
> +			need_resched = true;
> +			cpuc->host_squeezed = false;
> +		}
> +	}

What happens to host perf events created while the guest is actively running?

It looks like cpuc->host_squeezed is only set if a host event collided when
the guest was loaded. If a new host event is created while the guest is
running, it will fail to allocate a counter due to the restricted mask.

When the guest unloads, cpuc->host_squeezed is false, so no reschedule is
requested. Will the perf subsystem be notified that counters have been freed,
or will the host event remain permanently inactive?

> +
> +	if (need_resched) {
> +		/* Collision: run full perf reschedule */
> +		perf_pmu_resched_update(&cpu_pmu->pmu, kvm_pmu_update_mask, new_mask);

Could this cause a deadlock if a Performance Monitor Interrupt (PMI) or
perf IPI fires?

kvm_pmu_load() calls this with preemption disabled but leaves hardware
interrupts enabled. perf_pmu_resched_update() unconditionally calls
perf_ctx_lock(), which acquires the context lock without disabling local IRQs.

If an interrupt fires on the same CPU while this lock is held, and the
handler attempts to acquire ctx->lock via raw_spin_lock_irqsave(), will
it spin indefinitely?

> +	} else {
> +		/* Host was never using guest counters anyway */
> +		bitmap_copy(cpu_pmu->cntr_mask, new_mask, ARMPMU_MAX_HWEVENTS);

Does modifying cpu_pmu->cntr_mask here introduce a data race?

struct arm_pmu represents PMU hardware capabilities shared across multiple CPUs
in a domain. If this per-CPU load/put path dynamically mutates cntr_mask
without global synchronization, could reserving guest counters on one CPU
incorrectly prevent host perf events from running on another CPU?

Also, when a vCPU unloads, it appears to unconditionally restore cntr_mask to
hw_cntr_mask, which might wipe out active reservations from other vCPUs.

> +	}
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=13

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations
  2026-05-04 21:18 ` [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
  2026-05-04 22:11   ` sashiko-bot
@ 2026-05-11 14:47   ` James Clark
  1 sibling, 0 replies; 42+ messages in thread
From: James Clark @ 2026-05-11 14:47 UTC (permalink / raw)
  To: Colton Lewis
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-perf-users,
	linux-kselftest, kvm



On 04/05/2026 10:18 pm, Colton Lewis wrote:
> Apply dynamic guest counter reservations by checking if the requested
> guest mask collides with any events the host has scheduled and calling
> pmu_perf_resched_update() with a hook that updates the mask of
> available counters in between schedule out and schedule in.
> 
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
>   arch/arm64/kvm/pmu-direct.c  | 69 ++++++++++++++++++++++++++++++++++++
>   include/linux/perf/arm_pmu.h |  1 +
>   2 files changed, 70 insertions(+)
> 
> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 2252d3b905db9..14cc419dbafad 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
> @@ -100,6 +100,73 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
>   	return *host_data_ptr(nr_event_counters);
>   }
>   
> +/* Callback to update counter mask between perf scheduling */
> +static void kvm_pmu_update_mask(struct pmu *pmu, void *data)
> +{
> +	struct arm_pmu *arm_pmu = to_arm_pmu(pmu);
> +	unsigned long *new_mask = data;
> +
> +	bitmap_copy(arm_pmu->cntr_mask, new_mask, ARMPMU_MAX_HWEVENTS);
> +}
> +
> +/**
> + * kvm_pmu_set_guest_counters() - Handle dynamic counter reservations
> + * @cpu_pmu: struct arm_pmu to potentially modify
> + * @guest_mask: new guest mask for the pmu
> + *
> + * Check if guest counters will interfere with current host events and
> + * call into perf_pmu_resched_update if a reschedule is required.
> + */
> +static void kvm_pmu_set_guest_counters(struct arm_pmu *cpu_pmu, u64 guest_mask)
> +{
> +	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> +	DECLARE_BITMAP(guest_bitmap, ARMPMU_MAX_HWEVENTS);
> +	DECLARE_BITMAP(new_mask, ARMPMU_MAX_HWEVENTS);
> +	bool need_resched = false;
> +
> +	bitmap_from_arr64(guest_bitmap, &guest_mask, ARMPMU_MAX_HWEVENTS);
> +	bitmap_copy(new_mask, cpu_pmu->hw_cntr_mask, ARMPMU_MAX_HWEVENTS);
> +
> +	if (guest_mask) {
> +		/* Subtract guest counters from available host mask */
> +		bitmap_andnot(new_mask, new_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS);
> +
> +		/* Did we collide with an active host event? */
> +		if (bitmap_intersects(cpuc->used_mask, guest_bitmap, ARMPMU_MAX_HWEVENTS)) {
> +			int idx;
> +
> +			need_resched = true;
> +			cpuc->host_squeezed = true;
> +
> +			/* Look for pinned events that are about to be preempted */
> +			for_each_set_bit(idx, guest_bitmap, ARMPMU_MAX_HWEVENTS) {
> +				if (test_bit(idx, cpuc->used_mask) && cpuc->events[idx] &&
> +				    cpuc->events[idx]->attr.pinned) {
> +					pr_warn_ratelimited("perf: Pinned host event squeezed out by KVM guest PMU partition\n");

Hi Colton,

I get "perf: Pinned host event squeezed out by KVM guest PMU partition" 
even with arm_pmuv3.reserved_host_counters=3 for example. I would have 
expected any non zero value to stop the warning.

I think armv8pmu_get_single_idx() needs to be changed to allocate from 
the high end host counters first. A more complicated option would be 
checking to see if there are any non-pinned counters in the host 
reserved half when a new pinned counter is opened, then swapping the 
places of the new pinned and existing non-pinned counters so pinned 
always prefer being put into the host half. But it's probably not worth 
doing that.

James

> +					break;
> +				}
> +			}
> +		}
> +	} else {
> +		/*
> +		 * Restoring to hw_cntr_mask.
> +		 * Only resched if we previously squeezed an event.
> +		 */
> +		if (cpuc->host_squeezed) {
> +			need_resched = true;
> +			cpuc->host_squeezed = false;
> +		}
> +	}
> +
> +	if (need_resched) {
> +		/* Collision: run full perf reschedule */
> +		perf_pmu_resched_update(&cpu_pmu->pmu, kvm_pmu_update_mask, new_mask);
> +	} else {
> +		/* Host was never using guest counters anyway */
> +		bitmap_copy(cpu_pmu->cntr_mask, new_mask, ARMPMU_MAX_HWEVENTS);
> +	}
> +}
> +
>   /**
>    * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters
>    * @pmu: Pointer to arm_pmu struct
> @@ -218,6 +285,7 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
>   
>   	pmu = vcpu->kvm->arch.arm_pmu;
>   	guest_counters = kvm_pmu_guest_counter_mask(pmu);
> +	kvm_pmu_set_guest_counters(pmu, guest_counters);
>   	kvm_pmu_apply_event_filter(vcpu);
>   
>   	for_each_set_bit(i, &guest_counters, ARMPMU_MAX_HWEVENTS) {
> @@ -319,5 +387,6 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu)
>   	val = read_sysreg(pmintenset_el1);
>   	__vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask);
>   
> +	kvm_pmu_set_guest_counters(pmu, 0);
>   	preempt_enable();
>   }
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index f7b000bb3eca8..63f88fec5e80f 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -75,6 +75,7 @@ struct pmu_hw_events {
>   
>   	/* Active events requesting branch records */
>   	unsigned int		branch_users;
> +	bool host_squeezed;
>   };
>   
>   enum armpmu_attr_groups {


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (12 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:13   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Since many guests will never touch the PMU, they need not pay the cost
of context swapping those registers.

Use an enum to implement a simple state machine for PMU register
access. The PMU is either free or guest owned. We only need to context
swap if the PMU registers are guest owned. The PMU initially starts as
free and only transitions to guest owned if a guest has touched the
PMU registers.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/include/asm/kvm_types.h |  6 +++++-
 arch/arm64/kvm/debug.c             |  2 +-
 arch/arm64/kvm/pmu-direct.c        | 25 +++++++++++++++++++++++--
 arch/arm64/kvm/sys_regs.c          | 11 +++++++++++
 include/kvm/arm_pmu.h              |  5 +++++
 6 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1f789ba589d56..891433fe304ac 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1404,6 +1404,7 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
 	return cpus_have_final_cap(ARM64_SPECTRE_V3A);
 }
 
+void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu);
 void kvm_init_host_debug_data(void);
 void kvm_debug_init_vhe(void);
 void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_types.h b/arch/arm64/include/asm/kvm_types.h
index 9a126b9e2d7c9..4e39cbc80aa0b 100644
--- a/arch/arm64/include/asm/kvm_types.h
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -4,5 +4,9 @@
 
 #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
 
-#endif /* _ASM_ARM64_KVM_TYPES_H */
+enum vcpu_pmu_register_access {
+	VCPU_PMU_ACCESS_FREE,
+	VCPU_PMU_ACCESS_GUEST_OWNED,
+};
 
+#endif /* _ASM_ARM64_KVM_TYPES_H */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 0ab89c91e19cb..c2cf6b308ec60 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -34,7 +34,7 @@ static int cpu_has_spe(u64 dfr0)
  *  - Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
  *  - Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB)
  */
-static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
+void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 {
 	int hpmn = kvm_pmu_hpmn(vcpu);
 
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 14cc419dbafad..881cea5117515 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -72,10 +72,29 @@ bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
 	u8 hpmn = vcpu->kvm->arch.nr_pmu_counters;
 
 	return kvm_vcpu_pmu_is_partitioned(vcpu) &&
+		vcpu->arch.pmu.access == VCPU_PMU_ACCESS_GUEST_OWNED &&
 		cpus_have_final_cap(ARM64_HAS_FGT) &&
 		(hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0));
 }
 
+/**
+ * kvm_pmu_set_guest_owned()
+ * @vcpu: Pointer to vcpu struct
+ *
+ * Reconfigure the guest for physical access of PMU hardware if
+ * allowed. This means reconfiguring mdcr_el2 and loading the vCPU
+ * state onto hardware.
+ *
+ */
+void kvm_pmu_set_guest_owned(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_pmu_is_partitioned(vcpu) &&
+	    vcpu->arch.pmu.access == VCPU_PMU_ACCESS_FREE) {
+		vcpu->arch.pmu.access = VCPU_PMU_ACCESS_GUEST_OWNED;
+		kvm_arm_setup_mdcr_el2(vcpu);
+	}
+}
+
 /**
  * kvm_pmu_hpmn() - Calculate HPMN field value
  * @vcpu: Pointer to struct kvm_vcpu
@@ -278,7 +297,8 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
 	 * If we aren't guest-owned then we know the guest isn't using
 	 * the PMU anyway, so no need to bother with the swap.
 	 */
-	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu) ||
+	    vcpu->arch.pmu.access != VCPU_PMU_ACCESS_GUEST_OWNED)
 		return;
 
 	preempt_disable();
@@ -353,7 +373,8 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu)
 	 * accessing the PMU anyway, so no need to bother with the
 	 * swap.
 	 */
-	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
+	if (!kvm_vcpu_pmu_is_partitioned(vcpu) ||
+	    vcpu->arch.pmu.access != VCPU_PMU_ACCESS_GUEST_OWNED)
 		return;
 
 	preempt_disable();
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index cc3d1804ab200..e4840d93a769f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1035,6 +1035,7 @@ static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
 static void pmu_write_pmcr(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_guest_owned(vcpu);
 		write_sysreg(val, pmcr_el0);
 		return;
 	}
@@ -1086,6 +1087,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static void pmu_write_pmselr(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_guest_owned(vcpu);
 		write_sysreg(val, pmselr_el0);
 		return;
 	}
@@ -1193,6 +1195,8 @@ static void pmu_write_evcntr(struct kvm_vcpu *vcpu, u64 val, u64 idx)
 		return;
 	}
 
+	kvm_pmu_set_guest_owned(vcpu);
+
 	if (idx == ARMV8_PMU_CYCLE_IDX) {
 		write_sysreg(val, pmccntr_el0);
 		return;
@@ -1281,6 +1285,7 @@ static void pmu_write_evtyper(struct kvm_vcpu *vcpu, u64 val, u64 idx)
 	u64 mask;
 
 	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_guest_owned(vcpu);
 		mask = kvm_pmu_evtyper_mask(vcpu->kvm);
 		__vcpu_assign_sys_reg(vcpu, PMEVTYPER0_EL0 + idx, val & mask);
 		return;
@@ -1350,6 +1355,8 @@ static int get_pmreg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 *v
 static void pmu_write_pmcnten(struct kvm_vcpu *vcpu, u64 val, bool set)
 {
 	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_guest_owned(vcpu);
+
 		if (set)
 			write_sysreg(val, pmcntenset_el0);
 		else
@@ -1398,6 +1405,8 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static void pmu_write_pminten(struct kvm_vcpu *vcpu, u64 val, bool set)
 {
 	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
+		kvm_pmu_set_guest_owned(vcpu);
+
 		if (set)
 			write_sysreg(val, pmintenset_el1);
 		else
@@ -1453,6 +1462,8 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		return false;
 
 	if (p->is_write) {
+		kvm_pmu_set_guest_owned(vcpu);
+
 		if (r->CRm & 0x2)
 			/* accessing PMOVSSET_EL0 */
 			__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, (p->regval & mask));
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index fa881dc5f5832..0de63cc48fef9 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -7,6 +7,7 @@
 #ifndef __ASM_ARM_KVM_PMU_H
 #define __ASM_ARM_KVM_PMU_H
 
+#include <linux/kvm_types.h>
 #include <linux/perf_event.h>
 #include <linux/perf/arm_pmuv3.h>
 #include <linux/perf/arm_pmu.h>
@@ -40,6 +41,7 @@ struct kvm_pmu {
 	int irq_num;
 	bool created;
 	bool irq_level;
+	enum vcpu_pmu_register_access access;
 };
 
 struct arm_pmu_entry {
@@ -101,6 +103,8 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu);
 void kvm_pmu_load(struct kvm_vcpu *vcpu);
 void kvm_pmu_put(struct kvm_vcpu *vcpu);
 
+void kvm_pmu_set_guest_owned(struct kvm_vcpu *vcpu);
+
 #if !defined(__KVM_NVHE_HYPERVISOR__)
 bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu);
@@ -173,6 +177,7 @@ static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
 }
 static inline void kvm_pmu_load(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_put(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_set_guest_owned(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu,
 					     u64 select_idx, u64 val) {}
 static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu,
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps
  2026-05-04 21:18 ` [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps Colton Lewis
@ 2026-05-04 22:13   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:13 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Transitioning the PMU to GUEST_OWNED fails to load the guest's virtual PMU state into the hardware registers, leading to an information leak and state corruption.
- [High] PMU read accessors directly read from hardware registers when the PMU is in the FREE state, leaking host PMU state to the guest.
- [High] Restoring PMU registers via userspace during live migration fails to transition the vCPU to GUEST_OWNED, leaving restored counters inert.
- [High] `pmu_write_evtyper` fails to update the physical hardware register, causing guest event configurations to be ignored.
- [High] Direct hardware register manipulation in `pmu_write_evcntr` and `pmu_read_evcntr` lacks preemption protection, interacting with a bug in `kvm_pmu_put` to cause state corruption.
--

commit 62dcf0081f7f3437442e2756f69cc3188de056f1
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Implement lazy PMU context swaps

This commit implements a simple state machine for PMU register access to 
avoid the cost of context swapping for guests that never touch the PMU. The 
PMU starts as free and transitions to guest-owned only when a guest accesses 
the PMU registers.

> diff --git a/arch/arm64/include/asm/kvm_types.h b/arch/arm64/include/asm/kvm_types.h
> index 9a126b9e2d7c9..4e39cbc80aa0b 100644
> --- a/arch/arm64/include/asm/kvm_types.h
> +++ b/arch/arm64/include/asm/kvm_types.h
> @@ -4,5 +4,9 @@
>  
>  #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
>  
> -#endif /* _ASM_ARM64_KVM_TYPES_H */
> +enum vcpu_pmu_register_access {
> +	VCPU_PMU_ACCESS_FREE,
> +	VCPU_PMU_ACCESS_GUEST_OWNED,
> +};

How does this state machine handle PMU registers being restored via userspace
during live migration?

If userspace restores PMU registers via KVM_SET_ONE_REG, the sysreg setters
update the in-memory vCPU state but do not appear to call
kvm_pmu_set_guest_owned() or transition the state away from
VCPU_PMU_ACCESS_FREE.

When the migrated VM resumes execution, kvm_pmu_load() might ignore the PMU
context, leaving any previously running counters inert until the guest
explicitly performs a new PMU write.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 14cc419dbafad..881cea5117515 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
[ ... ]
> @@ -278,7 +297,8 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu)
>  	 * If we aren't guest-owned then we know the guest isn't using
>  	 * the PMU anyway, so no need to bother with the swap.
>  	 */
> -	if (!kvm_vcpu_pmu_is_partitioned(vcpu))
> +	if (!kvm_vcpu_pmu_is_partitioned(vcpu) ||
> +	    vcpu->arch.pmu.access != VCPU_PMU_ACCESS_GUEST_OWNED)
>  		return;
>  
>  	preempt_disable();

If the PMU is in the VCPU_PMU_ACCESS_FREE state, do the PMU read accessors
directly read from hardware registers?

If a guest reads a PMU register before performing any writes, the state
remains free. Because this check skips loading the guest context, could the
read accessors execute read_sysreg() on the physical hardware and return the
host's PMU state to the guest?

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index cc3d1804ab200..e4840d93a769f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1035,6 +1035,7 @@ static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
>  static void pmu_write_pmcr(struct kvm_vcpu *vcpu, u64 val)
>  {
>  	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		kvm_pmu_set_guest_owned(vcpu);
>  		write_sysreg(val, pmcr_el0);
>  		return;
>  	}

Does transitioning the PMU to VCPU_PMU_ACCESS_GUEST_OWNED here also require
loading the guest's virtual PMU state into the hardware registers?

When kvm_pmu_set_guest_owned() is called, it configures mdcr_el2 to allow
direct guest access but fails to invoke kvm_pmu_load(). Since the initial
vCPU entry skipped loading the PMU, the physical registers might still
contain the host's PMU state.

Could this allow the guest to read uninitialized hardware registers? Also,
upon preemption, kvm_pmu_put() might read these host values and overwrite the
guest's saved memory context.

[ ... ]
> @@ -1193,6 +1195,8 @@ static void pmu_write_evcntr(struct kvm_vcpu *vcpu, u64 val, u64 idx)
>  		return;
>  	}
>  
> +	kvm_pmu_set_guest_owned(vcpu);
> +
>  	if (idx == ARMV8_PMU_CYCLE_IDX) {
>  		write_sysreg(val, pmccntr_el0);
>  		return;

Is there a preemption issue when manipulating hardware registers directly
here? This problem wasn't introduced by this patch, but it significantly
impacts the PMU logic.

In pmu_write_evcntr(), the code modifies pmselr_el0 to select a counter index
before writing to pmxevcntr_el0. Since KVM sysreg handlers run with preemption
enabled, if preemption occurs mid-sequence, kvm_pmu_put() iterates over guest
counters, using and clobbering pmselr_el0.

It appears kvm_pmu_put() fails to read and save the original pmselr_el0 before
its loop, instead saving the highest guest counter index at the end. When the
task resumes, kvm_pmu_load() restores this highest index to pmselr_el0.
Execution resumes in pmu_write_evcntr(), which might then write to
pmxevcntr_el0, inadvertently mutating the highest guest counter instead of
the intended idx.

[ ... ]
> @@ -1281,6 +1285,7 @@ static void pmu_write_evtyper(struct kvm_vcpu *vcpu, u64 val, u64 idx)
>  	u64 mask;
>  
>  	if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> +		kvm_pmu_set_guest_owned(vcpu);
>  		mask = kvm_pmu_evtyper_mask(vcpu->kvm);
>  		__vcpu_assign_sys_reg(vcpu, PMEVTYPER0_EL0 + idx, val & mask);
>  		return;
>  	}

Does this sequence fail to update the physical hardware register?

The code updates the guest's event configuration in the memory array using
__vcpu_assign_sys_reg(), but omits writing the value to the physical
pmxevtyper_el0 register.

Since KVM does not invoke kvm_pmu_load() after returning from a sysreg trap,
could the active hardware counter continue counting its previous event instead
of the newly configured one?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=14

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (13 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:18   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU Colton Lewis
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Because ARM hardware is not yet capable of direct PPI injection into
guests, guest counters will still trigger interrupts that need to be
handled by the host PMU interrupt handler. Clear the overflow flags in
hardware to handle the interrupt as normal, but the virtual overflow
register for later injecting the interrupt into the guest.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm/include/asm/arm_pmuv3.h   |  6 ++++++
 arch/arm64/include/asm/arm_pmuv3.h |  5 +++++
 arch/arm64/kvm/pmu-direct.c        | 22 ++++++++++++++++++++++
 drivers/perf/arm_pmuv3.c           | 24 +++++++++++++++++-------
 include/kvm/arm_pmu.h              |  3 +++
 5 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index eebc89bdab7a1..0d01508c5b77f 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -180,6 +180,11 @@ static inline void write_pmintenset(u32 val)
 	write_sysreg(val, PMINTENSET);
 }
 
+static inline u32 read_pmintenset(void)
+{
+	return read_sysreg(PMINTENSET);
+}
+
 static inline void write_pmintenclr(u32 val)
 {
 	write_sysreg(val, PMINTENCLR);
@@ -239,6 +244,7 @@ static inline u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu)
 {
 	return ~0;
 }
+static inline void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr) {}
 
 /* PMU Version in DFR Register */
 #define ARMV8_PMU_DFR_VER_NI        0
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 27c4d6d47da31..69ff4d014bf39 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -110,6 +110,11 @@ static inline void write_pmintenset(u64 val)
 	write_sysreg(val, pmintenset_el1);
 }
 
+static inline u64 read_pmintenset(void)
+{
+	return read_sysreg(pmintenset_el1);
+}
+
 static inline void write_pmintenclr(u64 val)
 {
 	write_sysreg(val, pmintenclr_el1);
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 881cea5117515..535b4c492ff80 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -411,3 +411,25 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu)
 	kvm_pmu_set_guest_counters(pmu, 0);
 	preempt_enable();
 }
+
+/**
+ * kvm_pmu_handle_guest_irq() - Record IRQs in guest counters
+ * @pmu: PMU to check for overflows
+ * @pmovsr: Overflow flags reported by driver
+ *
+ * Set overflow flags in guest-reserved counters in the VCPU register
+ * for the guest to clear later.
+ */
+void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+	u64 mask = kvm_pmu_guest_counter_mask(pmu);
+	u64 govf = pmovsr & mask;
+
+	write_pmovsclr(govf);
+
+	if (!vcpu)
+		return;
+
+	__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf);
+}
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 6e447227d801f..16e3700dca645 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -774,16 +774,15 @@ static void armv8pmu_disable_event_irq(struct perf_event *event)
 	armv8pmu_disable_intens(BIT(event->hw.idx));
 }
 
-static u64 armv8pmu_getreset_flags(void)
+static u64 armv8pmu_getovf_flags(void)
 {
 	u64 value;
 
 	/* Read */
 	value = read_pmovsclr();
 
-	/* Write to clear flags */
-	value &= ARMV8_PMU_CNT_MASK_ALL;
-	write_pmovsclr(value);
+	/* Only report interrupt enabled counters. */
+	value &= read_pmintenset();
 
 	return value;
 }
@@ -897,16 +896,17 @@ static void read_branch_records(struct pmu_hw_events *cpuc,
 
 static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 {
-	u64 pmovsr;
 	struct perf_sample_data data;
 	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 	struct pt_regs *regs;
+	u64 host_set = kvm_pmu_host_counter_mask(cpu_pmu);
+	u64 pmovsr;
 	int idx;
 
 	/*
-	 * Get and reset the IRQ flags
+	 * Get the IRQ flags
 	 */
-	pmovsr = armv8pmu_getreset_flags();
+	pmovsr = armv8pmu_getovf_flags();
 
 	/*
 	 * Did an overflow occur?
@@ -914,6 +914,12 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	if (!armv8pmu_has_overflowed(pmovsr))
 		return IRQ_NONE;
 
+	/*
+	 * Guest flag reset is handled the kvm hook at the bottom of
+	 * this function.
+	 */
+	write_pmovsclr(pmovsr & host_set);
+
 	/*
 	 * Handle the counter(s) overflow(s)
 	 */
@@ -955,6 +961,10 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		 */
 		perf_event_overflow(event, &data, regs);
 	}
+
+	if (kvm_pmu_is_partitioned(cpu_pmu))
+		kvm_pmu_handle_guest_irq(cpu_pmu, pmovsr);
+
 	armv8pmu_start(cpu_pmu);
 
 	return IRQ_HANDLED;
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 0de63cc48fef9..de058a5347d18 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -95,6 +95,7 @@ void kvm_vcpu_pmu_resync_el0(void);
 	(vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3))
 
 bool kvm_pmu_is_partitioned(struct arm_pmu *pmu);
+void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr);
 
 u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu);
 u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu);
@@ -290,6 +291,8 @@ static inline u64 kvm_pmu_guest_counter_mask(void *kvm)
 	return 0;
 }
 
+static inline void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr) {}
+
 #endif
 
 #endif
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
  2026-05-04 21:18 ` [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
@ 2026-05-04 22:18   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:18 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Non-atomic read-modify-write race on `PMOVSSET_EL0` corrupts guest PMU state.
- [High] Guest PMU interrupts are permanently dropped if they fire while the vCPU is scheduled out.
- [High] Missing KVM request fails to inject virtual PMU interrupts into the guest.
- [High] Pending or polled guest PMU overflow flags are silently lost during vCPU context switches.
--

commit a0dad1aa1988f327f4127d17e8307aeadf9dd34d
Author: Colton Lewis <coltonlewis@google.com>

perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters

Because ARM hardware is not yet capable of direct PPI injection into
guests, guest counters will still trigger interrupts that need to be
handled by the host PMU interrupt handler. Clear the overflow flags in
hardware to handle the interrupt as normal, but the virtual overflow
register for later injecting the interrupt into the guest.

[ ... ]

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 881cea5117515..535b4c492ff80 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
> @@ -411,3 +411,25 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu)
>  	kvm_pmu_set_guest_counters(pmu, 0);
>  	preempt_enable();
>  }

Does this code need to save the guest's hardware overflow flags from
pmovsclr_el0 into the vCPU memory context when it is scheduled out?

If a guest counter overflows but the vCPU is preempted before the PMU IRQ
is delivered, or if the guest is intentionally polling the counters with
interrupts disabled, the hardware overflow bit remains set. Because this
function doesn't save the active hardware state, could the overflow be
lost across the context switch?

Also, should this function clear the hardware registers (pmcntenclr_el0
and pmintenclr_el1) to stop the guest counters? If they continue running
while the physical CPU executes host tasks, they can overflow and trigger
a physical PMU interrupt while the vCPU is scheduled out.

> +
> +/**
> + * kvm_pmu_handle_guest_irq() - Record IRQs in guest counters
> + * @pmu: PMU to check for overflows
> + * @pmovsr: Overflow flags reported by driver
> + *
> + * Set overflow flags in guest-reserved counters in the VCPU register
> + * for the guest to clear later.
> + */
> +void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr)
> +{
> +	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> +	u64 mask = kvm_pmu_guest_counter_mask(pmu);
> +	u64 govf = pmovsr & mask;
> +
> +	write_pmovsclr(govf);
> +
> +	if (!vcpu)
> +		return;

If the counters continue running while the vCPU is scheduled out, can
a physical PMU interrupt fire when kvm_get_running_vcpu() is NULL?

Since the handler clears the physical overflow flag via write_pmovsclr()
but immediately returns here without saving the overflow state to the
vCPU memory backing store, are guest PMU interrupts permanently dropped
in this scenario?

> +
> +	__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf);

Could this non-atomic read-modify-write corrupt the guest PMU state?

kvm_pmu_handle_guest_irq() executes in hardirq context, while KVM FGT trap
handlers execute in KVM process context with local IRQs enabled. If a
physical PMU interrupt preempts the vCPU thread while it is in the middle
of a process-context read-modify-write on PMOVSSET_EL0, the hardirq
handler's modification could be completely overwritten and lost when the
interrupted trap handler writes back its stale value.

Additionally, since this executes unconditionally even if govf == 0,
could it create a non-atomic read/write cycle that blindly clobbers the
trap handler's concurrent modifications?

Finally, how does KVM know to inject the virtual PMU interrupt? Should
there be a call to kvm_make_request(KVM_REQ_PMU, vcpu) here? Without a
vCPU request to evaluate and route virtual interrupts to the VGIC, the
KVM run loop will re-enter the guest without asserting the vPPI, and the
guest will remain unaware of the overflow until an unrelated VM exit
occurs.

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=15

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (14 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 23:47   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU Colton Lewis
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

When we re-enter the VM after handling a PMU interrupt, calculate
whether it was any of the guest counters that overflowed and inject an
interrupt into the guest if so.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/pmu-direct.c | 30 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/pmu-emul.c   |  4 ++--
 arch/arm64/kvm/pmu.c        |  6 +++++-
 include/kvm/arm_pmu.h       |  2 ++
 4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 535b4c492ff80..9693d9eb69daa 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -433,3 +433,33 @@ void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr)
 
 	__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf);
 }
+
+/**
+ * kvm_pmu_part_overflow_status() - Determine if any guest counters have overflowed
+ * @vcpu: Pointer to struct kvm_vcpu
+ *
+ * Determine if any guest counters have overflowed and therefore an
+ * IRQ needs to be injected into the guest. If access is still free,
+ * then the guest hasn't accessed the PMU yet so we know the guest
+ * context is not loaded onto the pCPU and an overflow is impossible.
+ *
+ * Return: True if there was an overflow, false otherwise
+ */
+bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu)
+{
+	struct arm_pmu *pmu;
+	u64 mask, pmovs, pmint, pmcr;
+	bool overflow;
+
+	if (vcpu->arch.pmu.access == VCPU_PMU_ACCESS_FREE)
+		return false;
+
+	pmu = vcpu->kvm->arch.arm_pmu;
+	mask = kvm_pmu_guest_counter_mask(pmu);
+	pmovs = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
+	pmint = read_pmintenset();
+	pmcr = read_pmcr();
+	overflow = (pmcr & ARMV8_PMU_PMCR_E) && (mask & pmovs & pmint);
+
+	return overflow;
+}
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a40db0d5120ff..c5438de3e5a74 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -268,7 +268,7 @@ void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val)
  * counter where the values of the global enable control, PMOVSSET_EL0[n], and
  * PMINTENSET_EL1[n] are all 1.
  */
-bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
+bool kvm_pmu_emul_overflow_status(struct kvm_vcpu *vcpu)
 {
 	u64 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
 
@@ -405,7 +405,7 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
 		kvm_pmu_counter_increment(vcpu, BIT(idx + 1),
 					  ARMV8_PMUV3_PERFCTR_CHAIN);
 
-	if (kvm_pmu_overflow_status(vcpu)) {
+	if (kvm_pmu_emul_overflow_status(vcpu)) {
 		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 
 		if (!in_nmi())
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 8c10ad05661bc..f1c66ce678840 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -408,7 +408,11 @@ static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
 	bool overflow;
 
-	overflow = kvm_pmu_overflow_status(vcpu);
+	if (kvm_vcpu_pmu_is_partitioned(vcpu))
+		overflow = kvm_pmu_part_overflow_status(vcpu);
+	else
+		overflow = kvm_pmu_emul_overflow_status(vcpu);
+
 	if (pmu->irq_level == overflow)
 		return;
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index de058a5347d18..4af8abf2dde0f 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -90,6 +90,8 @@ bool kvm_set_pmuserenr(u64 val);
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_resync_el0(void);
+bool kvm_pmu_emul_overflow_status(struct kvm_vcpu *vcpu);
+bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu);
 
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3))
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU
  2026-05-04 21:18 ` [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU Colton Lewis
@ 2026-05-04 23:47   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 23:47 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [High] Virtual PMU interrupts can be permanently lost or arbitrarily delayed due to untrapped PMINTENSET_EL1 unmasking.
- [High] Undefined behavior and host state leakage when partitioned PMU has 0 guest counters.
- [High] The host PMU driver unconditionally overwrites the guest's PMCR_EL0.E state during host interrupt handling.
- [High] Hardware PMU interrupts do not wake a halted VCPU or assert KVM_REQ_IRQ_PENDING, leading to missed wakeups and TOCTOU races.
- [High] kvm_pmu_part_overflow_status() incorrectly drops pending PMU interrupts for migrated VMs.
- [High] Data race in PMOVSSET_EL0 read-modify-write causes lost hardware PMU overflows.
- [High] kvm_pmu_part_overflow_status() reads hardware registers containing host state due to incomplete lazy context switching.
- [Low] Dangling function declaration for kvm_pmu_overflow_status().
--

commit a5f6bda341ad4e262b923aa705a780bcfa130602
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Detect overflows for the Partitioned PMU

This commit calculates whether any guest counters overflowed upon VM
re-entry after a PMU interrupt, injecting an interrupt if necessary.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 535b4c492ff8..9693d9eb69da 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
[ ... ]
> @@ -433,3 +433,33 @@ void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr)
>  
>  	__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf);

Is there a data race here since __vcpu_rmw_sys_reg() is not atomic?

If kvm_pmu_handle_guest_irq() runs in hardirq context and preempts a
sysreg trap handler like access_pmovs() that is currently between
the read and write steps of __vcpu_rmw_sys_reg() for PMOVSSET_EL0:

access_pmovs()
   read PMOVSSET_EL0
   <-- hardware PMU interrupt preempts here
   ...
   write PMOVSSET_EL0

Will the trap handler overwrite the register with a stale value and
erase the hardware overflow event?

Does this missing KVM_REQ_IRQ_PENDING and vcpu kick cause missed wakeups?

Unlike the emulated PMU handler, this does not call
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu) or kvm_vcpu_kick(vcpu). If the
hardware PMU interrupt fires while the VCPU is halted in kvm_vcpu_block(),
will KVM fail to wake it up?

>  }
> +
> +/**
> + * kvm_pmu_part_overflow_status() - Determine if any guest counters have overflowed
> + * @vcpu: Pointer to struct kvm_vcpu
> + *
> + * Determine if any guest counters have overflowed and therefore an
> + * IRQ needs to be injected into the guest. If access is still free,
> + * then the guest hasn't accessed the PMU yet so we know the guest
> + * context is not loaded onto the pCPU and an overflow is impossible.
> + *
> + * Return: True if there was an overflow, false otherwise
> + */
> +bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu)
> +{
> +	struct arm_pmu *pmu;
> +	u64 mask, pmovs, pmint, pmcr;
> +	bool overflow;
> +
> +	if (vcpu->arch.pmu.access == VCPU_PMU_ACCESS_FREE)
> +		return false;

Will this check drop pending PMU interrupts for migrated VMs?

When a VM is migrated, userspace populates the PMU state via KVM_SET_ONE_REG,
but vcpu->arch.pmu.access remains VCPU_PMU_ACCESS_FREE. When KVM executes
the VCPU for the first time, kvm_pmu_update_state() calls this function,
which will return false immediately and set pmu->irq_level = false, dropping
the restored interrupt.

> +
> +	pmu = vcpu->kvm->arch.arm_pmu;
> +	mask = kvm_pmu_guest_counter_mask(pmu);

Can this result in undefined behavior and leak host state if the partitioned
PMU has 0 guest counters?

If a module parameter reserves all counters for the host,
pmu->max_guest_counters is 0. In kvm_pmu_guest_counter_mask(), calling
GENMASK(pmu->max_guest_counters - 1, 0) becomes GENMASK(-1, 0), evaluating to
~0UL. Will this include all host-reserved counters in the mask and inject
spurious PMU interrupts into the guest on host overflows?

> +	pmovs = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
> +	pmint = read_pmintenset();

Since the guest has untrapped access to PMINTENSET_EL1 in partitioned mode,
can virtual PMU interrupts be lost or delayed?

If a guest counter overflows while its PMINTENSET bit is cleared, no hardware
interrupt fires. The host IRQ handler later clears the physical overflow bit
and moves it to the VCPU shadow register.

If the guest later writes to PMINTENSET_EL1 to unmask the interrupt, KVM is
not notified. The hardware will not assert a physical IRQ because the
physical bit is gone, and this function will not evaluate the new state
until an unrelated VM exit.

Will reading pmintenset_el1 directly from hardware read host state due to
incomplete lazy context switching?

If a VCPU is scheduled with access == VCPU_PMU_ACCESS_FREE, kvm_pmu_load()
skips loading the guest's PMU context. If the guest accesses a trapped
register, kvm_pmu_set_guest_owned() transitions the state but does not trigger
a full hardware reload. When the VCPU exits, these hardware reads will see
host state.

> +	pmcr = read_pmcr();

Can the host PMU driver overwrite the guest's PMCR_EL0.E state here?

If a host-reserved counter overflows, the host's armv8pmu_handle_irq()
executes and calls armv8pmu_start(), which executes:

armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E)

Will this force PMCR_EL0.E to 1 in hardware, permanently corrupting the
guest's configuration and causing KVM to evaluate (pmcr & ARMV8_PMU_PMCR_E)
as true?

> +	overflow = (pmcr & ARMV8_PMU_PMCR_E) && (mask & pmovs & pmint);
> +
> +	return overflow;
> +}

> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index de058a5347d1..4af8abf2dde0 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
[ ... ]
> @@ -90,6 +90,8 @@ bool kvm_set_pmuserenr(u64 val);
>  void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_pmu_resync_el0(void);
> +bool kvm_pmu_emul_overflow_status(struct kvm_vcpu *vcpu);
> +bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu);

This isn't a bug, but this patch leaves a dangling declaration for
kvm_pmu_overflow_status() elsewhere in the header, since the function was
renamed to kvm_pmu_emul_overflow_status() in this commit.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=16

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (15 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:23   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 18/20] KVM: selftests: Add find_bit to KVM library Colton Lewis
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Add a new PMU device attr to enable the partitioned PMU for a given
VM. This capability can be set when the PMU is initially configured
before the vCPU starts running and is allowed where PMUv3 and VHE are
supported and the host driver was configured with
arm_pmuv3.reserved_host_counters.

The enabled capability is tracked by the new flag
KVM_ARCH_FLAG_PARTITION_PMU_ENABLED.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/uapi/asm/kvm.h |  2 ++
 arch/arm64/kvm/pmu-direct.c       | 35 ++++++++++++++++++++++++++++---
 arch/arm64/kvm/pmu.c              | 14 +++++++++++++
 include/kvm/arm_pmu.h             |  9 ++++++++
 5 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 891433fe304ac..22b3985b978de 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -353,6 +353,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS		10
 	/* Unhandled SEAs are taken to userspace */
 #define KVM_ARCH_FLAG_EXIT_SEA				11
+	/* Partitioned PMU Enabled */
+#define KVM_ARCH_FLAG_PARTITION_PMU_ENABLED		12
 	unsigned long flags;
 
 	/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index a792a599b9d68..3e0b7619f781d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -436,6 +436,8 @@ enum {
 #define   KVM_ARM_VCPU_PMU_V3_FILTER		2
 #define   KVM_ARM_VCPU_PMU_V3_SET_PMU		3
 #define   KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS	4
+#define   KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION	5
+
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
index 9693d9eb69daa..47fd143cf4ac7 100644
--- a/arch/arm64/kvm/pmu-direct.c
+++ b/arch/arm64/kvm/pmu-direct.c
@@ -44,8 +44,8 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
 }
 
 /**
- * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
- * @vcpu: Pointer to kvm_vcpu struct
+ * kvm_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
+ * @kvm: Pointer to kvm_vcpu struct
  *
  * Determine if given VCPU has a partitioned PMU by extracting that
  * field and passing it to :c:func:`kvm_pmu_is_partitioned`
@@ -55,7 +55,36 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
 bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
 {
 	return kvm_pmu_is_partitioned(vcpu->kvm->arch.arm_pmu) &&
-		false;
+		test_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &vcpu->kvm->arch.flags);
+}
+
+/**
+ * has_kvm_pmu_partition_support() - If we can enable/disable partition
+ *
+ * Return: true if allowed, false otherwise.
+ */
+bool has_kvm_pmu_partition_support(void)
+{
+	return has_host_pmu_partition_support() &&
+		kvm_supports_guest_pmuv3() &&
+		armv8pmu_is_partitioned;
+}
+
+/**
+ * kvm_pmu_partition_enable() - Enable/disable partition flag
+ * @kvm: Pointer to vcpu
+ * @enable: Whether to enable or disable
+ *
+ * If we want to enable the partition, the guest is free to grab
+ * hardware by accessing PMU registers. Otherwise, the host maintains
+ * control.
+ */
+void kvm_pmu_partition_enable(struct kvm *kvm, bool enable)
+{
+	if (enable)
+		set_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &kvm->arch.flags);
+	else
+		clear_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &kvm->arch.flags);
 }
 
 /**
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index f1c66ce678840..add5e7da830b2 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -759,6 +759,19 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
 	}
+	case KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION: {
+		unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr;
+		bool enable;
+
+		if (get_user(enable, uaddr))
+			return -EFAULT;
+
+		if (!has_kvm_pmu_partition_support())
+			return -EPERM;
+
+		kvm_pmu_partition_enable(kvm, enable);
+		return 0;
+	}
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -798,6 +811,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	case KVM_ARM_VCPU_PMU_V3_FILTER:
 	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
 	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
+	case KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION:
 		if (kvm_vcpu_has_pmu(vcpu))
 			return 0;
 	}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 4af8abf2dde0f..131c4b8eec194 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -107,6 +107,8 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu);
 void kvm_pmu_put(struct kvm_vcpu *vcpu);
 
 void kvm_pmu_set_guest_owned(struct kvm_vcpu *vcpu);
+bool has_kvm_pmu_partition_support(void);
+void kvm_pmu_partition_enable(struct kvm *kvm, bool enable);
 
 #if !defined(__KVM_NVHE_HYPERVISOR__)
 bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
@@ -295,6 +297,13 @@ static inline u64 kvm_pmu_guest_counter_mask(void *kvm)
 
 static inline void kvm_pmu_handle_guest_irq(struct arm_pmu *pmu, u64 pmovsr) {}
 
+static inline bool has_kvm_pmu_partition_support(void)
+{
+	return false;
+}
+
+static inline void kvm_pmu_partition_enable(struct kvm *kvm, bool enable) {}
+
 #endif
 
 #endif
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU
  2026-05-04 21:18 ` [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU Colton Lewis
@ 2026-05-04 22:23   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:23 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION modifies VM-wide state without checking if the VM has already run, allowing state corruption.
- [High] get_user() reads a 32-bit unsigned int directly into a 1-byte bool, leading to truncation bugs and failing to enforce strict UAPI input validation.
- [Medium] kvm_arm_pmu_v3_has_attr() unconditionally advertises support for the KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION attribute even when the host hardware does not support it.
- [Low] Mismatched kernel-doc comments for kvm_vcpu_pmu_is_partitioned() and kvm_pmu_partition_enable().
--

commit d9172235a5a8d816682a8078c38e635415737a2b
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: Add vCPU device attr to partition the PMU

This commit introduces a new PMU device attribute to enable partitioned
PMU support for a VM. The capability can be configured before the vCPU starts,
provided PMUv3 and VHE are supported and the host driver has reserved
counters.

> diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c
> index 9693d9eb69daa..47fd143cf4ac7 100644
> --- a/arch/arm64/kvm/pmu-direct.c
> +++ b/arch/arm64/kvm/pmu-direct.c
> @@ -44,8 +44,8 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
>  }
>  
>  /**
> - * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
> - * @vcpu: Pointer to kvm_vcpu struct
> + * kvm_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
> + * @kvm: Pointer to kvm_vcpu struct
>   *
>   * Determine if given VCPU has a partitioned PMU by extracting that
>   * field and passing it to :c:func:`kvm_pmu_is_partitioned`

This isn't a bug, but the function name and parameter in the kernel-doc
block don't match the actual function signature below:

> @@ -55,7 +55,36 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
>  bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)

[ ... ]

> +/**
> + * kvm_pmu_partition_enable() - Enable/disable partition flag
> + * @kvm: Pointer to vcpu
> + * @enable: Whether to enable or disable
> + *
> + * If we want to enable the partition, the guest is free to grab
> + * hardware by accessing PMU registers. Otherwise, the host maintains
> + * control.
> + */
> +void kvm_pmu_partition_enable(struct kvm *kvm, bool enable)

This isn't a bug, but the comment describes @kvm as a pointer to a vcpu,
while the function takes a pointer to struct kvm.

> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index f1c66ce678840..add5e7da830b2 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
[ ... ]
> @@ -759,6 +759,19 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  
>  		return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
>  	}
> +	case KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION: {
> +		unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr;
> +		bool enable;
> +
> +		if (get_user(enable, uaddr))
> +			return -EFAULT;

Is it possible for the get_user() call to truncate the value being read?

Since uaddr is cast to unsigned int __user *, the inline assembly inside
get_user() will load a 32-bit value from userspace. However, the output
variable enable is a 1-byte bool. If userspace passes a value like 256
(0x100), the lowest byte is 0, which could incorrectly result in false.

Additionally, should we explicitly check that the input value is exactly
0 or 1? Accepting any arbitrary value here burns the upper 31 bits and
prevents repurposing them for future flags without breaking backward
compatibility.

> +
> +		if (!has_kvm_pmu_partition_support())
> +			return -EPERM;
> +
> +		kvm_pmu_partition_enable(kvm, enable);
> +		return 0;
> +	}

Could this sequence result in concurrent state modification?

In a multi-vCPU VM, if userspace initializes and starts running vCPU 0, it
could subsequently invoke this ioctl on an uninitialized vCPU 1. Since
kvm_arm_pmu_v3_set_attr() only checks if the individual vCPU is initialized:

	if (vcpu->arch.pmu.created)
		return -EBUSY;

and KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION modifies VM-wide state without
calling kvm_vm_has_ran_once(kvm), the PMU partition mode could be altered
while vCPU 0 is already running.

[ ... ]

> @@ -798,6 +811,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	case KVM_ARM_VCPU_PMU_V3_FILTER:
>  	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
>  	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
> +	case KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION:
>  		if (kvm_vcpu_has_pmu(vcpu))
>  			return 0;
>  	}

Will this advertise the partition capability even when the host hardware
does not support it?

The has_attr handler returns 0 based solely on kvm_vcpu_has_pmu(vcpu), but
the set_attr handler requires has_kvm_pmu_partition_support() to be true. If
userspace relies on KVM_HAS_DEVICE_ATTR to probe capabilities, it might
incorrectly attempt to configure the feature and receive an unexpected -EPERM.
Should this return -ENXIO if has_kvm_pmu_partition_support() is false?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=17

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 18/20] KVM: selftests: Add find_bit to KVM library
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (16 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 21:18 ` [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU Colton Lewis
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Some selftests have a dependency on find_bit and weren't compiling
separately without it, so I've added it to the KVM library here using
the same method as files like rbtree.c.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm   | 1 +
 tools/testing/selftests/kvm/lib/find_bit.c | 1 +
 2 files changed, 2 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 6471fa214a9f9..79a45807865b5 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -5,6 +5,7 @@ all:
 
 LIBKVM += lib/assert.c
 LIBKVM += lib/elf.c
+LIBKVM += lib/find_bit.c
 LIBKVM += lib/guest_modes.c
 LIBKVM += lib/io.c
 LIBKVM += lib/kvm_util.c
diff --git a/tools/testing/selftests/kvm/lib/find_bit.c b/tools/testing/selftests/kvm/lib/find_bit.c
new file mode 100644
index 0000000000000..67d9d9cbca85c
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/find_bit.c
@@ -0,0 +1 @@
+#include "../../../../lib/find_bit.c"
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (17 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 18/20] KVM: selftests: Add find_bit to KVM library Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-04 22:19   ` sashiko-bot
  2026-05-04 21:18 ` [PATCH v7 20/20] KVM: arm64: selftests: Relax testing for exceptions when partitioned Colton Lewis
  2026-05-11 14:57 ` [PATCH v7 00/20] ARM64 PMU Partitioning James Clark
  20 siblings, 1 reply; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Rerun all tests for a Partitioned PMU in vpmu_counter_access.

Create an enum specifying whether we are testing the emulated or
Partitioned PMU and all the test functions are modified to take the
implementation as an argument and make the difference in setup
appropriately.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 .../selftests/kvm/arm64/vpmu_counter_access.c | 94 ++++++++++++++-----
 1 file changed, 73 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
index ae36325c022fb..9702f1d43b832 100644
--- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
@@ -25,9 +25,20 @@
 /* The cycle counter bit position that's common among the PMU registers */
 #define ARMV8_PMU_CYCLE_IDX		31
 
+enum pmu_impl {
+	EMULATED,
+	PARTITIONED
+};
+
+const char *pmu_impl_str[] = {
+	"Emulated",
+	"Partitioned"
+};
+
 struct vpmu_vm {
 	struct kvm_vm *vm;
 	struct kvm_vcpu *vcpu;
+	bool pmu_partitioned;
 };
 
 static struct vpmu_vm vpmu_vm;
@@ -399,7 +410,7 @@ static void guest_code(uint64_t expected_pmcr_n)
 }
 
 /* Create a VM that has one vCPU with PMUv3 configured. */
-static void create_vpmu_vm(void *guest_code)
+static void create_vpmu_vm(void *guest_code, enum pmu_impl impl)
 {
 	struct kvm_vcpu_init init;
 	uint8_t pmuver, ec;
@@ -409,6 +420,13 @@ static void create_vpmu_vm(void *guest_code)
 		.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
 		.addr = (uint64_t)&irq,
 	};
+	bool partition = (impl == PARTITIONED);
+	struct kvm_device_attr part_attr = {
+		.group = KVM_ARM_VCPU_PMU_V3_CTRL,
+		.attr = KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION,
+		.addr = (uint64_t)&partition
+	};
+	int ret;
 
 	/* The test creates the vpmu_vm multiple times. Ensure a clean state */
 	memset(&vpmu_vm, 0, sizeof(vpmu_vm));
@@ -436,6 +454,15 @@ static void create_vpmu_vm(void *guest_code)
 		    "Unexpected PMUVER (0x%x) on the vCPU with PMUv3", pmuver);
 
 	vcpu_ioctl(vpmu_vm.vcpu, KVM_SET_DEVICE_ATTR, &irq_attr);
+
+	ret = __vcpu_has_device_attr(
+		vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL, KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION);
+	if (!ret) {
+		vcpu_ioctl(vpmu_vm.vcpu, KVM_SET_DEVICE_ATTR, &part_attr);
+		vpmu_vm.pmu_partitioned = partition;
+		pr_debug("Set PMU partitioning: %d\n", partition);
+	}
+
 }
 
 static void destroy_vpmu_vm(void)
@@ -461,13 +488,14 @@ static void run_vcpu(struct kvm_vcpu *vcpu, uint64_t pmcr_n)
 	}
 }
 
-static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool expect_fail)
+static void test_create_vpmu_vm_with_nr_counters(
+	unsigned int nr_counters, enum pmu_impl impl, bool expect_fail)
 {
 	struct kvm_vcpu *vcpu;
 	unsigned int prev;
 	int ret;
 
-	create_vpmu_vm(guest_code);
+	create_vpmu_vm(guest_code, impl);
 	vcpu = vpmu_vm.vcpu;
 
 	prev = get_pmcr_n(vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0)));
@@ -489,7 +517,7 @@ static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool
  * Create a guest with one vCPU, set the PMCR_EL0.N for the vCPU to @pmcr_n,
  * and run the test.
  */
-static void run_access_test(uint64_t pmcr_n)
+static void run_access_test(uint64_t pmcr_n, enum pmu_impl impl)
 {
 	uint64_t sp;
 	struct kvm_vcpu *vcpu;
@@ -497,7 +525,7 @@ static void run_access_test(uint64_t pmcr_n)
 
 	pr_debug("Test with pmcr_n %lu\n", pmcr_n);
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, false);
 	vcpu = vpmu_vm.vcpu;
 
 	/* Save the initial sp to restore them later to run the guest again */
@@ -531,14 +559,14 @@ static struct pmreg_sets validity_check_reg_sets[] = {
  * Create a VM, and check if KVM handles the userspace accesses of
  * the PMU register sets in @validity_check_reg_sets[] correctly.
  */
-static void run_pmregs_validity_test(uint64_t pmcr_n)
+static void run_pmregs_validity_test(uint64_t pmcr_n, enum pmu_impl impl)
 {
 	int i;
 	struct kvm_vcpu *vcpu;
 	uint64_t set_reg_id, clr_reg_id, reg_val;
 	uint64_t valid_counters_mask, max_counters_mask;
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, false);
 	vcpu = vpmu_vm.vcpu;
 
 	valid_counters_mask = get_counters_mask(pmcr_n);
@@ -588,11 +616,11 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
  * the vCPU to @pmcr_n, which is larger than the host value.
  * The attempt should fail as @pmcr_n is too big to set for the vCPU.
  */
-static void run_error_test(uint64_t pmcr_n)
+static void run_error_test(uint64_t pmcr_n, enum pmu_impl impl)
 {
-	pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n);
+	pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n", pmcr_n);
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, true);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, true);
 	destroy_vpmu_vm();
 }
 
@@ -600,11 +628,11 @@ static void run_error_test(uint64_t pmcr_n)
  * Return the default number of implemented PMU event counters excluding
  * the cycle counter (i.e. PMCR_EL0.N value) for the guest.
  */
-static uint64_t get_pmcr_n_limit(void)
+static uint64_t get_pmcr_n_limit(enum pmu_impl impl)
 {
 	uint64_t pmcr;
 
-	create_vpmu_vm(guest_code);
+	create_vpmu_vm(guest_code, impl);
 	pmcr = vcpu_get_reg(vpmu_vm.vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0));
 	destroy_vpmu_vm();
 	return get_pmcr_n(pmcr);
@@ -614,7 +642,7 @@ static bool kvm_supports_nr_counters_attr(void)
 {
 	bool supported;
 
-	create_vpmu_vm(NULL);
+	create_vpmu_vm(NULL, EMULATED);
 	supported = !__vcpu_has_device_attr(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
 					    KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS);
 	destroy_vpmu_vm();
@@ -622,22 +650,46 @@ static bool kvm_supports_nr_counters_attr(void)
 	return supported;
 }
 
-int main(void)
+static bool kvm_supports_partition_attr(void)
+{
+	bool supported;
+
+	create_vpmu_vm(NULL, EMULATED);
+	supported = !__vcpu_has_device_attr(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+					    KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION);
+	destroy_vpmu_vm();
+
+	return supported;
+}
+
+void test_pmu(enum pmu_impl impl)
 {
 	uint64_t i, pmcr_n;
 
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
-	TEST_REQUIRE(kvm_supports_vgic_v3());
-	TEST_REQUIRE(kvm_supports_nr_counters_attr());
+	pr_info("Testing PMU: Implementation = %s\n", pmu_impl_str[impl]);
+
+	pmcr_n = get_pmcr_n_limit(impl);
+	pr_debug("PMCR_EL0.N: Limit = %lu\n", pmcr_n);
 
-	pmcr_n = get_pmcr_n_limit();
 	for (i = 0; i <= pmcr_n; i++) {
-		run_access_test(i);
-		run_pmregs_validity_test(i);
+		run_access_test(i, impl);
+		run_pmregs_validity_test(i, impl);
 	}
 
 	for (i = pmcr_n + 1; i < ARMV8_PMU_MAX_COUNTERS; i++)
-		run_error_test(i);
+		run_error_test(i, impl);
+}
+
+int main(void)
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
+	TEST_REQUIRE(kvm_supports_vgic_v3());
+	TEST_REQUIRE(kvm_supports_nr_counters_attr());
+
+	test_pmu(EMULATED);
+
+	if (kvm_supports_partition_attr())
+		test_pmu(PARTITIONED);
 
 	return 0;
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU
  2026-05-04 21:18 ` [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU Colton Lewis
@ 2026-05-04 22:19   ` sashiko-bot
  0 siblings, 0 replies; 42+ messages in thread
From: sashiko-bot @ 2026-05-04 22:19 UTC (permalink / raw)
  To: Colton Lewis; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION attribute expects an unsigned int, but a pointer to a 1-byte bool is passed, leading to an out-of-bounds stack read and potentially incorrect test execution.
- [Low] The 'pmu_partitioned' field added to 'struct vpmu_vm' is assigned but never read or utilized.
--

commit f6902795d8749ec81b1bab36231a48757e124a66
Author: Colton Lewis <coltonlewis@google.com>

KVM: arm64: selftests: Add test case for Partitioned PMU

This commit adds testing for the Partitioned PMU by running all tests for
both Emulated and Partitioned PMU implementations. It modifies the setup
functions to take the implementation type as an argument and configures the
vCPU attribute accordingly.

> diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
> index ae36325c022fb..9702f1d43b832 100644
> --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
> +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
[ ... ]
>  struct vpmu_vm {
>  	struct kvm_vm *vm;
>  	struct kvm_vcpu *vcpu;
> +	bool pmu_partitioned;
>  };

This isn't a bug, but is the pmu_partitioned field used anywhere else in
this test? It looks like it is assigned during setup but never read later
in the code.

[ ... ]
> @@ -409,6 +420,13 @@ static void create_vpmu_vm(void *guest_code)
>  		.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
>  		.addr = (uint64_t)&irq,
>  	};
> +	bool partition = (impl == PARTITIONED);
> +	struct kvm_device_attr part_attr = {
> +		.group = KVM_ARM_VCPU_PMU_V3_CTRL,
> +		.attr = KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION,
> +		.addr = (uint64_t)&partition
> +	};
> +	int ret;

Does this pass a 1-byte boolean to an attribute that expects a 4-byte
unsigned int?

When the kernel calls get_user() to read the
KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION attribute, could it read 3 bytes of
uninitialized stack data alongside the bool? 

If the uninitialized stack data contains non-zero bytes, could the kernel
evaluate the 32-bit value as true and erroneously enable PMU partitioning
even when partition is false?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis@google.com?part=19

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 20/20] KVM: arm64: selftests: Relax testing for exceptions when partitioned
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (18 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU Colton Lewis
@ 2026-05-04 21:18 ` Colton Lewis
  2026-05-11 14:57 ` [PATCH v7 00/20] ARM64 PMU Partitioning James Clark
  20 siblings, 0 replies; 42+ messages in thread
From: Colton Lewis @ 2026-05-04 21:18 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

Because the Partitioned PMU must lean heavily on underlying hardware
support, it can't guarantee an exception occurs when accessing an
invalid pmc index.

The ARM manual specifies that accessing PMEVCNTR<n>_EL0 where n is
greater than the number of counters on the system is constrained
unpredictable when FEAT_FGT is not implemented, and it is desired the
Partitioned PMU still work without FEAT_FGT.

Though KVM could enforce exceptions here since all PMU accesses
without FEAT_FGT are trapped, that creates further difficulties. For
one example, the manual also says that after writing a value to
PMSELR_EL0 greater than the number of counters on a system, direct
reads will return an unknown value, meaning KVM could not rely on the
hardware register to hold the correct value.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 .../selftests/kvm/arm64/vpmu_counter_access.c | 20 ++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
index 9702f1d43b832..27b7d7b2a059a 100644
--- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
@@ -38,10 +38,14 @@ const char *pmu_impl_str[] = {
 struct vpmu_vm {
 	struct kvm_vm *vm;
 	struct kvm_vcpu *vcpu;
+};
+
+struct guest_context {
 	bool pmu_partitioned;
 };
 
 static struct vpmu_vm vpmu_vm;
+static struct guest_context guest_context;
 
 struct pmreg_sets {
 	uint64_t set_reg_id;
@@ -342,11 +346,16 @@ static void test_access_invalid_pmc_regs(struct pmc_accessor *acc, int pmc_idx)
 	/*
 	 * Reading/writing the event count/type registers should cause
 	 * an UNDEFINED exception.
+	 *
+	 * If the pmu is partitioned, we can't guarantee it because
+	 * hardware doesn't.
 	 */
-	TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->read_cntr(pmc_idx));
-	TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->write_cntr(pmc_idx, 0));
-	TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->read_typer(pmc_idx));
-	TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->write_typer(pmc_idx, 0));
+	if (!guest_context.pmu_partitioned) {
+		TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->read_cntr(pmc_idx));
+		TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->write_cntr(pmc_idx, 0));
+		TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->read_typer(pmc_idx));
+		TEST_EXCEPTION(ESR_ELx_EC_UNKNOWN, acc->write_typer(pmc_idx, 0));
+	}
 	/*
 	 * The bit corresponding to the (unimplemented) counter in
 	 * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers should be RAZ.
@@ -459,7 +468,7 @@ static void create_vpmu_vm(void *guest_code, enum pmu_impl impl)
 		vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL, KVM_ARM_VCPU_PMU_V3_ENABLE_PARTITION);
 	if (!ret) {
 		vcpu_ioctl(vpmu_vm.vcpu, KVM_SET_DEVICE_ATTR, &part_attr);
-		vpmu_vm.pmu_partitioned = partition;
+		guest_context.pmu_partitioned = partition;
 		pr_debug("Set PMU partitioning: %d\n", partition);
 	}
 
@@ -511,6 +520,7 @@ static void test_create_vpmu_vm_with_nr_counters(
 		TEST_ASSERT(!ret, KVM_IOCTL_ERROR(KVM_SET_DEVICE_ATTR, ret));
 
 	vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL, KVM_ARM_VCPU_PMU_V3_INIT, NULL);
+	sync_global_to_guest(vpmu_vm.vm, guest_context);
 }
 
 /*
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 00/20] ARM64 PMU Partitioning
  2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
                   ` (19 preceding siblings ...)
  2026-05-04 21:18 ` [PATCH v7 20/20] KVM: arm64: selftests: Relax testing for exceptions when partitioned Colton Lewis
@ 2026-05-11 14:57 ` James Clark
  20 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2026-05-11 14:57 UTC (permalink / raw)
  To: Colton Lewis
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-perf-users,
	linux-kselftest



On 04/05/2026 10:17 pm, Colton Lewis wrote:
> This series creates a new PMU scheme on ARM, a partitioned PMU that
> allows reserving a subset of counters for more direct guest access,
> significantly reducing overhead. More details, including performance
> benchmarks, can be read in the v1 cover letter linked below.
> 
> An overview of what this series accomplishes was presented at KVM
> Forum 2025. Slides [1] and video [2] are linked below.
> 
> After a few false starts, meeting with Will Deacon and Mark Rutland to
> discuss implementation ideas, and a few more false starts, I finally
> have an implementation of dynamic counter reservation that works
> without disrupting host perf too much. Now the host only loses access
> to the guest counters when a vCPU resides on the CPU.
> 
> The key was creating perf_pmu_resched_update, which behaves exactly
> like perf_pmu_resched except it takes a callback to call in between
> when the perf events are scheduled out and when they are scheduled
> back in. That allows us to update the PMU's available counters when we
> know they are not currently in use without needing to expose private
> perf core functions and triple check they are not being called in a
> way that violates existing assumptions.
> 
> Because this introduces a possibility of perf reschedule during vCPU
> load, I've optimized to only do that operation if there are host
> events occupying the intended guest counters at the time of the load.
> 
> The kernel command line parameter for the driver still exists, but now
> only defines an upper limit of counters the guest might use rather
> than taking those counters from the host permanently.
> 
> v7:
> 
> * Implement dynamic counter reservation as described above. One side
>    effect is the PMUv3 driver now needs much fewer changes to enforce
>    the boundary.
> 
> * Move register accesses out of fast path for non-FGT hardware. The
>    performance impact was negligible and this moves bloat out of the
>    fast path and allows a more reliable design with more code sharing.
> 
> * Make PMCCNTR a special case in the context swap again because trying
>    to access it with PMXEVCNTR is undefined.
> 
> * Fix a bug where kvm_pmu_guest_counter_mask was using & instead of |.
> 
> * Re-expose the dedicated instruction counter to the host since it was
>    decided the guest will not own it.
> 
> * Change the global armv8pmu_reserved_host_counters to
>    armv8pmu_is_partitoned because it was only used in boolean checks.
> 
> * Fix typo in vcpu attribute commit so the spelling of the flag in the
>    commit message matches the code.
> 
> * Rebase to v7.0-rc7
> 
> v6:
> https://lore.kernel.org/kvmarm/20260209221414.2169465-1-coltonlewis@google.com/
> 
> v5:
> https://lore.kernel.org/kvmarm/20251209205121.1871534-1-coltonlewis@google.com/
> 
> v4:
> https://lore.kernel.org/kvmarm/20250714225917.1396543-1-coltonlewis@google.com/
> 
> v3:
> https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/
> 
> v2:
> https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/
> 
> v1:
> https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/
> 
> [1] https://gitlab.com/qemu-project/kvm-forum/-/raw/main/_attachments/2025/Optimizing__itvHkhc.pdf
> [2] https://www.youtube.com/watch?v=YRzZ8jMIA6M&list=PLW3ep1uCIRfxwmllXTOA2txfDWN6vUOHp&index=9
> 
> Colton Lewis (19):
>    arm64: cpufeature: Add cpucap for HPMN0
>    KVM: arm64: Reorganize PMU functions
>    perf: arm_pmuv3: Generalize counter bitmasks
>    perf: arm_pmuv3: Check cntr_mask before using pmccntr
>    perf: arm_pmuv3: Add method to partition the PMU
>    KVM: arm64: Set up FGT for Partitioned PMU
>    KVM: arm64: Add Partitioned PMU register trap handlers
>    KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
>    KVM: arm64: Context swap Partitioned PMU guest registers
>    KVM: arm64: Enforce PMU event filter at vcpu_load()
>    perf: Add perf_pmu_resched_update()
>    KVM: arm64: Apply dynamic guest counter reservations
>    KVM: arm64: Implement lazy PMU context swaps
>    perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
>    KVM: arm64: Detect overflows for the Partitioned PMU
>    KVM: arm64: Add vCPU device attr to partition the PMU
>    KVM: selftests: Add find_bit to KVM library
>    KVM: arm64: selftests: Add test case for Partitioned PMU
>    KVM: arm64: selftests: Relax testing for exceptions when partitioned
> 
> Marc Zyngier (1):
>    KVM: arm64: Reorganize PMU includes
> 
>   arch/arm/include/asm/arm_pmuv3.h              |  18 +
>   arch/arm64/include/asm/arm_pmuv3.h            |  12 +-
>   arch/arm64/include/asm/kvm_host.h             |  17 +-
>   arch/arm64/include/asm/kvm_types.h            |   6 +-
>   arch/arm64/include/uapi/asm/kvm.h             |   2 +
>   arch/arm64/kernel/cpufeature.c                |   8 +
>   arch/arm64/kvm/Makefile                       |   2 +-
>   arch/arm64/kvm/arm.c                          |   2 +
>   arch/arm64/kvm/config.c                       |  41 +-
>   arch/arm64/kvm/debug.c                        |  31 +-
>   arch/arm64/kvm/pmu-direct.c                   | 494 ++++++++++++
>   arch/arm64/kvm/pmu-emul.c                     | 674 +----------------
>   arch/arm64/kvm/pmu.c                          | 701 ++++++++++++++++++
>   arch/arm64/kvm/sys_regs.c                     | 250 ++++++-
>   arch/arm64/tools/cpucaps                      |   1 +
>   arch/arm64/tools/sysreg                       |   6 +-
>   drivers/perf/arm_pmuv3.c                      | 111 ++-
>   include/kvm/arm_pmu.h                         | 110 +++
>   include/linux/perf/arm_pmu.h                  |   3 +
>   include/linux/perf/arm_pmuv3.h                |  14 +-
>   include/linux/perf_event.h                    |   3 +
>   kernel/events/core.c                          |  28 +-
>   tools/testing/selftests/kvm/Makefile.kvm      |   1 +
>   .../selftests/kvm/arm64/vpmu_counter_access.c | 112 ++-
>   tools/testing/selftests/kvm/lib/find_bit.c    |   1 +
>   25 files changed, 1861 insertions(+), 787 deletions(-)
>   create mode 100644 arch/arm64/kvm/pmu-direct.c
>   create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c
> 
> 
> base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
> --
> 2.54.0.545.g6539524ca2-goog

I tested it a bit and ran the kselftests and it all seems to be working 
ok. Some of the critical sashiko comments look like they are worth 
looking into though: 
https://sashiko.dev/#/patchset/20260504211813.1804997-1-coltonlewis%40google.com

For example writing to PMCR_EL0.P from EL2 resets the host's counters, 
even if it's KVM doing it after trapping a write from the guest.


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2026-05-11 14:57 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 21:17 [PATCH v7 00/20] ARM64 PMU Partitioning Colton Lewis
2026-05-04 21:17 ` [PATCH v7 01/20] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
2026-05-04 21:17 ` [PATCH v7 02/20] KVM: arm64: Reorganize PMU includes Colton Lewis
2026-05-04 21:44   ` sashiko-bot
2026-05-04 21:17 ` [PATCH v7 03/20] KVM: arm64: Reorganize PMU functions Colton Lewis
2026-05-04 22:02   ` sashiko-bot
2026-05-04 21:17 ` [PATCH v7 04/20] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
2026-05-04 21:41   ` sashiko-bot
2026-05-04 21:17 ` [PATCH v7 05/20] perf: arm_pmuv3: Check cntr_mask before using pmccntr Colton Lewis
2026-05-04 21:49   ` sashiko-bot
2026-05-04 21:17 ` [PATCH v7 06/20] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
2026-05-04 21:53   ` sashiko-bot
2026-05-11 14:51   ` James Clark
2026-05-04 21:18 ` [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
2026-05-04 22:09   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers Colton Lewis
2026-05-04 22:06   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU Colton Lewis
2026-05-04 22:02   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 10/20] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
2026-05-04 22:01   ` sashiko-bot
2026-05-11 14:49   ` James Clark
2026-05-04 21:18 ` [PATCH v7 11/20] KVM: arm64: Enforce PMU event filter at vcpu_load() Colton Lewis
2026-05-04 22:31   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 12/20] perf: Add perf_pmu_resched_update() Colton Lewis
2026-05-04 21:55   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 13/20] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
2026-05-04 22:11   ` sashiko-bot
2026-05-11 14:47   ` James Clark
2026-05-04 21:18 ` [PATCH v7 14/20] KVM: arm64: Implement lazy PMU context swaps Colton Lewis
2026-05-04 22:13   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 15/20] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
2026-05-04 22:18   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 16/20] KVM: arm64: Detect overflows for the Partitioned PMU Colton Lewis
2026-05-04 23:47   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 17/20] KVM: arm64: Add vCPU device attr to partition the PMU Colton Lewis
2026-05-04 22:23   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 18/20] KVM: selftests: Add find_bit to KVM library Colton Lewis
2026-05-04 21:18 ` [PATCH v7 19/20] KVM: arm64: selftests: Add test case for Partitioned PMU Colton Lewis
2026-05-04 22:19   ` sashiko-bot
2026-05-04 21:18 ` [PATCH v7 20/20] KVM: arm64: selftests: Relax testing for exceptions when partitioned Colton Lewis
2026-05-11 14:57 ` [PATCH v7 00/20] ARM64 PMU Partitioning James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox