[PATCH v3 0/2] KVM: arm64: PMU: Use multiple host PMUs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] KVM: arm64: PMU: Use multiple host PMUs
@ 2026-02-25  4:31 Akihiko Odaki
  2026-02-25  4:31 ` [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY Akihiko Odaki
  2026-02-25  4:31 ` [PATCH v3 2/2] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY Akihiko Odaki
  0 siblings, 2 replies; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-25  4:31 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
	Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
  Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
	kvm, linux-doc, linux-kselftest, Akihiko Odaki

On a heterogeneous arm64 system, KVM's PMU emulation is based on the
features of a single host PMU instance. When a vCPU is migrated to a
pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
incrementing.

Although this behavior is permitted by the architecture, Windows does
not handle it gracefully and may crash with a division-by-zero error.

The current workaround requires VMMs to pin vCPUs to a set of pCPUs
that share a compatible PMU. This is difficult to implement correctly in
QEMU/libvirt, where pinning occurs after vCPU initialization, and it
also restricts the guest to a subset of available pCPUs.

This patch introduces the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
attribute. If set, PMUv3 will be emulated without programmable event
counters. KVM will be able to run VCPUs on any physical CPUs with a
compatible hardware PMU.

This allows Windows guests to run reliably on heterogeneous systems
without crashing, even without vCPU pinning, and enables VMMs to
schedule vCPUs across all available pCPUs, making full use of the host
hardware.

A QEMU patch that demonstrates the usage of the new attribute is
available at:
https://lore.kernel.org/qemu-devel/20260225-kvm-v2-1-b8d743db0f73@rsg.ci.i.u-tokyo.ac.jp/
("[PATCH RFC v2] target/arm/kvm: Choose PMU backend")

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
Changes in v3:
- Renamed the attribute to KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY.
- Changed to request the creation of perf counters when loading vCPU.
- Link to v2: https://lore.kernel.org/r/20250806-hybrid-v2-0-0661aec3af8c@rsg.ci.i.u-tokyo.ac.jp

Changes in v2:
- Added the KVM_ARM_VCPU_PMU_V3_COMPOSITION attribute to opt in the
  feature.
- Added code to handle overflow.
- Link to v1: https://lore.kernel.org/r/20250319-hybrid-v1-1-4d1ada10e705@daynix.com

---
Akihiko Odaki (2):
      KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
      KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY

 Documentation/virt/kvm/devices/vcpu.rst            |  29 ++++
 arch/arm64/include/asm/kvm_host.h                  |   3 +
 arch/arm64/include/uapi/asm/kvm.h                  |   1 +
 arch/arm64/kvm/arm.c                               |   8 +-
 arch/arm64/kvm/pmu-emul.c                          | 155 ++++++++++++++-------
 include/kvm/arm_pmu.h                              |   1 +
 .../selftests/kvm/arm64/vpmu_counter_access.c      | 148 ++++++++++++++++----
 7 files changed, 268 insertions(+), 77 deletions(-)
---
base-commit: ef87500dc466ef424e4fc344b5063d345e18bf73
change-id: 20250224-hybrid-01d5ff47edd2

Best regards,
--  
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-25  4:31 [PATCH v3 0/2] KVM: arm64: PMU: Use multiple host PMUs Akihiko Odaki
@ 2026-02-25  4:31 ` Akihiko Odaki
  2026-02-26 11:54   ` Oliver Upton
  2026-02-25  4:31 ` [PATCH v3 2/2] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY Akihiko Odaki
  1 sibling, 1 reply; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-25  4:31 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
	Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
  Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
	kvm, linux-doc, linux-kselftest, Akihiko Odaki

On a heterogeneous arm64 system, KVM's PMU emulation is based on the
features of a single host PMU instance. When a vCPU is migrated to a
pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
incrementing.

Although this behavior is permitted by the architecture, Windows does
not handle it gracefully and may crash with a division-by-zero error.

The current workaround requires VMMs to pin vCPUs to a set of pCPUs
that share a compatible PMU. This is difficult to implement correctly in
QEMU/libvirt, where pinning occurs after vCPU initialization, and it
also restricts the guest to a subset of available pCPUs.

Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
compatible with all pCPUs but that does not support programmable
event counters which may have different feature sets on different PMUs.

This allows Windows guests to run reliably on heterogeneous systems
without crashing, even without vCPU pinning, and enables VMMs to
schedule vCPUs across all available pCPUs, making full use of the host
hardware.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
 Documentation/virt/kvm/devices/vcpu.rst |  29 ++++++
 arch/arm64/include/asm/kvm_host.h       |   3 +
 arch/arm64/include/uapi/asm/kvm.h       |   1 +
 arch/arm64/kvm/arm.c                    |   8 +-
 arch/arm64/kvm/pmu-emul.c               | 155 +++++++++++++++++++++-----------
 include/kvm/arm_pmu.h                   |   1 +
 6 files changed, 146 insertions(+), 51 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 60bf205cb373..e0aeb1897d77 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,6 +161,35 @@ explicitly selected, or the number of counters is out of range for the
 selected PMU. Selecting a new PMU cancels the effect of setting this
 attribute.
 
+1.6 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
+------------------------------------------------------
+
+:Parameters: no additional parameter in kvm_device_attr.addr
+
+:Returns:
+
+	 =======  =====================================================
+	 -EBUSY   Attempted to set after initializing PMUv3 or running
+		  VCPU, or attempted to set for the first time after
+		  setting an event filter
+	 -ENXIO   Attempted to get before setting
+	 -ENODEV  Attempted to set while PMUv3 not supported
+	 =======  =====================================================
+
+If set, PMUv3 will be emulated without programmable event counters. The VCPU
+will use any compatible hardware PMU. This attribute is particularly useful on
+heterogeneous systems where different hardware PMUs cover different physical
+CPUs. The compatibility of hardware PMUs can be checked with
+KVM_ARM_VCPU_PMU_V3_SET_PMU. All VCPUs in a VM share this attribute. It isn't
+possible to set it for the first time if a PMU event filter is already present.
+
+Note that KVM will not make any attempts to run the VCPU on the physical CPUs
+with compatible hardware PMUs. This is entirely left to userspace. However,
+attempting to run the VCPU on an unsupported CPU will fail and KVM_RUN will
+return with exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct
+by setting hardware_entry_failure_reason field to
+KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and the cpu field to the processor id.
+
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 =================================
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 64302c438355..d1b0c71afbfe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -54,6 +54,7 @@
 #define KVM_REQ_NESTED_S2_UNMAP		KVM_ARCH_REQ(8)
 #define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(9)
 #define KVM_REQ_MAP_L1_VNCR_EL2		KVM_ARCH_REQ(10)
+#define KVM_REQ_CREATE_PMU		KVM_ARCH_REQ(11)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
@@ -350,6 +351,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_GUEST_HAS_SVE			9
 	/* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
 #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS		10
+	/* PMUv3 is emulated without progammable event counters */
+#define KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY	11
 	unsigned long flags;
 
 	/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ed5f3892674c..7fb7bf07df76 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -436,6 +436,7 @@ enum {
 #define   KVM_ARM_VCPU_PMU_V3_FILTER		2
 #define   KVM_ARM_VCPU_PMU_V3_SET_PMU		3
 #define   KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS	4
+#define   KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY	5
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 052bf0d4d0b0..6764d0bb3994 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		kvm_vcpu_load_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
 	kvm_vcpu_pmu_restore_guest(vcpu);
+	if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
+		kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
 	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
 		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
@@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_vcpu_reload_pmu(vcpu);
 
+		if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
+			kvm_vcpu_create_pmu(vcpu);
+
 		if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
 			kvm_vcpu_pmu_restore_guest(vcpu);
 
@@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
 	 * When the vCPU has a PMU, but no PMU is set for the guest
 	 * yet, set the default one.
 	 */
-	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
+	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
+	    !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
 		ret = kvm_arm_set_default_pmu(kvm);
 
 	/* Prepare for nested if required */
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index b03dbda7f1ab..2c19e037d432 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -619,18 +619,24 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 	}
 }
 
-static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
+static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
-	unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
+	u64 mask = 0;
 
-	if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
-		return false;
+	if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
+		mask |= kvm_pmu_hyp_counter_mask(vcpu);
 
-	if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
-		return mdcr & MDCR_EL2_HPME;
+	if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
+		mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
 
-	return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
+	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
+}
+
+static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
+{
+	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
+
+	return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
 }
 
 static bool kvm_pmc_counts_at_el0(struct kvm_pmc *pmc)
@@ -662,10 +668,8 @@ static bool kvm_pmc_counts_at_el2(struct kvm_pmc *pmc)
 	return kvm_pmc_read_evtreg(pmc) & ARMV8_PMU_INCLUDE_EL2;
 }
 
-static int kvm_map_pmu_event(struct kvm *kvm, unsigned int eventsel)
+static int kvm_map_pmu_event(struct arm_pmu *pmu, unsigned int eventsel)
 {
-	struct arm_pmu *pmu = kvm->arch.arm_pmu;
-
 	/*
 	 * The CPU PMU likely isn't PMUv3; let the driver provide a mapping
 	 * for the guest's PMUv3 event ID.
@@ -676,6 +680,23 @@ static int kvm_map_pmu_event(struct kvm *kvm, unsigned int eventsel)
 	return eventsel;
 }
 
+static struct arm_pmu *kvm_pmu_probe_armpmu(int cpu)
+{
+	struct arm_pmu_entry *entry;
+	struct arm_pmu *pmu;
+
+	guard(mutex)(&arm_pmus_lock);
+
+	list_for_each_entry(entry, &arm_pmus, entry) {
+		pmu = entry->arm_pmu;
+
+		if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
+			return pmu;
+	}
+
+	return NULL;
+}
+
 /**
  * kvm_pmu_create_perf_event - create a perf event for a counter
  * @pmc: Counter context
@@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
 	int eventsel;
 	u64 evtreg;
 
+	if (!arm_pmu) {
+		arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
+		if (!arm_pmu) {
+			vcpu_set_on_unsupported_cpu(vcpu);
+			return;
+		}
+	}
+
 	evtreg = kvm_pmc_read_evtreg(pmc);
 
 	kvm_pmu_stop_counter(pmc);
@@ -717,7 +746,7 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
 	 * Don't create an event if we're running on hardware that requires
 	 * PMUv3 event translation and we couldn't find a valid mapping.
 	 */
-	eventsel = kvm_map_pmu_event(vcpu->kvm, eventsel);
+	eventsel = kvm_map_pmu_event(arm_pmu, eventsel);
 	if (eventsel < 0)
 		return;
 
@@ -805,42 +834,6 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
 	list_add_tail(&entry->entry, &arm_pmus);
 }
 
-static struct arm_pmu *kvm_pmu_probe_armpmu(void)
-{
-	struct arm_pmu_entry *entry;
-	struct arm_pmu *pmu;
-	int cpu;
-
-	guard(mutex)(&arm_pmus_lock);
-
-	/*
-	 * It is safe to use a stale cpu to iterate the list of PMUs so long as
-	 * the same value is used for the entirety of the loop. Given this, and
-	 * the fact that no percpu data is used for the lookup there is no need
-	 * to disable preemption.
-	 *
-	 * It is still necessary to get a valid cpu, though, to probe for the
-	 * default PMU instance as userspace is not required to specify a PMU
-	 * type. In order to uphold the preexisting behavior KVM selects the
-	 * PMU instance for the core during vcpu init. A dependent use
-	 * case would be a user with disdain of all things big.LITTLE that
-	 * affines the VMM to a particular cluster of cores.
-	 *
-	 * In any case, userspace should just do the sane thing and use the UAPI
-	 * to select a PMU type directly. But, be wary of the baggage being
-	 * carried here.
-	 */
-	cpu = raw_smp_processor_id();
-	list_for_each_entry(entry, &arm_pmus, entry) {
-		pmu = entry->arm_pmu;
-
-		if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
-			return pmu;
-	}
-
-	return NULL;
-}
-
 static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1)
 {
 	u32 hi[2], lo[2];
@@ -883,6 +876,9 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
 	u64 val, mask = 0;
 	int base, i, nr_events;
 
+	if (!cpu_pmu)
+		return 0;
+
 	if (!pmceid1) {
 		val = compute_pmceid0(cpu_pmu);
 		base = 0;
@@ -921,6 +917,15 @@ void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu)
 	kvm_pmu_reprogram_counter_mask(vcpu, mask);
 }
 
+void kvm_vcpu_create_pmu(struct kvm_vcpu *vcpu)
+{
+	unsigned long mask = kvm_pmu_enabled_counter_mask(vcpu);
+	int i;
+
+	for_each_set_bit(i, &mask, 32)
+		kvm_pmu_create_perf_event(kvm_vcpu_idx_to_pmc(vcpu, i));
+}
+
 int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.pmu.created)
@@ -1011,6 +1016,9 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
 {
 	struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
 
+	if (!arm_pmu)
+		return 0;
+
 	/*
 	 * PMUv3 requires that all event counters are capable of counting any
 	 * event, though the same may not be true of non-PMUv3 hardware.
@@ -1065,7 +1073,24 @@ static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
  */
 int kvm_arm_set_default_pmu(struct kvm *kvm)
 {
-	struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu();
+	/*
+	 * It is safe to use a stale cpu to iterate the list of PMUs so long as
+	 * the same value is used for the entirety of the loop. Given this, and
+	 * the fact that no percpu data is used for the lookup there is no need
+	 * to disable preemption.
+	 *
+	 * It is still necessary to get a valid cpu, though, to probe for the
+	 * default PMU instance as userspace is not required to specify a PMU
+	 * type. In order to uphold the preexisting behavior KVM selects the
+	 * PMU instance for the core during vcpu init. A dependent use
+	 * case would be a user with disdain of all things big.LITTLE that
+	 * affines the VMM to a particular cluster of cores.
+	 *
+	 * In any case, userspace should just do the sane thing and use the UAPI
+	 * to select a PMU type directly. But, be wary of the baggage being
+	 * carried here.
+	 */
+	struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(raw_smp_processor_id());
 
 	if (!arm_pmu)
 		return -ENODEV;
@@ -1094,6 +1119,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 			}
 
 			kvm_arm_set_pmu(kvm, arm_pmu);
+			clear_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags);
 			cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus);
 			ret = 0;
 			break;
@@ -1104,11 +1130,33 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 	return ret;
 }
 
+static int kvm_arm_pmu_v3_set_pmu_fixed_counters_only(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+
+	lockdep_assert_held(&kvm->arch.config_lock);
+
+	if (kvm_vm_has_ran_once(kvm) ||
+	    (kvm->arch.pmu_filter &&
+	     !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags)))
+		return -EBUSY;
+
+	kvm->arch.arm_pmu = NULL;
+	kvm_arm_set_nr_counters(kvm, 0);
+	set_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags);
+	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
+
+	return 0;
+}
+
 static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n)
 {
 	struct kvm *kvm = vcpu->kvm;
 
-	if (!kvm->arch.arm_pmu)
+	lockdep_assert_held(&kvm->arch.config_lock);
+
+	if (!kvm->arch.arm_pmu &&
+	    !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
 		return -EINVAL;
 
 	if (n > kvm_arm_pmu_get_max_counters(kvm))
@@ -1223,6 +1271,8 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
 	}
+	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
+		return kvm_arm_pmu_v3_set_pmu_fixed_counters_only(vcpu);
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		irq = vcpu->arch.pmu.irq_num;
 		return put_user(irq, uaddr);
 	}
+	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
+		lockdep_assert_held(&vcpu->kvm->arch.config_lock);
+		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
+			return 0;
 	}
 
 	return -ENXIO;
@@ -1262,6 +1316,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	case KVM_ARM_VCPU_PMU_V3_FILTER:
 	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
 	case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
+	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
 		if (kvm_vcpu_has_pmu(vcpu))
 			return 0;
 	}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 96754b51b411..197ff8e25128 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -57,6 +57,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 				    u64 select_idx);
 void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu);
+void kvm_vcpu_create_pmu(struct kvm_vcpu *vcpu);
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY
  2026-02-25  4:31 [PATCH v3 0/2] KVM: arm64: PMU: Use multiple host PMUs Akihiko Odaki
  2026-02-25  4:31 ` [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY Akihiko Odaki
@ 2026-02-25  4:31 ` Akihiko Odaki
  1 sibling, 0 replies; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-25  4:31 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
	Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
  Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
	kvm, linux-doc, linux-kselftest, Akihiko Odaki

Assert the following:
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY is unset at initialization.
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set.
- Setting KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY for the first time
  after setting an event filter results in EBUSY.
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set again even if an
  event filter has already been set.
- Setting KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY after running a VCPU
  results in EBUSY.
- The existing test cases pass with
  KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY set.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
 .../selftests/kvm/arm64/vpmu_counter_access.c      | 148 +++++++++++++++++----
 1 file changed, 122 insertions(+), 26 deletions(-)

diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
index ae36325c022f..156bfa636923 100644
--- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
@@ -403,12 +403,7 @@ static void create_vpmu_vm(void *guest_code)
 {
 	struct kvm_vcpu_init init;
 	uint8_t pmuver, ec;
-	uint64_t dfr0, irq = 23;
-	struct kvm_device_attr irq_attr = {
-		.group = KVM_ARM_VCPU_PMU_V3_CTRL,
-		.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
-		.addr = (uint64_t)&irq,
-	};
+	uint64_t dfr0;
 
 	/* The test creates the vpmu_vm multiple times. Ensure a clean state */
 	memset(&vpmu_vm, 0, sizeof(vpmu_vm));
@@ -434,8 +429,6 @@ static void create_vpmu_vm(void *guest_code)
 	TEST_ASSERT(pmuver != ID_AA64DFR0_EL1_PMUVer_IMP_DEF &&
 		    pmuver >= ID_AA64DFR0_EL1_PMUVer_IMP,
 		    "Unexpected PMUVER (0x%x) on the vCPU with PMUv3", pmuver);
-
-	vcpu_ioctl(vpmu_vm.vcpu, KVM_SET_DEVICE_ATTR, &irq_attr);
 }
 
 static void destroy_vpmu_vm(void)
@@ -461,15 +454,25 @@ static void run_vcpu(struct kvm_vcpu *vcpu, uint64_t pmcr_n)
 	}
 }
 
-static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool expect_fail)
+static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters,
+						 bool fixed_counters_only,
+						 bool expect_fail)
 {
 	struct kvm_vcpu *vcpu;
 	unsigned int prev;
 	int ret;
+	uint64_t irq = 23;
 
 	create_vpmu_vm(guest_code);
 	vcpu = vpmu_vm.vcpu;
 
+	if (fixed_counters_only)
+		vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+				     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+	vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_IRQ, &irq);
+
 	prev = get_pmcr_n(vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0)));
 
 	ret = __vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
@@ -489,15 +492,15 @@ static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool
  * Create a guest with one vCPU, set the PMCR_EL0.N for the vCPU to @pmcr_n,
  * and run the test.
  */
-static void run_access_test(uint64_t pmcr_n)
+static void run_access_test(uint64_t pmcr_n, bool fixed_counters_only)
 {
 	uint64_t sp;
 	struct kvm_vcpu *vcpu;
 	struct kvm_vcpu_init init;
 
-	pr_debug("Test with pmcr_n %lu\n", pmcr_n);
+	pr_debug("Test with pmcr_n %lu, fixed_counters_only %d\n", pmcr_n, fixed_counters_only);
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, false);
 	vcpu = vpmu_vm.vcpu;
 
 	/* Save the initial sp to restore them later to run the guest again */
@@ -531,14 +534,14 @@ static struct pmreg_sets validity_check_reg_sets[] = {
  * Create a VM, and check if KVM handles the userspace accesses of
  * the PMU register sets in @validity_check_reg_sets[] correctly.
  */
-static void run_pmregs_validity_test(uint64_t pmcr_n)
+static void run_pmregs_validity_test(uint64_t pmcr_n, bool fixed_counters_only)
 {
 	int i;
 	struct kvm_vcpu *vcpu;
 	uint64_t set_reg_id, clr_reg_id, reg_val;
 	uint64_t valid_counters_mask, max_counters_mask;
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, false);
 	vcpu = vpmu_vm.vcpu;
 
 	valid_counters_mask = get_counters_mask(pmcr_n);
@@ -588,11 +591,11 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
  * the vCPU to @pmcr_n, which is larger than the host value.
  * The attempt should fail as @pmcr_n is too big to set for the vCPU.
  */
-static void run_error_test(uint64_t pmcr_n)
+static void run_error_test(uint64_t pmcr_n, bool fixed_counters_only)
 {
 	pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n);
 
-	test_create_vpmu_vm_with_nr_counters(pmcr_n, true);
+	test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, true);
 	destroy_vpmu_vm();
 }
 
@@ -622,22 +625,115 @@ static bool kvm_supports_nr_counters_attr(void)
 	return supported;
 }
 
-int main(void)
+static void test_config(uint64_t pmcr_n, bool fixed_counters_only)
 {
-	uint64_t i, pmcr_n;
-
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
-	TEST_REQUIRE(kvm_supports_vgic_v3());
-	TEST_REQUIRE(kvm_supports_nr_counters_attr());
+	uint64_t i;
 
-	pmcr_n = get_pmcr_n_limit();
 	for (i = 0; i <= pmcr_n; i++) {
-		run_access_test(i);
-		run_pmregs_validity_test(i);
+		run_access_test(i, fixed_counters_only);
+		run_pmregs_validity_test(i, fixed_counters_only);
 	}
 
 	for (i = pmcr_n + 1; i < ARMV8_PMU_MAX_COUNTERS; i++)
-		run_error_test(i);
+		run_error_test(i, fixed_counters_only);
+}
+
+static void test_fixed_counters_only(void)
+{
+	struct kvm_pmu_event_filter filter = { .nevents = 0 };
+	struct kvm_vm *vm;
+	struct kvm_vcpu *running_vcpu;
+	struct kvm_vcpu *stopped_vcpu;
+	struct kvm_vcpu_init init;
+	int ret;
+	uint64_t irq = 23;
+
+	create_vpmu_vm(guest_code);
+	ret = __vcpu_has_device_attr(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+				     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY);
+	if (ret) {
+		TEST_ASSERT(ret == -1 && errno == ENXIO,
+			    KVM_IOCTL_ERROR(KVM_GET_DEVICE_ATTR, ret));
+		destroy_vpmu_vm();
+		return;
+	}
+
+	/* Assert that FIXED_COUNTERS_ONLY is unset at initialization. */
+	ret = __vcpu_device_attr_get(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+				     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+	TEST_ASSERT(ret == -1 && errno == ENXIO,
+		    KVM_IOCTL_ERROR(KVM_GET_DEVICE_ATTR, ret));
+
+	/* Assert that setting FIXED_COUNTERS_ONLY succeeds. */
+	vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+	/* Assert that getting FIXED_COUNTERS_ONLY succeeds. */
+	vcpu_device_attr_get(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+	/*
+	 * Assert that setting FIXED_COUNTERS_ONLY again succeeds even if an
+	 * event filter has already been set.
+	 */
+	vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_FILTER, &filter);
+
+	vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+	destroy_vpmu_vm();
+
+	create_vpmu_vm(guest_code);
+
+	/*
+	 * Assert that setting FIXED_COUNTERS_ONLY results in EBUSY if an event
+	 * filter has already been set while FIXED_COUNTERS_ONLY has not.
+	 */
+	vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_FILTER, &filter);
+
+	ret = __vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+				     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+	TEST_ASSERT(ret == -1 && errno == EBUSY,
+		    KVM_IOCTL_ERROR(KVM_GET_DEVICE_ATTR, ret));
+
+	destroy_vpmu_vm();
+
+	/*
+	 * Assert that setting FIXED_COUNTERS_ONLY after running a VCPU results
+	 * in EBUSY.
+	 */
+	vm = vm_create(2);
+	vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
+	init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3);
+	running_vcpu = aarch64_vcpu_add(vm, 0, &init, guest_code);
+	stopped_vcpu = aarch64_vcpu_add(vm, 1, &init, guest_code);
+	kvm_arch_vm_finalize_vcpus(vm);
+	vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_IRQ, &irq);
+	vcpu_device_attr_set(running_vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+			     KVM_ARM_VCPU_PMU_V3_INIT, NULL);
+	vcpu_run(running_vcpu);
+
+	ret = __vcpu_device_attr_set(stopped_vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+				     KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+	TEST_ASSERT(ret == -1 && errno == EBUSY,
+		    KVM_IOCTL_ERROR(KVM_GET_DEVICE_ATTR, ret));
+
+	kvm_vm_free(vm);
+
+	test_config(0, true);
+}
+
+int main(void)
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
+	TEST_REQUIRE(kvm_supports_vgic_v3());
+	TEST_REQUIRE(kvm_supports_nr_counters_attr());
+
+	test_config(get_pmcr_n_limit(), false);
+	test_fixed_counters_only();
 
 	return 0;
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-25  4:31 ` [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY Akihiko Odaki
@ 2026-02-26 11:54   ` Oliver Upton
  2026-02-26 14:43     ` Akihiko Odaki
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Upton @ 2026-02-26 11:54 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
	linux-kselftest

Hi Akihiko,

On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		kvm_vcpu_load_vhe(vcpu);
>  	kvm_arch_vcpu_load_fp(vcpu);
>  	kvm_vcpu_pmu_restore_guest(vcpu);
> +	if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
> +		kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);

We only need to set the request if the vCPU has migrated to a different
PMU implementation, no?

>  	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>  		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>  
> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>  			kvm_vcpu_reload_pmu(vcpu);
>  
> +		if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
> +			kvm_vcpu_create_pmu(vcpu);
> +

My strong preference would be to squash the migration handling into
kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
response to other things.

>  		if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
>  			kvm_vcpu_pmu_restore_guest(vcpu);
>  
> @@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
>  	 * When the vCPU has a PMU, but no PMU is set for the guest
>  	 * yet, set the default one.
>  	 */
> -	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
> +	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
> +	    !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
>  		ret = kvm_arm_set_default_pmu(kvm);

I'd rather just initialize it to a default than have to deal with the
field being sometimes null.

> -static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
> +static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> -	unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
> +	u64 mask = 0;
>  
> -	if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
> -		return false;
> +	if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
> +		mask |= kvm_pmu_hyp_counter_mask(vcpu);
>  
> -	if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
> -		return mdcr & MDCR_EL2_HPME;
> +	if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
> +		mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
>  
> -	return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
> +	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
> +}
> +
> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
> +{
> +	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> +
> +	return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>  }

You're churning a good bit of code, this needs to happen in a separate
patch (if at all).

> @@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>  	int eventsel;
>  	u64 evtreg;
>  
> +	if (!arm_pmu) {
> +		arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);

kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
want.

What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
work out the right PMU for this CPU?

> +		if (!arm_pmu) {
> +			vcpu_set_on_unsupported_cpu(vcpu);

At this point it seems pretty late to flag the CPU as unsupported. Maybe
instead we can compute the union cpumask for all the PMU implemetations
the VM may schedule on.

> @@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  		irq = vcpu->arch.pmu.irq_num;
>  		return put_user(irq, uaddr);
>  	}
> +	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
> +		lockdep_assert_held(&vcpu->kvm->arch.config_lock);
> +		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
> +			return 0;

We don't need a getter for this, userspace should remember how it
provisioned the VM.

Thanks,
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-26 11:54   ` Oliver Upton
@ 2026-02-26 14:43     ` Akihiko Odaki
  2026-02-26 14:47       ` Akihiko Odaki
  0 siblings, 1 reply; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-26 14:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
	linux-kselftest

On 2026/02/26 20:54, Oliver Upton wrote:
> Hi Akihiko,
> 
> On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
>> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>   		kvm_vcpu_load_vhe(vcpu);
>>   	kvm_arch_vcpu_load_fp(vcpu);
>>   	kvm_vcpu_pmu_restore_guest(vcpu);
>> +	if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>> +		kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
> 
> We only need to set the request if the vCPU has migrated to a different
> PMU implementation, no?

Indeed. I was too lazy to implement such a check since it won't affect 
performance unless the new feature is requested, but having one may be 
still nice.

> 
>>   	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>>   		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>   
>> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>>   			kvm_vcpu_reload_pmu(vcpu);
>>   
>> +		if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
>> +			kvm_vcpu_create_pmu(vcpu);
>> +
> 
> My strong preference would be to squash the migration handling into
> kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
> response to other things.

Can you share a reason for that?

In terms of complexity, I don't think it will help reducing complexity 
since the only common things between kvm_vcpu_reload_pmu() and 
kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is 
simple enough.

In terms of performance, I guess it is better to keep 
kvm_vcpu_create_pmu() small since it is triggered for each migration.

> 
>>   		if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
>>   			kvm_vcpu_pmu_restore_guest(vcpu);
>>   
>> @@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
>>   	 * When the vCPU has a PMU, but no PMU is set for the guest
>>   	 * yet, set the default one.
>>   	 */
>> -	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
>> +	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
>> +	    !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
>>   		ret = kvm_arm_set_default_pmu(kvm);
> 
> I'd rather just initialize it to a default than have to deal with the
> field being sometimes null.

I agree. I'll change this with the next version.

> 
>> -static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>> +static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
>>   {
>> -	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> -	unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
>> +	u64 mask = 0;
>>   
>> -	if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
>> -		return false;
>> +	if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
>> +		mask |= kvm_pmu_hyp_counter_mask(vcpu);
>>   
>> -	if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
>> -		return mdcr & MDCR_EL2_HPME;
>> +	if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
>> +		mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
>>   
>> -	return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
>> +	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
>> +}
>> +
>> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>> +{
>> +	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> +
>> +	return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>>   }
> 
> You're churning a good bit of code, this needs to happen in a separate
> patch (if at all).

It makes sense. The next version will have a separate patch for this.

> 
>> @@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>>   	int eventsel;
>>   	u64 evtreg;
>>   
>> +	if (!arm_pmu) {
>> +		arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
> 
> kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
> want.
> 
> What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
> work out the right PMU for this CPU?

Unfortunately perf does not seem to have a capability to switch to the 
right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool 
creates events for each PMU in a hybird configuration, for example.

> 
>> +		if (!arm_pmu) {
>> +			vcpu_set_on_unsupported_cpu(vcpu);
> 
> At this point it seems pretty late to flag the CPU as unsupported. Maybe
> instead we can compute the union cpumask for all the PMU implemetations
> the VM may schedule on.

This is just a safe guard and it is a responsibility of the userspace to 
schedule the VCPU properly. It is conceptually same with what 
kvm_arch_vcpu_load() does when migrating to an unsupported CPU.

> 
>> @@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>>   		irq = vcpu->arch.pmu.irq_num;
>>   		return put_user(irq, uaddr);
>>   	}
>> +	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> +		lockdep_assert_held(&vcpu->kvm->arch.config_lock);
>> +		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>> +			return 0;
> 
> We don't need a getter for this, userspace should remember how it
> provisioned the VM.

The getter is useful for debugging and testing. The selftest will use it 
to query the current state.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-26 14:43     ` Akihiko Odaki
@ 2026-02-26 14:47       ` Akihiko Odaki
  2026-02-26 23:05         ` Oliver Upton
  0 siblings, 1 reply; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-26 14:47 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
	linux-kselftest

On 2026/02/26 23:43, Akihiko Odaki wrote:
> On 2026/02/26 20:54, Oliver Upton wrote:
>> Hi Akihiko,
>>
>> On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
>>> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, 
>>> int cpu)
>>>           kvm_vcpu_load_vhe(vcpu);
>>>       kvm_arch_vcpu_load_fp(vcpu);
>>>       kvm_vcpu_pmu_restore_guest(vcpu);
>>> +    if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu- 
>>> >kvm->arch.flags))
>>> +        kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
>>
>> We only need to set the request if the vCPU has migrated to a different
>> PMU implementation, no?
> 
> Indeed. I was too lazy to implement such a check since it won't affect 
> performance unless the new feature is requested, but having one may be 
> still nice.
> 
>>
>>>       if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>>>           kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu 
>>> *vcpu)
>>>           if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>>>               kvm_vcpu_reload_pmu(vcpu);
>>> +        if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
>>> +            kvm_vcpu_create_pmu(vcpu);
>>> +
>>
>> My strong preference would be to squash the migration handling into
>> kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
>> response to other things.
> 
> Can you share a reason for that?
> 
> In terms of complexity, I don't think it will help reducing complexity 
> since the only common things between kvm_vcpu_reload_pmu() and 
> kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is 
> simple enough.
> 
> In terms of performance, I guess it is better to keep 
> kvm_vcpu_create_pmu() small since it is triggered for each migration.
> 
>>
>>>           if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
>>>               kvm_vcpu_pmu_restore_guest(vcpu);
>>> @@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
>>>        * When the vCPU has a PMU, but no PMU is set for the guest
>>>        * yet, set the default one.
>>>        */
>>> -    if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
>>> +    if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
>>> +        !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm- 
>>> >arch.flags))
>>>           ret = kvm_arm_set_default_pmu(kvm);
>>
>> I'd rather just initialize it to a default than have to deal with the
>> field being sometimes null.
> 
> I agree. I'll change this with the next version.
> 
>>
>>> -static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>>> +static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
>>>   {
>>> -    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>>> -    unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
>>> +    u64 mask = 0;
>>> -    if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
>>> -        return false;
>>> +    if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
>>> +        mask |= kvm_pmu_hyp_counter_mask(vcpu);
>>> -    if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
>>> -        return mdcr & MDCR_EL2_HPME;
>>> +    if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
>>> +        mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
>>> -    return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
>>> +    return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
>>> +}
>>> +
>>> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>>> +{
>>> +    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>>> +
>>> +    return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>>>   }
>>
>> You're churning a good bit of code, this needs to happen in a separate
>> patch (if at all).
> 
> It makes sense. The next version will have a separate patch for this.
> 
>>
>>> @@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct 
>>> kvm_pmc *pmc)
>>>       int eventsel;
>>>       u64 evtreg;
>>> +    if (!arm_pmu) {
>>> +        arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
>>
>> kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
>> want.
>>
>> What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
>> work out the right PMU for this CPU?
> 
> Unfortunately perf does not seem to have a capability to switch to the 
> right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool 
> creates events for each PMU in a hybird configuration, for example.

I think I misunderstood what you meant. Letting 
perf_event_create_kernel_counter() to figure out what a PMU to use may 
be a good idea. I'll give a try with the next version.

> 
>>
>>> +        if (!arm_pmu) {
>>> +            vcpu_set_on_unsupported_cpu(vcpu);
>>
>> At this point it seems pretty late to flag the CPU as unsupported. Maybe
>> instead we can compute the union cpumask for all the PMU implemetations
>> the VM may schedule on.
> 
> This is just a safe guard and it is a responsibility of the userspace to 
> schedule the VCPU properly. It is conceptually same with what 
> kvm_arch_vcpu_load() does when migrating to an unsupported CPU.
> 
>>
>>> @@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu 
>>> *vcpu, struct kvm_device_attr *attr)
>>>           irq = vcpu->arch.pmu.irq_num;
>>>           return put_user(irq, uaddr);
>>>       }
>>> +    case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>>> +        lockdep_assert_held(&vcpu->kvm->arch.config_lock);
>>> +        if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, 
>>> &vcpu->kvm->arch.flags))
>>> +            return 0;
>>
>> We don't need a getter for this, userspace should remember how it
>> provisioned the VM.
> 
> The getter is useful for debugging and testing. The selftest will use it 
> to query the current state.
> 
> Regards,
> Akihiko Odaki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-26 14:47       ` Akihiko Odaki
@ 2026-02-26 23:05         ` Oliver Upton
  2026-02-27  9:34           ` Akihiko Odaki
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Upton @ 2026-02-26 23:05 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
	linux-kselftest

On Thu, Feb 26, 2026 at 11:47:54PM +0900, Akihiko Odaki wrote:
> On 2026/02/26 23:43, Akihiko Odaki wrote:
> > On 2026/02/26 20:54, Oliver Upton wrote:
> > > Hi Akihiko,
> > > 
> > > On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
> > > > @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
> > > > *vcpu, int cpu)
> > > >           kvm_vcpu_load_vhe(vcpu);
> > > >       kvm_arch_vcpu_load_fp(vcpu);
> > > >       kvm_vcpu_pmu_restore_guest(vcpu);
> > > > +    if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> > > > &vcpu- >kvm->arch.flags))
> > > > +        kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
> > > 
> > > We only need to set the request if the vCPU has migrated to a different
> > > PMU implementation, no?
> > 
> > Indeed. I was too lazy to implement such a check since it won't affect
> > performance unless the new feature is requested, but having one may be
> > still nice.

I'd definitely like to see this.

> > > 
> > > >       if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
> > > >           kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
> > > > @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct
> > > > kvm_vcpu *vcpu)
> > > >           if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
> > > >               kvm_vcpu_reload_pmu(vcpu);
> > > > +        if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
> > > > +            kvm_vcpu_create_pmu(vcpu);
> > > > +
> > > 
> > > My strong preference would be to squash the migration handling into
> > > kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
> > > response to other things.
> > 
> > Can you share a reason for that?
> > 
> > In terms of complexity, I don't think it will help reducing complexity
> > since the only common things between kvm_vcpu_reload_pmu() and
> > kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is
> > simple enough.

I prefer it in terms of code organization. We should have a single
helper that refreshes the backing perf events when something has
globally changed for the vPMU.

Besides this, "create" is confusing since the vPMU has already been
instantiated.

> > In terms of performance, I guess it is better to keep
> > kvm_vcpu_create_pmu() small since it is triggered for each migration.

I think the surrounding KVM code for iterating over the counters is
inconsequential compared to the overheads of calling into perf to
recreate the PMU events. Since we expect this to be slow, we should only
set the request when absolutely necessary.

> > > > +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
> > > > +{
> > > > +    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> > > > +
> > > > +    return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
> > > >   }
> > > 
> > > You're churning a good bit of code, this needs to happen in a separate
> > > patch (if at all).
> > 
> > It makes sense. The next version will have a separate patch for this.

If I have the full picture right, you may not need it with a common
request handler.

> > > 
> > > > @@ -689,6 +710,14 @@ static void
> > > > kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
> > > >       int eventsel;
> > > >       u64 evtreg;
> > > > +    if (!arm_pmu) {
> > > > +        arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
> > > 
> > > kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
> > > want.
> > > 
> > > What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
> > > work out the right PMU for this CPU?
> > 
> > Unfortunately perf does not seem to have a capability to switch to the
> > right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool
> > creates events for each PMU in a hybird configuration, for example.
> 
> I think I misunderstood what you meant. Letting
> perf_event_create_kernel_counter() to figure out what a PMU to use may be a
> good idea. I'll give a try with the next version.

Yep, this is what I was alluding to.

> > 
> > > 
> > > > +        if (!arm_pmu) {
> > > > +            vcpu_set_on_unsupported_cpu(vcpu);
> > > 
> > > At this point it seems pretty late to flag the CPU as unsupported. Maybe
> > > instead we can compute the union cpumask for all the PMU implemetations
> > > the VM may schedule on.
> > 
> > This is just a safe guard and it is a responsibility of the userspace to
> > schedule the VCPU properly. It is conceptually same with what
> > kvm_arch_vcpu_load() does when migrating to an unsupported CPU.

I agree with you that we need to have some handling for this situation.

What I don't like about this is userspace doesn't discover its mistake
until the guest actually programs a PMC. I'd much rather preserve the
existing ABI where KVM proactively rejects running a vCPU on an
unsupported CPU.

> > > > +    case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
> > > > +        lockdep_assert_held(&vcpu->kvm->arch.config_lock);
> > > > +        if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> > > > &vcpu->kvm->arch.flags))
> > > > +            return 0;
> > > 
> > > We don't need a getter for this, userspace should remember how it
> > > provisioned the VM.
> > 
> > The getter is useful for debugging and testing. The selftest will use it
> > to query the current state.

That's fine for debugging this on your own kernel but we don't need it
upstream. There's several other vPMU attributes that are write-only,
like KVM_ARM_VCPU_PMU_V3_SET_PMU.

Thanks,
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
  2026-02-26 23:05         ` Oliver Upton
@ 2026-02-27  9:34           ` Akihiko Odaki
  0 siblings, 0 replies; 8+ messages in thread
From: Akihiko Odaki @ 2026-02-27  9:34 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
	linux-kselftest



On 2026/02/27 8:05, Oliver Upton wrote:
> On Thu, Feb 26, 2026 at 11:47:54PM +0900, Akihiko Odaki wrote:
>> On 2026/02/26 23:43, Akihiko Odaki wrote:
>>> On 2026/02/26 20:54, Oliver Upton wrote:
>>>> Hi Akihiko,
>>>>
>>>> On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
>>>>> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
>>>>> *vcpu, int cpu)
>>>>>            kvm_vcpu_load_vhe(vcpu);
>>>>>        kvm_arch_vcpu_load_fp(vcpu);
>>>>>        kvm_vcpu_pmu_restore_guest(vcpu);
>>>>> +    if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>>>>> &vcpu- >kvm->arch.flags))
>>>>> +        kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
>>>>
>>>> We only need to set the request if the vCPU has migrated to a different
>>>> PMU implementation, no?
>>>
>>> Indeed. I was too lazy to implement such a check since it won't affect
>>> performance unless the new feature is requested, but having one may be
>>> still nice.
> 
> I'd definitely like to see this.
> 
>>>>
>>>>>        if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>>>>>            kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>>>> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct
>>>>> kvm_vcpu *vcpu)
>>>>>            if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>>>>>                kvm_vcpu_reload_pmu(vcpu);
>>>>> +        if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
>>>>> +            kvm_vcpu_create_pmu(vcpu);
>>>>> +
>>>>
>>>> My strong preference would be to squash the migration handling into
>>>> kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
>>>> response to other things.
>>>
>>> Can you share a reason for that?
>>>
>>> In terms of complexity, I don't think it will help reducing complexity
>>> since the only common things between kvm_vcpu_reload_pmu() and
>>> kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is
>>> simple enough.
> 
> I prefer it in terms of code organization. We should have a single
> helper that refreshes the backing perf events when something has
> globally changed for the vPMU.
> 
> Besides this, "create" is confusing since the vPMU has already been
> instantiated.
> 
>>> In terms of performance, I guess it is better to keep
>>> kvm_vcpu_create_pmu() small since it is triggered for each migration.
> 
> I think the surrounding KVM code for iterating over the counters is
> inconsequential compared to the overheads of calling into perf to
> recreate the PMU events. Since we expect this to be slow, we should only
> set the request when absolutely necessary.

I see. I'll squash it into kvm_vcpu_reload_pmu().

> 
>>>>> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>>>>> +{
>>>>> +    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>>>>> +
>>>>> +    return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>>>>>    }
>>>>
>>>> You're churning a good bit of code, this needs to happen in a separate
>>>> patch (if at all).
>>>
>>> It makes sense. The next version will have a separate patch for this.
> 
> If I have the full picture right, you may not need it with a common
> request handler.

I think I'm going to use it to check if the vCPU is covered by the perf 
events currently enabled before requesting KVM_REQ_RELOAD_PMU.

> 
>>>>
>>>>> @@ -689,6 +710,14 @@ static void
>>>>> kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>>>>>        int eventsel;
>>>>>        u64 evtreg;
>>>>> +    if (!arm_pmu) {
>>>>> +        arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
>>>>
>>>> kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
>>>> want.
>>>>
>>>> What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
>>>> work out the right PMU for this CPU?
>>>
>>> Unfortunately perf does not seem to have a capability to switch to the
>>> right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool
>>> creates events for each PMU in a hybird configuration, for example.
>>
>> I think I misunderstood what you meant. Letting
>> perf_event_create_kernel_counter() to figure out what a PMU to use may be a
>> good idea. I'll give a try with the next version.
> 
> Yep, this is what I was alluding to.

I tried this, but unfortunately it didn't work well. Simply using 
PERF_TYPE_RAW let perf_event_create_kernel_counter() choose an arbitrary 
PMU, potentially not covering the current PCPU.

We can change the cpu parameter of the function to fix this, but it 
binds the perf event to that particular PCPU and requires recreating 
perf event when migrating to another PCPU covered by the same PMU.

I think I'm going to use RCU to avoid locking a global mutex.

> 
>>>
>>>>
>>>>> +        if (!arm_pmu) {
>>>>> +            vcpu_set_on_unsupported_cpu(vcpu);
>>>>
>>>> At this point it seems pretty late to flag the CPU as unsupported. Maybe
>>>> instead we can compute the union cpumask for all the PMU implemetations
>>>> the VM may schedule on.
>>>
>>> This is just a safe guard and it is a responsibility of the userspace to
>>> schedule the VCPU properly. It is conceptually same with what
>>> kvm_arch_vcpu_load() does when migrating to an unsupported CPU.
> 
> I agree with you that we need to have some handling for this situation.
> 
> What I don't like about this is userspace doesn't discover its mistake
> until the guest actually programs a PMC. I'd much rather preserve the
> existing ABI where KVM proactively rejects running a vCPU on an
> unsupported CPU.

Thanks for explanation. I'll change this with the next version.

> 
>>>>> +    case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>>>>> +        lockdep_assert_held(&vcpu->kvm->arch.config_lock);
>>>>> +        if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>>>>> &vcpu->kvm->arch.flags))
>>>>> +            return 0;
>>>>
>>>> We don't need a getter for this, userspace should remember how it
>>>> provisioned the VM.
>>>
>>> The getter is useful for debugging and testing. The selftest will use it
>>> to query the current state.
> 
> That's fine for debugging this on your own kernel but we don't need it
> upstream. There's several other vPMU attributes that are write-only,
> like KVM_ARM_VCPU_PMU_V3_SET_PMU.

Not just for debugging kernel, but it will be useful for userspace 
debugging.

Indeed there are other readonly attributes, but there is also an 
attribute with getter: KVM_ARM_VCPU_PMU_V3_IRQ. I think there are more 
if you look at a scope broader than the KVM_ARM_VCPU_PMU_V3_CTRL group, 
and such existing getters for read/write attributes are probably only 
useful for kernel/userspace debugging either. I think having a getter 
can be justified, given that these preexisting examples.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-27  9:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25  4:31 [PATCH v3 0/2] KVM: arm64: PMU: Use multiple host PMUs Akihiko Odaki
2026-02-25  4:31 ` [PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY Akihiko Odaki
2026-02-26 11:54   ` Oliver Upton
2026-02-26 14:43     ` Akihiko Odaki
2026-02-26 14:47       ` Akihiko Odaki
2026-02-26 23:05         ` Oliver Upton
2026-02-27  9:34           ` Akihiko Odaki
2026-02-25  4:31 ` [PATCH v3 2/2] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY Akihiko Odaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox