Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v8 00/21] ARM64 PMU Partitioning
@ 2026-06-12 19:28 Colton Lewis
  2026-06-12 19:28 ` [PATCH 01/21] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Colton Lewis @ 2026-06-12 19:28 UTC (permalink / raw)
  To: kvm
  Cc: Alexandru Elisei, Paolo Bonzini, Jonathan Corbet, Russell King,
	Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Mingwei Zhang, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Mark Rutland, Shuah Khan, Ganapatrao Kulkarni, James Clark,
	linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
	linux-perf-users, linux-kselftest, Colton Lewis

This series creates a new PMU scheme on ARM, a partitioned PMU that
allows reserving a subset of counters for more direct guest access,
significantly reducing overhead. More details, including performance
benchmarks, can be read in the v1 cover letter linked below.

An overview of what this series accomplishes was presented at KVM
Forum 2025. Slides [1] and video [2] are linked below.

The kernel command line parameter for the driver still exists, but now
only defines an upper limit of counters the guest might use rather
than taking those counters from the host permanently.

I would appreciate any discussion on whether that parameter should
still exist as it's an inconvenient enabling gate on the feature that
is no longer required. The question comes down to what, if any, guards
we want against a guest monopolizing all counters on a system.

v8:

* Rebase on top of v7.1-rc7.

* Implement Oliver Upton's accessor proposal to centralize PMU
  register access and simplify trap handlers. Instead of one singular
  accessor, implement as two because the read and write paths are
  always different anyway.

* Introduce the partitioning flag along with the
  kvm_pmu_is_partitioned predicate

* Don't use ifdef for partitioning predicates as that can be handled
  by has_vhe

* Clean up MDCR_EL2 handling by open-coding use_fgt and hpmn and
  unconditionally setting RES0 bits.

* Use {read,write}_pmcrcntrn in context swaps

* Put operators on preceeding lines

* Rename hw_cntr_mask to hw_cntr_impl to clarify it tracks the number
  of counters implemented by hardware

* Use GENMASK_ULL in mask functions returning u64

* warn_once when host events are squeezed out by guest counter
  allocations.

* Address Sashiko AI Review findings:

  - Critical fixes for lazy PMU context swaps (ensuring guest state is
    loaded on transition to GUEST_OWNED), PMSELR_EL0 trapping to
    prevent stale selector index, and masking guest PMCR_EL0 writes to
    prevent host reset.

  - High priority fixes for lock safety (disabling IRQs when acquiring
    perf context lock), disabling guest counters on vCPU put,
    preserving VHE host profiling in MDCR_EL2, waking halted vCPUs on
    guest PMU interrupts, masking host configuration leaks, preemption
    safety in per-CPU accesses, emulating PMCR.N reads, and preventing
    data races in PMOVSSET_EL0 accesses.

  - Medium/Low fixes for user-access fallback safety, VM-wide state
    modification restrictions, selftests type safety, and cleanup of
    unused fields and typos.

v7:
https://lore.kernel.org/kvmarm/20260504211813.1804997-1-coltonlewis@google.com/

v6:
https://lore.kernel.org/kvmarm/20260209221414.2169465-1-coltonlewis@google.com/

v5:
https://lore.kernel.org/kvmarm/20251209205121.1871534-1-coltonlewis@google.com/

v4:
https://lore.kernel.org/kvmarm/20250714225917.1396543-1-coltonlewis@google.com/

v3:
https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/

v2:
https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/

v1:
https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/

[1] https://gitlab.com/qemu-project/kvm-forum/-/raw/main/_attachments/2025/Optimizing__itvHkhc.pdf
[2] https://www.youtube.com/watch?v=YRzZ8jMIA6M&list=PLW3ep1uCIRfxwmllXTOA2txfDWN6vUOHp&index=9

Colton Lewis (20):
  arm64: cpufeature: Add cpucap for HPMN0
  KVM: arm64: Reorganize PMU functions
  perf: arm_pmuv3: Generalize counter bitmasks
  perf: arm_pmuv3: Check cntr_mask before using pmccntr
  perf: arm_pmuv3: Allocate counter indices from high to low
  perf: arm_pmuv3: Add method to partition the PMU
  KVM: arm64: Set up FGT for Partitioned PMU
  KVM: arm64: Add Partitioned PMU register trap handlers
  KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
  KVM: arm64: Context swap Partitioned PMU guest registers
  KVM: arm64: Enforce PMU event filter at vcpu_load()
  perf: Add perf_pmu_resched_update()
  KVM: arm64: Apply dynamic guest counter reservations
  KVM: arm64: Implement lazy PMU context swaps
  perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
  KVM: arm64: Detect overflows for the Partitioned PMU
  KVM: arm64: Add vCPU device attr to partition the PMU
  KVM: selftests: Add find_bit to KVM library
  KVM: arm64: selftests: Add test case for Partitioned PMU
  KVM: arm64: selftests: Relax testing for exceptions when partitioned

Marc Zyngier (1):
  KVM: arm64: Reorganize PMU includes

 arch/arm/include/asm/arm_pmuv3.h              |  18 +
 arch/arm64/include/asm/arm_pmuv3.h            |  12 +-
 arch/arm64/include/asm/kvm_host.h             |  17 +-
 arch/arm64/include/asm/kvm_types.h            |   6 +-
 arch/arm64/include/uapi/asm/kvm.h             |   2 +
 arch/arm64/kernel/cpufeature.c                |  10 +-
 arch/arm64/kvm/Makefile                       |   2 +-
 arch/arm64/kvm/arm.c                          |   2 +
 arch/arm64/kvm/config.c                       |  41 +-
 arch/arm64/kvm/debug.c                        |  30 +-
 arch/arm64/kvm/pmu-direct.c                   | 507 ++++++++++++
 arch/arm64/kvm/pmu-emul.c                     | 684 +----------------
 arch/arm64/kvm/pmu.c                          | 720 ++++++++++++++++++
 arch/arm64/kvm/sys_regs.c                     | 271 +++++--
 arch/arm64/tools/cpucaps                      |   1 +
 arch/arm64/tools/sysreg                       |   6 +-
 drivers/perf/arm_pmuv3.c                      | 136 +++-
 include/kvm/arm_pmu.h                         |  93 ++-
 include/linux/perf/arm_pmu.h                  |   8 +
 include/linux/perf/arm_pmuv3.h                |  14 +-
 include/linux/perf_event.h                    |   3 +
 kernel/events/core.c                          |  31 +-
 tools/include/perf/arm_pmuv3.h                |  12 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/arm64/vpmu_counter_access.c | 112 ++-
 tools/testing/selftests/kvm/lib/find_bit.c    |   2 +
 26 files changed, 1918 insertions(+), 823 deletions(-)
 create mode 100644 arch/arm64/kvm/pmu-direct.c
 create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c


base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2026-06-12 20:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 19:28 [PATCH v8 00/21] ARM64 PMU Partitioning Colton Lewis
2026-06-12 19:28 ` [PATCH 01/21] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
2026-06-12 19:28 ` [PATCH 02/21] KVM: arm64: Reorganize PMU includes Colton Lewis
2026-06-12 19:28 ` [PATCH 03/21] KVM: arm64: Reorganize PMU functions Colton Lewis
2026-06-12 19:56   ` sashiko-bot
2026-06-12 19:28 ` [PATCH 04/21] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
2026-06-12 19:28 ` [PATCH 05/21] perf: arm_pmuv3: Check cntr_mask before using pmccntr Colton Lewis
2026-06-12 19:42   ` sashiko-bot
2026-06-12 19:28 ` [PATCH 06/21] perf: arm_pmuv3: Allocate counter indices from high to low Colton Lewis
2026-06-12 19:28 ` [PATCH 07/21] perf: arm_pmuv3: Add method to partition the PMU Colton Lewis
2026-06-12 19:28 ` [PATCH 08/21] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
2026-06-12 19:45   ` sashiko-bot
2026-06-12 19:28 ` [PATCH 09/21] KVM: arm64: Add Partitioned PMU register trap handlers Colton Lewis
2026-06-12 19:51   ` sashiko-bot
2026-06-12 19:28 ` [PATCH 10/21] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU Colton Lewis
2026-06-12 19:52   ` sashiko-bot
2026-06-12 19:28 ` [PATCH 11/21] KVM: arm64: Context swap Partitioned PMU guest registers Colton Lewis
2026-06-12 19:51   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 12/21] KVM: arm64: Enforce PMU event filter at vcpu_load() Colton Lewis
2026-06-12 19:53   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 13/21] perf: Add perf_pmu_resched_update() Colton Lewis
2026-06-12 19:29 ` [PATCH 14/21] KVM: arm64: Apply dynamic guest counter reservations Colton Lewis
2026-06-12 19:50   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 15/21] KVM: arm64: Implement lazy PMU context swaps Colton Lewis
2026-06-12 19:50   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 16/21] perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
2026-06-12 19:57   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 17/21] KVM: arm64: Detect overflows for the Partitioned PMU Colton Lewis
2026-06-12 19:58   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 18/21] KVM: arm64: Add vCPU device attr to partition the PMU Colton Lewis
2026-06-12 19:54   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 19/21] KVM: selftests: Add find_bit to KVM library Colton Lewis
2026-06-12 20:01   ` sashiko-bot
2026-06-12 19:29 ` [PATCH 20/21] KVM: arm64: selftests: Add test case for Partitioned PMU Colton Lewis
2026-06-12 19:29 ` [PATCH 21/21] KVM: arm64: selftests: Relax testing for exceptions when partitioned Colton Lewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox