* [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter
@ 2024-06-26 22:32 Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold Rob Herring (Arm)
` (12 more replies)
0 siblings, 13 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
This series adds support for the optional fixed instruction counter
added in Armv9.4 PMU. Most of the series is a refactoring to remove the
index to counter number conversion which dates back to the Armv7 PMU
driver. Removing it is necessary in order to support more than 32
counters without a bunch of conditional code further complicating the
conversion.
Patches 1-2 are a fix and cleanup for the threshold support. Patch 1 is
a dependency of patch 12.
Patches 3-4 move the 32-bit Arm PMU drivers into drivers/perf/ and drop
non-DT probe support. These can be taken first if there's no comments on
them.
Patch 5 is new to v2 and implements the common pattern of the linux/
header including the asm/ header of the same name.
Patch 6 changes struct arm_pmu.num_events to a bitmap of events, and
updates all the users. This removes the index to counter conversion
on the PMUv3 and Armv7 drivers.
Patch 7 updates various register accessors to use 64-bit values matching
the register size.
Patches 8-9 update KVM PMU register accesses to use shared accessors
from asm/arm_pmuv3.h.
Patches 10-11 rework KVM and perf PMU defines for counter indexes and
number of counters.
Patch 12 finally adds support for the fixed instruction counter.
I tested this on FVP with VHE host and a guest. I tested the Armv7 PMU
changes with QEMU.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
Changes in v2:
- Include threshold fix patches and account for threshold support in
counter assignment.
- Add patch including asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
- Fix compile error for Apple PMU
- Minor review comments detailed in individual patches
- Link to v1: https://lore.kernel.org/r/20240607-arm-pmu-3-9-icntr-v1-0-c7bd2dceff3b@kernel.org
---
Rob Herring (Arm) (12):
perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
perf/arm: Move 32-bit PMU drivers to drivers/perf/
perf: arm_v6/7_pmu: Drop non-DT probe support
perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
perf: arm_pmu: Remove event index to counter remapping
perf: arm_pmuv3: Prepare for more than 32 counters
KVM: arm64: pmu: Use arm_pmuv3.h register accessors
KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access
arm64: perf/kvm: Use a common PMU cycle counter define
KVM: arm64: Refine PMU defines for number of counters
perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
arch/arm/include/asm/arm_pmuv3.h | 20 +++
arch/arm/kernel/Makefile | 2 -
arch/arm64/include/asm/arm_pmuv3.h | 55 +++++++-
arch/arm64/include/asm/kvm_host.h | 8 +-
arch/arm64/include/asm/sysreg.h | 1 -
arch/arm64/kvm/pmu-emul.c | 15 +-
arch/arm64/kvm/pmu.c | 87 +++---------
arch/arm64/kvm/sys_regs.c | 11 +-
arch/arm64/tools/sysreg | 30 ++++
drivers/perf/Kconfig | 12 ++
drivers/perf/Makefile | 3 +
drivers/perf/apple_m1_cpu_pmu.c | 4 +-
drivers/perf/arm_pmu.c | 11 +-
drivers/perf/arm_pmuv3.c | 154 +++++++++++----------
.../perf_event_v6.c => drivers/perf/arm_v6_pmu.c | 26 +---
.../perf_event_v7.c => drivers/perf/arm_v7_pmu.c | 90 ++++--------
.../perf/arm_xscale_pmu.c | 15 +-
include/kvm/arm_pmu.h | 8 +-
include/linux/perf/arm_pmu.h | 10 +-
include/linux/perf/arm_pmuv3.h | 11 +-
20 files changed, 301 insertions(+), 272 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240607-arm-pmu-3-9-icntr-04375ddd0082
Best regards,
--
Rob Herring (Arm) <robh@kernel.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-07-01 17:09 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check Rob Herring (Arm)
` (11 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
If the user has requested a counting threshold for the CPU cycles event,
then the fixed cycle counter can't be assigned as it lacks threshold
support. Currently, the thresholds will work or not randomly depending
on which counter the event is assigned.
While using thresholds for CPU cycles doesn't make much sense, it can be
useful for testing purposes.
Fixes: 816c26754447 ("arm64: perf: Add support for event counting threshold")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
This should go to 6.10 and stable. It is also a dependency for ICNTR
support.
v2:
- Add and use armv8pmu_event_get_threshold() helper.
v1: https://lore.kernel.org/all/20240611155012.2286044-1-robh@kernel.org/
---
drivers/perf/arm_pmuv3.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 23fa6c5da82c..8ed5c3358920 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -338,6 +338,11 @@ static bool armv8pmu_event_want_user_access(struct perf_event *event)
return ATTR_CFG_GET_FLD(&event->attr, rdpmc);
}
+static u32 armv8pmu_event_get_threshold(struct perf_event_attr *attr)
+{
+ return ATTR_CFG_GET_FLD(attr, threshold);
+}
+
static u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
{
u8 th_compare = ATTR_CFG_GET_FLD(attr, threshold_compare);
@@ -941,7 +946,8 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
/* Always prefer to place a cycle counter into the cycle counter. */
- if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
+ if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
+ !armv8pmu_event_get_threshold(&event->attr)) {
if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
return ARMV8_IDX_CYCLE_COUNTER;
else if (armv8pmu_event_is_64bit(event) &&
@@ -1033,7 +1039,7 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
* If FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will
* be 0 and will also trigger this check, preventing it from being used.
*/
- th = ATTR_CFG_GET_FLD(attr, threshold);
+ th = armv8pmu_event_get_threshold(attr);
if (th > threshold_max(cpu_pmu)) {
pr_debug("PMU event threshold exceeds max value\n");
return -EINVAL;
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-07-01 17:11 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/ Rob Herring (Arm)
` (10 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
The IS_ENABLED(CONFIG_ARM64) check for threshold support is unnecessary.
The purpose is to not enable thresholds on arm32, but if threshold is
non-zero, the check against threshold_max() just above here will have
errored out because threshold_max() is always 0 on arm32.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- new patch
---
drivers/perf/arm_pmuv3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8ed5c3358920..3e51cd7062b9 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1045,7 +1045,7 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
return -EINVAL;
}
- if (IS_ENABLED(CONFIG_ARM64) && th) {
+ if (th) {
config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
armv8pmu_event_threshold_control(attr));
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 04/12] perf: arm_v6/7_pmu: Drop non-DT probe support Rob Herring (Arm)
` (9 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
It is preferred to put drivers under drivers/ rather than under arch/.
The PMU drivers also depend on arm_pmu.c, so it's better to place them
all together.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
arch/arm/kernel/Makefile | 2 --
drivers/perf/Kconfig | 12 ++++++++++++
drivers/perf/Makefile | 3 +++
arch/arm/kernel/perf_event_v6.c => drivers/perf/arm_v6_pmu.c | 3 ---
arch/arm/kernel/perf_event_v7.c => drivers/perf/arm_v7_pmu.c | 3 ---
.../perf_event_xscale.c => drivers/perf/arm_xscale_pmu.c | 3 ---
6 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89a77e3f51d2..aaae31b8c4a5 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -78,8 +78,6 @@ obj-$(CONFIG_CPU_XSC3) += xscale-cp0.o
obj-$(CONFIG_CPU_MOHAWK) += xscale-cp0.o
obj-$(CONFIG_IWMMXT) += iwmmxt.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event_xscale.o perf_event_v6.o \
- perf_event_v7.o
AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt
obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o
obj-$(CONFIG_VDSO) += vdso.o
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 7526a9e714fa..aa9530b4064f 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -56,6 +56,18 @@ config ARM_PMU
Say y if you want to use CPU performance monitors on ARM-based
systems.
+config ARM_V6_PMU
+ depends on ARM_PMU && (CPU_V6 || CPU_V6K)
+ def_bool y
+
+config ARM_V7_PMU
+ depends on ARM_PMU && CPU_V7
+ def_bool y
+
+config ARM_XSCALE_PMU
+ depends on ARM_PMU && CPU_XSCALE
+ def_bool y
+
config RISCV_PMU
depends on RISCV
bool "RISC-V PMU framework"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 29b1c28203ef..d43df81d52f7 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
obj-$(CONFIG_ARM_PMUV3) += arm_pmuv3.o
+obj-$(CONFIG_ARM_V6_PMU) += arm_v6_pmu.o
+obj-$(CONFIG_ARM_V7_PMU) += arm_v7_pmu.o
+obj-$(CONFIG_ARM_XSCALE_PMU) += arm_xscale_pmu.o
obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
obj-$(CONFIG_FSL_IMX8_DDR_PMU) += fsl_imx8_ddr_perf.o
obj-$(CONFIG_FSL_IMX9_DDR_PMU) += fsl_imx9_ddr_perf.o
diff --git a/arch/arm/kernel/perf_event_v6.c b/drivers/perf/arm_v6_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_v6.c
rename to drivers/perf/arm_v6_pmu.c
index d9fd53841591..f7593843bb85 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -31,8 +31,6 @@
* enable the interrupt.
*/
-#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
-
#include <asm/cputype.h>
#include <asm/irq_regs.h>
@@ -445,4 +443,3 @@ static struct platform_driver armv6_pmu_driver = {
};
builtin_platform_driver(armv6_pmu_driver);
-#endif /* CONFIG_CPU_V6 || CONFIG_CPU_V6K */
diff --git a/arch/arm/kernel/perf_event_v7.c b/drivers/perf/arm_v7_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_v7.c
rename to drivers/perf/arm_v7_pmu.c
index a3322e2b3ea4..fdd936fbd188 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -17,8 +17,6 @@
* counter and all 4 performance counters together can be reset separately.
*/
-#ifdef CONFIG_CPU_V7
-
#include <asm/cp15.h>
#include <asm/cputype.h>
#include <asm/irq_regs.h>
@@ -2002,4 +2000,3 @@ static struct platform_driver armv7_pmu_driver = {
};
builtin_platform_driver(armv7_pmu_driver);
-#endif /* CONFIG_CPU_V7 */
diff --git a/arch/arm/kernel/perf_event_xscale.c b/drivers/perf/arm_xscale_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_xscale.c
rename to drivers/perf/arm_xscale_pmu.c
index 7a2ba1c689a7..3d8b72d6b37f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/drivers/perf/arm_xscale_pmu.c
@@ -13,8 +13,6 @@
* PMU structures.
*/
-#ifdef CONFIG_CPU_XSCALE
-
#include <asm/cputype.h>
#include <asm/irq_regs.h>
@@ -745,4 +743,3 @@ static struct platform_driver xscale_pmu_driver = {
};
builtin_platform_driver(xscale_pmu_driver);
-#endif /* CONFIG_CPU_XSCALE */
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 04/12] perf: arm_v6/7_pmu: Drop non-DT probe support
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (2 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/ Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h Rob Herring (Arm)
` (8 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
There are no non-DT based PMU users for v6 or v7, so drop the custom
non-DT probe table. Unfortunately XScale still needs non-DT probing.
Note that this drops support for arm1156 PMU, but there are no arm1156
based systems supported in the kernel.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Note that XScale still needs non-DT probe
---
drivers/perf/arm_v6_pmu.c | 17 +----------------
drivers/perf/arm_v7_pmu.c | 10 +---------
2 files changed, 2 insertions(+), 25 deletions(-)
diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
index f7593843bb85..0bb685b4bac5 100644
--- a/drivers/perf/arm_v6_pmu.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -401,13 +401,6 @@ static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
return 0;
}
-static int armv6_1156_pmu_init(struct arm_pmu *cpu_pmu)
-{
- armv6pmu_init(cpu_pmu);
- cpu_pmu->name = "armv6_1156";
- return 0;
-}
-
static int armv6_1176_pmu_init(struct arm_pmu *cpu_pmu)
{
armv6pmu_init(cpu_pmu);
@@ -421,17 +414,9 @@ static const struct of_device_id armv6_pmu_of_device_ids[] = {
{ /* sentinel value */ }
};
-static const struct pmu_probe_info armv6_pmu_probe_table[] = {
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
- { /* sentinel value */ }
-};
-
static int armv6_pmu_device_probe(struct platform_device *pdev)
{
- return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids,
- armv6_pmu_probe_table);
+ return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids, NULL);
}
static struct platform_driver armv6_pmu_driver = {
diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index fdd936fbd188..928ac3d626ed 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -1977,17 +1977,9 @@ static const struct of_device_id armv7_pmu_of_device_ids[] = {
{},
};
-static const struct pmu_probe_info armv7_pmu_probe_table[] = {
- ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
- { /* sentinel value */ }
-};
-
-
static int armv7_pmu_device_probe(struct platform_device *pdev)
{
- return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids,
- armv7_pmu_probe_table);
+ return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids, NULL);
}
static struct platform_driver armv7_pmu_driver = {
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (3 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 04/12] perf: arm_v6/7_pmu: Drop non-DT probe support Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping Rob Herring (Arm)
` (7 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
The arm64 asm/arm_pmuv3.h depends on defines from
linux/perf/arm_pmuv3.h. Rather than depend on include order, follow the
usual pattern of "linux" headers including "asm" headers of the same
name.
With this change, the include of linux/kvm_host.h is problematic due to
circular includes:
In file included from ../arch/arm64/include/asm/arm_pmuv3.h:9,
from ../include/linux/perf/arm_pmuv3.h:312,
from ../include/kvm/arm_pmu.h:11,
from ../arch/arm64/include/asm/kvm_host.h:38,
from ../arch/arm64/mm/init.c:41:
../include/linux/kvm_host.h:383:30: error: field 'arch' has incomplete type
Switching to asm/kvm_host.h solves the issue.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- new patch
---
arch/arm64/include/asm/arm_pmuv3.h | 2 +-
arch/arm64/kvm/pmu-emul.c | 1 -
drivers/perf/arm_pmuv3.c | 2 --
include/linux/perf/arm_pmuv3.h | 2 ++
4 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index c27404fa4418..a4697a0b6835 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -6,7 +6,7 @@
#ifndef __ASM_PMUV3_H
#define __ASM_PMUV3_H
-#include <linux/kvm_host.h>
+#include <asm/kvm_host.h>
#include <asm/cpufeature.h>
#include <asm/sysreg.h>
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a35ce10e0a9f..d1a476b08f54 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -14,7 +14,6 @@
#include <asm/kvm_emulate.h>
#include <kvm/arm_pmu.h>
#include <kvm/arm_vgic.h>
-#include <asm/arm_pmuv3.h>
#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 3e51cd7062b9..6cbd37fd691a 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -25,8 +25,6 @@
#include <linux/smp.h>
#include <linux/nmi.h>
-#include <asm/arm_pmuv3.h>
-
/* ARMv8 Cortex-A53 specific event types. */
#define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 46377e134d67..7867db04ec98 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -309,4 +309,6 @@
} \
} while (0)
+#include <asm/arm_pmuv3.h>
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (4 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-27 11:05 ` Marc Zyngier
2024-07-01 17:06 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 07/12] perf: arm_pmuv3: Prepare for more than 32 counters Rob Herring (Arm)
` (6 subsequent siblings)
12 siblings, 2 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
later, this changed the cycle counter to 31 and event counters start at
0. The drivers for Armv7 and PMUv3 kept the old event index numbering
and introduced an event index to counter conversion. The conversion uses
masking to convert from event index to a counter number. This operation
relies on having at most 32 counters so that the cycle counter index 0
can be transformed to counter number 31.
Armv9.4 adds support for an additional fixed function counter
(instructions) which increases possible counters to more than 32, and
the conversion won't work anymore as a simple subtract and mask. The
primary reason for the translation (other than history) seems to be to
have a contiguous mask of counters 0-N. Keeping that would result in
more complicated index to counter conversions. Instead, store a mask of
available counters rather than just number of events. That provides more
information in addition to the number of events.
No (intended) functional changes.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Include Apple M1 PMU changes
- Use set_bit instead of bitmap_set(addr, bit, 1)
- Use for_each_andnot_bit() when clearing unused counters to avoid
accessing non-existent counters
- Use defines for XScale number of counters and
s/XSCALE_NUM_COUNTERS/XSCALE1_NUM_COUNTERS/
- Add and use define ARMV8_PMU_MAX_GENERAL_COUNTERS (copied from
tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c)
---
arch/arm64/kvm/pmu-emul.c | 6 ++--
drivers/perf/apple_m1_cpu_pmu.c | 4 +--
drivers/perf/arm_pmu.c | 11 +++---
drivers/perf/arm_pmuv3.c | 62 +++++++++++----------------------
drivers/perf/arm_v6_pmu.c | 6 ++--
drivers/perf/arm_v7_pmu.c | 77 ++++++++++++++++-------------------------
drivers/perf/arm_xscale_pmu.c | 12 ++++---
include/linux/perf/arm_pmu.h | 2 +-
include/linux/perf/arm_pmuv3.h | 1 +
9 files changed, 75 insertions(+), 106 deletions(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index d1a476b08f54..69be070a9378 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -910,10 +910,10 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
/*
- * The arm_pmu->num_events considers the cycle counter as well.
- * Ignore that and return only the general-purpose counters.
+ * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
+ * Ignore those and return only the general-purpose counters.
*/
- return arm_pmu->num_events - 1;
+ return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS);
}
static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
index f322e5ca1114..c8f607912567 100644
--- a/drivers/perf/apple_m1_cpu_pmu.c
+++ b/drivers/perf/apple_m1_cpu_pmu.c
@@ -400,7 +400,7 @@ static irqreturn_t m1_pmu_handle_irq(struct arm_pmu *cpu_pmu)
regs = get_irq_regs();
- for (idx = 0; idx < cpu_pmu->num_events; idx++) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, M1_PMU_NR_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct perf_sample_data data;
@@ -560,7 +560,7 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags)
cpu_pmu->reset = m1_pmu_reset;
cpu_pmu->set_event_filter = m1_pmu_set_event_filter;
- cpu_pmu->num_events = M1_PMU_NR_COUNTERS;
+ bitmap_set(cpu_pmu->cntr_mask, 0, M1_PMU_NR_COUNTERS);
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = &m1_pmu_events_attr_group;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] = &m1_pmu_format_attr_group;
return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8458fe2cebb4..398cce3d76fc 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -522,7 +522,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);
/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
@@ -742,7 +742,7 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
struct perf_event *event;
int idx;
- for (idx = 0; idx < armpmu->num_events; idx++) {
+ for_each_set_bit(idx, armpmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
event = hw_events->events[idx];
if (!event)
continue;
@@ -772,7 +772,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
@@ -924,8 +924,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
goto out_destroy;
- pr_info("enabled with %s PMU driver, %d counters available%s\n",
- pmu->name, pmu->num_events,
+ pr_info("enabled with %s PMU driver, %d (%*pb) counters available%s\n",
+ pmu->name, bitmap_weight(pmu->cntr_mask, ARMPMU_MAX_HWEVENTS),
+ ARMPMU_MAX_HWEVENTS, &pmu->cntr_mask,
has_nmi ? ", using NMIs" : "");
kvm_host_pmu_init(pmu);
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 6cbd37fd691a..53ad674bf009 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -454,9 +454,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
/*
* Perf Events' indices
*/
-#define ARMV8_IDX_CYCLE_COUNTER 0
-#define ARMV8_IDX_COUNTER0 1
-#define ARMV8_IDX_CYCLE_COUNTER_USER 32
+#define ARMV8_IDX_CYCLE_COUNTER 31
/*
* We unconditionally enable ARMv8.5-PMU long event counter support
@@ -489,19 +487,12 @@ static bool armv8pmu_event_is_chained(struct perf_event *event)
return !armv8pmu_event_has_user_read(event) &&
armv8pmu_event_is_64bit(event) &&
!armv8pmu_has_long_event(cpu_pmu) &&
- (idx != ARMV8_IDX_CYCLE_COUNTER);
+ (idx <= ARMV8_PMU_MAX_GENERAL_COUNTERS);
}
/*
* ARMv8 low level PMU access
*/
-
-/*
- * Perf Event to low level counters mapping
- */
-#define ARMV8_IDX_TO_COUNTER(x) \
- (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK)
-
static u64 armv8pmu_pmcr_read(void)
{
return read_pmcr();
@@ -521,14 +512,12 @@ static int armv8pmu_has_overflowed(u32 pmovsr)
static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
{
- return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
+ return pmnc & BIT(idx);
}
static u64 armv8pmu_read_evcntr(int idx)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
-
- return read_pmevcntrn(counter);
+ return read_pmevcntrn(idx);
}
static u64 armv8pmu_read_hw_counter(struct perf_event *event)
@@ -557,7 +546,7 @@ static bool armv8pmu_event_needs_bias(struct perf_event *event)
return false;
if (armv8pmu_has_long_event(cpu_pmu) ||
- idx == ARMV8_IDX_CYCLE_COUNTER)
+ idx >= ARMV8_PMU_MAX_GENERAL_COUNTERS)
return true;
return false;
@@ -595,9 +584,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
static void armv8pmu_write_evcntr(int idx, u64 value)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
-
- write_pmevcntrn(counter, value);
+ write_pmevcntrn(idx, value);
}
static void armv8pmu_write_hw_counter(struct perf_event *event,
@@ -628,7 +615,6 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
static void armv8pmu_write_evtype(int idx, unsigned long val)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
ARMV8_PMU_INCLUDE_EL2 |
ARMV8_PMU_EXCLUDE_EL0 |
@@ -638,7 +624,7 @@ static void armv8pmu_write_evtype(int idx, unsigned long val)
mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH;
val &= mask;
- write_pmevtypern(counter, val);
+ write_pmevtypern(idx, val);
}
static void armv8pmu_write_event_type(struct perf_event *event)
@@ -667,7 +653,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)
static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
{
- int counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
+ int counter = event->hw.idx;
u32 mask = BIT(counter);
if (armv8pmu_event_is_chained(event))
@@ -726,8 +712,7 @@ static void armv8pmu_enable_intens(u32 mask)
static void armv8pmu_enable_event_irq(struct perf_event *event)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
- armv8pmu_enable_intens(BIT(counter));
+ armv8pmu_enable_intens(BIT(event->hw.idx));
}
static void armv8pmu_disable_intens(u32 mask)
@@ -741,8 +726,7 @@ static void armv8pmu_disable_intens(u32 mask)
static void armv8pmu_disable_event_irq(struct perf_event *event)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
- armv8pmu_disable_intens(BIT(counter));
+ armv8pmu_disable_intens(BIT(event->hw.idx));
}
static u32 armv8pmu_getreset_flags(void)
@@ -786,7 +770,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
/* Clear any unused counters to avoid leaking their contents */
- for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
+ for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
+ ARMPMU_MAX_HWEVENTS) {
if (i == ARMV8_IDX_CYCLE_COUNTER)
write_pmccntr(0);
else
@@ -869,7 +854,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
* to prevent skews in group events.
*/
armv8pmu_stop(cpu_pmu);
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;
@@ -908,7 +893,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
{
int idx;
- for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) {
if (!test_and_set_bit(idx, cpuc->used_mask))
return idx;
}
@@ -924,7 +909,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
* Chaining requires two consecutive event counters, where
* the lower idx must be even.
*/
- for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) {
+ if (!(idx & 0x1))
+ continue;
if (!test_and_set_bit(idx, cpuc->used_mask)) {
/* Check if the preceding even counter is available */
if (!test_and_set_bit(idx - 1, cpuc->used_mask))
@@ -978,15 +965,7 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
return 0;
- /*
- * We remap the cycle counter index to 32 to
- * match the offset applied to the rest of
- * the counter indices.
- */
- if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
- return ARMV8_IDX_CYCLE_COUNTER_USER;
-
- return event->hw.idx;
+ return event->hw.idx + 1;
}
/*
@@ -1211,10 +1190,11 @@ static void __armv8pmu_probe_pmu(void *info)
probe->present = true;
/* Read the nb of CNTx counters supported from PMNC */
- cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
+ bitmap_set(cpu_pmu->cntr_mask,
+ 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
/* Add the CPU cycles counter */
- cpu_pmu->num_events += 1;
+ set_bit(ARMV8_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
index 0bb685b4bac5..b09615bb2bb2 100644
--- a/drivers/perf/arm_v6_pmu.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -64,6 +64,7 @@ enum armv6_counters {
ARMV6_CYCLE_COUNTER = 0,
ARMV6_COUNTER0,
ARMV6_COUNTER1,
+ ARMV6_NUM_COUNTERS
};
/*
@@ -254,7 +255,7 @@ armv6pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
armv6_pmcr_write(pmcr);
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV6_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;
@@ -391,7 +392,8 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = armv6pmu_start;
cpu_pmu->stop = armv6pmu_stop;
cpu_pmu->map_event = armv6_map_event;
- cpu_pmu->num_events = 3;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, ARMV6_NUM_COUNTERS);
}
static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index 928ac3d626ed..420cadd108e7 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -649,24 +649,12 @@ static struct attribute_group armv7_pmuv2_events_attr_group = {
/*
* Perf Events' indices
*/
-#define ARMV7_IDX_CYCLE_COUNTER 0
-#define ARMV7_IDX_COUNTER0 1
-#define ARMV7_IDX_COUNTER_LAST(cpu_pmu) \
- (ARMV7_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
-
-#define ARMV7_MAX_COUNTERS 32
-#define ARMV7_COUNTER_MASK (ARMV7_MAX_COUNTERS - 1)
-
+#define ARMV7_IDX_CYCLE_COUNTER 31
+#define ARMV7_IDX_COUNTER_MAX 31
/*
* ARMv7 low level PMNC access
*/
-/*
- * Perf Event to low level counters mapping
- */
-#define ARMV7_IDX_TO_COUNTER(x) \
- (((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
-
/*
* Per-CPU PMNC: config reg
*/
@@ -725,19 +713,17 @@ static inline int armv7_pmnc_has_overflowed(u32 pmnc)
static inline int armv7_pmnc_counter_valid(struct arm_pmu *cpu_pmu, int idx)
{
- return idx >= ARMV7_IDX_CYCLE_COUNTER &&
- idx <= ARMV7_IDX_COUNTER_LAST(cpu_pmu);
+ return test_bit(idx, cpu_pmu->cntr_mask);
}
static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx)
{
- return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx));
+ return pmnc & BIT(idx);
}
static inline void armv7_pmnc_select_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
+ asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (idx));
isb();
}
@@ -787,29 +773,25 @@ static inline void armv7_pmnc_write_evtsel(int idx, u32 val)
static inline void armv7_pmnc_enable_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(idx)));
}
static inline void armv7_pmnc_disable_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(idx)));
}
static inline void armv7_pmnc_enable_intens(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(idx)));
}
static inline void armv7_pmnc_disable_intens(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(idx)));
isb();
/* Clear the overflow flag in case an interrupt is pending. */
- asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(idx)));
isb();
}
@@ -853,15 +835,12 @@ static void armv7_pmnc_dump_regs(struct arm_pmu *cpu_pmu)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (val));
pr_info("CCNT =0x%08x\n", val);
- for (cnt = ARMV7_IDX_COUNTER0;
- cnt <= ARMV7_IDX_COUNTER_LAST(cpu_pmu); cnt++) {
+ for_each_set_bit(cnt, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(cnt);
asm volatile("mrc p15, 0, %0, c9, c13, 2" : "=r" (val));
- pr_info("CNT[%d] count =0x%08x\n",
- ARMV7_IDX_TO_COUNTER(cnt), val);
+ pr_info("CNT[%d] count =0x%08x\n", cnt, val);
asm volatile("mrc p15, 0, %0, c9, c13, 1" : "=r" (val));
- pr_info("CNT[%d] evtsel=0x%08x\n",
- ARMV7_IDX_TO_COUNTER(cnt), val);
+ pr_info("CNT[%d] evtsel=0x%08x\n", cnt, val);
}
}
#endif
@@ -958,7 +937,7 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
regs = get_irq_regs();
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;
@@ -1027,7 +1006,7 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
* For anything other than a cycle counter, try and use
* the events counters
*/
- for (idx = ARMV7_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
if (!test_and_set_bit(idx, cpuc->used_mask))
return idx;
}
@@ -1073,7 +1052,7 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event,
static void armv7pmu_reset(void *info)
{
struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
- u32 idx, nb_cnt = cpu_pmu->num_events, val;
+ u32 idx, val;
if (cpu_pmu->secure_access) {
asm volatile("mrc p15, 0, %0, c1, c1, 1" : "=r" (val));
@@ -1082,7 +1061,7 @@ static void armv7pmu_reset(void *info)
}
/* The counter and interrupt enable registers are unknown at reset. */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
armv7_pmnc_disable_counter(idx);
armv7_pmnc_disable_intens(idx);
}
@@ -1161,20 +1140,22 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
static void armv7_read_num_pmnc_events(void *info)
{
- int *nb_cnt = info;
+ int nb_cnt;
+ struct arm_pmu *cpu_pmu = info;
/* Read the nb of CNTx counters supported from PMNC */
- *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
+ nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
+ bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt);
/* Add the CPU cycles counter */
- *nb_cnt += 1;
+ set_bit(ARMV7_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
}
static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
{
return smp_call_function_any(&arm_pmu->supported_cpus,
armv7_read_num_pmnc_events,
- &arm_pmu->num_events, 1);
+ arm_pmu, 1);
}
static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1524,7 +1505,7 @@ static void krait_pmu_reset(void *info)
{
u32 vval, fval;
struct arm_pmu *cpu_pmu = info;
- u32 idx, nb_cnt = cpu_pmu->num_events;
+ u32 idx;
armv7pmu_reset(info);
@@ -1538,7 +1519,7 @@ static void krait_pmu_reset(void *info)
venum_post_pmresr(vval, fval);
/* Reset PMxEVNCTCR to sane default */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(idx);
asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
}
@@ -1562,7 +1543,7 @@ static int krait_event_to_bit(struct perf_event *event, unsigned int region,
* Lower bits are reserved for use by the counters (see
* armv7pmu_get_event_idx() for more info)
*/
- bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
+ bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);
return bit;
}
@@ -1845,7 +1826,7 @@ static void scorpion_pmu_reset(void *info)
{
u32 vval, fval;
struct arm_pmu *cpu_pmu = info;
- u32 idx, nb_cnt = cpu_pmu->num_events;
+ u32 idx;
armv7pmu_reset(info);
@@ -1860,7 +1841,7 @@ static void scorpion_pmu_reset(void *info)
venum_post_pmresr(vval, fval);
/* Reset PMxEVNCTCR to sane default */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(idx);
asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
}
@@ -1883,7 +1864,7 @@ static int scorpion_event_to_bit(struct perf_event *event, unsigned int region,
* Lower bits are reserved for use by the counters (see
* armv7pmu_get_event_idx() for more info)
*/
- bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
+ bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);
return bit;
}
diff --git a/drivers/perf/arm_xscale_pmu.c b/drivers/perf/arm_xscale_pmu.c
index 3d8b72d6b37f..638fea9b1263 100644
--- a/drivers/perf/arm_xscale_pmu.c
+++ b/drivers/perf/arm_xscale_pmu.c
@@ -53,6 +53,8 @@ enum xscale_counters {
XSCALE_COUNTER2,
XSCALE_COUNTER3,
};
+#define XSCALE1_NUM_COUNTERS 3
+#define XSCALE2_NUM_COUNTERS 5
static const unsigned xscale_perf_map[PERF_COUNT_HW_MAX] = {
PERF_MAP_ALL_UNSUPPORTED,
@@ -168,7 +170,7 @@ xscale1pmu_handle_irq(struct arm_pmu *cpu_pmu)
regs = get_irq_regs();
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE1_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;
@@ -364,7 +366,8 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = xscale1pmu_start;
cpu_pmu->stop = xscale1pmu_stop;
cpu_pmu->map_event = xscale_map_event;
- cpu_pmu->num_events = 3;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE1_NUM_COUNTERS);
return 0;
}
@@ -500,7 +503,7 @@ xscale2pmu_handle_irq(struct arm_pmu *cpu_pmu)
regs = get_irq_regs();
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE2_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;
@@ -719,7 +722,8 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = xscale2pmu_start;
cpu_pmu->stop = xscale2pmu_stop;
cpu_pmu->map_event = xscale_map_event;
- cpu_pmu->num_events = 5;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE2_NUM_COUNTERS);
return 0;
}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index b3b34f6670cf..e5d6d204beab 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -96,7 +96,7 @@ struct arm_pmu {
void (*stop)(struct arm_pmu *);
void (*reset)(void *);
int (*map_event)(struct perf_event *event);
- int num_events;
+ DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
bool secure_access; /* 32-bit ARM only */
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 7867db04ec98..eccbdd8eb98f 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -6,6 +6,7 @@
#ifndef __PERF_ARM_PMUV3_H
#define __PERF_ARM_PMUV3_H
+#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
#define ARMV8_PMU_MAX_COUNTERS 32
#define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 07/12] perf: arm_pmuv3: Prepare for more than 32 counters
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (5 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors Rob Herring (Arm)
` (5 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
Various PMUv3 registers which are a mask of counters are 64-bit
registers, but the accessor functions take a u32. This has been fine as
the upper 32-bits have been RES0 as there has been a maximum of 32
counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is
added. Update the accessor functions to use a u64 instead.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
arch/arm64/include/asm/arm_pmuv3.h | 12 ++++++------
arch/arm64/include/asm/kvm_host.h | 8 ++++----
arch/arm64/kvm/pmu.c | 8 ++++----
drivers/perf/arm_pmuv3.c | 40 ++++++++++++++++++++------------------
include/kvm/arm_pmu.h | 4 ++--
5 files changed, 37 insertions(+), 35 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index a4697a0b6835..19b3f9150058 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -71,22 +71,22 @@ static inline u64 read_pmccntr(void)
return read_sysreg(pmccntr_el0);
}
-static inline void write_pmcntenset(u32 val)
+static inline void write_pmcntenset(u64 val)
{
write_sysreg(val, pmcntenset_el0);
}
-static inline void write_pmcntenclr(u32 val)
+static inline void write_pmcntenclr(u64 val)
{
write_sysreg(val, pmcntenclr_el0);
}
-static inline void write_pmintenset(u32 val)
+static inline void write_pmintenset(u64 val)
{
write_sysreg(val, pmintenset_el1);
}
-static inline void write_pmintenclr(u32 val)
+static inline void write_pmintenclr(u64 val)
{
write_sysreg(val, pmintenclr_el1);
}
@@ -96,12 +96,12 @@ static inline void write_pmccfiltr(u64 val)
write_sysreg(val, pmccfiltr_el0);
}
-static inline void write_pmovsclr(u32 val)
+static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
}
-static inline u32 read_pmovsclr(void)
+static inline u64 read_pmovsclr(void)
{
return read_sysreg(pmovsclr_el0);
}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8170c04fde91..6243a01d9d26 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1267,12 +1267,12 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
-void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
-void kvm_clr_pmu_events(u32 clr);
+void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
+void kvm_clr_pmu_events(u64 clr);
bool kvm_set_pmuserenr(u64 val);
#else
-static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
-static inline void kvm_clr_pmu_events(u32 clr) {}
+static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
+static inline void kvm_clr_pmu_events(u64 clr) {}
static inline bool kvm_set_pmuserenr(u64 val)
{
return false;
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 329819806096..e633b4434c6a 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -35,7 +35,7 @@ struct kvm_pmu_events *kvm_get_pmu_events(void)
* Add events to track that we may want to switch at guest entry/exit
* time.
*/
-void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
+void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr)
{
struct kvm_pmu_events *pmu = kvm_get_pmu_events();
@@ -51,7 +51,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
/*
* Stop tracking events
*/
-void kvm_clr_pmu_events(u32 clr)
+void kvm_clr_pmu_events(u64 clr)
{
struct kvm_pmu_events *pmu = kvm_get_pmu_events();
@@ -176,7 +176,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
{
struct kvm_pmu_events *pmu;
- u32 events_guest, events_host;
+ u64 events_guest, events_host;
if (!kvm_arm_support_pmu_v3() || !has_vhe())
return;
@@ -197,7 +197,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
{
struct kvm_pmu_events *pmu;
- u32 events_guest, events_host;
+ u64 events_guest, events_host;
if (!kvm_arm_support_pmu_v3() || !has_vhe())
return;
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 53ad674bf009..f771242168f1 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -505,14 +505,14 @@ static void armv8pmu_pmcr_write(u64 val)
write_pmcr(val);
}
-static int armv8pmu_has_overflowed(u32 pmovsr)
+static int armv8pmu_has_overflowed(u64 pmovsr)
{
- return pmovsr & ARMV8_PMU_OVERFLOWED_MASK;
+ return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK);
}
-static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
+static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx)
{
- return pmnc & BIT(idx);
+ return !!(pmnc & BIT(idx));
}
static u64 armv8pmu_read_evcntr(int idx)
@@ -651,17 +651,17 @@ static void armv8pmu_write_event_type(struct perf_event *event)
}
}
-static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
+static u64 armv8pmu_event_cnten_mask(struct perf_event *event)
{
int counter = event->hw.idx;
- u32 mask = BIT(counter);
+ u64 mask = BIT(counter);
if (armv8pmu_event_is_chained(event))
mask |= BIT(counter - 1);
return mask;
}
-static void armv8pmu_enable_counter(u32 mask)
+static void armv8pmu_enable_counter(u64 mask)
{
/*
* Make sure event configuration register writes are visible before we
@@ -674,7 +674,7 @@ static void armv8pmu_enable_counter(u32 mask)
static void armv8pmu_enable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
- u32 mask = armv8pmu_event_cnten_mask(event);
+ u64 mask = armv8pmu_event_cnten_mask(event);
kvm_set_pmu_events(mask, attr);
@@ -683,7 +683,7 @@ static void armv8pmu_enable_event_counter(struct perf_event *event)
armv8pmu_enable_counter(mask);
}
-static void armv8pmu_disable_counter(u32 mask)
+static void armv8pmu_disable_counter(u64 mask)
{
write_pmcntenclr(mask);
/*
@@ -696,7 +696,7 @@ static void armv8pmu_disable_counter(u32 mask)
static void armv8pmu_disable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
- u32 mask = armv8pmu_event_cnten_mask(event);
+ u64 mask = armv8pmu_event_cnten_mask(event);
kvm_clr_pmu_events(mask);
@@ -705,7 +705,7 @@ static void armv8pmu_disable_event_counter(struct perf_event *event)
armv8pmu_disable_counter(mask);
}
-static void armv8pmu_enable_intens(u32 mask)
+static void armv8pmu_enable_intens(u64 mask)
{
write_pmintenset(mask);
}
@@ -715,7 +715,7 @@ static void armv8pmu_enable_event_irq(struct perf_event *event)
armv8pmu_enable_intens(BIT(event->hw.idx));
}
-static void armv8pmu_disable_intens(u32 mask)
+static void armv8pmu_disable_intens(u64 mask)
{
write_pmintenclr(mask);
isb();
@@ -729,9 +729,9 @@ static void armv8pmu_disable_event_irq(struct perf_event *event)
armv8pmu_disable_intens(BIT(event->hw.idx));
}
-static u32 armv8pmu_getreset_flags(void)
+static u64 armv8pmu_getreset_flags(void)
{
- u32 value;
+ u64 value;
/* Read */
value = read_pmovsclr();
@@ -827,7 +827,7 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
{
- u32 pmovsr;
+ u64 pmovsr;
struct perf_sample_data data;
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
struct pt_regs *regs;
@@ -1040,14 +1040,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
static void armv8pmu_reset(void *info)
{
struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
- u64 pmcr;
+ u64 pmcr, mask;
+
+ bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
/* The counter and interrupt enable registers are unknown at reset. */
- armv8pmu_disable_counter(U32_MAX);
- armv8pmu_disable_intens(U32_MAX);
+ armv8pmu_disable_counter(mask);
+ armv8pmu_disable_intens(mask);
/* Clear the counters we flip at guest entry/exit */
- kvm_clr_pmu_events(U32_MAX);
+ kvm_clr_pmu_events(mask);
/*
* Initialize & Reset PMNC. Request overflow interrupt for
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 35d4ca4f6122..334d7c5503cf 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -19,8 +19,8 @@ struct kvm_pmc {
};
struct kvm_pmu_events {
- u32 events_host;
- u32 events_guest;
+ u64 events_host;
+ u64 events_guest;
};
struct kvm_pmu {
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (6 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 07/12] perf: arm_pmuv3: Prepare for more than 32 counters Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-27 10:47 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access Rob Herring (Arm)
` (4 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
Commit df29ddf4f04b ("arm64: perf: Abstract system register accesses
away") split off PMU register accessor functions to a standalone header.
Let's use it for KVM PMU code and get rid one copy of the ugly switch
macro.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Use linux/perf/arm_pmuv3.h include instead of asm/
---
arch/arm64/include/asm/arm_pmuv3.h | 13 ++++++++
arch/arm64/kvm/pmu.c | 66 +++++---------------------------------
2 files changed, 21 insertions(+), 58 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 19b3f9150058..36c3e82b4eec 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -33,6 +33,14 @@ static inline void write_pmevtypern(int n, unsigned long val)
PMEVN_SWITCH(n, WRITE_PMEVTYPERN);
}
+#define RETURN_READ_PMEVTYPERN(n) \
+ return read_sysreg(pmevtyper##n##_el0)
+static inline unsigned long read_pmevtypern(int n)
+{
+ PMEVN_SWITCH(n, RETURN_READ_PMEVTYPERN);
+ return 0;
+}
+
static inline unsigned long read_pmmir(void)
{
return read_cpuid(PMMIR_EL1);
@@ -96,6 +104,11 @@ static inline void write_pmccfiltr(u64 val)
write_sysreg(val, pmccfiltr_el0);
}
+static inline u64 read_pmccfiltr(void)
+{
+ return read_sysreg(pmccfiltr_el0);
+}
+
static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index e633b4434c6a..a47ae311d4a8 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -5,6 +5,7 @@
*/
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
+#include <linux/perf/arm_pmuv3.h>
static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
@@ -62,63 +63,16 @@ void kvm_clr_pmu_events(u64 clr)
pmu->events_guest &= ~clr;
}
-#define PMEVTYPER_READ_CASE(idx) \
- case idx: \
- return read_sysreg(pmevtyper##idx##_el0)
-
-#define PMEVTYPER_WRITE_CASE(idx) \
- case idx: \
- write_sysreg(val, pmevtyper##idx##_el0); \
- break
-
-#define PMEVTYPER_CASES(readwrite) \
- PMEVTYPER_##readwrite##_CASE(0); \
- PMEVTYPER_##readwrite##_CASE(1); \
- PMEVTYPER_##readwrite##_CASE(2); \
- PMEVTYPER_##readwrite##_CASE(3); \
- PMEVTYPER_##readwrite##_CASE(4); \
- PMEVTYPER_##readwrite##_CASE(5); \
- PMEVTYPER_##readwrite##_CASE(6); \
- PMEVTYPER_##readwrite##_CASE(7); \
- PMEVTYPER_##readwrite##_CASE(8); \
- PMEVTYPER_##readwrite##_CASE(9); \
- PMEVTYPER_##readwrite##_CASE(10); \
- PMEVTYPER_##readwrite##_CASE(11); \
- PMEVTYPER_##readwrite##_CASE(12); \
- PMEVTYPER_##readwrite##_CASE(13); \
- PMEVTYPER_##readwrite##_CASE(14); \
- PMEVTYPER_##readwrite##_CASE(15); \
- PMEVTYPER_##readwrite##_CASE(16); \
- PMEVTYPER_##readwrite##_CASE(17); \
- PMEVTYPER_##readwrite##_CASE(18); \
- PMEVTYPER_##readwrite##_CASE(19); \
- PMEVTYPER_##readwrite##_CASE(20); \
- PMEVTYPER_##readwrite##_CASE(21); \
- PMEVTYPER_##readwrite##_CASE(22); \
- PMEVTYPER_##readwrite##_CASE(23); \
- PMEVTYPER_##readwrite##_CASE(24); \
- PMEVTYPER_##readwrite##_CASE(25); \
- PMEVTYPER_##readwrite##_CASE(26); \
- PMEVTYPER_##readwrite##_CASE(27); \
- PMEVTYPER_##readwrite##_CASE(28); \
- PMEVTYPER_##readwrite##_CASE(29); \
- PMEVTYPER_##readwrite##_CASE(30)
-
/*
* Read a value direct from PMEVTYPER<idx> where idx is 0-30
* or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
*/
static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
{
- switch (idx) {
- PMEVTYPER_CASES(READ);
- case ARMV8_PMU_CYCLE_IDX:
- return read_sysreg(pmccfiltr_el0);
- default:
- WARN_ON(1);
- }
+ if (idx == ARMV8_PMU_CYCLE_IDX)
+ return read_pmccfiltr();
- return 0;
+ return read_pmevtypern(idx);
}
/*
@@ -127,14 +81,10 @@ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
*/
static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
{
- switch (idx) {
- PMEVTYPER_CASES(WRITE);
- case ARMV8_PMU_CYCLE_IDX:
- write_sysreg(val, pmccfiltr_el0);
- break;
- default:
- WARN_ON(1);
- }
+ if (idx == ARMV8_PMU_CYCLE_IDX)
+ write_pmccfiltr(val);
+ else
+ write_pmevtypern(idx, val);
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (7 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-27 10:47 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define Rob Herring (Arm)
` (3 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register
field. Make that clear by adding a standard sysreg definition for the
register, and using it instead.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
arch/arm64/include/asm/sysreg.h | 1 -
arch/arm64/kvm/sys_regs.c | 10 +++++-----
arch/arm64/tools/sysreg | 5 +++++
include/linux/perf/arm_pmuv3.h | 1 -
4 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index af3b206fa423..b0d6c33f9ecc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -403,7 +403,6 @@
#define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2)
#define SYS_PMOVSCLR_EL0 sys_reg(3, 3, 9, 12, 3)
#define SYS_PMSWINC_EL0 sys_reg(3, 3, 9, 12, 4)
-#define SYS_PMSELR_EL0 sys_reg(3, 3, 9, 12, 5)
#define SYS_PMCEID0_EL0 sys_reg(3, 3, 9, 12, 6)
#define SYS_PMCEID1_EL0 sys_reg(3, 3, 9, 12, 7)
#define SYS_PMCCNTR_EL0 sys_reg(3, 3, 9, 13, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 22b45a15d068..f8b5db48ea8a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -880,7 +880,7 @@ static u64 reset_pmevtyper(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static u64 reset_pmselr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
reset_unknown(vcpu, r);
- __vcpu_sys_reg(vcpu, r->reg) &= ARMV8_PMU_COUNTER_MASK;
+ __vcpu_sys_reg(vcpu, r->reg) &= PMSELR_EL0_SEL_MASK;
return __vcpu_sys_reg(vcpu, r->reg);
}
@@ -972,7 +972,7 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
else
/* return PMSELR.SEL field */
p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
- & ARMV8_PMU_COUNTER_MASK;
+ & PMSELR_EL0_SEL_MASK;
return true;
}
@@ -1040,8 +1040,8 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
if (pmu_access_event_counter_el0_disabled(vcpu))
return false;
- idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
- & ARMV8_PMU_COUNTER_MASK;
+ idx = SYS_FIELD_GET(PMSELR_EL0, SEL,
+ __vcpu_sys_reg(vcpu, PMSELR_EL0));
} else if (r->Op2 == 0) {
/* PMCCNTR_EL0 */
if (pmu_access_cycle_counter_el0_disabled(vcpu))
@@ -1091,7 +1091,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
/* PMXEVTYPER_EL0 */
- idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
+ idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
reg = PMEVTYPER0_EL0 + idx;
} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index a4c1dd4741a4..231817a379b5 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2153,6 +2153,11 @@ Field 4 P
Field 3:0 ALIGN
EndSysreg
+Sysreg PMSELR_EL0 3 3 9 12 5
+Res0 63:5
+Field 4:0 SEL
+EndSysreg
+
SysregFields CONTEXTIDR_ELx
Res0 63:32
Field 31:0 PROCID
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index eccbdd8eb98f..792b8e10b72a 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -8,7 +8,6 @@
#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
#define ARMV8_PMU_MAX_COUNTERS 32
-#define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)
/*
* Common architectural and microarchitectural event numbers.
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (8 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-27 10:48 ` Marc Zyngier
2024-07-01 17:07 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters Rob Herring (Arm)
` (2 subsequent siblings)
12 siblings, 2 replies; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
The PMUv3 and KVM code each have a define for the PMU cycle counter
index. Move KVM's define to a shared location and use it for PMUv3
driver.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Move ARMV8_PMU_CYCLE_IDX to linux/perf/arm_pmuv3.h
---
arch/arm64/kvm/sys_regs.c | 1 +
drivers/perf/arm_pmuv3.c | 19 +++++++------------
include/kvm/arm_pmu.h | 1 -
include/linux/perf/arm_pmuv3.h | 3 +++
4 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f8b5db48ea8a..22393ae7ce14 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -18,6 +18,7 @@
#include <linux/printk.h>
#include <linux/uaccess.h>
+#include <asm/arm_pmuv3.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/debug-monitors.h>
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index f771242168f1..f58dff49ea7d 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -451,11 +451,6 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
.attrs = armv8_pmuv3_caps_attrs,
};
-/*
- * Perf Events' indices
- */
-#define ARMV8_IDX_CYCLE_COUNTER 31
-
/*
* We unconditionally enable ARMv8.5-PMU long event counter support
* (64-bit events) where supported. Indicate if this arm_pmu has long
@@ -574,7 +569,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
int idx = hwc->idx;
u64 value;
- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
value = read_pmccntr();
else
value = armv8pmu_read_hw_counter(event);
@@ -607,7 +602,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
value = armv8pmu_bias_long_counter(event, value);
- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(value);
else
armv8pmu_write_hw_counter(event, value);
@@ -644,7 +639,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)
armv8pmu_write_evtype(idx - 1, hwc->config_base);
armv8pmu_write_evtype(idx, chain_evt);
} else {
- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(hwc->config_base);
else
armv8pmu_write_evtype(idx, hwc->config_base);
@@ -772,7 +767,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
/* Clear any unused counters to avoid leaking their contents */
for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
ARMPMU_MAX_HWEVENTS) {
- if (i == ARMV8_IDX_CYCLE_COUNTER)
+ if (i == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(0);
else
armv8pmu_write_evcntr(i, 0);
@@ -933,8 +928,8 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
/* Always prefer to place a cycle counter into the cycle counter. */
if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
!armv8pmu_event_get_threshold(&event->attr)) {
- if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
- return ARMV8_IDX_CYCLE_COUNTER;
+ if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
+ return ARMV8_PMU_CYCLE_IDX;
else if (armv8pmu_event_is_64bit(event) &&
armv8pmu_event_want_user_access(event) &&
!armv8pmu_has_long_event(cpu_pmu))
@@ -1196,7 +1191,7 @@ static void __armv8pmu_probe_pmu(void *info)
0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
/* Add the CPU cycles counter */
- set_bit(ARMV8_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
+ set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 334d7c5503cf..871067fb2616 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -10,7 +10,6 @@
#include <linux/perf_event.h>
#include <linux/perf/arm_pmuv3.h>
-#define ARMV8_PMU_CYCLE_IDX (ARMV8_PMU_MAX_COUNTERS - 1)
#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
struct kvm_pmc {
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 792b8e10b72a..f4ec76f725a3 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -9,6 +9,9 @@
#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
#define ARMV8_PMU_MAX_COUNTERS 32
+#define ARMV8_PMU_CYCLE_IDX 31
+
+
/*
* Common architectural and microarchitectural event numbers.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (9 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-06-27 10:54 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter Rob Herring (Arm)
2024-07-03 14:38 ` [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed " Will Deacon
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
There are 2 defines for the number of PMU counters:
ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same
currently, but Armv9.4/8.9 increases the number of possible counters
from 32 to 33. With this change, the maximum number of counters will
differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation
its own define to decouple it from the rest of the kernel's number PMU
counters.
The VHE PMU code needs to match the PMU driver, so switch it to use
ARMPMU_MAX_HWEVENTS instead.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
arch/arm64/kvm/pmu-emul.c | 8 ++++----
arch/arm64/kvm/pmu.c | 5 +++--
include/kvm/arm_pmu.h | 3 ++-
include/linux/perf/arm_pmuv3.h | 2 --
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 69be070a9378..566a0e120306 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -233,7 +233,7 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
int i;
struct kvm_pmu *pmu = &vcpu->arch.pmu;
- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
pmu->pmc[i].idx = i;
}
@@ -260,7 +260,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
{
int i;
- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
kvm_pmu_release_perf_event(kvm_vcpu_idx_to_pmc(vcpu, i));
irq_work_sync(&vcpu->arch.pmu.overflow_work);
}
@@ -291,7 +291,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
if (!(kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E) || !val)
return;
- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
struct kvm_pmc *pmc;
if (!(val & BIT(i)))
@@ -323,7 +323,7 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
if (!kvm_vcpu_has_pmu(vcpu) || !val)
return;
- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
struct kvm_pmc *pmc;
if (!(val & BIT(i)))
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index a47ae311d4a8..215b74875815 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -5,6 +5,7 @@
*/
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
+#include <linux/perf/arm_pmu.h>
#include <linux/perf/arm_pmuv3.h>
static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
@@ -95,7 +96,7 @@ static void kvm_vcpu_pmu_enable_el0(unsigned long events)
u64 typer;
u32 counter;
- for_each_set_bit(counter, &events, 32) {
+ for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer &= ~ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
@@ -110,7 +111,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
u64 typer;
u32 counter;
- for_each_set_bit(counter, &events, 32) {
+ for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer |= ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 871067fb2616..e08aeec5d936 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -10,6 +10,7 @@
#include <linux/perf_event.h>
#include <linux/perf/arm_pmuv3.h>
+#define KVM_ARMV8_PMU_MAX_COUNTERS 32
#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
struct kvm_pmc {
@@ -25,7 +26,7 @@ struct kvm_pmu_events {
struct kvm_pmu {
struct irq_work overflow_work;
struct kvm_pmu_events events;
- struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
+ struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS];
int irq_num;
bool created;
bool irq_level;
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index f4ec76f725a3..4f7a7f2222e5 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -7,8 +7,6 @@
#define __PERF_ARM_PMUV3_H
#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
-#define ARMV8_PMU_MAX_COUNTERS 32
-
#define ARMV8_PMU_CYCLE_IDX 31
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (10 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters Rob Herring (Arm)
@ 2024-06-26 22:32 ` Rob Herring (Arm)
2024-07-01 17:20 ` Mark Rutland
2024-07-03 14:38 ` [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed " Will Deacon
12 siblings, 1 reply; 29+ messages in thread
From: Rob Herring (Arm) @ 2024-06-26 22:32 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Marc Zyngier, Oliver Upton, James Morse,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, James Clark
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, kvmarm
Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
similar to the fixed cycle counter. Support for the feature is indicated
in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
accessible in AArch32.
Existing userspace using direct counter access won't know how to handle
the fixed instruction counter, so we have to avoid using the counter
when user access is requested.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Use set_bit() instead of bitmap_set()
- Check for ARMV8_PMUV3_PERFCTR_INST_RETIRED first in counter assignment
- Check for threshold disabled in counter assignment
---
arch/arm/include/asm/arm_pmuv3.h | 20 ++++++++++++++++++++
arch/arm64/include/asm/arm_pmuv3.h | 28 ++++++++++++++++++++++++++++
arch/arm64/kvm/pmu.c | 8 ++++++--
arch/arm64/tools/sysreg | 25 +++++++++++++++++++++++++
drivers/perf/arm_pmuv3.c | 25 +++++++++++++++++++++++++
include/linux/perf/arm_pmu.h | 8 ++++++--
include/linux/perf/arm_pmuv3.h | 6 ++++--
7 files changed, 114 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index a41b503b7dcd..f63ba8986b24 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -127,6 +127,12 @@ static inline u32 read_pmuver(void)
return (dfr0 >> 24) & 0xf;
}
+static inline bool pmuv3_has_icntr(void)
+{
+ /* FEAT_PMUv3_ICNTR not accessible for 32-bit */
+ return false;
+}
+
static inline void write_pmcr(u32 val)
{
write_sysreg(val, PMCR);
@@ -152,6 +158,13 @@ static inline u64 read_pmccntr(void)
return read_sysreg(PMCCNTR);
}
+static inline void write_pmicntr(u64 val) {}
+
+static inline u64 read_pmicntr(void)
+{
+ return 0;
+}
+
static inline void write_pmcntenset(u32 val)
{
write_sysreg(val, PMCNTENSET);
@@ -177,6 +190,13 @@ static inline void write_pmccfiltr(u32 val)
write_sysreg(val, PMCCFILTR);
}
+static inline void write_pmicfiltr(u64 val) {}
+
+static inline u64 read_pmicfiltr(void)
+{
+ return 0;
+}
+
static inline void write_pmovsclr(u32 val)
{
write_sysreg(val, PMOVSR);
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 36c3e82b4eec..468a049bc63b 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -54,6 +54,14 @@ static inline u32 read_pmuver(void)
ID_AA64DFR0_EL1_PMUVer_SHIFT);
}
+static inline bool pmuv3_has_icntr(void)
+{
+ u64 dfr1 = read_sysreg(id_aa64dfr1_el1);
+
+ return !!cpuid_feature_extract_unsigned_field(dfr1,
+ ID_AA64DFR1_EL1_PMICNTR_SHIFT);
+}
+
static inline void write_pmcr(u64 val)
{
write_sysreg(val, pmcr_el0);
@@ -79,6 +87,16 @@ static inline u64 read_pmccntr(void)
return read_sysreg(pmccntr_el0);
}
+static inline void write_pmicntr(u64 val)
+{
+ write_sysreg_s(val, SYS_PMICNTR_EL0);
+}
+
+static inline u64 read_pmicntr(void)
+{
+ return read_sysreg_s(SYS_PMICNTR_EL0);
+}
+
static inline void write_pmcntenset(u64 val)
{
write_sysreg(val, pmcntenset_el0);
@@ -109,6 +127,16 @@ static inline u64 read_pmccfiltr(void)
return read_sysreg(pmccfiltr_el0);
}
+static inline void write_pmicfiltr(u64 val)
+{
+ write_sysreg_s(val, SYS_PMICFILTR_EL0);
+}
+
+static inline u64 read_pmicfiltr(void)
+{
+ return read_sysreg_s(SYS_PMICFILTR_EL0);
+}
+
static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 215b74875815..0b3adf3e17b4 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -66,24 +66,28 @@ void kvm_clr_pmu_events(u64 clr)
/*
* Read a value direct from PMEVTYPER<idx> where idx is 0-30
- * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
+ * or PMxCFILTR_EL0 where idx is 31-32.
*/
static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
{
if (idx == ARMV8_PMU_CYCLE_IDX)
return read_pmccfiltr();
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ return read_pmicfiltr();
return read_pmevtypern(idx);
}
/*
* Write a value direct to PMEVTYPER<idx> where idx is 0-30
- * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
+ * or PMxCFILTR_EL0 where idx is 31-32.
*/
static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
{
if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(val);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicfiltr(val);
else
write_pmevtypern(idx, val);
}
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 231817a379b5..8ab6e09871de 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0
Field 63:0 ADDR
EndSysreg
+Sysreg PMICNTR_EL0 3 3 9 4 0
+Field 63:0 ICNT
+EndSysreg
+
+Sysreg PMICFILTR_EL0 3 3 9 6 0
+Res0 63:59
+Field 58 SYNC
+Field 57:56 VS
+Res0 55:32
+Field 31 P
+Field 30 U
+Field 29 NSK
+Field 28 NSU
+Field 27 NSH
+Field 26 M
+Res0 25
+Field 24 SH
+Field 23 T
+Field 22 RLK
+Field 21 RLU
+Field 20 RLH
+Res0 19:16
+Field 15:0 evtCount
+EndSysreg
+
Sysreg PMSCR_EL1 3 0 9 9 0
Res0 63:8
Field 7:6 PCT
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index f58dff49ea7d..3b3a3334cc3f 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -571,6 +571,8 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
if (idx == ARMV8_PMU_CYCLE_IDX)
value = read_pmccntr();
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ value = read_pmicntr();
else
value = armv8pmu_read_hw_counter(event);
@@ -604,6 +606,8 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(value);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicntr(value);
else
armv8pmu_write_hw_counter(event, value);
}
@@ -641,6 +645,8 @@ static void armv8pmu_write_event_type(struct perf_event *event)
} else {
if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(hwc->config_base);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicfiltr(hwc->config_base);
else
armv8pmu_write_evtype(idx, hwc->config_base);
}
@@ -769,6 +775,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
ARMPMU_MAX_HWEVENTS) {
if (i == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(0);
+ else if (i == ARMV8_PMU_INSTR_IDX)
+ write_pmicntr(0);
else
armv8pmu_write_evcntr(i, 0);
}
@@ -936,6 +944,19 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
return -EAGAIN;
}
+ /*
+ * Always prefer to place a instruction counter into the instruction counter,
+ * but don't expose the instruction counter to userspace access as userspace
+ * may not know how to handle it.
+ */
+ if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) &&
+ !armv8pmu_event_get_threshold(&event->attr) &&
+ test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) &&
+ !armv8pmu_event_want_user_access(event)) {
+ if (!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask))
+ return ARMV8_PMU_INSTR_IDX;
+ }
+
/*
* Otherwise use events counters
*/
@@ -1193,6 +1214,10 @@ static void __armv8pmu_probe_pmu(void *info)
/* Add the CPU cycles counter */
set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
+ /* Add the CPU instructions counter */
+ if (pmuv3_has_icntr())
+ set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
+
pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index e5d6d204beab..4b5b83677e3f 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -17,10 +17,14 @@
#ifdef CONFIG_ARM_PMU
/*
- * The ARMv7 CPU PMU supports up to 32 event counters.
+ * The Armv7 and Armv8.8 or less CPU PMU supports up to 32 event counters.
+ * The Armv8.9/9.4 CPU PMU supports up to 33 event counters.
*/
+#ifdef CONFIG_ARM
#define ARMPMU_MAX_HWEVENTS 32
-
+#else
+#define ARMPMU_MAX_HWEVENTS 33
+#endif
/*
* ARM PMU hw_event flags
*/
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 4f7a7f2222e5..3372c1b56486 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -8,7 +8,7 @@
#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
#define ARMV8_PMU_CYCLE_IDX 31
-
+#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */
/*
* Common architectural and microarchitectural event numbers.
@@ -228,8 +228,10 @@
*/
#define ARMV8_PMU_OVSR_P GENMASK(30, 0)
#define ARMV8_PMU_OVSR_C BIT(31)
+#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */
/* Mask for writable bits is both P and C fields */
-#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C)
+#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \
+ ARMV8_PMU_OVSR_F)
/*
* PMXEVTYPER: Event selection reg
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors
2024-06-26 22:32 ` [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors Rob Herring (Arm)
@ 2024-06-27 10:47 ` Marc Zyngier
0 siblings, 0 replies; 29+ messages in thread
From: Marc Zyngier @ 2024-06-27 10:47 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, linux-arm-kernel,
linux-kernel, linux-perf-users, kvmarm
On Wed, 26 Jun 2024 23:32:32 +0100,
"Rob Herring (Arm)" <robh@kernel.org> wrote:
>
> Commit df29ddf4f04b ("arm64: perf: Abstract system register accesses
> away") split off PMU register accessor functions to a standalone header.
> Let's use it for KVM PMU code and get rid one copy of the ugly switch
> macro.
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access
2024-06-26 22:32 ` [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access Rob Herring (Arm)
@ 2024-06-27 10:47 ` Marc Zyngier
0 siblings, 0 replies; 29+ messages in thread
From: Marc Zyngier @ 2024-06-27 10:47 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, linux-arm-kernel,
linux-kernel, linux-perf-users, kvmarm
On Wed, 26 Jun 2024 23:32:33 +0100,
"Rob Herring (Arm)" <robh@kernel.org> wrote:
>
> ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register
> field. Make that clear by adding a standard sysreg definition for the
> register, and using it instead.
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define
2024-06-26 22:32 ` [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define Rob Herring (Arm)
@ 2024-06-27 10:48 ` Marc Zyngier
2024-07-01 17:07 ` Mark Rutland
1 sibling, 0 replies; 29+ messages in thread
From: Marc Zyngier @ 2024-06-27 10:48 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, linux-arm-kernel,
linux-kernel, linux-perf-users, kvmarm
On Wed, 26 Jun 2024 23:32:34 +0100,
"Rob Herring (Arm)" <robh@kernel.org> wrote:
>
> The PMUv3 and KVM code each have a define for the PMU cycle counter
> index. Move KVM's define to a shared location and use it for PMUv3
> driver.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters
2024-06-26 22:32 ` [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters Rob Herring (Arm)
@ 2024-06-27 10:54 ` Marc Zyngier
0 siblings, 0 replies; 29+ messages in thread
From: Marc Zyngier @ 2024-06-27 10:54 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, linux-arm-kernel,
linux-kernel, linux-perf-users, kvmarm
On Wed, 26 Jun 2024 23:32:35 +0100,
"Rob Herring (Arm)" <robh@kernel.org> wrote:
>
> There are 2 defines for the number of PMU counters:
> ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same
> currently, but Armv9.4/8.9 increases the number of possible counters
> from 32 to 33. With this change, the maximum number of counters will
> differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation
> its own define to decouple it from the rest of the kernel's number PMU
> counters.
>
> The VHE PMU code needs to match the PMU driver, so switch it to use
> ARMPMU_MAX_HWEVENTS instead.
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-06-26 22:32 ` [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping Rob Herring (Arm)
@ 2024-06-27 11:05 ` Marc Zyngier
2024-07-01 13:52 ` Will Deacon
2024-07-01 17:06 ` Mark Rutland
1 sibling, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2024-06-27 11:05 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Will Deacon, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, linux-arm-kernel,
linux-kernel, linux-perf-users, kvmarm
On Wed, 26 Jun 2024 23:32:30 +0100,
"Rob Herring (Arm)" <robh@kernel.org> wrote:
>
> Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> later, this changed the cycle counter to 31 and event counters start at
> 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> and introduced an event index to counter conversion. The conversion uses
> masking to convert from event index to a counter number. This operation
> relies on having at most 32 counters so that the cycle counter index 0
> can be transformed to counter number 31.
>
> Armv9.4 adds support for an additional fixed function counter
> (instructions) which increases possible counters to more than 32, and
> the conversion won't work anymore as a simple subtract and mask. The
> primary reason for the translation (other than history) seems to be to
> have a contiguous mask of counters 0-N. Keeping that would result in
> more complicated index to counter conversions. Instead, store a mask of
> available counters rather than just number of events. That provides more
> information in addition to the number of events.
>
> No (intended) functional changes.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
[...]
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index b3b34f6670cf..e5d6d204beab 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -96,7 +96,7 @@ struct arm_pmu {
> void (*stop)(struct arm_pmu *);
> void (*reset)(void *);
> int (*map_event)(struct perf_event *event);
> - int num_events;
> + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
I'm slightly worried by this, as this size is never used, let alone
checked by the individual drivers. I can perfectly picture some new
(non-architectural) PMU driver having more counters than that, and
blindly setting bits outside of the allowed range.
One way to make it a bit safer would be to add a helper replacing the
various bitmap_set() calls, and enforcing that we never overflow this
bitmap.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-06-27 11:05 ` Marc Zyngier
@ 2024-07-01 13:52 ` Will Deacon
2024-07-01 15:32 ` Mark Rutland
2024-07-01 15:49 ` Rob Herring
0 siblings, 2 replies; 29+ messages in thread
From: Will Deacon @ 2024-07-01 13:52 UTC (permalink / raw)
To: Marc Zyngier
Cc: Rob Herring (Arm), Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Thu, Jun 27, 2024 at 12:05:23PM +0100, Marc Zyngier wrote:
> On Wed, 26 Jun 2024 23:32:30 +0100,
> "Rob Herring (Arm)" <robh@kernel.org> wrote:
> >
> > Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> > starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> > later, this changed the cycle counter to 31 and event counters start at
> > 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> > and introduced an event index to counter conversion. The conversion uses
> > masking to convert from event index to a counter number. This operation
> > relies on having at most 32 counters so that the cycle counter index 0
> > can be transformed to counter number 31.
> >
> > Armv9.4 adds support for an additional fixed function counter
> > (instructions) which increases possible counters to more than 32, and
> > the conversion won't work anymore as a simple subtract and mask. The
> > primary reason for the translation (other than history) seems to be to
> > have a contiguous mask of counters 0-N. Keeping that would result in
> > more complicated index to counter conversions. Instead, store a mask of
> > available counters rather than just number of events. That provides more
> > information in addition to the number of events.
> >
> > No (intended) functional changes.
> >
> > Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
>
> [...]
>
> > diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> > index b3b34f6670cf..e5d6d204beab 100644
> > --- a/include/linux/perf/arm_pmu.h
> > +++ b/include/linux/perf/arm_pmu.h
> > @@ -96,7 +96,7 @@ struct arm_pmu {
> > void (*stop)(struct arm_pmu *);
> > void (*reset)(void *);
> > int (*map_event)(struct perf_event *event);
> > - int num_events;
> > + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
>
> I'm slightly worried by this, as this size is never used, let alone
> checked by the individual drivers. I can perfectly picture some new
> (non-architectural) PMU driver having more counters than that, and
> blindly setting bits outside of the allowed range.
I tend to agree.
> One way to make it a bit safer would be to add a helper replacing the
> various bitmap_set() calls, and enforcing that we never overflow this
> bitmap.
Or perhaps wd could leave the 'num_events' field intact and allocate the
new bitmap dynamically?
Rob -- what do you prefer? I think the rest of the series is ready to go.
Will
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-07-01 13:52 ` Will Deacon
@ 2024-07-01 15:32 ` Mark Rutland
2024-07-01 15:49 ` Rob Herring
1 sibling, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 15:32 UTC (permalink / raw)
To: Will Deacon
Cc: Marc Zyngier, Rob Herring (Arm), Russell King, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Mon, Jul 01, 2024 at 02:52:16PM +0100, Will Deacon wrote:
> On Thu, Jun 27, 2024 at 12:05:23PM +0100, Marc Zyngier wrote:
> > On Wed, 26 Jun 2024 23:32:30 +0100,
> > "Rob Herring (Arm)" <robh@kernel.org> wrote:
> > >
> > > Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> > > starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> > > later, this changed the cycle counter to 31 and event counters start at
> > > 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> > > and introduced an event index to counter conversion. The conversion uses
> > > masking to convert from event index to a counter number. This operation
> > > relies on having at most 32 counters so that the cycle counter index 0
> > > can be transformed to counter number 31.
> > >
> > > Armv9.4 adds support for an additional fixed function counter
> > > (instructions) which increases possible counters to more than 32, and
> > > the conversion won't work anymore as a simple subtract and mask. The
> > > primary reason for the translation (other than history) seems to be to
> > > have a contiguous mask of counters 0-N. Keeping that would result in
> > > more complicated index to counter conversions. Instead, store a mask of
> > > available counters rather than just number of events. That provides more
> > > information in addition to the number of events.
> > >
> > > No (intended) functional changes.
> > >
> > > Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> >
> > [...]
> >
> > > diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> > > index b3b34f6670cf..e5d6d204beab 100644
> > > --- a/include/linux/perf/arm_pmu.h
> > > +++ b/include/linux/perf/arm_pmu.h
> > > @@ -96,7 +96,7 @@ struct arm_pmu {
> > > void (*stop)(struct arm_pmu *);
> > > void (*reset)(void *);
> > > int (*map_event)(struct perf_event *event);
> > > - int num_events;
> > > + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
> >
> > I'm slightly worried by this, as this size is never used, let alone
> > checked by the individual drivers. I can perfectly picture some new
> > (non-architectural) PMU driver having more counters than that, and
> > blindly setting bits outside of the allowed range.
>
> I tend to agree.
It's the same size as other bitmaps and arrays in struct arm_pmu, e.g.
the first two fields:
| struct pmu_hw_events {
| /*
| * The events that are active on the PMU for the given index.
| */
| struct perf_event *events[ARMPMU_MAX_HWEVENTS];
|
| /*
| * A 1 bit for an index indicates that the counter is being used for
| * an event. A 0 means that the counter can be used.
| */
| DECLARE_BITMAP(used_mask, ARMPMU_MAX_HWEVENTS);
... so IMO it's fine as-is, since anything not bound by
ARMPMU_MAX_HWEVENTS would already be wrong today.
> > One way to make it a bit safer would be to add a helper replacing the
> > various bitmap_set() calls, and enforcing that we never overflow this
> > bitmap.
>
> Or perhaps wd could leave the 'num_events' field intact and allocate the
> new bitmap dynamically?
I don't think we should allocate the bitmap dynamically, since then we'd
have to do likewise for all the other fields sized by
ARMPMU_MAX_HWEVENTS.
I'm not averse to a check when setting bits in the new cntr_mask (which
I guess would WARN() and not set the bit), but as above I think it's
fine as-is.
Mark.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-07-01 13:52 ` Will Deacon
2024-07-01 15:32 ` Mark Rutland
@ 2024-07-01 15:49 ` Rob Herring
2024-07-02 16:19 ` Will Deacon
1 sibling, 1 reply; 29+ messages in thread
From: Rob Herring @ 2024-07-01 15:49 UTC (permalink / raw)
To: Will Deacon
Cc: Marc Zyngier, Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Mon, Jul 1, 2024 at 7:52 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jun 27, 2024 at 12:05:23PM +0100, Marc Zyngier wrote:
> > On Wed, 26 Jun 2024 23:32:30 +0100,
> > "Rob Herring (Arm)" <robh@kernel.org> wrote:
> > >
> > > Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> > > starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> > > later, this changed the cycle counter to 31 and event counters start at
> > > 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> > > and introduced an event index to counter conversion. The conversion uses
> > > masking to convert from event index to a counter number. This operation
> > > relies on having at most 32 counters so that the cycle counter index 0
> > > can be transformed to counter number 31.
> > >
> > > Armv9.4 adds support for an additional fixed function counter
> > > (instructions) which increases possible counters to more than 32, and
> > > the conversion won't work anymore as a simple subtract and mask. The
> > > primary reason for the translation (other than history) seems to be to
> > > have a contiguous mask of counters 0-N. Keeping that would result in
> > > more complicated index to counter conversions. Instead, store a mask of
> > > available counters rather than just number of events. That provides more
> > > information in addition to the number of events.
> > >
> > > No (intended) functional changes.
> > >
> > > Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> >
> > [...]
> >
> > > diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> > > index b3b34f6670cf..e5d6d204beab 100644
> > > --- a/include/linux/perf/arm_pmu.h
> > > +++ b/include/linux/perf/arm_pmu.h
> > > @@ -96,7 +96,7 @@ struct arm_pmu {
> > > void (*stop)(struct arm_pmu *);
> > > void (*reset)(void *);
> > > int (*map_event)(struct perf_event *event);
> > > - int num_events;
> > > + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
> >
> > I'm slightly worried by this, as this size is never used, let alone
> > checked by the individual drivers. I can perfectly picture some new
> > (non-architectural) PMU driver having more counters than that, and
> > blindly setting bits outside of the allowed range.
>
> I tend to agree.
>
> > One way to make it a bit safer would be to add a helper replacing the
> > various bitmap_set() calls, and enforcing that we never overflow this
> > bitmap.
>
> Or perhaps wd could leave the 'num_events' field intact and allocate the
> new bitmap dynamically?
>
> Rob -- what do you prefer? I think the rest of the series is ready to go.
I think the list of places we're initializing cntr_mask is short
enough to check and additions to arm_pmu users are rare enough I would
not be too worried about it.
If anything, I think the issue is with the bitmap API in that it has
no bounds checking. I'm sure it will get on someone's radar to fix at
some point.
But if we want to have something check, this is what I have:
static inline void armpmu_set_counter_mask(struct arm_pmu *pmu,
unsigned int start, unsigned int nr)
{
if (WARN_ON(start + nr > ARMPMU_MAX_HWEVENTS))
return;
bitmap_set(pmu->cntr_mask, start, nr);
}
Rob
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-06-26 22:32 ` [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping Rob Herring (Arm)
2024-06-27 11:05 ` Marc Zyngier
@ 2024-07-01 17:06 ` Mark Rutland
1 sibling, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 17:06 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Will Deacon, Marc Zyngier,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, Jun 26, 2024 at 04:32:30PM -0600, Rob Herring (Arm) wrote:
> Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> later, this changed the cycle counter to 31 and event counters start at
> 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> and introduced an event index to counter conversion. The conversion uses
> masking to convert from event index to a counter number. This operation
> relies on having at most 32 counters so that the cycle counter index 0
> can be transformed to counter number 31.
>
> Armv9.4 adds support for an additional fixed function counter
> (instructions) which increases possible counters to more than 32, and
> the conversion won't work anymore as a simple subtract and mask. The
> primary reason for the translation (other than history) seems to be to
> have a contiguous mask of counters 0-N. Keeping that would result in
> more complicated index to counter conversions. Instead, store a mask of
> available counters rather than just number of events. That provides more
> information in addition to the number of events.
>
> No (intended) functional changes.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> ---
> v2:
> - Include Apple M1 PMU changes
> - Use set_bit instead of bitmap_set(addr, bit, 1)
> - Use for_each_andnot_bit() when clearing unused counters to avoid
> accessing non-existent counters
> - Use defines for XScale number of counters and
> s/XSCALE_NUM_COUNTERS/XSCALE1_NUM_COUNTERS/
> - Add and use define ARMV8_PMU_MAX_GENERAL_COUNTERS (copied from
> tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c)
> ---
> arch/arm64/kvm/pmu-emul.c | 6 ++--
> drivers/perf/apple_m1_cpu_pmu.c | 4 +--
> drivers/perf/arm_pmu.c | 11 +++---
> drivers/perf/arm_pmuv3.c | 62 +++++++++++----------------------
> drivers/perf/arm_v6_pmu.c | 6 ++--
> drivers/perf/arm_v7_pmu.c | 77 ++++++++++++++++-------------------------
> drivers/perf/arm_xscale_pmu.c | 12 ++++---
> include/linux/perf/arm_pmu.h | 2 +-
> include/linux/perf/arm_pmuv3.h | 1 +
> 9 files changed, 75 insertions(+), 106 deletions(-)
FWIW, for this as-is:
Acked-by: Mark Rutland <mark.rutland@arm.com>
Mark.
>
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index d1a476b08f54..69be070a9378 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -910,10 +910,10 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
> struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
>
> /*
> - * The arm_pmu->num_events considers the cycle counter as well.
> - * Ignore that and return only the general-purpose counters.
> + * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
> + * Ignore those and return only the general-purpose counters.
> */
> - return arm_pmu->num_events - 1;
> + return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS);
> }
>
> static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
> diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
> index f322e5ca1114..c8f607912567 100644
> --- a/drivers/perf/apple_m1_cpu_pmu.c
> +++ b/drivers/perf/apple_m1_cpu_pmu.c
> @@ -400,7 +400,7 @@ static irqreturn_t m1_pmu_handle_irq(struct arm_pmu *cpu_pmu)
>
> regs = get_irq_regs();
>
> - for (idx = 0; idx < cpu_pmu->num_events; idx++) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, M1_PMU_NR_COUNTERS) {
> struct perf_event *event = cpuc->events[idx];
> struct perf_sample_data data;
>
> @@ -560,7 +560,7 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags)
> cpu_pmu->reset = m1_pmu_reset;
> cpu_pmu->set_event_filter = m1_pmu_set_event_filter;
>
> - cpu_pmu->num_events = M1_PMU_NR_COUNTERS;
> + bitmap_set(cpu_pmu->cntr_mask, 0, M1_PMU_NR_COUNTERS);
> cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = &m1_pmu_events_attr_group;
> cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] = &m1_pmu_format_attr_group;
> return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..398cce3d76fc 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -522,7 +522,7 @@ static void armpmu_enable(struct pmu *pmu)
> {
> struct arm_pmu *armpmu = to_arm_pmu(pmu);
> struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> - bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
> + bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);
>
> /* For task-bound events we may be called on other CPUs */
> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> @@ -742,7 +742,7 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
> struct perf_event *event;
> int idx;
>
> - for (idx = 0; idx < armpmu->num_events; idx++) {
> + for_each_set_bit(idx, armpmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
> event = hw_events->events[idx];
> if (!event)
> continue;
> @@ -772,7 +772,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
> {
> struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
> struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> - bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
> + bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);
>
> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> return NOTIFY_DONE;
> @@ -924,8 +924,9 @@ int armpmu_register(struct arm_pmu *pmu)
> if (ret)
> goto out_destroy;
>
> - pr_info("enabled with %s PMU driver, %d counters available%s\n",
> - pmu->name, pmu->num_events,
> + pr_info("enabled with %s PMU driver, %d (%*pb) counters available%s\n",
> + pmu->name, bitmap_weight(pmu->cntr_mask, ARMPMU_MAX_HWEVENTS),
> + ARMPMU_MAX_HWEVENTS, &pmu->cntr_mask,
> has_nmi ? ", using NMIs" : "");
>
> kvm_host_pmu_init(pmu);
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 6cbd37fd691a..53ad674bf009 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -454,9 +454,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
> /*
> * Perf Events' indices
> */
> -#define ARMV8_IDX_CYCLE_COUNTER 0
> -#define ARMV8_IDX_COUNTER0 1
> -#define ARMV8_IDX_CYCLE_COUNTER_USER 32
> +#define ARMV8_IDX_CYCLE_COUNTER 31
>
> /*
> * We unconditionally enable ARMv8.5-PMU long event counter support
> @@ -489,19 +487,12 @@ static bool armv8pmu_event_is_chained(struct perf_event *event)
> return !armv8pmu_event_has_user_read(event) &&
> armv8pmu_event_is_64bit(event) &&
> !armv8pmu_has_long_event(cpu_pmu) &&
> - (idx != ARMV8_IDX_CYCLE_COUNTER);
> + (idx <= ARMV8_PMU_MAX_GENERAL_COUNTERS);
> }
>
> /*
> * ARMv8 low level PMU access
> */
> -
> -/*
> - * Perf Event to low level counters mapping
> - */
> -#define ARMV8_IDX_TO_COUNTER(x) \
> - (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK)
> -
> static u64 armv8pmu_pmcr_read(void)
> {
> return read_pmcr();
> @@ -521,14 +512,12 @@ static int armv8pmu_has_overflowed(u32 pmovsr)
>
> static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
> {
> - return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
> + return pmnc & BIT(idx);
> }
>
> static u64 armv8pmu_read_evcntr(int idx)
> {
> - u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> -
> - return read_pmevcntrn(counter);
> + return read_pmevcntrn(idx);
> }
>
> static u64 armv8pmu_read_hw_counter(struct perf_event *event)
> @@ -557,7 +546,7 @@ static bool armv8pmu_event_needs_bias(struct perf_event *event)
> return false;
>
> if (armv8pmu_has_long_event(cpu_pmu) ||
> - idx == ARMV8_IDX_CYCLE_COUNTER)
> + idx >= ARMV8_PMU_MAX_GENERAL_COUNTERS)
> return true;
>
> return false;
> @@ -595,9 +584,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
>
> static void armv8pmu_write_evcntr(int idx, u64 value)
> {
> - u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> -
> - write_pmevcntrn(counter, value);
> + write_pmevcntrn(idx, value);
> }
>
> static void armv8pmu_write_hw_counter(struct perf_event *event,
> @@ -628,7 +615,6 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>
> static void armv8pmu_write_evtype(int idx, unsigned long val)
> {
> - u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
> ARMV8_PMU_INCLUDE_EL2 |
> ARMV8_PMU_EXCLUDE_EL0 |
> @@ -638,7 +624,7 @@ static void armv8pmu_write_evtype(int idx, unsigned long val)
> mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH;
>
> val &= mask;
> - write_pmevtypern(counter, val);
> + write_pmevtypern(idx, val);
> }
>
> static void armv8pmu_write_event_type(struct perf_event *event)
> @@ -667,7 +653,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)
>
> static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
> {
> - int counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
> + int counter = event->hw.idx;
> u32 mask = BIT(counter);
>
> if (armv8pmu_event_is_chained(event))
> @@ -726,8 +712,7 @@ static void armv8pmu_enable_intens(u32 mask)
>
> static void armv8pmu_enable_event_irq(struct perf_event *event)
> {
> - u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
> - armv8pmu_enable_intens(BIT(counter));
> + armv8pmu_enable_intens(BIT(event->hw.idx));
> }
>
> static void armv8pmu_disable_intens(u32 mask)
> @@ -741,8 +726,7 @@ static void armv8pmu_disable_intens(u32 mask)
>
> static void armv8pmu_disable_event_irq(struct perf_event *event)
> {
> - u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
> - armv8pmu_disable_intens(BIT(counter));
> + armv8pmu_disable_intens(BIT(event->hw.idx));
> }
>
> static u32 armv8pmu_getreset_flags(void)
> @@ -786,7 +770,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
>
> /* Clear any unused counters to avoid leaking their contents */
> - for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
> + for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
> + ARMPMU_MAX_HWEVENTS) {
> if (i == ARMV8_IDX_CYCLE_COUNTER)
> write_pmccntr(0);
> else
> @@ -869,7 +854,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> * to prevent skews in group events.
> */
> armv8pmu_stop(cpu_pmu);
> - for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
> struct perf_event *event = cpuc->events[idx];
> struct hw_perf_event *hwc;
>
> @@ -908,7 +893,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> {
> int idx;
>
> - for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) {
> if (!test_and_set_bit(idx, cpuc->used_mask))
> return idx;
> }
> @@ -924,7 +909,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> * Chaining requires two consecutive event counters, where
> * the lower idx must be even.
> */
> - for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) {
> + if (!(idx & 0x1))
> + continue;
> if (!test_and_set_bit(idx, cpuc->used_mask)) {
> /* Check if the preceding even counter is available */
> if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> @@ -978,15 +965,7 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
> if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
> return 0;
>
> - /*
> - * We remap the cycle counter index to 32 to
> - * match the offset applied to the rest of
> - * the counter indices.
> - */
> - if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
> - return ARMV8_IDX_CYCLE_COUNTER_USER;
> -
> - return event->hw.idx;
> + return event->hw.idx + 1;
> }
>
> /*
> @@ -1211,10 +1190,11 @@ static void __armv8pmu_probe_pmu(void *info)
> probe->present = true;
>
> /* Read the nb of CNTx counters supported from PMNC */
> - cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
> + bitmap_set(cpu_pmu->cntr_mask,
> + 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
>
> /* Add the CPU cycles counter */
> - cpu_pmu->num_events += 1;
> + set_bit(ARMV8_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
>
> pmceid[0] = pmceid_raw[0] = read_pmceid0();
> pmceid[1] = pmceid_raw[1] = read_pmceid1();
> diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
> index 0bb685b4bac5..b09615bb2bb2 100644
> --- a/drivers/perf/arm_v6_pmu.c
> +++ b/drivers/perf/arm_v6_pmu.c
> @@ -64,6 +64,7 @@ enum armv6_counters {
> ARMV6_CYCLE_COUNTER = 0,
> ARMV6_COUNTER0,
> ARMV6_COUNTER1,
> + ARMV6_NUM_COUNTERS
> };
>
> /*
> @@ -254,7 +255,7 @@ armv6pmu_handle_irq(struct arm_pmu *cpu_pmu)
> */
> armv6_pmcr_write(pmcr);
>
> - for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV6_NUM_COUNTERS) {
> struct perf_event *event = cpuc->events[idx];
> struct hw_perf_event *hwc;
>
> @@ -391,7 +392,8 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
> cpu_pmu->start = armv6pmu_start;
> cpu_pmu->stop = armv6pmu_stop;
> cpu_pmu->map_event = armv6_map_event;
> - cpu_pmu->num_events = 3;
> +
> + bitmap_set(cpu_pmu->cntr_mask, 0, ARMV6_NUM_COUNTERS);
> }
>
> static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
> diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
> index 928ac3d626ed..420cadd108e7 100644
> --- a/drivers/perf/arm_v7_pmu.c
> +++ b/drivers/perf/arm_v7_pmu.c
> @@ -649,24 +649,12 @@ static struct attribute_group armv7_pmuv2_events_attr_group = {
> /*
> * Perf Events' indices
> */
> -#define ARMV7_IDX_CYCLE_COUNTER 0
> -#define ARMV7_IDX_COUNTER0 1
> -#define ARMV7_IDX_COUNTER_LAST(cpu_pmu) \
> - (ARMV7_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
> -
> -#define ARMV7_MAX_COUNTERS 32
> -#define ARMV7_COUNTER_MASK (ARMV7_MAX_COUNTERS - 1)
> -
> +#define ARMV7_IDX_CYCLE_COUNTER 31
> +#define ARMV7_IDX_COUNTER_MAX 31
> /*
> * ARMv7 low level PMNC access
> */
>
> -/*
> - * Perf Event to low level counters mapping
> - */
> -#define ARMV7_IDX_TO_COUNTER(x) \
> - (((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
> -
> /*
> * Per-CPU PMNC: config reg
> */
> @@ -725,19 +713,17 @@ static inline int armv7_pmnc_has_overflowed(u32 pmnc)
>
> static inline int armv7_pmnc_counter_valid(struct arm_pmu *cpu_pmu, int idx)
> {
> - return idx >= ARMV7_IDX_CYCLE_COUNTER &&
> - idx <= ARMV7_IDX_COUNTER_LAST(cpu_pmu);
> + return test_bit(idx, cpu_pmu->cntr_mask);
> }
>
> static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx)
> {
> - return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx));
> + return pmnc & BIT(idx);
> }
>
> static inline void armv7_pmnc_select_counter(int idx)
> {
> - u32 counter = ARMV7_IDX_TO_COUNTER(idx);
> - asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (idx));
> isb();
> }
>
> @@ -787,29 +773,25 @@ static inline void armv7_pmnc_write_evtsel(int idx, u32 val)
>
> static inline void armv7_pmnc_enable_counter(int idx)
> {
> - u32 counter = ARMV7_IDX_TO_COUNTER(idx);
> - asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(counter)));
> + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(idx)));
> }
>
> static inline void armv7_pmnc_disable_counter(int idx)
> {
> - u32 counter = ARMV7_IDX_TO_COUNTER(idx);
> - asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(counter)));
> + asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(idx)));
> }
>
> static inline void armv7_pmnc_enable_intens(int idx)
> {
> - u32 counter = ARMV7_IDX_TO_COUNTER(idx);
> - asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(counter)));
> + asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(idx)));
> }
>
> static inline void armv7_pmnc_disable_intens(int idx)
> {
> - u32 counter = ARMV7_IDX_TO_COUNTER(idx);
> - asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(counter)));
> + asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(idx)));
> isb();
> /* Clear the overflow flag in case an interrupt is pending. */
> - asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(counter)));
> + asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(idx)));
> isb();
> }
>
> @@ -853,15 +835,12 @@ static void armv7_pmnc_dump_regs(struct arm_pmu *cpu_pmu)
> asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (val));
> pr_info("CCNT =0x%08x\n", val);
>
> - for (cnt = ARMV7_IDX_COUNTER0;
> - cnt <= ARMV7_IDX_COUNTER_LAST(cpu_pmu); cnt++) {
> + for_each_set_bit(cnt, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
> armv7_pmnc_select_counter(cnt);
> asm volatile("mrc p15, 0, %0, c9, c13, 2" : "=r" (val));
> - pr_info("CNT[%d] count =0x%08x\n",
> - ARMV7_IDX_TO_COUNTER(cnt), val);
> + pr_info("CNT[%d] count =0x%08x\n", cnt, val);
> asm volatile("mrc p15, 0, %0, c9, c13, 1" : "=r" (val));
> - pr_info("CNT[%d] evtsel=0x%08x\n",
> - ARMV7_IDX_TO_COUNTER(cnt), val);
> + pr_info("CNT[%d] evtsel=0x%08x\n", cnt, val);
> }
> }
> #endif
> @@ -958,7 +937,7 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
> */
> regs = get_irq_regs();
>
> - for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
> struct perf_event *event = cpuc->events[idx];
> struct hw_perf_event *hwc;
>
> @@ -1027,7 +1006,7 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
> * For anything other than a cycle counter, try and use
> * the events counters
> */
> - for (idx = ARMV7_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
> if (!test_and_set_bit(idx, cpuc->used_mask))
> return idx;
> }
> @@ -1073,7 +1052,7 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event,
> static void armv7pmu_reset(void *info)
> {
> struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
> - u32 idx, nb_cnt = cpu_pmu->num_events, val;
> + u32 idx, val;
>
> if (cpu_pmu->secure_access) {
> asm volatile("mrc p15, 0, %0, c1, c1, 1" : "=r" (val));
> @@ -1082,7 +1061,7 @@ static void armv7pmu_reset(void *info)
> }
>
> /* The counter and interrupt enable registers are unknown at reset. */
> - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
> armv7_pmnc_disable_counter(idx);
> armv7_pmnc_disable_intens(idx);
> }
> @@ -1161,20 +1140,22 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
>
> static void armv7_read_num_pmnc_events(void *info)
> {
> - int *nb_cnt = info;
> + int nb_cnt;
> + struct arm_pmu *cpu_pmu = info;
>
> /* Read the nb of CNTx counters supported from PMNC */
> - *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> + nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> + bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt);
>
> /* Add the CPU cycles counter */
> - *nb_cnt += 1;
> + set_bit(ARMV7_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
> }
>
> static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
> {
> return smp_call_function_any(&arm_pmu->supported_cpus,
> armv7_read_num_pmnc_events,
> - &arm_pmu->num_events, 1);
> + arm_pmu, 1);
> }
>
> static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
> @@ -1524,7 +1505,7 @@ static void krait_pmu_reset(void *info)
> {
> u32 vval, fval;
> struct arm_pmu *cpu_pmu = info;
> - u32 idx, nb_cnt = cpu_pmu->num_events;
> + u32 idx;
>
> armv7pmu_reset(info);
>
> @@ -1538,7 +1519,7 @@ static void krait_pmu_reset(void *info)
> venum_post_pmresr(vval, fval);
>
> /* Reset PMxEVNCTCR to sane default */
> - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
> armv7_pmnc_select_counter(idx);
> asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
> }
> @@ -1562,7 +1543,7 @@ static int krait_event_to_bit(struct perf_event *event, unsigned int region,
> * Lower bits are reserved for use by the counters (see
> * armv7pmu_get_event_idx() for more info)
> */
> - bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
> + bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);
>
> return bit;
> }
> @@ -1845,7 +1826,7 @@ static void scorpion_pmu_reset(void *info)
> {
> u32 vval, fval;
> struct arm_pmu *cpu_pmu = info;
> - u32 idx, nb_cnt = cpu_pmu->num_events;
> + u32 idx;
>
> armv7pmu_reset(info);
>
> @@ -1860,7 +1841,7 @@ static void scorpion_pmu_reset(void *info)
> venum_post_pmresr(vval, fval);
>
> /* Reset PMxEVNCTCR to sane default */
> - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
> armv7_pmnc_select_counter(idx);
> asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
> }
> @@ -1883,7 +1864,7 @@ static int scorpion_event_to_bit(struct perf_event *event, unsigned int region,
> * Lower bits are reserved for use by the counters (see
> * armv7pmu_get_event_idx() for more info)
> */
> - bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
> + bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);
>
> return bit;
> }
> diff --git a/drivers/perf/arm_xscale_pmu.c b/drivers/perf/arm_xscale_pmu.c
> index 3d8b72d6b37f..638fea9b1263 100644
> --- a/drivers/perf/arm_xscale_pmu.c
> +++ b/drivers/perf/arm_xscale_pmu.c
> @@ -53,6 +53,8 @@ enum xscale_counters {
> XSCALE_COUNTER2,
> XSCALE_COUNTER3,
> };
> +#define XSCALE1_NUM_COUNTERS 3
> +#define XSCALE2_NUM_COUNTERS 5
>
> static const unsigned xscale_perf_map[PERF_COUNT_HW_MAX] = {
> PERF_MAP_ALL_UNSUPPORTED,
> @@ -168,7 +170,7 @@ xscale1pmu_handle_irq(struct arm_pmu *cpu_pmu)
>
> regs = get_irq_regs();
>
> - for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE1_NUM_COUNTERS) {
> struct perf_event *event = cpuc->events[idx];
> struct hw_perf_event *hwc;
>
> @@ -364,7 +366,8 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
> cpu_pmu->start = xscale1pmu_start;
> cpu_pmu->stop = xscale1pmu_stop;
> cpu_pmu->map_event = xscale_map_event;
> - cpu_pmu->num_events = 3;
> +
> + bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE1_NUM_COUNTERS);
>
> return 0;
> }
> @@ -500,7 +503,7 @@ xscale2pmu_handle_irq(struct arm_pmu *cpu_pmu)
>
> regs = get_irq_regs();
>
> - for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE2_NUM_COUNTERS) {
> struct perf_event *event = cpuc->events[idx];
> struct hw_perf_event *hwc;
>
> @@ -719,7 +722,8 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
> cpu_pmu->start = xscale2pmu_start;
> cpu_pmu->stop = xscale2pmu_stop;
> cpu_pmu->map_event = xscale_map_event;
> - cpu_pmu->num_events = 5;
> +
> + bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE2_NUM_COUNTERS);
>
> return 0;
> }
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index b3b34f6670cf..e5d6d204beab 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -96,7 +96,7 @@ struct arm_pmu {
> void (*stop)(struct arm_pmu *);
> void (*reset)(void *);
> int (*map_event)(struct perf_event *event);
> - int num_events;
> + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
> bool secure_access; /* 32-bit ARM only */
> #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
> DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 7867db04ec98..eccbdd8eb98f 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -6,6 +6,7 @@
> #ifndef __PERF_ARM_PMUV3_H
> #define __PERF_ARM_PMUV3_H
>
> +#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
> #define ARMV8_PMU_MAX_COUNTERS 32
> #define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)
>
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define
2024-06-26 22:32 ` [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define Rob Herring (Arm)
2024-06-27 10:48 ` Marc Zyngier
@ 2024-07-01 17:07 ` Mark Rutland
1 sibling, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 17:07 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Will Deacon, Marc Zyngier,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, Jun 26, 2024 at 04:32:34PM -0600, Rob Herring (Arm) wrote:
> The PMUv3 and KVM code each have a define for the PMU cycle counter
> index. Move KVM's define to a shared location and use it for PMUv3
> driver.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> ---
> v2:
> - Move ARMV8_PMU_CYCLE_IDX to linux/perf/arm_pmuv3.h
> ---
> arch/arm64/kvm/sys_regs.c | 1 +
> drivers/perf/arm_pmuv3.c | 19 +++++++------------
> include/kvm/arm_pmu.h | 1 -
> include/linux/perf/arm_pmuv3.h | 3 +++
> 4 files changed, 11 insertions(+), 13 deletions(-)
Acked-by: Mark Rutland <mark.rutland@arm.com>
Mark.
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f8b5db48ea8a..22393ae7ce14 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -18,6 +18,7 @@
> #include <linux/printk.h>
> #include <linux/uaccess.h>
>
> +#include <asm/arm_pmuv3.h>
> #include <asm/cacheflush.h>
> #include <asm/cputype.h>
> #include <asm/debug-monitors.h>
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index f771242168f1..f58dff49ea7d 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -451,11 +451,6 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
> .attrs = armv8_pmuv3_caps_attrs,
> };
>
> -/*
> - * Perf Events' indices
> - */
> -#define ARMV8_IDX_CYCLE_COUNTER 31
> -
> /*
> * We unconditionally enable ARMv8.5-PMU long event counter support
> * (64-bit events) where supported. Indicate if this arm_pmu has long
> @@ -574,7 +569,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
> int idx = hwc->idx;
> u64 value;
>
> - if (idx == ARMV8_IDX_CYCLE_COUNTER)
> + if (idx == ARMV8_PMU_CYCLE_IDX)
> value = read_pmccntr();
> else
> value = armv8pmu_read_hw_counter(event);
> @@ -607,7 +602,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>
> value = armv8pmu_bias_long_counter(event, value);
>
> - if (idx == ARMV8_IDX_CYCLE_COUNTER)
> + if (idx == ARMV8_PMU_CYCLE_IDX)
> write_pmccntr(value);
> else
> armv8pmu_write_hw_counter(event, value);
> @@ -644,7 +639,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)
> armv8pmu_write_evtype(idx - 1, hwc->config_base);
> armv8pmu_write_evtype(idx, chain_evt);
> } else {
> - if (idx == ARMV8_IDX_CYCLE_COUNTER)
> + if (idx == ARMV8_PMU_CYCLE_IDX)
> write_pmccfiltr(hwc->config_base);
> else
> armv8pmu_write_evtype(idx, hwc->config_base);
> @@ -772,7 +767,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> /* Clear any unused counters to avoid leaking their contents */
> for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
> ARMPMU_MAX_HWEVENTS) {
> - if (i == ARMV8_IDX_CYCLE_COUNTER)
> + if (i == ARMV8_PMU_CYCLE_IDX)
> write_pmccntr(0);
> else
> armv8pmu_write_evcntr(i, 0);
> @@ -933,8 +928,8 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
> /* Always prefer to place a cycle counter into the cycle counter. */
> if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
> !armv8pmu_event_get_threshold(&event->attr)) {
> - if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
> - return ARMV8_IDX_CYCLE_COUNTER;
> + if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
> + return ARMV8_PMU_CYCLE_IDX;
> else if (armv8pmu_event_is_64bit(event) &&
> armv8pmu_event_want_user_access(event) &&
> !armv8pmu_has_long_event(cpu_pmu))
> @@ -1196,7 +1191,7 @@ static void __armv8pmu_probe_pmu(void *info)
> 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));
>
> /* Add the CPU cycles counter */
> - set_bit(ARMV8_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask);
> + set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
>
> pmceid[0] = pmceid_raw[0] = read_pmceid0();
> pmceid[1] = pmceid_raw[1] = read_pmceid1();
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 334d7c5503cf..871067fb2616 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -10,7 +10,6 @@
> #include <linux/perf_event.h>
> #include <linux/perf/arm_pmuv3.h>
>
> -#define ARMV8_PMU_CYCLE_IDX (ARMV8_PMU_MAX_COUNTERS - 1)
>
> #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
> struct kvm_pmc {
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 792b8e10b72a..f4ec76f725a3 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -9,6 +9,9 @@
> #define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
> #define ARMV8_PMU_MAX_COUNTERS 32
>
> +#define ARMV8_PMU_CYCLE_IDX 31
> +
> +
> /*
> * Common architectural and microarchitectural event numbers.
> */
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
2024-06-26 22:32 ` [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold Rob Herring (Arm)
@ 2024-07-01 17:09 ` Mark Rutland
0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 17:09 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Will Deacon, Marc Zyngier,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, Jun 26, 2024 at 04:32:25PM -0600, Rob Herring (Arm) wrote:
> If the user has requested a counting threshold for the CPU cycles event,
> then the fixed cycle counter can't be assigned as it lacks threshold
> support. Currently, the thresholds will work or not randomly depending
> on which counter the event is assigned.
>
> While using thresholds for CPU cycles doesn't make much sense, it can be
> useful for testing purposes.
>
> Fixes: 816c26754447 ("arm64: perf: Add support for event counting threshold")
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Mark.
> ---
> This should go to 6.10 and stable. It is also a dependency for ICNTR
> support.
>
> v2:
> - Add and use armv8pmu_event_get_threshold() helper.
>
> v1: https://lore.kernel.org/all/20240611155012.2286044-1-robh@kernel.org/
> ---
> drivers/perf/arm_pmuv3.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 23fa6c5da82c..8ed5c3358920 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -338,6 +338,11 @@ static bool armv8pmu_event_want_user_access(struct perf_event *event)
> return ATTR_CFG_GET_FLD(&event->attr, rdpmc);
> }
>
> +static u32 armv8pmu_event_get_threshold(struct perf_event_attr *attr)
> +{
> + return ATTR_CFG_GET_FLD(attr, threshold);
> +}
> +
> static u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
> {
> u8 th_compare = ATTR_CFG_GET_FLD(attr, threshold_compare);
> @@ -941,7 +946,8 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
> unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>
> /* Always prefer to place a cycle counter into the cycle counter. */
> - if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
> + if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
> + !armv8pmu_event_get_threshold(&event->attr)) {
> if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
> return ARMV8_IDX_CYCLE_COUNTER;
> else if (armv8pmu_event_is_64bit(event) &&
> @@ -1033,7 +1039,7 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> * If FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will
> * be 0 and will also trigger this check, preventing it from being used.
> */
> - th = ATTR_CFG_GET_FLD(attr, threshold);
> + th = armv8pmu_event_get_threshold(attr);
> if (th > threshold_max(cpu_pmu)) {
> pr_debug("PMU event threshold exceeds max value\n");
> return -EINVAL;
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
2024-06-26 22:32 ` [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check Rob Herring (Arm)
@ 2024-07-01 17:11 ` Mark Rutland
0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 17:11 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Will Deacon, Marc Zyngier,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, Jun 26, 2024 at 04:32:26PM -0600, Rob Herring (Arm) wrote:
> The IS_ENABLED(CONFIG_ARM64) check for threshold support is unnecessary.
> The purpose is to not enable thresholds on arm32, but if threshold is
> non-zero, the check against threshold_max() just above here will have
> errored out because threshold_max() is always 0 on arm32.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark rutland <mark.rutland@arm.com>
Mark.
> ---
> v2:
> - new patch
> ---
> drivers/perf/arm_pmuv3.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 8ed5c3358920..3e51cd7062b9 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -1045,7 +1045,7 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> return -EINVAL;
> }
>
> - if (IS_ENABLED(CONFIG_ARM64) && th) {
> + if (th) {
> config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
> config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
> armv8pmu_event_threshold_control(attr));
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
2024-06-26 22:32 ` [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter Rob Herring (Arm)
@ 2024-07-01 17:20 ` Mark Rutland
0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2024-07-01 17:20 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Will Deacon, Marc Zyngier,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, Jun 26, 2024 at 04:32:36PM -0600, Rob Herring (Arm) wrote:
> Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
> similar to the fixed cycle counter. Support for the feature is indicated
> in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
> accessible in AArch32.
>
> Existing userspace using direct counter access won't know how to handle
> the fixed instruction counter, so we have to avoid using the counter
> when user access is requested.
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Mark.
> ---
> v2:
> - Use set_bit() instead of bitmap_set()
> - Check for ARMV8_PMUV3_PERFCTR_INST_RETIRED first in counter assignment
> - Check for threshold disabled in counter assignment
> ---
> arch/arm/include/asm/arm_pmuv3.h | 20 ++++++++++++++++++++
> arch/arm64/include/asm/arm_pmuv3.h | 28 ++++++++++++++++++++++++++++
> arch/arm64/kvm/pmu.c | 8 ++++++--
> arch/arm64/tools/sysreg | 25 +++++++++++++++++++++++++
> drivers/perf/arm_pmuv3.c | 25 +++++++++++++++++++++++++
> include/linux/perf/arm_pmu.h | 8 ++++++--
> include/linux/perf/arm_pmuv3.h | 6 ++++--
> 7 files changed, 114 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index a41b503b7dcd..f63ba8986b24 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -127,6 +127,12 @@ static inline u32 read_pmuver(void)
> return (dfr0 >> 24) & 0xf;
> }
>
> +static inline bool pmuv3_has_icntr(void)
> +{
> + /* FEAT_PMUv3_ICNTR not accessible for 32-bit */
> + return false;
> +}
> +
> static inline void write_pmcr(u32 val)
> {
> write_sysreg(val, PMCR);
> @@ -152,6 +158,13 @@ static inline u64 read_pmccntr(void)
> return read_sysreg(PMCCNTR);
> }
>
> +static inline void write_pmicntr(u64 val) {}
> +
> +static inline u64 read_pmicntr(void)
> +{
> + return 0;
> +}
> +
> static inline void write_pmcntenset(u32 val)
> {
> write_sysreg(val, PMCNTENSET);
> @@ -177,6 +190,13 @@ static inline void write_pmccfiltr(u32 val)
> write_sysreg(val, PMCCFILTR);
> }
>
> +static inline void write_pmicfiltr(u64 val) {}
> +
> +static inline u64 read_pmicfiltr(void)
> +{
> + return 0;
> +}
> +
> static inline void write_pmovsclr(u32 val)
> {
> write_sysreg(val, PMOVSR);
> diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
> index 36c3e82b4eec..468a049bc63b 100644
> --- a/arch/arm64/include/asm/arm_pmuv3.h
> +++ b/arch/arm64/include/asm/arm_pmuv3.h
> @@ -54,6 +54,14 @@ static inline u32 read_pmuver(void)
> ID_AA64DFR0_EL1_PMUVer_SHIFT);
> }
>
> +static inline bool pmuv3_has_icntr(void)
> +{
> + u64 dfr1 = read_sysreg(id_aa64dfr1_el1);
> +
> + return !!cpuid_feature_extract_unsigned_field(dfr1,
> + ID_AA64DFR1_EL1_PMICNTR_SHIFT);
> +}
> +
> static inline void write_pmcr(u64 val)
> {
> write_sysreg(val, pmcr_el0);
> @@ -79,6 +87,16 @@ static inline u64 read_pmccntr(void)
> return read_sysreg(pmccntr_el0);
> }
>
> +static inline void write_pmicntr(u64 val)
> +{
> + write_sysreg_s(val, SYS_PMICNTR_EL0);
> +}
> +
> +static inline u64 read_pmicntr(void)
> +{
> + return read_sysreg_s(SYS_PMICNTR_EL0);
> +}
> +
> static inline void write_pmcntenset(u64 val)
> {
> write_sysreg(val, pmcntenset_el0);
> @@ -109,6 +127,16 @@ static inline u64 read_pmccfiltr(void)
> return read_sysreg(pmccfiltr_el0);
> }
>
> +static inline void write_pmicfiltr(u64 val)
> +{
> + write_sysreg_s(val, SYS_PMICFILTR_EL0);
> +}
> +
> +static inline u64 read_pmicfiltr(void)
> +{
> + return read_sysreg_s(SYS_PMICFILTR_EL0);
> +}
> +
> static inline void write_pmovsclr(u64 val)
> {
> write_sysreg(val, pmovsclr_el0);
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 215b74875815..0b3adf3e17b4 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -66,24 +66,28 @@ void kvm_clr_pmu_events(u64 clr)
>
> /*
> * Read a value direct from PMEVTYPER<idx> where idx is 0-30
> - * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
> + * or PMxCFILTR_EL0 where idx is 31-32.
> */
> static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
> {
> if (idx == ARMV8_PMU_CYCLE_IDX)
> return read_pmccfiltr();
> + else if (idx == ARMV8_PMU_INSTR_IDX)
> + return read_pmicfiltr();
>
> return read_pmevtypern(idx);
> }
>
> /*
> * Write a value direct to PMEVTYPER<idx> where idx is 0-30
> - * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
> + * or PMxCFILTR_EL0 where idx is 31-32.
> */
> static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
> {
> if (idx == ARMV8_PMU_CYCLE_IDX)
> write_pmccfiltr(val);
> + else if (idx == ARMV8_PMU_INSTR_IDX)
> + write_pmicfiltr(val);
> else
> write_pmevtypern(idx, val);
> }
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 231817a379b5..8ab6e09871de 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0
> Field 63:0 ADDR
> EndSysreg
>
> +Sysreg PMICNTR_EL0 3 3 9 4 0
> +Field 63:0 ICNT
> +EndSysreg
> +
> +Sysreg PMICFILTR_EL0 3 3 9 6 0
> +Res0 63:59
> +Field 58 SYNC
> +Field 57:56 VS
> +Res0 55:32
> +Field 31 P
> +Field 30 U
> +Field 29 NSK
> +Field 28 NSU
> +Field 27 NSH
> +Field 26 M
> +Res0 25
> +Field 24 SH
> +Field 23 T
> +Field 22 RLK
> +Field 21 RLU
> +Field 20 RLH
> +Res0 19:16
> +Field 15:0 evtCount
> +EndSysreg
> +
> Sysreg PMSCR_EL1 3 0 9 9 0
> Res0 63:8
> Field 7:6 PCT
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index f58dff49ea7d..3b3a3334cc3f 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -571,6 +571,8 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
>
> if (idx == ARMV8_PMU_CYCLE_IDX)
> value = read_pmccntr();
> + else if (idx == ARMV8_PMU_INSTR_IDX)
> + value = read_pmicntr();
> else
> value = armv8pmu_read_hw_counter(event);
>
> @@ -604,6 +606,8 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>
> if (idx == ARMV8_PMU_CYCLE_IDX)
> write_pmccntr(value);
> + else if (idx == ARMV8_PMU_INSTR_IDX)
> + write_pmicntr(value);
> else
> armv8pmu_write_hw_counter(event, value);
> }
> @@ -641,6 +645,8 @@ static void armv8pmu_write_event_type(struct perf_event *event)
> } else {
> if (idx == ARMV8_PMU_CYCLE_IDX)
> write_pmccfiltr(hwc->config_base);
> + else if (idx == ARMV8_PMU_INSTR_IDX)
> + write_pmicfiltr(hwc->config_base);
> else
> armv8pmu_write_evtype(idx, hwc->config_base);
> }
> @@ -769,6 +775,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> ARMPMU_MAX_HWEVENTS) {
> if (i == ARMV8_PMU_CYCLE_IDX)
> write_pmccntr(0);
> + else if (i == ARMV8_PMU_INSTR_IDX)
> + write_pmicntr(0);
> else
> armv8pmu_write_evcntr(i, 0);
> }
> @@ -936,6 +944,19 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
> return -EAGAIN;
> }
>
> + /*
> + * Always prefer to place a instruction counter into the instruction counter,
> + * but don't expose the instruction counter to userspace access as userspace
> + * may not know how to handle it.
> + */
> + if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) &&
> + !armv8pmu_event_get_threshold(&event->attr) &&
> + test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) &&
> + !armv8pmu_event_want_user_access(event)) {
> + if (!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask))
> + return ARMV8_PMU_INSTR_IDX;
> + }
> +
> /*
> * Otherwise use events counters
> */
> @@ -1193,6 +1214,10 @@ static void __armv8pmu_probe_pmu(void *info)
> /* Add the CPU cycles counter */
> set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask);
>
> + /* Add the CPU instructions counter */
> + if (pmuv3_has_icntr())
> + set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
> +
> pmceid[0] = pmceid_raw[0] = read_pmceid0();
> pmceid[1] = pmceid_raw[1] = read_pmceid1();
>
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index e5d6d204beab..4b5b83677e3f 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -17,10 +17,14 @@
> #ifdef CONFIG_ARM_PMU
>
> /*
> - * The ARMv7 CPU PMU supports up to 32 event counters.
> + * The Armv7 and Armv8.8 or less CPU PMU supports up to 32 event counters.
> + * The Armv8.9/9.4 CPU PMU supports up to 33 event counters.
> */
> +#ifdef CONFIG_ARM
> #define ARMPMU_MAX_HWEVENTS 32
> -
> +#else
> +#define ARMPMU_MAX_HWEVENTS 33
> +#endif
> /*
> * ARM PMU hw_event flags
> */
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 4f7a7f2222e5..3372c1b56486 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -8,7 +8,7 @@
>
> #define ARMV8_PMU_MAX_GENERAL_COUNTERS 31
> #define ARMV8_PMU_CYCLE_IDX 31
> -
> +#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */
>
> /*
> * Common architectural and microarchitectural event numbers.
> @@ -228,8 +228,10 @@
> */
> #define ARMV8_PMU_OVSR_P GENMASK(30, 0)
> #define ARMV8_PMU_OVSR_C BIT(31)
> +#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */
> /* Mask for writable bits is both P and C fields */
> -#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C)
> +#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \
> + ARMV8_PMU_OVSR_F)
>
> /*
> * PMXEVTYPER: Event selection reg
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping
2024-07-01 15:49 ` Rob Herring
@ 2024-07-02 16:19 ` Will Deacon
0 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2024-07-02 16:19 UTC (permalink / raw)
To: Rob Herring
Cc: Marc Zyngier, Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, James Clark, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Mon, Jul 01, 2024 at 09:49:29AM -0600, Rob Herring wrote:
> On Mon, Jul 1, 2024 at 7:52 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Jun 27, 2024 at 12:05:23PM +0100, Marc Zyngier wrote:
> > > On Wed, 26 Jun 2024 23:32:30 +0100,
> > > "Rob Herring (Arm)" <robh@kernel.org> wrote:
> > > >
> > > > Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> > > > starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> > > > later, this changed the cycle counter to 31 and event counters start at
> > > > 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> > > > and introduced an event index to counter conversion. The conversion uses
> > > > masking to convert from event index to a counter number. This operation
> > > > relies on having at most 32 counters so that the cycle counter index 0
> > > > can be transformed to counter number 31.
> > > >
> > > > Armv9.4 adds support for an additional fixed function counter
> > > > (instructions) which increases possible counters to more than 32, and
> > > > the conversion won't work anymore as a simple subtract and mask. The
> > > > primary reason for the translation (other than history) seems to be to
> > > > have a contiguous mask of counters 0-N. Keeping that would result in
> > > > more complicated index to counter conversions. Instead, store a mask of
> > > > available counters rather than just number of events. That provides more
> > > > information in addition to the number of events.
> > > >
> > > > No (intended) functional changes.
> > > >
> > > > Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> > >
> > > [...]
> > >
> > > > diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> > > > index b3b34f6670cf..e5d6d204beab 100644
> > > > --- a/include/linux/perf/arm_pmu.h
> > > > +++ b/include/linux/perf/arm_pmu.h
> > > > @@ -96,7 +96,7 @@ struct arm_pmu {
> > > > void (*stop)(struct arm_pmu *);
> > > > void (*reset)(void *);
> > > > int (*map_event)(struct perf_event *event);
> > > > - int num_events;
> > > > + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
> > >
> > > I'm slightly worried by this, as this size is never used, let alone
> > > checked by the individual drivers. I can perfectly picture some new
> > > (non-architectural) PMU driver having more counters than that, and
> > > blindly setting bits outside of the allowed range.
> >
> > I tend to agree.
> >
> > > One way to make it a bit safer would be to add a helper replacing the
> > > various bitmap_set() calls, and enforcing that we never overflow this
> > > bitmap.
> >
> > Or perhaps wd could leave the 'num_events' field intact and allocate the
> > new bitmap dynamically?
> >
> > Rob -- what do you prefer? I think the rest of the series is ready to go.
>
> I think the list of places we're initializing cntr_mask is short
> enough to check and additions to arm_pmu users are rare enough I would
> not be too worried about it.
>
> If anything, I think the issue is with the bitmap API in that it has
> no bounds checking. I'm sure it will get on someone's radar to fix at
> some point.
>
> But if we want to have something check, this is what I have:
>
> static inline void armpmu_set_counter_mask(struct arm_pmu *pmu,
> unsigned int start, unsigned int nr)
> {
> if (WARN_ON(start + nr > ARMPMU_MAX_HWEVENTS))
> return;
> bitmap_set(pmu->cntr_mask, start, nr);
> }
Fair enough, for the sake of consistency, let's leave the series as-is
and we can add helpers for all the counter-bound structures later, if we
want to.
Will
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
` (11 preceding siblings ...)
2024-06-26 22:32 ` [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter Rob Herring (Arm)
@ 2024-07-03 14:38 ` Will Deacon
2024-07-10 12:36 ` Will Deacon
12 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2024-07-03 14:38 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, Rob Herring (Arm)
Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-kernel,
linux-perf-users, kvmarm
On Wed, 26 Jun 2024 16:32:24 -0600, Rob Herring (Arm) wrote:
> This series adds support for the optional fixed instruction counter
> added in Armv9.4 PMU. Most of the series is a refactoring to remove the
> index to counter number conversion which dates back to the Armv7 PMU
> driver. Removing it is necessary in order to support more than 32
> counters without a bunch of conditional code further complicating the
> conversion.
>
> [...]
Applied to will (for-next/perf), thanks!
[01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
https://git.kernel.org/will/c/81e15ca3e523
[02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
https://git.kernel.org/will/c/598c1a2d9f4b
[03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/
https://git.kernel.org/will/c/8d75537bebfa
[04/12] perf: arm_v6/7_pmu: Drop non-DT probe support
https://git.kernel.org/will/c/12f051c987dc
[05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
https://git.kernel.org/will/c/d688ffa26942
[06/12] perf: arm_pmu: Remove event index to counter remapping
https://git.kernel.org/will/c/b7e89b0f5bd7
[07/12] perf: arm_pmuv3: Prepare for more than 32 counters
https://git.kernel.org/will/c/12fef9fb7179
[08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors
https://git.kernel.org/will/c/6ef2846c17a3
[09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access
https://git.kernel.org/will/c/558fdd12c069
[10/12] arm64: perf/kvm: Use a common PMU cycle counter define
https://git.kernel.org/will/c/323bc9e17c01
[11/12] KVM: arm64: Refine PMU defines for number of counters
https://git.kernel.org/will/c/be884dd62461
[12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
https://git.kernel.org/will/c/dc4c33f753ca
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter
2024-07-03 14:38 ` [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed " Will Deacon
@ 2024-07-10 12:36 ` Will Deacon
0 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2024-07-10 12:36 UTC (permalink / raw)
To: Russell King, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, James Clark, Rob Herring (Arm)
Cc: kernel-team, linux-arm-kernel, linux-kernel, linux-perf-users,
kvmarm
On Wed, Jul 03, 2024 at 03:38:44PM +0100, Will Deacon wrote:
> On Wed, 26 Jun 2024 16:32:24 -0600, Rob Herring (Arm) wrote:
> > This series adds support for the optional fixed instruction counter
> > added in Armv9.4 PMU. Most of the series is a refactoring to remove the
> > index to counter number conversion which dates back to the Armv7 PMU
> > driver. Removing it is necessary in order to support more than 32
> > counters without a bunch of conditional code further complicating the
> > conversion.
> >
> > [...]
>
> Applied to will (for-next/perf), thanks!
>
> [01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
> https://git.kernel.org/will/c/81e15ca3e523
> [02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
> https://git.kernel.org/will/c/598c1a2d9f4b
> [03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/
> https://git.kernel.org/will/c/8d75537bebfa
> [04/12] perf: arm_v6/7_pmu: Drop non-DT probe support
> https://git.kernel.org/will/c/12f051c987dc
> [05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
> https://git.kernel.org/will/c/d688ffa26942
I've had an off-list report that this series causes a kernel crash under
KVM unit tests (panic in write_pmevtypern()).
Given that I don't have enough information to repro/debug and Catalin is
tagging the arm64 branch for 6.11 today, I've dropped patches 6-12 for
now. Please can you send a fixed version after the merge window?
Cheers,
Will
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-07-10 12:36 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-26 22:32 [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed instruction counter Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 01/12] perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold Rob Herring (Arm)
2024-07-01 17:09 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 02/12] perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check Rob Herring (Arm)
2024-07-01 17:11 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 03/12] perf/arm: Move 32-bit PMU drivers to drivers/perf/ Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 04/12] perf: arm_v6/7_pmu: Drop non-DT probe support Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 05/12] perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 06/12] perf: arm_pmu: Remove event index to counter remapping Rob Herring (Arm)
2024-06-27 11:05 ` Marc Zyngier
2024-07-01 13:52 ` Will Deacon
2024-07-01 15:32 ` Mark Rutland
2024-07-01 15:49 ` Rob Herring
2024-07-02 16:19 ` Will Deacon
2024-07-01 17:06 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 07/12] perf: arm_pmuv3: Prepare for more than 32 counters Rob Herring (Arm)
2024-06-26 22:32 ` [PATCH v2 08/12] KVM: arm64: pmu: Use arm_pmuv3.h register accessors Rob Herring (Arm)
2024-06-27 10:47 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 09/12] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access Rob Herring (Arm)
2024-06-27 10:47 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 10/12] arm64: perf/kvm: Use a common PMU cycle counter define Rob Herring (Arm)
2024-06-27 10:48 ` Marc Zyngier
2024-07-01 17:07 ` Mark Rutland
2024-06-26 22:32 ` [PATCH v2 11/12] KVM: arm64: Refine PMU defines for number of counters Rob Herring (Arm)
2024-06-27 10:54 ` Marc Zyngier
2024-06-26 22:32 ` [PATCH v2 12/12] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter Rob Herring (Arm)
2024-07-01 17:20 ` Mark Rutland
2024-07-03 14:38 ` [PATCH v2 00/12] arm64: Add support for Armv9.4 PMU fixed " Will Deacon
2024-07-10 12:36 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).