* [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features
@ 2025-05-29 11:30 James Clark
2025-05-29 11:30 ` [PATCH v2 01/11] arm64: sysreg: Update PMSIDR_EL1 description James Clark
` (11 more replies)
0 siblings, 12 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark, Leo Yan
Support 3 new SPE features: FEAT_SPEv1p4 filters, FEAT_SPE_EFT extended
filtering, and SPE_FEAT_FDS data source filtering. The features are
independent can be applied separately:
* Prerequisite sysreg changes - patches 1 - 2
* FEAT_SPEv1p4 - patch 3
* FEAT_SPE_EFT - patch 4
* FEAT_SPE_FDS - patches 5 - 8
* FEAT_SPE_FDS Perf tool changes - patches 9 - 11
The first two features will work with old Perfs but a Perf change to
parse the new config4 is required for the last feature.
---
Changes in v2:
- Fix detection of FEAT_SPE_FDS in el2_setup.h
- Pickup Marc Z's sysreg change instead which matches the json
- Restructure and expand docs changes
- Link to v1: https://lore.kernel.org/r/20250506-james-perf-feat_spe_eft-v1-0-dd480e8e4851@linaro.org
---
James Clark (10):
arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
perf: arm_spe: Support FEAT_SPEv1p4 filters
perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
KVM: arm64: Add trap configs for PMSDSFR_EL1
perf: Add perf_event_attr::config4
perf: arm_spe: Add support for filtering on data source
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
perf tools: Add support for perf_event_attr::config4
perf docs: arm-spe: Document new SPE filtering features
Marc Zyngier (1):
arm64: sysreg: Update PMSIDR_EL1 description
Documentation/arch/arm64/booting.rst | 11 ++++
arch/arm64/include/asm/el2_setup.h | 14 +++++
arch/arm64/include/asm/sysreg.h | 7 +++
arch/arm64/kvm/emulate-nested.c | 1 +
arch/arm64/kvm/sys_regs.c | 1 +
arch/arm64/tools/sysreg | 45 ++++++++++++--
drivers/perf/arm_spe_pmu.c | 100 +++++++++++++++++++++++++++++-
include/uapi/linux/perf_event.h | 2 +
tools/include/uapi/linux/perf_event.h | 2 +
tools/perf/Documentation/perf-arm-spe.txt | 97 ++++++++++++++++++++++++++---
tools/perf/tests/parse-events.c | 14 ++++-
tools/perf/util/parse-events.c | 11 ++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/pmu.c | 8 +++
tools/perf/util/pmu.h | 1 +
16 files changed, 301 insertions(+), 15 deletions(-)
---
base-commit: 90b83efa6701656e02c86e7df2cb1765ea602d07
change-id: 20250312-james-perf-feat_spe_eft-66cdf4d8fe99
Best regards,
--
James Clark <james.clark@linaro.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 01/11] arm64: sysreg: Update PMSIDR_EL1 description
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 02/11] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
` (10 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm
From: Marc Zyngier <maz@kernel.org>
Add the missing SME, ALTCLK, FPF, EFT. CRR and FDS fields.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/tools/sysreg | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index bdf044c5d11b..e7a8423500f7 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2226,7 +2226,28 @@ Field 15:0 MINLAT
EndSysreg
Sysreg PMSIDR_EL1 3 0 9 9 7
-Res0 63:25
+Res0 63:33
+UnsignedEnum 32 SME
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 31:28 ALTCLK
+ 0b0000 NI
+ 0b0001 IMP
+ 0b1111 IMPDEF
+EndEnum
+UnsignedEnum 27 FPF
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 26 EFT
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 25 CRR
+ 0b0 NI
+ 0b1 IMP
+EndEnum
Field 24 PBT
Field 23:20 FORMAT
Enum 19:16 COUNTSIZE
@@ -2244,7 +2265,10 @@ Enum 11:8 INTERVAL
0b0111 3072
0b1000 4096
EndEnum
-Res0 7
+UnsignedEnum 7 FDS
+ 0b0 NI
+ 0b1 IMP
+EndEnum
Field 6 FnE
Field 5 ERND
Field 4 LDS
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 02/11] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
2025-05-29 11:30 ` [PATCH v2 01/11] arm64: sysreg: Update PMSIDR_EL1 description James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 03/11] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
` (9 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark
Add new fields and register that are introduced for the features
FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source
filtering).
Signed-off-by: James Clark <james.clark@linaro.org>
---
arch/arm64/tools/sysreg | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index e7a8423500f7..e2cadf224f7e 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2205,11 +2205,20 @@ Field 0 RND
EndSysreg
Sysreg PMSFCR_EL1 3 0 9 9 4
-Res0 63:19
+Res0 63:53
+Field 52 SIMDm
+Field 51 FPm
+Field 50 STm
+Field 49 LDm
+Field 48 Bm
+Res0 47:21
+Field 20 SIMD
+Field 19 FP
Field 18 ST
Field 17 LD
Field 16 B
-Res0 15:4
+Res0 15:5
+Field 4 FDS
Field 3 FnE
Field 2 FL
Field 1 FT
@@ -2311,6 +2320,10 @@ Field 16 COLL
Field 15:0 MSS
EndSysreg
+Sysreg PMSDSFR_EL1 3 0 9 10 4
+Field 63:0 S
+EndSysreg
+
Sysreg PMBIDR_EL1 3 0 9 10 7
Res0 63:12
Enum 11:8 EA
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 03/11] perf: arm_spe: Support FEAT_SPEv1p4 filters
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
2025-05-29 11:30 ` [PATCH v2 01/11] arm64: sysreg: Update PMSIDR_EL1 description James Clark
2025-05-29 11:30 ` [PATCH v2 02/11] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 04/11] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
` (8 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan, James Clark
FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
remove them from the previous version's RES0 bits using
PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.
E.g:
E[30], bit [30]
When FEAT_SPEv1p4 is _not_ implemented ...
FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
switch.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
arch/arm64/include/asm/sysreg.h | 7 +++++++
drivers/perf/arm_spe_pmu.c | 5 ++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 2639d3633073..e24042e914a4 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -354,6 +354,13 @@
(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
#define PMSEVFR_EL1_RES0_V1P2 \
(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
+#define PMSEVFR_EL1_RES0_V1P4_EXCL \
+ (BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
+#define PMSEVFR_EL1_RES0_V1P4_INCL \
+ (GENMASK_ULL(31, 26))
+#define PMSEVFR_EL1_RES0_V1P4 \
+ (PMSEVFR_EL1_RES0_V1P4_INCL | \
+ (PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
/* Buffer error reporting */
#define PMBSR_EL1_FAULT_FSC_SHIFT PMBSR_EL1_MSS_SHIFT
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 3efed8839a4e..d9f6d229dce8 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
case ID_AA64DFR0_EL1_PMSVer_V1P1:
return PMSEVFR_EL1_RES0_V1P1;
case ID_AA64DFR0_EL1_PMSVer_V1P2:
+ case ID_AA64DFR0_EL1_PMSVer_V1P3:
+ return PMSEVFR_EL1_RES0_V1P2;
+ case ID_AA64DFR0_EL1_PMSVer_V1P4:
/* Return the highest version we support in default */
default:
- return PMSEVFR_EL1_RES0_V1P2;
+ return PMSEVFR_EL1_RES0_V1P4;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 04/11] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (2 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 03/11] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
` (7 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan, James Clark
FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing
load, store and branch filters. It also adds two new filter bits for
SIMD and floating point with their own associated mask bits. The current
filters only allow OR filtering on samples that are load OR store etc,
and the new mask bits allow setting part of the filter to an AND, for
example filtering samples that are store AND SIMD. With mask bits set to
0, the OR behavior is preserved, so the unless any masks are explicitly
set old filters will behave the same.
Add them all and make them behave the same way as existing format bits,
hidden and return EOPNOTSUPP if set when the feature doesn't exist.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
drivers/perf/arm_spe_pmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d9f6d229dce8..9309b846f642 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -86,6 +86,7 @@ struct arm_spe_pmu {
#define SPE_PMU_FEAT_ERND (1UL << 5)
#define SPE_PMU_FEAT_INV_FILT_EVT (1UL << 6)
#define SPE_PMU_FEAT_DISCARD (1UL << 7)
+#define SPE_PMU_FEAT_EFT (1UL << 8)
#define SPE_PMU_FEAT_DEV_PROBED (1UL << 63)
u64 features;
@@ -197,6 +198,27 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
#define ATTR_CFG_FLD_discard_CFG config /* PMBLIMITR_EL1.FM = DISCARD */
#define ATTR_CFG_FLD_discard_LO 35
#define ATTR_CFG_FLD_discard_HI 35
+#define ATTR_CFG_FLD_branch_filter_mask_CFG config /* PMSFCR_EL1.Bm */
+#define ATTR_CFG_FLD_branch_filter_mask_LO 36
+#define ATTR_CFG_FLD_branch_filter_mask_HI 36
+#define ATTR_CFG_FLD_load_filter_mask_CFG config /* PMSFCR_EL1.LDm */
+#define ATTR_CFG_FLD_load_filter_mask_LO 37
+#define ATTR_CFG_FLD_load_filter_mask_HI 37
+#define ATTR_CFG_FLD_store_filter_mask_CFG config /* PMSFCR_EL1.STm */
+#define ATTR_CFG_FLD_store_filter_mask_LO 38
+#define ATTR_CFG_FLD_store_filter_mask_HI 38
+#define ATTR_CFG_FLD_simd_filter_CFG config /* PMSFCR_EL1.SIMD */
+#define ATTR_CFG_FLD_simd_filter_LO 39
+#define ATTR_CFG_FLD_simd_filter_HI 39
+#define ATTR_CFG_FLD_simd_filter_mask_CFG config /* PMSFCR_EL1.SIMDm */
+#define ATTR_CFG_FLD_simd_filter_mask_LO 40
+#define ATTR_CFG_FLD_simd_filter_mask_HI 40
+#define ATTR_CFG_FLD_float_filter_CFG config /* PMSFCR_EL1.FP */
+#define ATTR_CFG_FLD_float_filter_LO 41
+#define ATTR_CFG_FLD_float_filter_HI 41
+#define ATTR_CFG_FLD_float_filter_mask_CFG config /* PMSFCR_EL1.FPm */
+#define ATTR_CFG_FLD_float_filter_mask_LO 42
+#define ATTR_CFG_FLD_float_filter_mask_HI 42
#define ATTR_CFG_FLD_event_filter_CFG config1 /* PMSEVFR_EL1 */
#define ATTR_CFG_FLD_event_filter_LO 0
@@ -215,8 +237,15 @@ GEN_PMU_FORMAT_ATTR(pa_enable);
GEN_PMU_FORMAT_ATTR(pct_enable);
GEN_PMU_FORMAT_ATTR(jitter);
GEN_PMU_FORMAT_ATTR(branch_filter);
+GEN_PMU_FORMAT_ATTR(branch_filter_mask);
GEN_PMU_FORMAT_ATTR(load_filter);
+GEN_PMU_FORMAT_ATTR(load_filter_mask);
GEN_PMU_FORMAT_ATTR(store_filter);
+GEN_PMU_FORMAT_ATTR(store_filter_mask);
+GEN_PMU_FORMAT_ATTR(simd_filter);
+GEN_PMU_FORMAT_ATTR(simd_filter_mask);
+GEN_PMU_FORMAT_ATTR(float_filter);
+GEN_PMU_FORMAT_ATTR(float_filter_mask);
GEN_PMU_FORMAT_ATTR(event_filter);
GEN_PMU_FORMAT_ATTR(inv_event_filter);
GEN_PMU_FORMAT_ATTR(min_latency);
@@ -228,8 +257,15 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
&format_attr_pct_enable.attr,
&format_attr_jitter.attr,
&format_attr_branch_filter.attr,
+ &format_attr_branch_filter_mask.attr,
&format_attr_load_filter.attr,
+ &format_attr_load_filter_mask.attr,
&format_attr_store_filter.attr,
+ &format_attr_store_filter_mask.attr,
+ &format_attr_simd_filter.attr,
+ &format_attr_simd_filter_mask.attr,
+ &format_attr_float_filter.attr,
+ &format_attr_float_filter_mask.attr,
&format_attr_event_filter.attr,
&format_attr_inv_event_filter.attr,
&format_attr_min_latency.attr,
@@ -250,6 +286,16 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
return 0;
+ if ((attr == &format_attr_branch_filter_mask.attr ||
+ attr == &format_attr_load_filter_mask.attr ||
+ attr == &format_attr_store_filter_mask.attr ||
+ attr == &format_attr_simd_filter.attr ||
+ attr == &format_attr_simd_filter_mask.attr ||
+ attr == &format_attr_float_filter.attr ||
+ attr == &format_attr_float_filter_mask.attr) &&
+ !(spe_pmu->features & SPE_PMU_FEAT_EFT))
+ return 0;
+
return attr->mode;
}
@@ -341,8 +387,15 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
u64 reg = 0;
reg |= FIELD_PREP(PMSFCR_EL1_LD, ATTR_CFG_GET_FLD(attr, load_filter));
+ reg |= FIELD_PREP(PMSFCR_EL1_LDm, ATTR_CFG_GET_FLD(attr, load_filter_mask));
reg |= FIELD_PREP(PMSFCR_EL1_ST, ATTR_CFG_GET_FLD(attr, store_filter));
+ reg |= FIELD_PREP(PMSFCR_EL1_STm, ATTR_CFG_GET_FLD(attr, store_filter_mask));
reg |= FIELD_PREP(PMSFCR_EL1_B, ATTR_CFG_GET_FLD(attr, branch_filter));
+ reg |= FIELD_PREP(PMSFCR_EL1_Bm, ATTR_CFG_GET_FLD(attr, branch_filter_mask));
+ reg |= FIELD_PREP(PMSFCR_EL1_SIMD, ATTR_CFG_GET_FLD(attr, simd_filter));
+ reg |= FIELD_PREP(PMSFCR_EL1_SIMDm, ATTR_CFG_GET_FLD(attr, simd_filter_mask));
+ reg |= FIELD_PREP(PMSFCR_EL1_FP, ATTR_CFG_GET_FLD(attr, float_filter));
+ reg |= FIELD_PREP(PMSFCR_EL1_FPm, ATTR_CFG_GET_FLD(attr, float_filter_mask));
if (reg)
reg |= PMSFCR_EL1_FT;
@@ -716,6 +769,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
u64 reg;
struct perf_event_attr *attr = &event->attr;
struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
+ const u64 feat_spe_eft_bits = PMSFCR_EL1_LDm | PMSFCR_EL1_STm |
+ PMSFCR_EL1_Bm | PMSFCR_EL1_SIMD |
+ PMSFCR_EL1_SIMDm | PMSFCR_EL1_FP |
+ PMSFCR_EL1_FPm;
/* This is, of course, deeply driver-specific */
if (attr->type != event->pmu->type)
@@ -761,6 +818,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
!(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
return -EOPNOTSUPP;
+ if ((reg & feat_spe_eft_bits) &&
+ !(spe_pmu->features & SPE_PMU_FEAT_EFT))
+ return -EOPNOTSUPP;
+
if (ATTR_CFG_GET_FLD(&event->attr, discard) &&
!(spe_pmu->features & SPE_PMU_FEAT_DISCARD))
return -EOPNOTSUPP;
@@ -1052,6 +1113,9 @@ static void __arm_spe_pmu_dev_probe(void *info)
if (spe_pmu->pmsver >= ID_AA64DFR0_EL1_PMSVer_V1P2)
spe_pmu->features |= SPE_PMU_FEAT_DISCARD;
+ if (FIELD_GET(PMSIDR_EL1_EFT, reg))
+ spe_pmu->features |= SPE_PMU_FEAT_EFT;
+
/* This field has a spaced out encoding, so just use a look-up */
fld = FIELD_GET(PMSIDR_EL1_INTERVAL, reg);
switch (fld) {
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (3 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 04/11] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 16:57 ` Marc Zyngier
2025-05-29 11:30 ` [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
` (6 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark
SPE data source filtering (optional from Armv8.8) requires that traps to
the filter register PMSDSFR be disabled. Document the requirements and
disable the traps if the feature is present.
Signed-off-by: James Clark <james.clark@linaro.org>
---
Documentation/arch/arm64/booting.rst | 11 +++++++++++
arch/arm64/include/asm/el2_setup.h | 14 ++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index dee7b6de864f..abd75085a239 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
- HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
- HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
+ For CPUs with SPE data source filtering (FEAT_SPE_FDS):
+
+ - If EL3 is present:
+
+ - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+ - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+
For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
- If the kernel is entered at EL1 and EL2 is present:
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index f6d72ca03133..6d0d8c25e912 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -279,6 +279,20 @@
orr x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
orr x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
.Lskip_pmuv3p9_\@:
+ mrs x1, id_aa64dfr0_el1
+ ubfx x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
+ /* If SPE is implemented, */
+ cmp x1, #ID_AA64DFR0_EL1_PMSVer_IMP
+ b.lt .Lskip_spefds_\@
+ /* we can read PMSIDR and */
+ mrs_s x1, SYS_PMSIDR_EL1
+ and x1, x1, #(1 << PMSIDR_EL1_FDS_SHIFT)
+ /* if FEAT_SPE_FDS is implemented, */
+ cbz x1, .Lskip_spefds_\@
+ /* disable traps to PMSDSFR. */
+ orr x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
+
+.Lskip_spefds_\@:
msr_s SYS_HDFGRTR2_EL2, x0
msr_s SYS_HDFGWTR2_EL2, x0
msr_s SYS_HFGRTR2_EL2, xzr
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (4 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 16:56 ` Marc Zyngier
2025-05-29 11:30 ` [PATCH v2 07/11] perf: Add perf_event_attr::config4 James Clark
` (5 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark
SPE data source filtering (SPE_FEAT_FDS) adds a new register
PMSDSFR_EL1, add the trap configs for it.
Signed-off-by: James Clark <james.clark@linaro.org>
---
arch/arm64/kvm/emulate-nested.c | 1 +
arch/arm64/kvm/sys_regs.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 0fcfcc0478f9..05d3e6b93ae9 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -1169,6 +1169,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
+ SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5dde9285afc8..9f544ac7b5a6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2956,6 +2956,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
+ { SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
/* PMBIDR_EL1 is not trapped */
{ PMU_SYS_REG(PMINTENSET_EL1),
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 07/11] perf: Add perf_event_attr::config4
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (5 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 08/11] perf: arm_spe: Add support for filtering on data source James Clark
` (4 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan, James Clark
Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
packet using another 64-bits of event filtering control. As the existing
perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config4' field.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
include/uapi/linux/perf_event.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 78a362b80027..0d0ed85ad8cb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -382,6 +382,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
+#define PERF_ATTR_SIZE_VER9 144 /* add: config4 */
/*
* 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +544,7 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+ __u64 config4; /* extension of config3 */
};
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 08/11] perf: arm_spe: Add support for filtering on data source
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (6 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 07/11] perf: Add perf_event_attr::config4 James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 09/11] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
` (3 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan, James Clark
SPE_FEAT_FDS adds the ability to filter on the data source of packets.
Like the other existing filters, enable filtering with PMSFCR_EL1.FDS
when any of the filter bits are set.
Each bit maps to data sources 0-63 described by bits[0:5] in the data
source packet (although the full range of data source is 16 bits so
higher value data sources can't be filtered on). The filter is an OR of
all the bits, so for example setting bits 0 and 3 filters packets from
data sources 0 OR 3.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
drivers/perf/arm_spe_pmu.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 9309b846f642..d04318411f77 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -87,6 +87,7 @@ struct arm_spe_pmu {
#define SPE_PMU_FEAT_INV_FILT_EVT (1UL << 6)
#define SPE_PMU_FEAT_DISCARD (1UL << 7)
#define SPE_PMU_FEAT_EFT (1UL << 8)
+#define SPE_PMU_FEAT_FDS (1UL << 9)
#define SPE_PMU_FEAT_DEV_PROBED (1UL << 63)
u64 features;
@@ -232,6 +233,10 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
#define ATTR_CFG_FLD_inv_event_filter_LO 0
#define ATTR_CFG_FLD_inv_event_filter_HI 63
+#define ATTR_CFG_FLD_data_src_filter_CFG config4 /* PMSDSFR_EL1 */
+#define ATTR_CFG_FLD_data_src_filter_LO 0
+#define ATTR_CFG_FLD_data_src_filter_HI 63
+
GEN_PMU_FORMAT_ATTR(ts_enable);
GEN_PMU_FORMAT_ATTR(pa_enable);
GEN_PMU_FORMAT_ATTR(pct_enable);
@@ -248,6 +253,7 @@ GEN_PMU_FORMAT_ATTR(float_filter);
GEN_PMU_FORMAT_ATTR(float_filter_mask);
GEN_PMU_FORMAT_ATTR(event_filter);
GEN_PMU_FORMAT_ATTR(inv_event_filter);
+GEN_PMU_FORMAT_ATTR(data_src_filter);
GEN_PMU_FORMAT_ATTR(min_latency);
GEN_PMU_FORMAT_ATTR(discard);
@@ -268,6 +274,7 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
&format_attr_float_filter_mask.attr,
&format_attr_event_filter.attr,
&format_attr_inv_event_filter.attr,
+ &format_attr_data_src_filter.attr,
&format_attr_min_latency.attr,
&format_attr_discard.attr,
NULL,
@@ -286,6 +293,9 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
return 0;
+ if (attr == &format_attr_data_src_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_FDS))
+ return 0;
+
if ((attr == &format_attr_branch_filter_mask.attr ||
attr == &format_attr_load_filter_mask.attr ||
attr == &format_attr_store_filter_mask.attr ||
@@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
reg |= PMSFCR_EL1_FnE;
+ if (ATTR_CFG_GET_FLD(attr, data_src_filter))
+ reg |= PMSFCR_EL1_FDS;
+
if (ATTR_CFG_GET_FLD(attr, min_latency))
reg |= PMSFCR_EL1_FL;
@@ -430,6 +443,12 @@ static u64 arm_spe_event_to_pmslatfr(struct perf_event *event)
return FIELD_PREP(PMSLATFR_EL1_MINLAT, ATTR_CFG_GET_FLD(attr, min_latency));
}
+static u64 arm_spe_event_to_pmsdsfr(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+ return ATTR_CFG_GET_FLD(attr, data_src_filter);
+}
+
static void arm_spe_pmu_pad_buf(struct perf_output_handle *handle, int len)
{
struct arm_spe_pmu_buf *buf = perf_get_aux(handle);
@@ -788,6 +807,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
return -EOPNOTSUPP;
+ if (arm_spe_event_to_pmsdsfr(event) &&
+ !(spe_pmu->features & SPE_PMU_FEAT_FDS))
+ return -EOPNOTSUPP;
+
if (attr->exclude_idle)
return -EOPNOTSUPP;
@@ -857,6 +880,11 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
write_sysreg_s(reg, SYS_PMSNEVFR_EL1);
}
+ if (spe_pmu->features & SPE_PMU_FEAT_FDS) {
+ reg = arm_spe_event_to_pmsdsfr(event);
+ write_sysreg_s(reg, SYS_PMSDSFR_EL1);
+ }
+
reg = arm_spe_event_to_pmslatfr(event);
write_sysreg_s(reg, SYS_PMSLATFR_EL1);
@@ -1116,6 +1144,9 @@ static void __arm_spe_pmu_dev_probe(void *info)
if (FIELD_GET(PMSIDR_EL1_EFT, reg))
spe_pmu->features |= SPE_PMU_FEAT_EFT;
+ if (FIELD_GET(PMSIDR_EL1_FDS, reg))
+ spe_pmu->features |= SPE_PMU_FEAT_FDS;
+
/* This field has a spaced out encoding, so just use a look-up */
fld = FIELD_GET(PMSIDR_EL1_INTERVAL, reg);
switch (fld) {
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 09/11] tools headers UAPI: Sync linux/perf_event.h with the kernel sources
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (7 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 08/11] perf: arm_spe: Add support for filtering on data source James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 11:30 ` [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4 James Clark
` (2 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark
To pickup config4 changes.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/include/uapi/linux/perf_event.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 78a362b80027..0d0ed85ad8cb 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -382,6 +382,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
+#define PERF_ATTR_SIZE_VER9 144 /* add: config4 */
/*
* 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +544,7 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+ __u64 config4; /* extension of config3 */
};
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (8 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 09/11] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 17:25 ` Ian Rogers
2025-05-29 11:30 ` [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features James Clark
2025-05-29 16:48 ` [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features Leo Yan
11 siblings, 1 reply; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan, James Clark
perf_event_attr has gained a new field, config4, so add support for it
extending the existing configN support.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/parse-events.c | 14 +++++++++++++-
tools/perf/util/parse-events.c | 11 +++++++++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/pmu.c | 8 ++++++++
tools/perf/util/pmu.h | 1 +
6 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 5ec2e5607987..5f624a63d550 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -615,6 +615,8 @@ static int test__checkevent_pmu(struct evlist *evlist)
TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
+ TEST_ASSERT_VAL("wrong config4", 0 == evsel->core.attr.config4);
+
/*
* The period value gets configured within evlist__config,
* while this test executes only parse events method.
@@ -637,6 +639,7 @@ static int test__checkevent_list(struct evlist *evlist)
TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
+ TEST_ASSERT_VAL("wrong config4", 0 == evsel->core.attr.config4);
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -813,6 +816,15 @@ static int test__checkterms_simple(struct parse_events_terms *terms)
TEST_ASSERT_VAL("wrong val", term->val.num == 4);
TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config3"));
+ /* config4=5 */
+ term = list_entry(term->list.next, struct parse_events_term, list);
+ TEST_ASSERT_VAL("wrong type term",
+ term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG4);
+ TEST_ASSERT_VAL("wrong type val",
+ term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
+ TEST_ASSERT_VAL("wrong val", term->val.num == 5);
+ TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config4"));
+
/* umask=1*/
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
@@ -2451,7 +2463,7 @@ struct terms_test {
static const struct terms_test test__terms[] = {
[0] = {
- .str = "config=10,config1,config2=3,config3=4,umask=1,read,r0xead",
+ .str = "config=10,config1,config2=3,config3=4,config4=5,umask=1,read,r0xead",
.check = test__checkterms_simple,
},
};
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5152fd5a6ead..7e37f91e7b49 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -247,6 +247,8 @@ __add_event(struct list_head *list, int *idx,
PERF_PMU_FORMAT_VALUE_CONFIG2, "config2");
perf_pmu__warn_invalid_config(pmu, attr->config3, name,
PERF_PMU_FORMAT_VALUE_CONFIG3, "config3");
+ perf_pmu__warn_invalid_config(pmu, attr->config4, name,
+ PERF_PMU_FORMAT_VALUE_CONFIG4, "config4");
}
if (init_attr)
event_attr_init(attr);
@@ -783,6 +785,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
[PARSE_EVENTS__TERM_TYPE_CONFIG1] = "config1",
[PARSE_EVENTS__TERM_TYPE_CONFIG2] = "config2",
[PARSE_EVENTS__TERM_TYPE_CONFIG3] = "config3",
+ [PARSE_EVENTS__TERM_TYPE_CONFIG4] = "config4",
[PARSE_EVENTS__TERM_TYPE_NAME] = "name",
[PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD] = "period",
[PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ] = "freq",
@@ -830,6 +833,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
case PARSE_EVENTS__TERM_TYPE_CONFIG1:
case PARSE_EVENTS__TERM_TYPE_CONFIG2:
case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
case PARSE_EVENTS__TERM_TYPE_NAME:
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
@@ -898,6 +902,10 @@ do { \
CHECK_TYPE_VAL(NUM);
attr->config3 = term->val.num;
break;
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
+ CHECK_TYPE_VAL(NUM);
+ attr->config4 = term->val.num;
+ break;
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
CHECK_TYPE_VAL(NUM);
break;
@@ -1097,6 +1105,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_CONFIG1:
case PARSE_EVENTS__TERM_TYPE_CONFIG2:
case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
case PARSE_EVENTS__TERM_TYPE_NAME:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
@@ -1237,6 +1246,7 @@ do { \
case PARSE_EVENTS__TERM_TYPE_CONFIG1:
case PARSE_EVENTS__TERM_TYPE_CONFIG2:
case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
case PARSE_EVENTS__TERM_TYPE_NAME:
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_RAW:
@@ -1274,6 +1284,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
case PARSE_EVENTS__TERM_TYPE_CONFIG1:
case PARSE_EVENTS__TERM_TYPE_CONFIG2:
case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
case PARSE_EVENTS__TERM_TYPE_NAME:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e176a34ab088..6e90c26066d4 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -58,6 +58,7 @@ enum parse_events__term_type {
PARSE_EVENTS__TERM_TYPE_CONFIG1,
PARSE_EVENTS__TERM_TYPE_CONFIG2,
PARSE_EVENTS__TERM_TYPE_CONFIG3,
+ PARSE_EVENTS__TERM_TYPE_CONFIG4,
PARSE_EVENTS__TERM_TYPE_NAME,
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 7ed86e3e34e3..8e2986d55bc4 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -317,6 +317,7 @@ config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); }
config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
config3 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG3); }
+config4 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG4); }
name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
freq { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b7ebac5ab1d1..fc50df65d540 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1427,6 +1427,10 @@ static int pmu_config_term(const struct perf_pmu *pmu,
assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
pmu_format_value(bits, term->val.num, &attr->config3, zero);
break;
+ case PARSE_EVENTS__TERM_TYPE_CONFIG4:
+ assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
+ pmu_format_value(bits, term->val.num, &attr->config4, zero);
+ break;
case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
return -EINVAL;
case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
@@ -1474,6 +1478,9 @@ static int pmu_config_term(const struct perf_pmu *pmu,
case PERF_PMU_FORMAT_VALUE_CONFIG3:
vp = &attr->config3;
break;
+ case PERF_PMU_FORMAT_VALUE_CONFIG4:
+ vp = &attr->config4;
+ break;
default:
return -EINVAL;
}
@@ -1787,6 +1794,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
"config1=0..0xffffffffffffffff",
"config2=0..0xffffffffffffffff",
"config3=0..0xffffffffffffffff",
+ "config4=0..0xffffffffffffffff",
"name=string",
"period=number",
"freq=number",
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b93014cc3670..1ce5377935db 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -22,6 +22,7 @@ enum {
PERF_PMU_FORMAT_VALUE_CONFIG1,
PERF_PMU_FORMAT_VALUE_CONFIG2,
PERF_PMU_FORMAT_VALUE_CONFIG3,
+ PERF_PMU_FORMAT_VALUE_CONFIG4,
PERF_PMU_FORMAT_VALUE_CONFIG_END,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (9 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4 James Clark
@ 2025-05-29 11:30 ` James Clark
2025-05-29 16:43 ` Leo Yan
2025-05-29 16:48 ` [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features Leo Yan
11 siblings, 1 reply; 20+ messages in thread
From: James Clark @ 2025-05-29 11:30 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, James Clark
FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
so document them. Also document existing 'event_filter' bits that were
missing from the doc and the fact that latency values are stored in the
weight field.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-arm-spe.txt | 97 ++++++++++++++++++++++++++++---
1 file changed, 88 insertions(+), 9 deletions(-)
diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index 37afade4f1b2..4092b53b58d2 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -141,27 +141,65 @@ Config parameters
These are placed between the // in the event and comma separated. For example '-e
arm_spe/load_filter=1,min_latency=10/'
- branch_filter=1 - collect branches only (PMSFCR.B)
- event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
+ event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below
+ inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below
jitter=1 - use jitter to avoid resonance when sampling (PMSIRR.RND)
- load_filter=1 - collect loads only (PMSFCR.LD)
min_latency=<n> - collect only samples with this latency or higher* (PMSLATFR)
pa_enable=1 - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
- store_filter=1 - collect stores only (PMSFCR.ST)
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
+ data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering'
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
than only the execution latency.
-Only some events can be filtered on; these include:
-
- bit 1 - instruction retired (i.e. omit speculative instructions)
+Only some events can be filtered on using 'event_filter' bits. The overall
+filter is the logical AND of these bits, for example if bits 3 and 5 are set
+only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When
+FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude
+events that have any (OR) of the filter's bits set. For example setting bits 3
+and 5 in 'inv_event_filter' will exclude any events that are either L1D cache
+refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE
+whether the sample is included or excluded. Filter bits for both event_filter
+and inv_event_filter are:
+
+ bit 1 - Instruction retired (i.e. omit speculative instructions)
+ bit 2 - L1D access (FEAT_SPEv1p4)
bit 3 - L1D refill
+ bit 4 - TLB access (FEAT_SPEv1p4)
bit 5 - TLB refill
- bit 7 - mispredict
- bit 11 - misaligned access
+ bit 6 - Not taken event (FEAT_SPEv1p2)
+ bit 7 - Mispredict
+ bit 8 - Last level cache access (FEAT_SPEv1p4)
+ bit 9 - Last level cache miss (FEAT_SPEv1p4)
+ bit 10 - Remote access (FEAT_SPEv1p4)
+ bit 11 - Misaligned access (FEAT_SPEv1p1)
+ bit 12-15 - IMPLEMENTATION DEFINED events (when implemented)
+ bit 16 - Transaction (FEAT_TME)
+ bit 17 - Partial or empty SME or SVE predicate (FEAT_SPEv1p1)
+ bit 18 - Empty SME or SVE predicate (FEAT_SPEv1p1)
+ bit 19 - L2D access (FEAT_SPEv1p4)
+ bit 20 - L2D miss (FEAT_SPEv1p4)
+ bit 21 - Cache data modified (FEAT_SPEv1p4)
+ bit 22 - Recently fetched (FEAT_SPEv1p4)
+ bit 23 - Data snooped (FEAT_SPEv1p4)
+ bit 24 - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or
+ IMPLEMENTATION DEFINED event 24 (when implemented, only versions
+ less than FEAT_SPEv1p4)
+ bit 25 - SMCU or external coprocessor operation event when FEAT_SPE_SME is
+ implemented, or IMPLEMENTATION DEFINED event 25 (when implemented,
+ only versions less than FEAT_SPEv1p4)
+ bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4)
+ bit 48-63 - IMPLEMENTATION DEFINED events (when implemented)
+
+For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are
+implemented.
+
+The driver will reject events if requested filter bits require unimplemented SPE
+versions, but will not reject filter bits for unimplemented IMPDEF bits or when
+their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is
+not implemented, filtering on "Not taken event" (bit 6) will be rejected.
So to sample just retired instructions:
@@ -171,6 +209,31 @@ or just mispredicted branches:
perf record -e arm_spe/event_filter=0x80/ -- ./mybench
+When set, the following filters can be used to select samples that match any of
+the operation types (OR filtering). If only one is set then only samples of that
+type are collected:
+
+ branch_filter=1 - Collect branches (PMSFCR.B)
+ load_filter=1 - Collect loads (PMSFCR.LD)
+ store_filter=1 - Collect stores (PMSFCR.ST)
+
+When extended filtering is supported (FEAT_SPE_EFT), SIMD and float
+pointer operations can also be selected:
+
+ simd_filter=1 - Collect SIMD loads, stores and operations (PMSFCR.SIMD)
+ float_filter=1 - Collect floating point loads, stores and operations (PMSFCR.FP)
+
+When extended filtering is supported (FEAT_SPE_EFT), operation type filters can
+be changed to AND using _mask fields. For example samples could be selected if
+they are store AND SIMD by setting 'store_filter=1,simd_filter=1,
+store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows:
+
+ branch_filter_mask=1 - Change branch filter behavior from OR to AND (PMSFCR.Bm)
+ load_filter_mask=1 - Change load filter behavior from OR to AND (PMSFCR.LDm)
+ store_filter_mask=1 - Change store filter behavior from OR to AND (PMSFCR.STm)
+ simd_filter_mask=1 - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm)
+ float_filter_mask=1 - Change floating point filter behavior from OR to AND (PMSFCR.FPm)
+
Viewing the data
~~~~~~~~~~~~~~~~~
@@ -204,6 +267,10 @@ Memory access details are also stored on the samples and this can be viewed with
perf report --mem-mode
+The latency value from the SPE sample is stored in the 'weight' field of the
+Perf samples and can be displayed in Perf script and report outputs by enabling
+its display from the command line.
+
Common errors
~~~~~~~~~~~~~
@@ -247,6 +314,18 @@ to minimize output. Then run perf stat:
perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
perf stat -e SAMPLE_FEED_LD
+Data source filtering
+~~~~~~~~~~~~~~~~~~~~~
+
+When FEAT_SPE_FDS is present, 'data_src_filter' can be used as a mask to filter
+on a subset (0 - 63) of possible data source IDs. The full range of data sources
+is 0 - 65535 although these are unlikely to be used in practice. Data sources
+are IMPDEF so refer to the TRM for the mappings. Each bit N of the filter maps
+to data source N. The filter is an OR of all the bits, so for example setting
+bits 0 and 3 includes only packets from data sources 0 OR 3. When
+'data_src_filter' is set to 0 data source filtering is disabled and all data
+sources are included.
+
SEE ALSO
--------
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features
2025-05-29 11:30 ` [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features James Clark
@ 2025-05-29 16:43 ` Leo Yan
0 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2025-05-29 16:43 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On Thu, May 29, 2025 at 12:30:32PM +0100, James Clark wrote:
> FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
> so document them. Also document existing 'event_filter' bits that were
> missing from the doc and the fact that latency values are stored in the
> weight field.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
LGTM:
Reviewed-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/Documentation/perf-arm-spe.txt | 97 ++++++++++++++++++++++++++++---
> 1 file changed, 88 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
> index 37afade4f1b2..4092b53b58d2 100644
> --- a/tools/perf/Documentation/perf-arm-spe.txt
> +++ b/tools/perf/Documentation/perf-arm-spe.txt
> @@ -141,27 +141,65 @@ Config parameters
> These are placed between the // in the event and comma separated. For example '-e
> arm_spe/load_filter=1,min_latency=10/'
>
> - branch_filter=1 - collect branches only (PMSFCR.B)
> - event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
> + event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below
> + inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below
> jitter=1 - use jitter to avoid resonance when sampling (PMSIRR.RND)
> - load_filter=1 - collect loads only (PMSFCR.LD)
> min_latency=<n> - collect only samples with this latency or higher* (PMSLATFR)
> pa_enable=1 - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
> - store_filter=1 - collect stores only (PMSFCR.ST)
> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
> discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
> + data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering'
>
> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
> than only the execution latency.
>
> -Only some events can be filtered on; these include:
> -
> - bit 1 - instruction retired (i.e. omit speculative instructions)
> +Only some events can be filtered on using 'event_filter' bits. The overall
> +filter is the logical AND of these bits, for example if bits 3 and 5 are set
> +only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When
> +FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude
> +events that have any (OR) of the filter's bits set. For example setting bits 3
> +and 5 in 'inv_event_filter' will exclude any events that are either L1D cache
> +refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE
> +whether the sample is included or excluded. Filter bits for both event_filter
> +and inv_event_filter are:
> +
> + bit 1 - Instruction retired (i.e. omit speculative instructions)
> + bit 2 - L1D access (FEAT_SPEv1p4)
> bit 3 - L1D refill
> + bit 4 - TLB access (FEAT_SPEv1p4)
> bit 5 - TLB refill
> - bit 7 - mispredict
> - bit 11 - misaligned access
> + bit 6 - Not taken event (FEAT_SPEv1p2)
> + bit 7 - Mispredict
> + bit 8 - Last level cache access (FEAT_SPEv1p4)
> + bit 9 - Last level cache miss (FEAT_SPEv1p4)
> + bit 10 - Remote access (FEAT_SPEv1p4)
> + bit 11 - Misaligned access (FEAT_SPEv1p1)
> + bit 12-15 - IMPLEMENTATION DEFINED events (when implemented)
> + bit 16 - Transaction (FEAT_TME)
> + bit 17 - Partial or empty SME or SVE predicate (FEAT_SPEv1p1)
> + bit 18 - Empty SME or SVE predicate (FEAT_SPEv1p1)
> + bit 19 - L2D access (FEAT_SPEv1p4)
> + bit 20 - L2D miss (FEAT_SPEv1p4)
> + bit 21 - Cache data modified (FEAT_SPEv1p4)
> + bit 22 - Recently fetched (FEAT_SPEv1p4)
> + bit 23 - Data snooped (FEAT_SPEv1p4)
> + bit 24 - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or
> + IMPLEMENTATION DEFINED event 24 (when implemented, only versions
> + less than FEAT_SPEv1p4)
> + bit 25 - SMCU or external coprocessor operation event when FEAT_SPE_SME is
> + implemented, or IMPLEMENTATION DEFINED event 25 (when implemented,
> + only versions less than FEAT_SPEv1p4)
> + bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4)
> + bit 48-63 - IMPLEMENTATION DEFINED events (when implemented)
> +
> +For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are
> +implemented.
> +
> +The driver will reject events if requested filter bits require unimplemented SPE
> +versions, but will not reject filter bits for unimplemented IMPDEF bits or when
> +their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is
> +not implemented, filtering on "Not taken event" (bit 6) will be rejected.
>
> So to sample just retired instructions:
>
> @@ -171,6 +209,31 @@ or just mispredicted branches:
>
> perf record -e arm_spe/event_filter=0x80/ -- ./mybench
>
> +When set, the following filters can be used to select samples that match any of
> +the operation types (OR filtering). If only one is set then only samples of that
> +type are collected:
> +
> + branch_filter=1 - Collect branches (PMSFCR.B)
> + load_filter=1 - Collect loads (PMSFCR.LD)
> + store_filter=1 - Collect stores (PMSFCR.ST)
> +
> +When extended filtering is supported (FEAT_SPE_EFT), SIMD and float
> +pointer operations can also be selected:
> +
> + simd_filter=1 - Collect SIMD loads, stores and operations (PMSFCR.SIMD)
> + float_filter=1 - Collect floating point loads, stores and operations (PMSFCR.FP)
> +
> +When extended filtering is supported (FEAT_SPE_EFT), operation type filters can
> +be changed to AND using _mask fields. For example samples could be selected if
> +they are store AND SIMD by setting 'store_filter=1,simd_filter=1,
> +store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows:
> +
> + branch_filter_mask=1 - Change branch filter behavior from OR to AND (PMSFCR.Bm)
> + load_filter_mask=1 - Change load filter behavior from OR to AND (PMSFCR.LDm)
> + store_filter_mask=1 - Change store filter behavior from OR to AND (PMSFCR.STm)
> + simd_filter_mask=1 - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm)
> + float_filter_mask=1 - Change floating point filter behavior from OR to AND (PMSFCR.FPm)
> +
> Viewing the data
> ~~~~~~~~~~~~~~~~~
>
> @@ -204,6 +267,10 @@ Memory access details are also stored on the samples and this can be viewed with
>
> perf report --mem-mode
>
> +The latency value from the SPE sample is stored in the 'weight' field of the
> +Perf samples and can be displayed in Perf script and report outputs by enabling
> +its display from the command line.
> +
> Common errors
> ~~~~~~~~~~~~~
>
> @@ -247,6 +314,18 @@ to minimize output. Then run perf stat:
> perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> perf stat -e SAMPLE_FEED_LD
>
> +Data source filtering
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +When FEAT_SPE_FDS is present, 'data_src_filter' can be used as a mask to filter
> +on a subset (0 - 63) of possible data source IDs. The full range of data sources
> +is 0 - 65535 although these are unlikely to be used in practice. Data sources
> +are IMPDEF so refer to the TRM for the mappings. Each bit N of the filter maps
> +to data source N. The filter is an OR of all the bits, so for example setting
> +bits 0 and 3 includes only packets from data sources 0 OR 3. When
> +'data_src_filter' is set to 0 data source filtering is disabled and all data
> +sources are included.
> +
> SEE ALSO
> --------
>
>
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
` (10 preceding siblings ...)
2025-05-29 11:30 ` [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features James Clark
@ 2025-05-29 16:48 ` Leo Yan
11 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2025-05-29 16:48 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On Thu, May 29, 2025 at 12:30:21PM +0100, James Clark wrote:
> Support 3 new SPE features: FEAT_SPEv1p4 filters, FEAT_SPE_EFT extended
> filtering, and SPE_FEAT_FDS data source filtering. The features are
> independent can be applied separately:
>
> * Prerequisite sysreg changes - patches 1 - 2
> * FEAT_SPEv1p4 - patch 3
> * FEAT_SPE_EFT - patch 4
> * FEAT_SPE_FDS - patches 5 - 8
> * FEAT_SPE_FDS Perf tool changes - patches 9 - 11
>
> The first two features will work with old Perfs but a Perf change to
> parse the new config4 is required for the last feature.
I tested the load_filter_mask / store_filter_mask (FEAT_SPE_EFT) and
data_src_filter (SPE_FEAT_FDS), all of them work as expected.
Tested-by: Leo Yan <leo.yan@arm.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1
2025-05-29 11:30 ` [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
@ 2025-05-29 16:56 ` Marc Zyngier
2025-06-03 9:50 ` James Clark
0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2025-05-29 16:56 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On Thu, 29 May 2025 12:30:27 +0100,
James Clark <james.clark@linaro.org> wrote:
>
> SPE data source filtering (SPE_FEAT_FDS) adds a new register
> PMSDSFR_EL1, add the trap configs for it.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> arch/arm64/kvm/emulate-nested.c | 1 +
> arch/arm64/kvm/sys_regs.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 0fcfcc0478f9..05d3e6b93ae9 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -1169,6 +1169,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
> SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
> SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
> SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
> + SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
> SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
> SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
> SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 5dde9285afc8..9f544ac7b5a6 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2956,6 +2956,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
> { SYS_DESC(SYS_PMBPTR_EL1), undef_access },
> { SYS_DESC(SYS_PMBSR_EL1), undef_access },
> + { SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
PMSDSFR_EL1 has an offset in the VNCR page (0x858), and must be
described as such. This is equally true for a bunch of other
SPE-related registers, so you might as well fix those while you're at
it.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
2025-05-29 11:30 ` [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
@ 2025-05-29 16:57 ` Marc Zyngier
0 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2025-05-29 16:57 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On Thu, 29 May 2025 12:30:26 +0100,
James Clark <james.clark@linaro.org> wrote:
>
> SPE data source filtering (optional from Armv8.8) requires that traps to
> the filter register PMSDSFR be disabled. Document the requirements and
> disable the traps if the feature is present.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> Documentation/arch/arm64/booting.rst | 11 +++++++++++
> arch/arm64/include/asm/el2_setup.h | 14 ++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> index dee7b6de864f..abd75085a239 100644
> --- a/Documentation/arch/arm64/booting.rst
> +++ b/Documentation/arch/arm64/booting.rst
> @@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
> - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
> - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
>
> + For CPUs with SPE data source filtering (FEAT_SPE_FDS):
> +
> + - If EL3 is present:
> +
> + - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
> +
> + - If the kernel is entered at EL1 and EL2 is present:
> +
> + - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> + - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> +
> For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
>
> - If the kernel is entered at EL1 and EL2 is present:
> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> index f6d72ca03133..6d0d8c25e912 100644
> --- a/arch/arm64/include/asm/el2_setup.h
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -279,6 +279,20 @@
> orr x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
> orr x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
> .Lskip_pmuv3p9_\@:
> + mrs x1, id_aa64dfr0_el1
> + ubfx x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
> + /* If SPE is implemented, */
> + cmp x1, #ID_AA64DFR0_EL1_PMSVer_IMP
> + b.lt .Lskip_spefds_\@
> + /* we can read PMSIDR and */
> + mrs_s x1, SYS_PMSIDR_EL1
> + and x1, x1, #(1 << PMSIDR_EL1_FDS_SHIFT)
Use PMSIDR_EL1_FDS directly, just like you do for the other register
fields.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4
2025-05-29 11:30 ` [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4 James Clark
@ 2025-05-29 17:25 ` Ian Rogers
0 siblings, 0 replies; 20+ messages in thread
From: Ian Rogers @ 2025-05-29 17:25 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
kvmarm, Leo Yan
On Thu, May 29, 2025 at 4:33 AM James Clark <james.clark@linaro.org> wrote:
>
> perf_event_attr has gained a new field, config4, so add support for it
> extending the existing configN support.
>
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/tests/parse-events.c | 14 +++++++++++++-
> tools/perf/util/parse-events.c | 11 +++++++++++
> tools/perf/util/parse-events.h | 1 +
> tools/perf/util/parse-events.l | 1 +
> tools/perf/util/pmu.c | 8 ++++++++
> tools/perf/util/pmu.h | 1 +
> 6 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 5ec2e5607987..5f624a63d550 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -615,6 +615,8 @@ static int test__checkevent_pmu(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
> TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
> TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
> + TEST_ASSERT_VAL("wrong config4", 0 == evsel->core.attr.config4);
> +
> /*
> * The period value gets configured within evlist__config,
> * while this test executes only parse events method.
> @@ -637,6 +639,7 @@ static int test__checkevent_list(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
> TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
> TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
> + TEST_ASSERT_VAL("wrong config4", 0 == evsel->core.attr.config4);
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -813,6 +816,15 @@ static int test__checkterms_simple(struct parse_events_terms *terms)
> TEST_ASSERT_VAL("wrong val", term->val.num == 4);
> TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config3"));
>
> + /* config4=5 */
> + term = list_entry(term->list.next, struct parse_events_term, list);
> + TEST_ASSERT_VAL("wrong type term",
> + term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG4);
> + TEST_ASSERT_VAL("wrong type val",
> + term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
> + TEST_ASSERT_VAL("wrong val", term->val.num == 5);
> + TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config4"));
> +
> /* umask=1*/
> term = list_entry(term->list.next, struct parse_events_term, list);
> TEST_ASSERT_VAL("wrong type term",
> @@ -2451,7 +2463,7 @@ struct terms_test {
>
> static const struct terms_test test__terms[] = {
> [0] = {
> - .str = "config=10,config1,config2=3,config3=4,umask=1,read,r0xead",
> + .str = "config=10,config1,config2=3,config3=4,config4=5,umask=1,read,r0xead",
> .check = test__checkterms_simple,
> },
> };
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 5152fd5a6ead..7e37f91e7b49 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -247,6 +247,8 @@ __add_event(struct list_head *list, int *idx,
> PERF_PMU_FORMAT_VALUE_CONFIG2, "config2");
> perf_pmu__warn_invalid_config(pmu, attr->config3, name,
> PERF_PMU_FORMAT_VALUE_CONFIG3, "config3");
> + perf_pmu__warn_invalid_config(pmu, attr->config4, name,
> + PERF_PMU_FORMAT_VALUE_CONFIG4, "config4");
> }
> if (init_attr)
> event_attr_init(attr);
> @@ -783,6 +785,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
> [PARSE_EVENTS__TERM_TYPE_CONFIG1] = "config1",
> [PARSE_EVENTS__TERM_TYPE_CONFIG2] = "config2",
> [PARSE_EVENTS__TERM_TYPE_CONFIG3] = "config3",
> + [PARSE_EVENTS__TERM_TYPE_CONFIG4] = "config4",
> [PARSE_EVENTS__TERM_TYPE_NAME] = "name",
> [PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD] = "period",
> [PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ] = "freq",
> @@ -830,6 +833,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
> case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> case PARSE_EVENTS__TERM_TYPE_CONFIG2:
> case PARSE_EVENTS__TERM_TYPE_CONFIG3:
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> case PARSE_EVENTS__TERM_TYPE_NAME:
> case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
> @@ -898,6 +902,10 @@ do { \
> CHECK_TYPE_VAL(NUM);
> attr->config3 = term->val.num;
> break;
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> + CHECK_TYPE_VAL(NUM);
> + attr->config4 = term->val.num;
> + break;
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
> CHECK_TYPE_VAL(NUM);
> break;
> @@ -1097,6 +1105,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
> case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> case PARSE_EVENTS__TERM_TYPE_CONFIG2:
> case PARSE_EVENTS__TERM_TYPE_CONFIG3:
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> case PARSE_EVENTS__TERM_TYPE_NAME:
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
> @@ -1237,6 +1246,7 @@ do { \
> case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> case PARSE_EVENTS__TERM_TYPE_CONFIG2:
> case PARSE_EVENTS__TERM_TYPE_CONFIG3:
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> case PARSE_EVENTS__TERM_TYPE_NAME:
> case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
> case PARSE_EVENTS__TERM_TYPE_RAW:
> @@ -1274,6 +1284,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
> case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> case PARSE_EVENTS__TERM_TYPE_CONFIG2:
> case PARSE_EVENTS__TERM_TYPE_CONFIG3:
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> case PARSE_EVENTS__TERM_TYPE_NAME:
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
> case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index e176a34ab088..6e90c26066d4 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -58,6 +58,7 @@ enum parse_events__term_type {
> PARSE_EVENTS__TERM_TYPE_CONFIG1,
> PARSE_EVENTS__TERM_TYPE_CONFIG2,
> PARSE_EVENTS__TERM_TYPE_CONFIG3,
> + PARSE_EVENTS__TERM_TYPE_CONFIG4,
> PARSE_EVENTS__TERM_TYPE_NAME,
> PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
> PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ,
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 7ed86e3e34e3..8e2986d55bc4 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -317,6 +317,7 @@ config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); }
> config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
> config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
> config3 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG3); }
> +config4 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG4); }
> name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
> period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
> freq { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ); }
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index b7ebac5ab1d1..fc50df65d540 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1427,6 +1427,10 @@ static int pmu_config_term(const struct perf_pmu *pmu,
> assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
> pmu_format_value(bits, term->val.num, &attr->config3, zero);
> break;
> + case PARSE_EVENTS__TERM_TYPE_CONFIG4:
> + assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
> + pmu_format_value(bits, term->val.num, &attr->config4, zero);
> + break;
> case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
> return -EINVAL;
> case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
> @@ -1474,6 +1478,9 @@ static int pmu_config_term(const struct perf_pmu *pmu,
> case PERF_PMU_FORMAT_VALUE_CONFIG3:
> vp = &attr->config3;
> break;
> + case PERF_PMU_FORMAT_VALUE_CONFIG4:
> + vp = &attr->config4;
> + break;
> default:
> return -EINVAL;
> }
> @@ -1787,6 +1794,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
> "config1=0..0xffffffffffffffff",
> "config2=0..0xffffffffffffffff",
> "config3=0..0xffffffffffffffff",
> + "config4=0..0xffffffffffffffff",
> "name=string",
> "period=number",
> "freq=number",
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b93014cc3670..1ce5377935db 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -22,6 +22,7 @@ enum {
> PERF_PMU_FORMAT_VALUE_CONFIG1,
> PERF_PMU_FORMAT_VALUE_CONFIG2,
> PERF_PMU_FORMAT_VALUE_CONFIG3,
> + PERF_PMU_FORMAT_VALUE_CONFIG4,
> PERF_PMU_FORMAT_VALUE_CONFIG_END,
> };
>
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1
2025-05-29 16:56 ` Marc Zyngier
@ 2025-06-03 9:50 ` James Clark
2025-06-04 15:31 ` Marc Zyngier
0 siblings, 1 reply; 20+ messages in thread
From: James Clark @ 2025-06-03 9:50 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On 29/05/2025 5:56 pm, Marc Zyngier wrote:
> On Thu, 29 May 2025 12:30:27 +0100,
> James Clark <james.clark@linaro.org> wrote:
>>
>> SPE data source filtering (SPE_FEAT_FDS) adds a new register
>> PMSDSFR_EL1, add the trap configs for it.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> arch/arm64/kvm/emulate-nested.c | 1 +
>> arch/arm64/kvm/sys_regs.c | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>> index 0fcfcc0478f9..05d3e6b93ae9 100644
>> --- a/arch/arm64/kvm/emulate-nested.c
>> +++ b/arch/arm64/kvm/emulate-nested.c
>> @@ -1169,6 +1169,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
>> SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
>> SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
>> SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
>> + SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
>> SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
>> SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
>> SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 5dde9285afc8..9f544ac7b5a6 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -2956,6 +2956,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>> { SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
>> { SYS_DESC(SYS_PMBPTR_EL1), undef_access },
>> { SYS_DESC(SYS_PMBSR_EL1), undef_access },
>> + { SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
>
> PMSDSFR_EL1 has an offset in the VNCR page (0x858), and must be
> described as such. This is equally true for a bunch of other
> SPE-related registers, so you might as well fix those while you're at
> it.
>
> Thanks,
>
> M.
>
I got a bit stuck with what that would look like with registers that are
only undef in case there was something that I missed, but do I just
document the offsets?
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -87,6 +87,8 @@
#define VNCR_PMSICR_EL1 0x838
#define VNCR_PMSIRR_EL1 0x840
#define VNCR_PMSLATFR_EL1 0x848
+#define VNCR_PMSNEVFR_EL1 0x850
+#define VNCR_PMSDSFR_EL1 0x858
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -596,6 +596,16 @@ enum vcpu_sysreg {
VNCR(ICH_HCR_EL2),
VNCR(ICH_VMCR_EL2),
+ /* SPE Registers */
+ VNCR(PMBLIMITR_EL1),
+ VNCR(PMBPTR_EL1),
+ VNCR(PMBSR_EL1),
+ VNCR(PMSCR_EL1),
+ VNCR(PMSEVFR_EL1),
+ VNCR(PMSICR_EL1),
+ VNCR(PMSIRR_EL1),
+ VNCR(PMSLATFR_EL1),
And then sys_reg_descs[] remain as "{ SYS_DESC(SYS_PMBLIMITR_EL1),
undef_access }," rather than EL2_REG_VNCR() because we don't actually
want to change to bad_vncr_trap()?
There are some other parts about fine grained traps and res0 bits for
NV, but they all already look to be setup correctly. Except
HDFGRTR2_EL2.nPMSDSFR_EL1, but it's inverted, none of the FGT2 traps are
configured currently and PMSDSFR_EL1 is already trapped by MDCR_EL2 anyway.
Thanks
James
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1
2025-06-03 9:50 ` James Clark
@ 2025-06-04 15:31 ` Marc Zyngier
2025-06-05 10:33 ` James Clark
0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2025-06-04 15:31 UTC (permalink / raw)
To: James Clark
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On Tue, 03 Jun 2025 10:50:23 +0100,
James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 29/05/2025 5:56 pm, Marc Zyngier wrote:
> > On Thu, 29 May 2025 12:30:27 +0100,
> > James Clark <james.clark@linaro.org> wrote:
> >>
> >> SPE data source filtering (SPE_FEAT_FDS) adds a new register
> >> PMSDSFR_EL1, add the trap configs for it.
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >> arch/arm64/kvm/emulate-nested.c | 1 +
> >> arch/arm64/kvm/sys_regs.c | 1 +
> >> 2 files changed, 2 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> >> index 0fcfcc0478f9..05d3e6b93ae9 100644
> >> --- a/arch/arm64/kvm/emulate-nested.c
> >> +++ b/arch/arm64/kvm/emulate-nested.c
> >> @@ -1169,6 +1169,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
> >> SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
> >> SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
> >> SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
> >> + SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
> >> SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
> >> SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
> >> SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index 5dde9285afc8..9f544ac7b5a6 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -2956,6 +2956,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >> { SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
> >> { SYS_DESC(SYS_PMBPTR_EL1), undef_access },
> >> { SYS_DESC(SYS_PMBSR_EL1), undef_access },
> >> + { SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
> >
> > PMSDSFR_EL1 has an offset in the VNCR page (0x858), and must be
> > described as such. This is equally true for a bunch of other
> > SPE-related registers, so you might as well fix those while you're at
> > it.
> >
> > Thanks,
> >
> > M.
> >
>
> I got a bit stuck with what that would look like with registers that
> are only undef in case there was something that I missed, but do I
> just document the offsets?
>
> +++ b/arch/arm64/include/asm/vncr_mapping.h
> @@ -87,6 +87,8 @@
> #define VNCR_PMSICR_EL1 0x838
> #define VNCR_PMSIRR_EL1 0x840
> #define VNCR_PMSLATFR_EL1 0x848
> +#define VNCR_PMSNEVFR_EL1 0x850
> +#define VNCR_PMSDSFR_EL1 0x858
>
This should be enough.
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -596,6 +596,16 @@ enum vcpu_sysreg {
> VNCR(ICH_HCR_EL2),
> VNCR(ICH_VMCR_EL2),
>
> + /* SPE Registers */
> + VNCR(PMBLIMITR_EL1),
> + VNCR(PMBPTR_EL1),
> + VNCR(PMBSR_EL1),
> + VNCR(PMSCR_EL1),
> + VNCR(PMSEVFR_EL1),
> + VNCR(PMSICR_EL1),
> + VNCR(PMSIRR_EL1),
> + VNCR(PMSLATFR_EL1),
I don't see a point in having those until we actually have SPE support
for guests, if ever, as these will potentially increase the size of
the vcpu sysreg array for no good reason.
> And then sys_reg_descs[] remain as "{ SYS_DESC(SYS_PMBLIMITR_EL1),
> undef_access }," rather than EL2_REG_VNCR() because we don't actually
> want to change to bad_vncr_trap()?
This seem OK for now. We may want to refine this in the future though,
as these registers cannot trap when NV is enabled. Yes, this is a bug
in the architecture.
> There are some other parts about fine grained traps and res0 bits for
> NV, but they all already look to be setup correctly. Except
> HDFGRTR2_EL2.nPMSDSFR_EL1, but it's inverted, none of the FGT2 traps
> are configured currently and PMSDSFR_EL1 is already trapped by
> MDCR_EL2 anyway.
Can you elaborate on that? We have:
SR_FGT(SYS_PMSDSFR_EL1, HDFGRTR2, nPMSDSFR_EL1, 0),
which seems to match the spec.
We also have full support for FEAT_FGT2 already (even if we have no
support for the stuff they trap).
Thanks,
M.
--
Jazz isn't dead. It just smells funny.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1
2025-06-04 15:31 ` Marc Zyngier
@ 2025-06-05 10:33 ` James Clark
0 siblings, 0 replies; 20+ messages in thread
From: James Clark @ 2025-06-05 10:33 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, linux-arm-kernel, linux-kernel, linux-perf-users,
linux-doc, kvmarm
On 04/06/2025 4:31 pm, Marc Zyngier wrote:
> On Tue, 03 Jun 2025 10:50:23 +0100,
> James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 29/05/2025 5:56 pm, Marc Zyngier wrote:
>>> On Thu, 29 May 2025 12:30:27 +0100,
>>> James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> SPE data source filtering (SPE_FEAT_FDS) adds a new register
>>>> PMSDSFR_EL1, add the trap configs for it.
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>> arch/arm64/kvm/emulate-nested.c | 1 +
>>>> arch/arm64/kvm/sys_regs.c | 1 +
>>>> 2 files changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>>>> index 0fcfcc0478f9..05d3e6b93ae9 100644
>>>> --- a/arch/arm64/kvm/emulate-nested.c
>>>> +++ b/arch/arm64/kvm/emulate-nested.c
>>>> @@ -1169,6 +1169,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
>>>> SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
>>>> SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
>>>> SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
>>>> + SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
>>>> SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
>>>> SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
>>>> SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),
>>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>>> index 5dde9285afc8..9f544ac7b5a6 100644
>>>> --- a/arch/arm64/kvm/sys_regs.c
>>>> +++ b/arch/arm64/kvm/sys_regs.c
>>>> @@ -2956,6 +2956,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>> { SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
>>>> { SYS_DESC(SYS_PMBPTR_EL1), undef_access },
>>>> { SYS_DESC(SYS_PMBSR_EL1), undef_access },
>>>> + { SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
>>>
>>> PMSDSFR_EL1 has an offset in the VNCR page (0x858), and must be
>>> described as such. This is equally true for a bunch of other
>>> SPE-related registers, so you might as well fix those while you're at
>>> it.
>>>
>>> Thanks,
>>>
>>> M.
>>>
>>
>> I got a bit stuck with what that would look like with registers that
>> are only undef in case there was something that I missed, but do I
>> just document the offsets?
>>
>> +++ b/arch/arm64/include/asm/vncr_mapping.h
>> @@ -87,6 +87,8 @@
>> #define VNCR_PMSICR_EL1 0x838
>> #define VNCR_PMSIRR_EL1 0x840
>> #define VNCR_PMSLATFR_EL1 0x848
>> +#define VNCR_PMSNEVFR_EL1 0x850
>> +#define VNCR_PMSDSFR_EL1 0x858
>>
>
> This should be enough.
>
Thanks, I'll resend with these added.
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -596,6 +596,16 @@ enum vcpu_sysreg {
>> VNCR(ICH_HCR_EL2),
>> VNCR(ICH_VMCR_EL2),
>>
>> + /* SPE Registers */
>> + VNCR(PMBLIMITR_EL1),
>> + VNCR(PMBPTR_EL1),
>> + VNCR(PMBSR_EL1),
>> + VNCR(PMSCR_EL1),
>> + VNCR(PMSEVFR_EL1),
>> + VNCR(PMSICR_EL1),
>> + VNCR(PMSIRR_EL1),
>> + VNCR(PMSLATFR_EL1),
>
> I don't see a point in having those until we actually have SPE support
> for guests, if ever, as these will potentially increase the size of
> the vcpu sysreg array for no good reason.
>
>> And then sys_reg_descs[] remain as "{ SYS_DESC(SYS_PMBLIMITR_EL1),
>> undef_access }," rather than EL2_REG_VNCR() because we don't actually
>> want to change to bad_vncr_trap()?
>
> This seem OK for now. We may want to refine this in the future though,
> as these registers cannot trap when NV is enabled. Yes, this is a bug
> in the architecture.
>
>> There are some other parts about fine grained traps and res0 bits for
>> NV, but they all already look to be setup correctly. Except
>> HDFGRTR2_EL2.nPMSDSFR_EL1, but it's inverted, none of the FGT2 traps
>> are configured currently and PMSDSFR_EL1 is already trapped by
>> MDCR_EL2 anyway.
>
> Can you elaborate on that? We have:
>
> SR_FGT(SYS_PMSDSFR_EL1, HDFGRTR2, nPMSDSFR_EL1, 0),
>
> which seems to match the spec.
>
> We also have full support for FEAT_FGT2 already (even if we have no
> support for the stuff they trap).
Oh I think that was misleading, the version I was poking around on
didn't have the FEAT_FGT2 stuff applied yet but I see it now. And yes,
as you say, what's there matches the spec.
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-06-05 10:35 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-29 11:30 [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features James Clark
2025-05-29 11:30 ` [PATCH v2 01/11] arm64: sysreg: Update PMSIDR_EL1 description James Clark
2025-05-29 11:30 ` [PATCH v2 02/11] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
2025-05-29 11:30 ` [PATCH v2 03/11] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
2025-05-29 11:30 ` [PATCH v2 04/11] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
2025-05-29 11:30 ` [PATCH v2 05/11] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
2025-05-29 16:57 ` Marc Zyngier
2025-05-29 11:30 ` [PATCH v2 06/11] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
2025-05-29 16:56 ` Marc Zyngier
2025-06-03 9:50 ` James Clark
2025-06-04 15:31 ` Marc Zyngier
2025-06-05 10:33 ` James Clark
2025-05-29 11:30 ` [PATCH v2 07/11] perf: Add perf_event_attr::config4 James Clark
2025-05-29 11:30 ` [PATCH v2 08/11] perf: arm_spe: Add support for filtering on data source James Clark
2025-05-29 11:30 ` [PATCH v2 09/11] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
2025-05-29 11:30 ` [PATCH v2 10/11] perf tools: Add support for perf_event_attr::config4 James Clark
2025-05-29 17:25 ` Ian Rogers
2025-05-29 11:30 ` [PATCH v2 11/11] perf docs: arm-spe: Document new SPE filtering features James Clark
2025-05-29 16:43 ` Leo Yan
2025-05-29 16:48 ` [PATCH v2 00/11] perf: arm_spe: Armv8.8 SPE features Leo Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).